Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to distribute data for each rows in R?

$
0
0

I have a data set as I've shown below:

df <- tribble(
  ~id,  ~price, ~type, ~number_of_book,        
  "1",    10,     "X",        3,    
  "1",     2,     "X",        1, 
  "1",     5,     "Y",        1,         
  "2",     7,     "X",        4,
  "2",     6,     "X",        1,
  "2",     6,     "Y",        2, 
  "3",     2,     "X",        4,
  "3",     8,     "X",        2,
  "3",     1,     "Y",        4,
  "3",     9,     "Y",        5,
)

Now, I want to answer this question: for each id and for each selected price group, what percentage of books is X and what percentage is Y? In other word, what is the distribution of the type of books for each id and price group?

To do this, first I need to have this data set as far as I visualize it in my mind:

agg_df <- tribble(
  ~type,     ~id,       ~less_than_two,    ~two-five,  ~five-six, ~more_than_six,     
    "X",      "1",              1,               0,           0,            3,
    "Y",      "1",              0,               1,           0,            0,
    "X",      "2",              0,               0,           1,            4,
    "Y",      "2",              0,               0,           2,            2,
    "X",      "3",              4,               0,           0,            2,
    "Y",      "3",              4,               0,           0,            5,
)

And then, this will be my desired data set:

desired_df <- tribble(
  ~type,     ~id,       ~less_than_two,  ~three-five,  ~five-six, ~more_than_six,     
  "X",      "1",            "100%",           "0%",          "0%",       "100%",
  "Y",      "1",              "0%",         "100%",          "0%",         "0%",
  "X",      "2",              "0%",           "0%",       "33.3%",      "66.6%",
  "Y",      "2",              "0%",           "0%",       "66.6%",       "33.3%",
  "X",      "3",             "50%",           "0%",          "0%",      "28.5%",
  "Y",      "3",             "50%",           "0%",          "0%",       "71.4%",
)

This desired data set shows me that when id is "3" and the price bin is more than six dollars there are two books in X type, but five books in Y type. So, here is the distribution: X(28.5%) and Y(71.4%).

Note: I had a similar question here, but now it is more complex manipulation that I could not manage to get it: How to manipulate (aggregate) the data in R?

I would appreciate if you could help me. Thanks in advance.


Viewing all articles
Browse latest Browse all 201839

Trending Articles