Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

Binning data in R with the same output as in spreadsheet

$
0
0

I have this sample data:

set.seed(25)

xx <- data.table(
  year = 2015,
  values = iris$Sepal.Length,
  score = sample(1:8, nrow(iris), replace = TRUE))

The actual data contains many years and rows. I wanted to grouped the values column using the cut() function in baseR but the result is different from the result generated by LibreOffice Calc (even in MS Office Excel) pivot. This is what I have done so far:

brks <- seq(0, ceiling(max(xx$values)), 0.5)
xx[, bins := cut(values, brks, ordered_result = TRUE)]
xx_binned <- dcast(xx, bins ~ year, length, value.var = "values")
xx_binned <- melt(xx_binned, id.vars = "bins", value.name = "value")

I started at 0 so that it will be consistent if I use different data. In the spreadsheet I also started at 0 as the starting number.

The result of the above codes is this:

     bins   variable value
1   (4,4.5] 2015     5
2   (4.5,5] 2015     27
3   (5,5.5] 2015     27
4   (5.5,6] 2015     30
5   (6,6.5] 2015     31
6   (6.5,7] 2015     18
7   (7,7.5] 2015     6
8   (7.5,8] 2015     6

This is the result of LibreOffice Calc:

values  2015
4-4.5   15
4.5-5   106
5-5.5   100
5.5-6   142
6-6.5   148
6.5-7   95
7-7.5   25
7.5-8   27

How can I make it the same? I am writing a function converting a spreadsheet tools into R function and I want it to be the same as in the output of the spreadsheet.

Thanks.


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>