Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

R group_by and summarize is not working as it should.. no clue why

$
0
0

This should REALLY work but it doesn't and I lose my mind!

This is my data

> head(dataset_2,n=5)
  CUSTOMER_NUMBER OLD_NEW_CLIENT COMPLETION_PRCT CRASH_RISK
1       535961675     Old client            0.06         25
2       223186690     Old client            0.04         24
3       217140964     Old client            0.05         32
4       514559839     Old client            0.10         52
5        10991413     Old client            0.53         15

> str(dataset_2)

'data.frame':   90405 obs. of  4 variables:
 $ CUSTOMER_NUMBER: int  535961675 223186690 217140964 514559839 10991413 506839750 15102896 34980927 578647941 804552857 ...
 $ OLD_NEW_CLIENT : chr  "Old client""Old client""Old client""Old client" ...
 $ COMPLETION_PRCT: num  0.06 0.04 0.05 0.1 0.53 0.05 0.06 0.06 1 0.09 ...
 $ CRASH_RISK     : num  25 24 32 52 15 38 42 42 41 78 ...
 - attr(*, ".internal.selfref")=<externalptr> 

I want to summarise count of clients by all other columns - so combinations of old_new_client, completion_prct and crash_risk and a count of clients falling into this bucket. But when I type code:

  by_parameters <-dataset_2 %>%
  group_by(OLD_NEW_CLIENT, COMPLETION_PRCT, CRASH_RISK) %>%
  summarize(clients=n_distinct(CUSTOMER_NUMBER))

I get:

> by_parameters
  clients
1   90399

Thanks for any help!


Viewing all articles
Browse latest Browse all 201894

Trending Articles