I am trying to build an association rules algorithm using Sparklyr and have been following this blog which is really well explained.
However, there is a section just after they fit the FPGrowth algorithm where the author extracts the rules from the "FPGrowthModel object" which is returned but I am not able to reproduce to extract my rules.
The section where I am struggling is this piece of code:
rules = FPGmodel %>% invoke("associationRules")
Could someone please explain where FPGmodel comes from?
My code looks as follows and I am not seeing an FPGmodel object that I can extract my rules from, any help would be greatly appreciated.
# CACHE HIVE TABLE INTO SPARK
tbl_cache(sc, 'claims', force = TRUE)
med_tbl <- tbl(sc, 'claims')
# SELECT VARIABLES OF INTEREST
med_tbl <- med_tbl %>% select(proc_desc,alt_claim_id)
# REMOVE DUPLICATED ROWS
med_tbl <- dplyr::distinct(med_tbl)
med_tbl <- med_tbl %>% group_by(alt_claim_id)
# AGGREGATING CLAIMS BY CLAIM ID
med_agg <- med_tbl %>%
group_by(alt_claim_id) %>%
summarise(procedures = collect_list(proc_desc))
# CREATE UNIQUE STRING TO IDENTIFY THE MACHINE LEARNING ESTIMATOR
uid = sparklyr:::random_string("fpgrowth_")
# INVOKE THE FPGrowth JAVA CLASS
jobj = invoke_new(sc, "org.apache.spark.ml.fpm.FPGrowth", uid)
jobj %>%
invoke("setItemsCol", "procedures") %>%
invoke("setMinConfidence", 0.03) %>%
invoke("setMinSupport", 0.01) %>%
invoke("fit", spark_dataframe(med_agg))