I am trying to add a new column to an existing data frame using the withColumn
statement in Spark Dataframe API
. The below code works but I was wondering if there's a way that I can select more than one group. Let's say Group 1, 2, 3, 4 instead of only Group 1. I think I may be able to write when
statement four times. I have seen people do that in some posts. However, in R
, there is a %in%
operator that can specify if a variable contains values in a vector, but I don't know if there's such thing in Spark. I checked on the Spark API documentation but most of the functions don't contain any examples.
# R Sample Code:
library(dplyr)
df1 <- df %>% mutate( Selected_Group = (Group %in% 1:4))
Spark Dataframe Sample Code That Selects Group 1:
val df1 = df.withColumn("Selected_Group", when($"Group" === 1, 1).otherwise(0))
Data
ID, Group
1, 0
2, 1
3, 2
. .
. .
100, 99