Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to count the most popular names in a column (the names are separated by comma) in R [closed]

$
0
0

I am trying to get the most frequent country of movie production from this netflix data set, but freq_terms in qdap package only gave me the terms:

    freq_terms(netflix$country)
> WORD      FREQ
1  united    3001
2  states    2421
3  india      753
4  kingdom    559
5  canada     300
6  france     255
7  japan      216
8  south      181
9  spain      173
10 korea      155
11 germany    139
12 mexico     122
...

Here is the first 10 rows of my data:

dput(droplevels(netflix[1:10, ]))
structure(list(show_id = c(81193313L, 81197050L, 81213894L, 81082007L, 
80213643L, 81172754L, 81120982L, 81227195L, 70205672L, 81172841L
), title = structure(c(3L, 5L, 10L, 1L, 2L, 4L, 6L, 7L, 8L, 9L
), .Label = c("Atlantics", "Chip and Potato", "Chocolate", "Crazy people", 
"Guatemala: Heart of the Mayan World", "I Lost My Body", "Kalushi: The Story of Solomon Mahlangu", 
"La Reina del Sur", "Lagos Real Fake Life", "The Zoya Factor"
), class = "factor"), director = structure(c(1L, 4L, 2L, 6L, 
1L, 8L, 3L, 5L, 1L, 7L), .Label = c("", "Abhishek Sharma", "Jérémy Clapin", 
"Luis Ara, Ignacio Jaunsolo", "Mandla Dube", "Mati Diop", "Mike Ezuruonye", 
"Moses Inwang"), class = "factor"), cast = structure(c(3L, 2L, 
9L, 6L, 1L, 8L, 4L, 10L, 5L, 7L), .Label = c("Abigail Oliver, Andrea Libman, Briana Buckmaster, Brian Dobson, Chance Hurstfield, Dominic Good, Emma Jayne Maas, Evan Byarushengo, Scotia Anderson, Alessandro Juliani", 
"Christian Morales", "Ha Ji-won, Yoon Kye-sang, Jang Seung-jo, Kang Bu-ja, Lee Jae-ryong, Min Jin-woong, Kim Won-hae, Yoo Teo", 
"Hakim Faris, Victoire Du Bois, Patrick d'Assumçao, Dev Patel, Alia Shawkat, George Wendt", 
"Kate del Castillo, Cristina Urgel, Alberto Jiménez, Juan José Arjona, Humberto Zurita, Dagoberto Gama, Christian Tappán, Miguel de Miguel, Salvador Zerboni, Carmen Navarro, Santiago Meléndez, Juan Carlos Solarte", 
"Mama Sane, Amadou Mbow, Ibrahima Traore, Nicole Sougou, Amina Kane, Mariama Gassama, Coumba Dieng, Ibrahima Mbaye, Diankou Sembene", 
"Nonso Diobi, Mike Ezuruonye, Mercy Aigbe, Rex Nosa, Annie Macaulay Idibia, Ik Ogbonna, Nedu Wazobia, Uzee Usman, Odunlade Adekola, Mr Jollof, Efe Irele, Josh 2 Funny, Haillie Sumney, Emmanuella, MC Lively", 
"Ramsey Nouah, Chigul, Sola Sobowale, Ireti Doyle, Ben Touitou, Francis Onwochei, Ememobong Nkana, Emem Inwang, Patrick Onyeke", 
"Sonam Kapoor, Dulquer Salmaan, Sanjay Kapoor, Sikander Kher, Angad Bedi, Koel Purie, Pooja Bhamrah, Manu Rishi Chadha", 
"Thabo Rametsi, Thabo Malema, Welile Nzuza, Jafta Mamabolo, Louw Venter, Pearl Thusi"
), class = "factor"), country = structure(c(8L, 1L, 5L, 4L, 2L, 
6L, 3L, 7L, 9L, 1L), .Label = c("", "Canada, United Kingdom", 
"France", "France, Senegal, Belgium", "India", "Nigeria", "South Africa", 
"South Korea", "United States, Spain, Colombia, Mexico"), class = "factor"), 
    date_added = structure(c(3L, 3L, 3L, 2L, 1L, 2L, 2L, 2L, 
    1L, 2L), .Label = c("", "29-Nov-19", "30-Nov-19"), class = "factor"), 
    release_year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2018L, 
    2019L, 2016L, 2019L, 2018L), rating = structure(c(1L, 2L, 
    1L, 1L, 4L, 1L, 3L, 3L, 1L, 1L), .Label = c("TV-14", "TV-G", 
    "TV-MA", "TV-Y"), class = "factor"), duration = structure(c(1L, 
    7L, 5L, 2L, 6L, 3L, 8L, 3L, 6L, 4L), .Label = c("1 Season", 
    "106 min", "107 min", "118 min", "135 min", "2 Seasons", 
    "67 min", "81 min"), class = "factor"), listed_in = structure(c(8L, 
    5L, 1L, 6L, 9L, 3L, 6L, 7L, 4L, 2L), .Label = c("Comedies, Dramas, International Movies", 
    "Comedies, International Movies", "Comedies, International Movies, Thrillers", 
    "Crime TV Shows, International TV Shows, Spanish-Language TV Shows", 
    "Documentaries, International Movies", "Dramas, Independent Movies, International Movies", 
    "Dramas, International Movies", "International TV Shows, Korean TV Shows, Romantic TV Shows", 
    "Kids' TV"), class = "factor"), description = structure(c(3L, 
    4L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("A goofy copywriter unwittingly convinces the Indian cricket team that sheâ\200\231s their lucky mascot, to the dismay of their superstition-shunning captain.", 
    "Arranged to marry a rich man, young Ada is crushed when her true love goes missing at sea during a migration attempt â\200“ until a miracle reunites them.", 
    "Brought together by meaningful meals in the past and present, a doctor and a chef are reacquainted when they begin working at a hospice ward.", 
    "From Sierra de las Minas to Esquipulas, explore Guatemala's cultural and geological wealth, including ancient Mayan cities and other natural wonders.", 
    "Lovable pug Chip starts kindergarten, makes new friends and tries new things â\200“ with a little help from Potato, her secret mouse pal.", 
    "Nollywood star Ramsey Nouah learns that someone is impersonating him and breaks out of a mental institution to expose the imposter.", 
    "Romance, mystery and adventure intertwine as a young man falls in love and a severed hand scours Paris for its owner in this mesmerizing animated film.", 
    "The life and times of iconic South African liberation fighter Solomon Mahlangu, who battled the forces of apartheid, come into focus.", 
    "This compelling show tells the story of the legendary Teresa Mendoza, a courageous woman who is perceived as conquering the world of drug trafficking.", 
    "Two mooching friends vie for the attention of wealthy, beautiful women only to discover that their lavish lifestyles are bogus."
    ), class = "factor"), type = structure(c(2L, 1L, 1L, 1L, 
    2L, 1L, 1L, 1L, 2L, 1L), .Label = c("Movie", "TV Show"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

I wanted it to show "United States" as one word not "united" and "states" separately. In addition, country names in my country column are separated by comma, how to specify the separator in freq_terms in qdap R. Similar issue occurred to the director of the movie column, the director names are at least two terms and separated by comma, and i need to get the most productive directors names. Please let me know if i should switch freq_terms function in qdap to another approach. Thanks everyone!


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>