I have a dataset where each row contains a string of text of this type
1)list(text = \"incredible hosts\", relevance = 0.87518, count = 1), list(text = \"Japan\", relevance = 0.675236, count = 1), list(text = \"support\", relevance = 0.625663, count = 1), list(text = \"result\", relevance = 0.359757, count = 1)
2)list(text = \"British fleet\", relevance = 0.912888, count = 1), list(text = \"worst maritime disasters\", relevance = 0.904047, count = 1), list(text = \"British history\", relevance = 0.755491, count = 1), list(text = \"Scilly Isles\", relevance = 0.716508, count = 1), list(text = \"sailors\", relevance = 0.691141, count = 1), list(text = \"evening\", relevance = 0.597375, count = 1), list(text = \"Tragedy\", relevance = 0.577141, count = 1), list(text = \"prize\", relevance = 0.565035, count = 1), list(text = \"rocks\", relevance = 0.543257, count = 1), list(text = \"innovation\", relevance = 0.529463, count = 1), list(text = \"longitude\", relevance = 0.335207, count = 1)
basically I would like to extract just the string of text contain between \" and \"
and obtain something like this
1) "incredible hosts, Japan, support , result"
2) "British fleet, worst maritime disasters, British history, scilly Isles, sailors, evening, etc..."
Moreover I would like to create a data frame that helps le keep track of the relevance score contained in the text for each piece of text (considering that different raws might have different number of pieces of text) so to get something like this:
col1 col2. col3. col4. col5. col6..... colA1 colA2. .....
incredible hosts Japon support result NA. NA 0.87518. 0.675236....
british fleet. worst marit.......
basically a number of columns that is equal to the maximum number of pieces of text in a row, same for the columns corresponding to the score (each relevance score refers to a piece of text, so they re the same number).
If I can find a way to extract first the pieces of text and separate them by a comma, and then do the same with the relevance scores I think I can easily merge the two in a dataframe. so the problem is mainly extracting this 2 things from that text.
thank you in advance for your help,
Carlo