Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201943

I'm a complete beginner! How to I convert a .txt file (film script) to a table (characters and lines) in R or Python?

$
0
0

I'm a complete beginner, and for a project for college I need to analyse film scripts. I want to create a table in which I can match the characters to their lines. My files are all in .txt format and I'd like to convert them to a csv file. I have a lot of scripts to go through, so I'd like to find a code that can be easily adapted to the different files.

This is what I have:

                            THREEPIO
      Did you hear that?  They've shut 
      down the main reactor.  We'll be 
      destroyed for sure.  This is 
      madness!


                THREEPIO
      We're doomed!


                THREEPIO
      There'll be no escape for the 
      Princess this time.

                THREEPIO
      What's that?

And this is what I need to have:

"character""dialogue"

"1""THREEPIO""Did you hear that? They've shut down the main reactor. We'll be destroyed for sure. This is madness!"

"2""THREEPIO""We're doomed!"

"3""THREEPIO""There'll be no escape for the Princess this time."

"4""THREEPIO""What's that?"

This is what I've tried:

# the first 70 lines don't contain dialogues
# so we can start reading at line 70 (for instance)
i = 70

# while loop to extract character and dialogues
# (probably there's a better way to parse the file instead of
# using my crazy nested if-then-elses, but this works for me)
while (i <= nlines)
{
  # if empty line
  if (sw[i] == "") i = i + 1  # next line
  # if text line
  if (sw[i] != "")
  {
    # if uninteresting stuff
    if (substr(sw[i], 1, 1) != "") {
      i = i + 1   # next line
    } else {
      if (nchar(sw[i]) < 10) {
        i = i + 1  # next line
      } else {
        if (substr(sw[i], 1, 5) != ""&& substr(sw[i], 6, 6) != "") {
          i = i + 1  # next line
        } else {
          # if character name
          if (substr(sw[i], 1, 30) == b30) 
          {
            if (substr(sw[i], 31, 31) != "")
            {
              tmp_name = substr(sw[i], 31, nchar(sw[i], "bytes"))
              cat("\n", file="EpisodeVI_dialogues.txt", append=TRUE)
              cat(tmp_name, "", file="EpisodeVI_dialogues.txt", sep="\t", append=TRUE)
              i = i + 1        
            } else {
              i = i + 1
            }
          } else {
            # if dialogue
            if (substr(sw[i], 1, 15) == b15)
            {
              if (substr(sw[i], 16, 16) != "")
              {
                tmp_diag = substr(sw[i], 16, nchar(sw[i], "bytes"))
                cat("", tmp_diag, file="EpisodeVI_dialogues.txt", append=TRUE)
                i = i + 1
              } else {
                i = i + 1
              }
            }
          }
        }
      }
    }    
  }
}

Any help would me much appreciated! Thank you!! 

Viewing all articles
Browse latest Browse all 201943

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>