I have this dataframe that contains with 41,000 rows of Flickr tags with non-english words. Example:
column1 column2 column3
amsterdam het dag calamiteit bij doen gratis dag 2015
rotterdam blijdorp groet gratis burp het ik ben 2016
I want to translate all the non-English words in column2 to English using google translate API. I tried to do it, but then I hit the requested limit of google translate API because I have 41,000 rows (so massive data).
Luckily I got someone who gave me R script that somehow can translate these massive words within the request limit of the google translate API. I tried to convert the R script to Python language as best as I could, but I failed.
R script:
library(googleLanguageR)
library(tidyverse)
## create a tibble in the required format
tibble <- tibble
translate <- function(tibble) {
tibble <- tibble
count <- data.frame(nchar = 0, cumsum = 0) # create count file to stay within API limits
for (i in 1:nrow(tibble)) {
des <- pull(tibble[i,2]) # extract description as single character string
if (count$cumsum[nrow(count)] >= 80000) { # API limit check
print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
Sys.sleep(100)
count <- count[1,] # reset count file
}
if (grepl("^\\s*$", des) == TRUE) { # if description is only whitespace then skip
trns <- tibble(translatedText = "", detectedSourceLanguage = "", text = "")
} else { # else request translation from API
trns <- gl_translate(des, target='en', format='html') # request in html format to anticipate html descriptions
}
tibble[i,3:4] <- trns[,1:2] # add to tibble
nchar = nchar(pull(tibble[i,2])) # count number of characters
req <- data.frame(nchar = nchar, cumsum = nchar + sum(count$nchar))
count <- rbind(count, req) # add to count file
if (nchar > 20000) { # addtional API request limit safeguard for large descriptions
print("large description (>20,000), pausing to manage API limit")
Sys.sleep(100)
count <- count[1,] # reset count file
}
}
return(tibble)
}
this is the furthest i can go to convert R script to python:
def translate(text):
tibble = []
tibble = pd.DataFrame(tibble)
tibble = testDataset
count = []
count = pd.DataFrame(count, columns=['nchar', 'cumsum'])
count.loc[0] = 'asd'
des = []
des = pd.DataFrame(des)
grepl = []
trns = []
trns = pd.DataFrame(trns)
nchar = []
nchar = pd.DataFrame(nchar)
for i in tibble:
des = tibble['keywords'].str.split(expand=True).stack()
if len(count['cumsum']) >= 80000:
print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
sleep(100)
count = count[0:]
I am confused, especially with the grepl, gl_translate, pull(tibble), rbind from R script.
How do I translate them into Python code?