Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

working within google translation api requested limit

$
0
0

I have this dataframe that contains with 41,000 rows of Flickr tags with non-english words. Example:

column1     column2                                 column3
amsterdam   het dag calamiteit bij doen gratis dag  2015
rotterdam   blijdorp groet gratis burp het ik ben   2016

I want to translate all the non-English words in column2 to English using google translate API. I tried to do it, but then I hit the requested limit of google translate API because I have 41,000 rows (so massive data).

Luckily I got someone who gave me R script that somehow can translate these massive words within the request limit of the google translate API. I tried to convert the R script to Python language as best as I could, but I failed.

R script:

library(googleLanguageR)
library(tidyverse)
## create a tibble in the required format
tibble <- tibble
translate <- function(tibble) {
   tibble <- tibble
   count <- data.frame(nchar = 0, cumsum = 0) # create count file to stay within API limits
   for (i in 1:nrow(tibble)) {
        des <- pull(tibble[i,2]) # extract description as single character string
        if (count$cumsum[nrow(count)] >= 80000) { # API limit check
            print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
            Sys.sleep(100)
            count <- count[1,] # reset count file
        }
        if (grepl("^\\s*$", des) == TRUE) { # if description is only whitespace then skip
            trns <- tibble(translatedText = "", detectedSourceLanguage = "", text = "")
            } else { # else request translation from API
              trns <- gl_translate(des, target='en', format='html') # request in html format to anticipate html descriptions
        }
        tibble[i,3:4] <- trns[,1:2] # add to tibble
        nchar = nchar(pull(tibble[i,2])) # count number of characters
        req <- data.frame(nchar = nchar, cumsum = nchar + sum(count$nchar))
        count <- rbind(count, req) # add to count file
        if (nchar > 20000) { # addtional API request limit safeguard for large descriptions
            print("large description (>20,000), pausing to manage API limit")
            Sys.sleep(100)
            count <- count[1,] # reset count file
        }
      }
   return(tibble)
}

this is the furthest i can go to convert R script to python:

def translate(text):
tibble = []
tibble = pd.DataFrame(tibble)
tibble = testDataset

count = []
count = pd.DataFrame(count, columns=['nchar', 'cumsum'])
count.loc[0] = 'asd'

des = []
des = pd.DataFrame(des)

grepl = []

trns = []
trns = pd.DataFrame(trns)

nchar = []
nchar = pd.DataFrame(nchar)


for i in tibble:    
    des = tibble['keywords'].str.split(expand=True).stack()
    if len(count['cumsum']) >= 80000:
        print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
        sleep(100)
        count = count[0:]

I am confused, especially with the grepl, gl_translate, pull(tibble), rbind from R script.

How do I translate them into Python code?


Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>