python自动翻译视频_【Python爬虫】利用python自动翻译文本

最新推荐文章于 2023-10-31 17:31:39 发布

weixin_39932762

最新推荐文章于 2023-10-31 17:31:39 发布

阅读量235

点赞数

文章标签： python自动翻译视频

【Python爬虫】利用python自动翻译文本

首先，打开 google 翻译网站。

https://translate.google.cn/https://translate.google.cn/https://translate.google.cn/https://translate.google.cn/

然后，让我们试着翻译几个单词，看一下网址会有什么变化。

翻译模式

翻译内容

对应网址

自动检测---->中文

hello

https://translate.google.cn/#view=home&op=translate&sl=auto&tl=zh-CN&text=hello

自动检测---->英文

你好

https://translate.google.cn/#view=home&op=translate&sl=auto&tl=en&text=%E4%BD%A0%E5%A5%BD

英文---->中文

hello

https://translate.google.cn/#view=home&op=translate&sl=en&tl=zh-CN&text=hello

中文---->英文

你好

https://translate.google.cn/#view=home&op=translate&sl=zh-CN&tl=en&text=%E4%BD%A0%E5%A5%BD

观察后发现，网址中 sl 后接源语言，tl 后接翻译后的语言，text 后接需要翻译的内容，其中 %E4%BD%A0%E5%A5%BD 是“你好”的 UTF-8 编码，于是，尝试将这一串字符直接换成“你好”，再次请求站点。

发现直接使用中文也可以得到正确的内容。

于是，开始尝试通过 Python 爬虫来抓取页面并根据规则提取出翻译后的内容。

首先，使用自动检测---->中文，翻译 hello ，网址为 https://translate.google.cn/#view=home&op=translate&sl=auto&tl=zh-CN&text=hello ，打开网址，打开开发者工具，按 Ctrl+Shift+C ，然后鼠标点击页面上的“你好”字样，然后在开发者工具内，右击蓝色部分，依次点击 Copy->Copy Selector 。

然后开始敲代码：

from requests_html import HTMLSession

session = HTMLSession()

link = 'https://translate.google.cn/#view=home&op=translate&sl=auto&tl=zh-CN&text=hello'

r = session.get(link)

f = r.html.find('body > div.container > div.frame > div.page.tlid-homepage.homepage.translate-text > div.homepage-content-wrap > div.tlid-source-target.main-header > div.source-target-row > div.tlid-results-container.results-container > div.tlid-result.result-dict-wrapper > div.result.tlid-copy-target > div.text-wrap.tlid-copy-target > div > span.tlid-translation.translation > span',first = True)

print(f)

代码没有错误，但是返回值 f 是None。于是考虑翻译后的内容是异步加载的，打开开发者工具中的 network ，重新翻译一遍 “hello”，观察一下，发现果然是异步加载的。

依次点击预览，发现名为 single?client……中包含翻译结果。

于是右击复制这部分的链接：

重新开始敲代码：

from requests_html import HTMLSession

session = HTMLSession()

link = 'https://translate.google.cn/translate_a/single?client=webapp&sl=auto&tl=zh-CN&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=sos&dt=ss&dt=t&otf=1&ssel=0&tsel=0&xid=45626150&kc=13&tk=680344.843218&q=hello'

r = session.get(link)

print(r.text)

发现已经可以输入带有翻译结果的内容了：

[[["你好","hello",null,null,1]

,[null,null,"Nǐ hǎo","heˈlō,həˈlō"]

]

,[["感叹词",["你好!","喂!"]

,[["你好!",["Hello!","Hi!","Hallo!"]

,null,0.13117145]

,["喂!",["Hey!","Hello!"]

,null,0.020115795]

]

,"Hello!",9]

]

,"en",null,null,[["hello",null,[["你好",1000,true,false]

,["您好",1000,true,false]

]

,[[0,5]

]

,"hello",0,0]

]

,1.0,[]

,[["en"]

,null,[1.0]

,["en"]

]

,null,null,[["名词",[[["hullo","hi","how-do-you-do","howdy"]

,""]

]

,"hello"]

,["惊叹词",[[["hi","howdy","hey","hiya","ciao","aloha"]

,"m_en_us1254307.001"]

]

,"hello"]

]

,[["名词",[["an utterance of “hello”; a greeting.","m_en_us1254307.006","Colin Spencer still stood by the desk no one signed in at; and he still smiled and nodded his hellos and goodbyes to every oblivious face that passed him by as though he was host to this year's biggest A-list birthday bash."]

]

,"hello"]

,["惊叹词",[["used as a greeting or to begin a telephone conversation.","m_en_us1254307.001","But instead of a normal greeting like saying hello or something, they hugged."]

]

,"hello"]

,["动词",[["say or shout “hello”; greet someone.","m_en_us1254307.007","After all the helloing and such, he would sit down and talk to me in a gruff, military kind of way."]

]

,"hello"]

]

……

于是尝试提取翻译内容：

for i in range(5,100):

if content[i] == '"':

count = i

break

print(content[4:count])

成功提取翻译内容。

完整代码如下：