python自动翻译视频_【Python爬虫】利用python自动翻译文本

【Python爬虫】利用python自动翻译文本

首先,打开 google 翻译网站。

https://translate.google.cn/https://translate.google.cn/https://translate.google.cn/https://translate.google.cn/

然后,让我们试着翻译几个单词,看一下网址会有什么变化。

翻译模式

翻译内容

对应网址

自动检测---->中文

hello

https://translate.google.cn/#view=home&op=translate&sl=auto&tl=zh-CN&text=hello

自动检测---->英文

你好

https://translate.google.cn/#view=home&op=translate&sl=auto&tl=en&text=%E4%BD%A0%E5%A5%BD

英文---->中文

hello

https://translate.google.cn/#view=home&op=translate&sl=en&tl=zh-CN&text=hello

中文---->英文

你好

https://translate.google.cn/#view=home&op=translate&sl=zh-CN&tl=en&text=%E4%BD%A0%E5%A5%BD

观察后发现,网址中 sl 后接源语言,tl 后接翻译后的语言,text 后接需要翻译的内容,其中 %E4%BD%A0%E5%A5%BD 是“你好”的 UTF-8 编码,于是,尝试将这一串字符直接换成“你好”,再次请求站点。

发现直接使用中文也可以得到正确的内容。

于是,开始尝试通过 Python 爬虫来抓取页面并根据规则提取出翻译后的内容。

首先,使用自动检测---->中文,翻译 hello ,网址为 https://translate.google.cn/#view=home&op=translate&sl=auto&tl=zh-CN&text=hello ,打开网址,打开开发者工具,按 Ctrl+Shift+C ,然后鼠标点击页面上的“你好”字样,然后在开发者工具内,右击蓝色部分,依次点击 Copy->Copy Selector 。

然后开始敲代码:

from requests_html import HTMLSession

session = HTMLSession()

link = 'https://translate.google.cn/#view=home&op=translate&sl=auto&tl=zh-CN&text=hello'

r = session.get(link)

f = r.html.find('body > div.container > div.frame > div.page.tlid-homepage.homepage.translate-text > div.homepage-content-wrap > div.tlid-source-target.main-header > div.source-target-row > div.tlid-results-container.results-container > div.tlid-result.result-dict-wrapper > div.result.tlid-copy-target > div.text-wrap.tlid-copy-target > div > span.tlid-translation.translation > span',first = True)

print(f)

代码没有错误,但是返回值 f 是None。于是考虑翻译后的内容是异步加载的,打开开发者工具中的 network ,重新翻译一遍 “hello”,观察一下,发现果然是异步加载的。

依次点击预览,发现名为 single?client……中包含翻译结果。

于是右击复制这部分的链接:

重新开始敲代码:

from requests_html import HTMLSession

session = HTMLSession()

link = 'https://translate.google.cn/translate_a/single?client=webapp&sl=auto&tl=zh-CN&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=sos&dt=ss&dt=t&otf=1&ssel=0&tsel=0&xid=45626150&kc=13&tk=680344.843218&q=hello'

r = session.get(link)

print(r.text)

发现已经可以输入带有翻译结果的内容了:

[[["你好","hello",null,null,1]

,[null,null,"Nǐ hǎo","heˈlō,həˈlō"]

]

,[["感叹词",["你好!","喂!"]

,[["你好!",["Hello!","Hi!","Hallo!"]

,null,0.13117145]

,["喂!",["Hey!","Hello!"]

,null,0.020115795]

]

,"Hello!",9]

]

,"en",null,null,[["hello",null,[["你好",1000,true,false]

,["您好",1000,true,false]

]

,[[0,5]

]

,"hello",0,0]

]

,1.0,[]

,[["en"]

,null,[1.0]

,["en"]

]

,null,null,[["名词",[[["hullo","hi","how-do-you-do","howdy"]

,""]

]

,"hello"]

,["惊叹词",[[["hi","howdy","hey","hiya","ciao","aloha"]

,"m_en_us1254307.001"]

]

,"hello"]

]

,[["名词",[["an utterance of “hello”; a greeting.","m_en_us1254307.006","Colin Spencer still stood by the desk no one signed in at; and he still smiled and nodded his hellos and goodbyes to every oblivious face that passed him by as though he was host to this year's biggest A-list birthday bash."]

]

,"hello"]

,["惊叹词",[["used as a greeting or to begin a telephone conversation.","m_en_us1254307.001","But instead of a normal greeting like saying hello or something, they hugged."]

]

,"hello"]

,["动词",[["say or shout “hello”; greet someone.","m_en_us1254307.007","After all the helloing and such, he would sit down and talk to me in a gruff, military kind of way."]

]

,"hello"]

]

……

于是尝试提取翻译内容:

for i in range(5,100):

if content[i] == '"':

count = i

break

print(content[4:count])

成功提取翻译内容。

完整代码如下:

from requests_html import HTMLSession

session = HTMLSession()

link = 'https://translate.google.cn/translate_a/single?client=webapp&sl=auto&tl=zh-CN&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=sos&dt=ss&dt=t&otf=1&ssel=0&tsel=0&xid=45626150&kc=13&tk=680344.843218&q=hello'

r = session.get(link)

content = r.text

for i in range(5,100):

if content[i] == '"':

count = i

break

print(content[4:count])

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值