python有道翻译小工具

Clown_34

已于 2023-09-11 16:59:38 修改

阅读量172

点赞数

文章标签： python 机器翻译 selenium

于 2023-09-11 16:58:18 首次发布

本文链接：https://blog.csdn.net/Clown_34/article/details/132812915

版权

Python有道翻译小工具（selenium自动化）

1.前言

本文记录了一种，基于python语言，selenium自动化模块，有道翻译的翻译小工具。

撰写背景：爬取数据存在英文，需要进行英译中，尝试使用有道翻译API：https://ai.youdao.com/gw.s。但有道提供的API存在费用需求，虽然送了50但数据量太大，并不足够。

2.逻辑分析

虽然API接口存在限制，但网页翻译是无限次的。

在这里插入图片描述

尝试抓包发现有道对返回信息进行了加密处理，处理起来比较麻烦，因此选用其他方式。

在这里插入图片描述

我们可以借助python-selenium，类似自动化爬虫的方式，模拟人工输入翻译，再获取翻译结果。
在这里插入图片描述

并且有道有新的AI翻译功能可以体验，效果也还可以

在这里插入图片描述

3.逻辑实现

环境准备：python 3+，以及chrome浏览器与版本对应的驱动

代码实现：

import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

# selenium驱动获取
def get_driver(proxies=None, headless=True, strategy=None):
    chrome_options = Options()
    if strategy == 'none':
        chrome_options.page_load_strategy = 'none'
    if strategy == 'eager':
        chrome_options.page_load_strategy = 'eager'
    # chrome_options.binary_location = "E:\\Chrome\\chrome.exe"
    chrome_options.add_argument("--start-maximized")
    # 无痕
    if headless:
        chrome_options.add_argument("--headless")
    # 代理设置
    if proxies:
        chrome_options.add_argument("--proxy-server=" + proxies.get("https"))
    driver_service = Service('E:/桌面/ChromeDerive104/chromedriver')

    driver = webdriver.Chrome(options=chrome_options, service=driver_service)
    # driver = webdriver.Chrome(options=chrome_options, executable_path="E:/桌面/ChromeDerive104/chromedriver")
    # driver = webdriver.Chrome(options=chrome_options)
    driver.implicitly_wait(5)
    return driver

# 有道翻译主函数
# original_text : 要翻译的文本
# ai_switch ： ai翻译开关，默认关闭
def get_translated_text(original_text, ai_switch=False):
    # 登录后获取的cookie，作用于ai翻译
    cookie = {
        "DICT_SESS": xxx,
        "DICT_LOGIN": xxx,
        "DICT_DOCTRANS_SESSION_ID": xxx
    }

    driver = get_driver(headless=True)
    driver.get('https://fanyi.youdao.com/indexLLM.html#/')
    if ai_switch:
        for name in cookie:
            cookie_dict = {"domain": ".youdao.com", "path": "/", "name": name, "value": cookie[name]}
            driver.add_cookie(cookie_dict=cookie_dict)
        driver.get('https://fanyi.youdao.com/indexLLM.html#/')

    input_box = driver.find_element(By.XPATH, '//*[@id="js_fanyi_input"]')

    input_box.send_keys(original_text)
    output_text = ''
    if ai_switch:
        driver.find_element(By.XPATH, '//*[@id="bottom"]/div/div[2]/span[2]').click()
        while True:
            try:
                driver.find_element(By.XPATH,
                                    '//*[@class="menu-item disabled color_text_5 disabled generating color_text_5"]')
            except Exception:
                output_text = driver.find_element(By.XPATH, '//*[@class="origin-text color_text_1"]').get_attribute(
                    'innerHTML')
                break
            else:
                continue
    else:
        time.sleep(2)
        output_texts = driver.find_elements(By.XPATH, '//*[@id="js_fanyi_output_resultOutput"]/p/span')
        for i in output_texts:
            output_text += i.get_attribute('innerHTML')

    return output_text

if __name__ == '__main__':
    text = get_translated_text(
        "[{'title': 'Acute toxicity', 'conclusion': 'LD50 Oral - Rat - 1.560 mg/kg|| Remarks: Behavioral:Coma.'}, {'title': 'Skin corrosion/irritation', 'conclusion': 'Skin - Rabbit|| Result: Severe skin irritation - 24 h (Draize Test)'}, {'title': 'Serious eye damage/eye irritation', 'conclusion': 'No data available'}, {'title': 'Respiratory or skin sensitisation', 'conclusion': 'No data available'}, {'title': 'Germ cell mutagenicity', 'conclusion': 'No data available'}, {'title': 'Carcinogenicity', 'conclusion': 'IARC: No component of this product present at levels greater than or equal to 0.1% is identified as probable, possible or confirmed human carcinogen by IARC.'}, {'title': 'Reproductive toxicity', 'conclusion': 'No data available'}, {'title': 'Specific target organ toxicity - single exposure', 'conclusion': 'Inhalation - May cause respiratory irritation.'}, {'title': 'Specific target organ toxicity - repeated exposure', 'conclusion': 'No data available'}, {'title': 'Aspiration hazard', 'conclusion': 'No data available'}, {'title': 'Additional Information', 'conclusion': 'RTECS: SL7875000|| To the best of our knowledge, the chemical, physical, and toxicological properties have not been thoroughly investigated.'}, {'title': 'Toxicity', 'conclusion': 'LD50 orally in rats:  1560 mg/kg (Jenner)'}]",ai_switch=True)
    print(text)