这次更新支持了短文的查询
下面是过程:
首先找到一篇短文:
Today, my mother has something to do, so she needs to meet her friends. I make a promise that I will clean the house when she returns. But I become lazy soon. I start to play computer games and forget my promise. When I remember, I start my work quickly. I feel so bad to almost let my mother down. Luckily, I finish my job and my mother feels happy.
放到youdao翻译网站上,生成的url是:
仔细对比网站生成的url会发现,它把所有的空格换成了"%20",所有的','换成了"%2C"
因此我们简单的跑一个for循环,就能生成符合条件的url
此时对比生成的url和网站的url
跑一个c++程序进行字符串判断
#include<iostream>
#include<cstring>
#include<cstdio>
#include<algorithm>
#include<vector>
#include<set>
#include<map>
#include<queue>
#include<stack>
#include<cmath>
#define ll long long
#define mod 1000000007
#define inf 0x3f3f3f3f
using namespace std;
int main()
{
char a[1005], b[1005];
scanf("%s%s", a, b);
bool flag = 1;
for(int i = 0; i < strlen(a); i ++)
{
if(a[i] != b[i])
{
printf("%d\n", i);
flag = 0;
break;
}
}
if(flag) printf("YES\n");
else printf("NO\n");
return 0;
}
输出YES!表示猜想正确,url完全一致
接下来阅读页面的HTML码
很容易我们就找到了位于<div id = "fanyiToggle"><p>标签下的有效信息,然后我们就可以利用bs4进行筛选和爬取了
以下是源码,封装成了函数,整体容错率较高,但是不支持空白行信息的查询
基本上这个自己制作的第一个小爬虫就更新到这里了,也很满意了,可以完善的就是多种语言互译了,这个以后看心情再搞吧
感谢youdao翻译!!!仅供学习,禁止用于商业
import requests
from bs4 import BeautifulSoup
import bs4
url1 = "http://www.youdao.com/w/"
url2 = "/#keyfrom=dict2.top"
kv = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0' }
def getword(url): #获取单词的HTML文本
try:
r = requests.get(url, headers = kv)
r.raise_for_status()
text = r.text
return text
except:
print("单词翻译失败")
return ""
def printword(text): #输出单词的有效信息
soup = BeautifulSoup(text, "html.parser")
for word in soup.find('div', id="phrsListTab").children:
if isinstance(word, bs4.element.Tag):
tds = word('li')
if (tds):
for i in range(len(tds)):
print(tds[i].contents[0])
def getarticleurl(url): #获取短文的有效url链接
text = ""
for ch in url:
if ch == ' ':
text = text + "%20"
elif ch == ',':
text = text + "%2C"
else:
text = text + ch
# url = url1 + text + url2
return url1 + text + url2
def getarticle(url): #获取短文的HTML文本
try:
r = requests.get(url, headers=kv)
r.raise_for_status()
text = r.text
return text
except:
print("短文翻译失败")
def printarticle(text): #输出短文的有效信息
soup = BeautifulSoup(text, "html.parser")
for word in soup.find('div', id="fanyiToggle").children:
if isinstance(word, bs4.element.Tag):
tds = word('p')
if (tds):
# for i in range(len(tds)):
print(tds[1].contents[0])
def main(): #主函数
print("\n")
print("查询单词请按1, 查询短文请按2(请输入成段信息,不要出现空白行)")
key = int(input())
if key == 1:
print("请输入想翻译的单词:")
printword(getword(url1 + input() + url2))
elif key == 2:
print("请输入想翻译的短文:")
printarticle(getarticle(getarticleurl(input())))
else:
print("请输入合法的指令(1 or 2)!")
while 1:
main()