想法:
在做开发的时候,经常需要命名各种变量,方法/函数,类,包,库等。
走一遍流程就是:想好要起的名字,比如“非常帅气”;
然后上翻译网站,比如百度翻译,有道翻译;
将中文输入并让其翻译成英文,此时就得出一个“very handsome”的单词;
根据驼峰命名法,我们最后需要得到的字符串是“veryHandsome”。
初步构想实现步骤:
1.做一个GUI界面,这里打算用tkinter(界面比较丑,如果有条件建议用其他GUI库),实现输入和输出
2.做一个爬虫,实现对输入内容进行实时在线翻译并爬取正确的翻译结果。
3.经过对结果的处理,输出到GUI界面
操作:
首先新建一个工程,就叫nameMaster吧。
1.GUI界面先放一边,可以最后再操作,第一步先实现爬虫
爬虫一共有这几个功能:
1)请求并读取url的response内容--->html downloader
2)对获取的response的内容进行解析--->html parser
3)还有爬虫调度器本体--->spider man
================================================================================================
从上面的几张图(谷歌浏览器自带开发者工具F12)就可以看到,关键的东西有几个:
请求url:https://fanyi.baidu.com/v2transapi,这是一直不变的
请求方式:POST,也一样不变
Data:
from:一般都是中文“zh”;
to:一般都是英文“en”,通过与日文“jp”对比
query:就是用户输入的内容了,也就是待翻译内容
transtype:英语的可以看到是translang,然而日文的没有这个信息,再通过与韩语的信息对比,韩语也是translang,也就是说这个信息,默认给上应该也没问题。
获取到的信息:
在获取到的json信息中,只有一个值是我们关心的,即trans_result中的data中的0中的dst的值,也就是翻译的最终结果。
1)HtmlDownloader
import requests
import js2py
import re
class HtmlDownloader(object):
def download(self, input_query):
headers = {
'Cookie': 'BIDUPSID=A360C8CE43082B9E2E23B5B111FC4363; PSTM=1485329732; to_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_PSSID=26524_1466_21124_18560_26350_22158; locale=zh; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1531663222,1531743493; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1531748592; from_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2C%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D; BAIDUID=96A2302A5D66AFBE539FFC68E881260F:FG=1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13076.400',
}
req_get = requests.get(url=r'http://fanyi.baidu.com', headers=headers)
token = re.search(r"token: '(.*?)',", req_get.text, re.S).group(1)
run_js=js2py.EvalJs({})
run_js.execute(self.getJs())
sign=run_js.e(input_query)
data = {
'from': 'zh',
'to': 'en',
'query': input_query,
'transtype': 'translang',
'simple_means_flag': 3,
'sign': sign,
'token': token
}
url = r"https://fanyi.baidu.com/v2transapi"
response = requests.post(url, data=data, headers=headers)
return response.text
def getJs(self):
with open("sign.js", "r", encoding="utf-8") as f:
return f.read()
headers那里因为只写user-agent的话过不了反爬,所以把全部header信息都加上了。
并且,sign的值是会根据输入的查询内容改变而改变的,所以要进行一些计算得到sign值。
下面是计算sign值的js代码,在py脚本中使用js2py模块打开js文件进行操作。
function n(r, o) {
for (var t = 0; t < o.length - 2; t += 3) {
var a = o.charAt(t + 2);
a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
a = "+" === o.charAt(t + 1) ? r >>> a: r << a,
r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
}
return r
}
function e(r) {
var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
if (null === o) {
var t = r.length;
t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr( - 10, 10))
} else {
for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)"" !== e[C] && f.push.apply(f, a(e[C].split(""))),
C !== h - 1 && f.push(o[C]);
var g = f.length;
g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice( - 10).join(""))
}
var u = void 0, i = null;
u = null !== i ? i: (i = "320305.131321201" || "") || "";
for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
var A = r.charCodeAt(v);
128 > A ? S[c++] = A: (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)), S[c++] = A >> 18 | 240, S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224, S[c++] = A >> 6 & 63 | 128), S[c++] = 63 & A | 128)
}
for (var p = m,
F = "+-a^+6",
D = "+-3^+b+-f",
b = 0; b < S.length; b++) p += S[b],
p = n(p, F);
return p = n(p, D),
p ^= s,
0 > p && (p = (2147483647 & p) + 2147483648),
p %= 1e6,
p.toString() + "." + (p ^ m)
}
2)HtmlParser
返回的response.text是一个字符串类型的数据,更准确的说是JSON类型的。
import json
class HtmlParser(object):
def parse(self, string):
data = json.loads(string)
result = data["trans_result"]["data"][0]["dst"]
output = self.dataRefromHump(result)
return output
def dataRefromHump(self,string):
lis = string.split(" ")
new_string = lis[0].lower()
lis.pop(0)
if len(lis):
for i in lis:
new_string += i.capitalize()
return new_string
因为不涉及到提取网页或者其他需要正则表达式的地方,所以网页解析器的代码基本也就两三行。
具体分析一下result = data["trans_result"]["data"][0]["dst"]这一句:
首先,data = json.loads(string),得到的data是以下这么一坨字典:
{'liju_result': {'tag': [], 'double': '[[[["\\u4ed6\\u4eec","w_0","w_0,w_21",0],["\\u4e4c\\u9ed1","w_1","w_1,w_25",0],["\\u95ea\\u4eae","w_2","w_2,w_27",0],["\\u7684","w_3","w_3",0],["\\u5934\\u53d1","w_4","w_4,w_28",0],["\\u3001","w_5","w_5,w_29",0],["\\u6a44\\u6984","w_6","w_6,w_7,w_30",0],["\\u8272","w_7","w_6,w_7,w_30",0],["\\u7684","w_8","w_8",0],["\\u76ae\\u80a4","w_9","w_9,w_31",0],["\\u548c","w_10","w_10,w_32",0],["\\u8ff7\\u4eba","w_11","w_11,w_33",0],["\\u7684","w_12","w_12",0],["\\u68d5\\u8272","w_13","w_13,w_34",0],["\\u773c\\u775b","w_14","w_14,w_15,w_35",0],["\\u4ee4","w_15","w_14,w_15,w_35",0],["\\u4ed6\\u4eec","w_16","w_16,w_21",0],["\\u770b\\u4e0a\\u53bb","w_17","w_17,w_18,w_19,w_22,w_23",0],["\\u975e\\u5e38","w_18","w_17,w_18,w_19,w_22,w_23",1],["\\u5e05\\u6c14","w_19","w_17,w_18,w_19,w_22,w_23",1],["\\u3002","w_20","w_20,w_36",0]],[["They","w_21","w_16,w_21",0," "],["are","w_22","w_17,w_18,w_19,w_22,w_23",0," "],["handsome","w_23","w_17,w_18,w_19,w_22,w_23",0," "],["with","w_24","",0," "],["dark","w_25","w_1,w_25",0],[",","w_26","",0," "],["shining","w_27","w_2,w_27",0," "],["hair","w_28","w_4,w_28",0],[",","w_29","w_5,w_29",0," "],["olive","w_30","w_6,w_7,w_30",0," "],["skin","w_31","w_9,w_31",0," "],["and","w_32","w_10,w_32",0," "],["fine","w_33","w_11,w_33",0," "],["brown","w_34","w_13,w_34",0," "],["eyes","w_35","w_14,w_15,w_35",0],[".","w_36","w_20,w_36",0]],"\\u300a\\u67ef\\u6797\\u65af\\u9ad8\\u9636\\u82f1\\u6c49\\u53cc\\u89e3\\u5b66\\u4e60\\u8bcd\\u5178\\u300b",42107],[[["\\u4ed6","w_74","w_74,w_86",0],["\\u8eab\\u6750","w_75","w_75,w_87,w_88",0],["\\u9ad8\\u5927","w_76","w_76,w_89,w_90",0],["\\uff0c","w_77","w_77,w_91",0],["\\u7a7f\\u7740","w_78","w_78,w_93,w_94",0],["\\u897f\\u670d","w_79","w_79,w_96",0],["\\u6253\\u7740","w_80","w_80,w_97",0],["\\u9886\\u5e26","w_81","w_81,w_98",0],["\\uff0c","w_82","w_82",0],["\\u975e\\u5e38","w_83","w_83,w_84,w_92",1],["\\u5e05\\u6c14","w_84","w_83,w_84,w_92",1],["\\u3002","w_85","w_85,w_99",0]],[["He","w_86","w_74,w_86",0," "],["was","w_87","w_75,w_87,w_88",0," "],["a","w_88","w_75,w_87,w_88",0," "],["big","w_89","w_76,w_89,w_90",0," "],["man","w_90","w_76,w_89,w_90",0],[",","w_91","w_77,w_91",0," "],["smartly","w_92","w_83,w_84,w_92",0," "],["dressed","w_93","w_78,w_93,w_94",0," "],["in","w_94","w_78,w_93,w_94",0," "],["a","w_95","",0," "],["suit","w_96","w_79,w_96",0," "],["and","w_97","w_80,w_97",0," "],["tie","w_98","w_81,w_98",0],[".","w_99","w_85,w_99",0]],"\\u300a\\u67ef\\u6797\\u65af\\u9ad8\\u9636\\u82f1\\u6c49\\u53cc\\u89e3\\u5b66\\u4e60\\u8bcd\\u5178\\u300b",30883],[[["\\u7b2c\\u4e00","w_126","w_126,w_132,w_133",0],["\\uff0c","w_127","w_127,w_134",0],["\\u4ed6","w_128","w_128,w_135",0],["\\u975e\\u5e38","w_129","w_129,w_136,w_137",1],["\\u5e05\\u6c14","w_130","w_130,w_138",1],["\\u3002","w_131","w_131,w_139",0]],[["Number","w_132","w_126,w_132,w_133",0," "],["one","w_133","w_126,w_132,w_133",0],["&","w_134","w_127,w_134",0," "],["he","w_135","w_128,w_135",0],["\'s","w_136","w_129,w_136,w_137",0," "],["very","w_137","w_129,w_136,w_137",0," "],["handsome","w_138","w_130,w_138",0],[".","w_139","w_131,w_139",0]],"http:\\\\/\\\\/www.kekenet.com\\\\/video\\\\/201110\\\\/158708.shtml",977241],[[["\\u4ed6\\u4eec","w_154","w_154,w_174",0],["\\u90fd","w_155","w_155,w_156,w_175",0],["\\u662f","w_156","w_155,w_156,w_175",0],["\\u5f88","w_157","w_157,w_158,w_176",0],["\\u597d","w_158","w_157,w_158,w_176",0],["\\u7684","w_159","w_159,w_190",0],["\\u4eba","w_160","w_160,w_177",0],["\\uff0c","w_161","w_161,w_178,w_179",0],["\\u5f88","w_162","w_162,w_163,w_164,w_180",0],["\\u6709","w_163","w_162,w_163,w_164,w_180",0],["\\u5929\\u8d4b","w_164","w_162,w_163,w_164,w_180",0],["\\u7684","w_165","w_165",0],["\\u6f14\\u5458","w_166","w_166,w_181",0],["\\uff0c","w_167","w_167,w_182",0],["\\u957f","w_168","w_168,w_185",0],["\\u5f97","w_169","w_169,w_187",0],["\\u4e5f","w_170","w_170,w_192",0],["\\u975e\\u5e38","w_171","w_171,w_175",1],["\\u5e05\\u6c14","w_172","w_172,w_191",1],["\\u3002","w_173","w_173,w_193",0]],[["They","w_174","w_154,w_174",0," "],["are","w_175","w_171,w_175",0," "],["great","w_176","w_157,w_158,w_176",0," "],["guys","w_177","w_160,w_177",0],[",","w_178","w_161,w_178,w_179",0," "],["and","w_179","w_161,w_178,w_179",0," "],["talented","w_180","w_162,w_163,w_164,w_180",0," "],["actors","w_181","w_166,w_181",0],[",","w_182","w_167,w_182",0," "],["and","w_183","",0," "],["they","w_184","",0],["\'re","w_185","w_168,w_185",0," "],["not","w_186","",0," "],["too","w_187","w_169,w_187",0," "],["bad","w_188","",0," "],["on","w_189","",0," "],["the","w_190","w_159,w_190",0," "],["eyes","w_191","w_172,w_191",0," "],["either","w_192","w_170,w_192",0],[".","w_193","w_173,w_193",0]],"http:\\\\/\\\\/www.hjenglish.com\\\\/new\\\\/p195520\\\\/",753456],[[["\\u8fd9","w_234","w_234,w_258",0],["\\u5957","w_235","w_235,w_236,w_237,w_238,w_260,w_261",0],["\\u897f\\u88c5","w_236","w_235,w_236,w_237,w_238,w_260,w_261",0],["\\u597d\\u770b","w_237","w_235,w_236,w_237,w_238,w_260,w_261",0],["\\u53c8","w_238","w_235,w_236,w_237,w_238,w_260,w_261",0],["\\u65f6\\u5c1a","w_239","w_239,w_262",0],["\\uff0c","w_240","w_240,w_263,w_264",0],["\\u4e0d\\u7ba1","w_241","w_241,w_242,w_243,w_265,w_266",0],["\\u4f60","w_242","w_241,w_242,w_243,w_265,w_266",0],["\\u76f8\\u4fe1","w_243","w_241,w_242,w_243,w_265,w_266",0],["\\u4e0e","w_244","w_244,w_261",0],["\\u5426","w_245","w_245,w_267,w_268",0],["\\uff0c","w_246","w_246,w_269",0],["\\u7a7f\\u7740","w_247","w_247,w_274",0],["\\u8fd9","w_248","w_248,w_275",0],["\\u5957","w_249","w_249,w_250,w_262",0],["\\u8863\\u670d","w_250","w_249,w_250,w_262",0],["\\u8ba9","w_251","w_251",0],["\\u6211","w_252","w_252,w_270",0],["\\u770b\\u8d77\\u6765","w_253","w_253,w_271",0],["\\u975e\\u5e38","w_254","w_254,w_255,w_272",1],["\\u5e05\\u6c14","w_255","w_254,w_255,w_272",1],["\\u6f47\\u6d12","w_256","w_256,w_273",0],["\\u3002","w_257","w_257,w_276",0]],[["It","w_258","w_234,w_258",0," "],["was","w_259","",0," "],["handsome","w_260","w_235,w_236,w_237,w_238,w_260,w_261",0," "],["and","w_261","w_244,w_261",0," "],["fashionable","w_262","w_249,w_250,w_262",0],[",","w_263","w_240,w_263,w_264",0," "],["and","w_264","w_240,w_263,w_264",0," "],["believe","w_265","w_241,w_242,w_243,w_265,w_266",0," "],["it","w_266","w_241,w_242,w_243,w_265,w_266",0," "],["or","w_267","w_245,w_267,w_268",0," "],["not","w_268","w_245,w_267,w_268",0],[",","w_269","w_246,w_269",0," "],["I","w_270","w_252,w_270",0," "],["looked","w_271","w_253,w_271",0," "],["pretty","w_272","w_254,w_255,w_272",0," "],["elegant","w_273","w_256,w_273",0," "],["in","w_274","w_247,w_274",0," "],["it","w_275","w_248,w_275",0],[".","w_276","w_257,w_276",0]],"http:\\\\/\\\\/www.joyen.net\\\\/article\\\\/listen\\\\/2\\\\/201008\\\\/3118.html",4449591],[[["\\u800c","w_320","w_320,w_330",0],["\\u6709\\u4e9b","w_321","w_321,w_331",0],["\\u662f","w_322","w_322,w_332",0],["\\u975e\\u5e38","w_323","w_323,w_333",1],["\\u5e05\\u6c14","w_324","w_324,w_334",1],["\\u7684","w_325","w_325",0],["\\u5192\\u9669","w_326","w_326,w_327,w_328,w_335",0],["\\u4e3b\\u4e49","w_327","w_326,w_327,w_328,w_335",0],["\\u8005","w_328","w_326,w_327,w_328,w_335",0],["\\uff01","w_329","w_329,w_336",0]],[["And","w_330","w_320,w_330",0," "],["some","w_331","w_321,w_331",0," "],["are","w_332","w_322,w_332",0," "],["remarkably","w_333","w_323,w_333",0," "],["handsome","w_334","w_324,w_334",0," "],["adventurers","w_335","w_326,w_327,w_328,w_335",0],[".","w_336","w_329,w_336",0]],"provided by jukuu",3091036],[[["\\u4ed6","w_354","w_354,w_372",0],["\\u957f","w_355","w_355,w_356,w_373",0],["\\u5f97","w_356","w_355,w_356,w_373",0],["\\u975e\\u5e38","w_357","w_357,w_374",0],["\\u3001","w_358","w_358,w_375",0],["\\u975e\\u5e38","w_359","w_359,w_376",1],["\\u5e05\\u6c14","w_360","w_360,w_377,w_378",1],["\\uff0c","w_361","w_361,w_379",0],["\\u4eba","w_362","w_362",0],["\\u4e5f","w_363","w_363",0],["\\u5f88","w_364","w_364,w_380",0],["\\u806a\\u660e","w_365","w_365,w_381",0],["\\uff0c","w_366","w_366,w_382",0],["\\u7eb3\\u4ec0","w_367","w_367,w_384",0],["\\u592b\\u4eba","w_368","w_368,w_383",0],["\\u544a\\u8bc9","w_369","w_369,w_385",0],["\\u5a1c\\u8428","w_370","w_370,w_386,w_387",0],["\\u3002","w_371","w_371,w_388",0]],[["He","w_372","w_354,w_372",0," "],["was","w_373","w_355,w_356,w_373",0," "],["very","w_374","w_357,w_374",0],[",","w_375","w_358,w_375",0," "],["very","w_376","w_359,w_376",0," "],["good","w_377","w_360,w_377,w_378",0," "],["looking","w_378","w_360,w_377,w_378",0],[",","w_379","w_361,w_379",0," "],["very","w_380","w_364,w_380",0," "],["intelligent","w_381","w_365,w_381",0],[",","w_382","w_366,w_382",0," "],["Mrs.","w_383","w_368,w_383",0," "],["Nash","w_384","w_367,w_384",0," "],["told","w_385","w_369,w_385",0," "],["Ms.","w_386","w_370,w_386,w_387",0," "],["Nasar","w_387","w_370,w_386,w_387",0],[".","w_388","w_371,w_388",0]],"http:\\\\/\\\\/www.okread.info\\\\/page\\\\/read.php?ys=3&d=201505&q=26942",859567],[[["\\u90a3","w_424","w_424,w_442",0],["\\u662f","w_425","w_425,w_443",0],["\\u4e00","w_426","w_426,w_446",0],["\\u4f4d","w_427","w_427,w_449",0],["\\u7a7f\\u7740","w_428","w_428,w_454",0],["\\u9ed1\\u8272","w_429","w_429,w_455",0],["\\u4e0a\\u8863","w_430","w_430,w_456",0],["\\u3001","w_431","w_431,w_457",0],["\\u6761\\u7eb9","w_432","w_432,w_458",0],["\\u897f\\u88e4","w_433","w_433,w_449",0],["\\uff0c","w_434","w_434,w_445",0],["\\u957f\\u76f8","w_435","w_435,w_436,w_449",0],["\\u975e\\u5e38","w_436","w_435,w_436,w_449",1],["\\u5e05\\u6c14","w_437","w_437,w_447",1],["\\u7684","w_438","w_438",0],["\\u65e5\\u672c","w_439","w_439,w_444",0],["\\u5c0f\\u4f19\\u5b50","w_440","w_440,w_450,w_451",0],["\\u3002","w_441","w_441,w_460",0]],[["He","w_442","w_424,w_442",0," "],["was","w_443","w_425,w_443",0," "],["Japanese","w_444","w_439,w_444",0],[",","w_445","w_434,w_445",0," "],["a","w_446","w_426,w_446",0," "],["pretty","w_447","w_437,w_447",0," "],["and","w_448","",0," "],["delicate-looking","w_449","w_435,w_436,w_449",0," "],["young","w_450","w_440,w_450,w_451",0," "],["man","w_451","w_440,w_450,w_451",0," "],["kitted","w_452","",0," "],["out","w_453","",0," "],["in","w_454","w_428,w_454",0," "],["black","w_455","w_429,w_455",0," "],["jacket","w_456","w_430,w_456",0," "],["and","w_457","w_431,w_457",0," "],["striped","w_458","w_432,w_458",0," "],["trousers","w_459","",0],[".","w_460","w_441,w_460",0]],"http:\\\\/\\\\/blog.sina.com.cn\\\\/s\\\\/blog_53b0ddf00100euxd.html",4633441],[[["\\u5c3d\\u7ba1","w_498","w_498,w_536",0],["\\u65f6\\u5c1a","w_499","w_499,w_538",0],["\\u5708","w_500","w_500,w_539,w_540",0],["\\u5bf9","w_501","w_501,w_541",0],["\\u4ed6","w_502","w_502,w_542",0],["\\u59bb\\u5b50","w_503","w_503,w_543",0],["\\u81ea\\u5df1","w_504","w_504,w_547,w_548",0],["\\u8bbe\\u8ba1","w_505","w_505,w_549",0],["\\u7684","w_506","w_506,w_544",0],["\\u88d9\\u5b50","w_507","w_507,w_545",0],["\\u5341\\u5206","w_508","w_508,w_509,w_551",0],["\\u63a8\\u5d07","w_509","w_508,w_509,w_551",0],["\\uff0c","w_510","w_510,w_550",0],["\\u4f46\\u662f","w_511","w_511,w_536",0],["\\u8d1d\\u683c","w_512","w_512,w_513,w_551",0],["\\u6c49\\u59c6","w_513","w_512,w_513,w_551",0],["\\u7684","w_514","w_514",0],["\\u98ce\\u5934","w_515","w_515,w_556",0],["\\u4e5f","w_516","w_516",0],["\\u5e76","w_517","w_517,w_552",0],["\\u6ca1\\u6709","w_518","w_518,w_553",0],["\\u88ab\\u76d6","w_519","w_519,w_557",0],["\\u8fc7","w_520","w_520,w_563",0],["\\uff0c","w_521","w_521,w_558",0],["\\u800c\\u662f","w_522","w_522,w_557",0],["\\u8eab\\u7a7f","w_523","w_523,w_564",0],["\\u4e00","w_524","w_524,w_554",0],["\\u8eab","w_525","w_525,w_567",0],["\\u767d\\u8272","w_526","w_526,w_566",0],["\\u71d5\\u5c3e","w_527","w_527,w_528,w_567,w_568",0],["\\u670d","w_528","w_527,w_528,w_567,w_568",0],["\\uff0c","w_529","w_529",0],["\\u4e00\\u5982\\u65e2\\u5f80","w_530","w_530,w_560",0],["\\u7684","w_531","w_531",0],["\\u770b\\u8d77\\u6765","w_532","w_532,w_559",0],["\\u975e\\u5e38","w_533","w_533,w_534,w_561",1],["\\u5e05\\u6c14","w_534","w_533,w_534,w_561",1],["\\u3002","w_535","w_535,w_569",0]],[["Although","w_536","w_511,w_536",0," "],["the","w_537","",0," "],["fashion","w_538","w_499,w_538",0," "],["world","w_539","w_500,w_539,w_540",0," "],["raved","w_540","w_500,w_539,w_540",0," "],["about","w_541","w_501,w_541",0," "],["his","w_542","w_502,w_542",0," "],["wife","w_543","w_503,w_543",0],["\'s","w_544","w_506,w_544",0," "],["dress","w_545","w_507,w_545",0," "],["of","w_546","",0," "],["her","w_547","w_504,w_547,w_548",0," "],["own","w_548","w_504,w_547,w_548",0," "],["design","w_549","w_505,w_549",0],[",","w_550","w_510,w_550",0," "],["Becks","w_551","w_512,w_513,w_551",0," "],["was","w_552","w_517,w_552",0," "],["not","w_553","w_518,w_553",0," "],["one","w_554","w_524,w_554",0," "],["to","w_555","",0," "],["be","w_556","w_515,w_556",0," "],["outdone","w_557","w_522,w_557",0," "],["and","w_558","w_521,w_558",0," "],["looked","w_559","w_532,w_559",0," "],["as","w_560","w_530,w_560",0," "],["handsome","w_561","w_533,w_534,w_561",0," "],["as","w_562","",0," "],["ever","w_563","w_520,w_563",0," "],["in","w_564","w_523,w_564",0," "],["a","w_565","",0," "],["white","w_566","w_526,w_566",0," "],["tuxedo","w_567","w_527,w_528,w_567,w_568",0," "],["jacket","w_568","w_527,w_528,w_567,w_568",0],[".","w_569","w_535,w_569",0]],"http:\\\\/\\\\/www.hjenglish.com\\\\/new\\\\/p611467\\\\/",721016],[[["\\u5b83","w_642","w_642,w_666",0],["\\u7684","w_643","w_643",0],["\\u5916\\u5f62","w_644","w_644,w_667",0],["\\u662f","w_645","w_645,w_668",0],["\\u6700","w_646","w_646,w_670",0],["\\u6d41\\u884c","w_647","w_647,w_671",0],["\\u7684","w_648","w_648",0],["\\u8d5b\\u8f66","w_649","w_649,w_672",0],["\\u578b","w_650","w_650,w_673",0],["\\uff0c","w_651","w_651,w_674",0],["\\u975e\\u5e38","w_652","w_652,w_675,w_676",1],["\\u5e05\\u6c14","w_653","w_653,w_677",1],["\\uff0c","w_654","w_654,w_678",0],["\\u662f","w_655","w_655,w_679",0],["\\u5f53\\u4ee3","w_656","w_656,w_680",0],["\\u65f6\\u5c1a","w_657","w_657,w_681",0],["\\u7684","w_658","w_658",0],["\\u5c0f","w_659","w_659,w_660,w_682",0],["\\u9752\\u5e74","w_660","w_659,w_660,w_682",0],["\\u65c5\\u6e38","w_661","w_661,w_683",0],["\\u7684","w_662","w_662",0],["\\u5fc5\\u5907","w_663","w_663,w_684",0],["\\u4f73\\u54c1","w_664","w_664,w_686",0],["\\u3002","w_665","w_665,w_687",0]],[["Its","w_666","w_642,w_666",0," "],["appearance","w_667","w_644,w_667",0," "],["is","w_668","w_645,w_668",0," "],["the","w_669","",0," "],["most","w_670","w_646,w_670",0," "],["popular","w_671","w_647,w_671",0," "],["game","w_672","w_649,w_672",0," "],["models","w_673","w_650,w_673",0],[",","w_674","w_651,w_674",0," "],["is","w_675","w_652,w_675,w_676",0," "],["very","w_676","w_652,w_675,w_676",0," "],["handsome","w_677","w_653,w_677",0],[",","w_678","w_654,w_678",0," "],["is","w_679","w_655,w_679",0," "],["contemporary","w_680","w_656,w_680",0," "],["fashionable","w_681","w_657,w_681",0," "],["young","w_682","w_659,w_660,w_682",0," "],["tourist","w_683","w_661,w_683",0," "],["necessary","w_684","w_663,w_684",0," "],["to","w_685","",0," "],["taste","w_686","w_664,w_686",0],[".","w_687","w_665,w_687",0]],"http:\\\\/\\\\/www.268r.com\\\\/yingyuzuowen\\\\/3174.html",3637381],[[["\\u6211","w_734","w_734,w_748",0],["\\u975e\\u5e38","w_735","w_735,w_751,w_752",1],["\\u559c\\u6b22","w_736","w_736,w_749",0],["\\u4ed6","w_737","w_737,w_750",0],["\\uff0c","w_738","w_738,w_753",0],["\\u4ed6","w_739","w_739,w_750",0],["\\u5904\\u5904","w_740","w_740,w_741,w_754,w_755",0],["\\u90fd","w_741","w_740,w_741,w_754,w_755",0],["\\u6563\\u53d1","w_742","w_742,w_757",0],["\\u7740","w_743","w_743",0],["\\u7075\\u6c14","w_744","w_744,w_757",0],["\\u548c","w_745","w_745,w_758",0],["\\u5e05\\u6c14","w_746","w_746,w_759",1],["\\u3002","w_747","w_747,w_760",0]],[["I","w_748","w_734,w_748",0," "],["like","w_749","w_736,w_749",0," "],["him","w_750","w_739,w_750",0," "],["so","w_751","w_735,w_751,w_752",0," "],["much","w_752","w_735,w_751,w_752",0],[",","w_753","w_738,w_753",0," "],["every","w_754","w_740,w_741,w_754,w_755",0," "],["bit","w_755","w_740,w_741,w_754,w_755",0," "],["as","w_756","",0," "],["clever","w_757","w_744,w_757",0," "],["and","w_758","w_745,w_758",0," "],["handsome","w_759","w_746,w_759",0],[".","w_760","w_747,w_760",0]],"provided by jukuu",2140233],[[["\\u8fd9\\u6837","w_788","w_788,w_823",0],["\\u7684","w_789","w_789",0],["\\u7537\\u4eba","w_790","w_790,w_825",0],["\\uff0c","w_791","w_791,w_826",0],["\\u4e0d","w_792","w_792,w_827",0],["\\u4e00\\u5b9a","w_793","w_793,w_828,w_829",0],["\\u975e\\u5e38","w_794","w_794,w_830",1],["\\u82f1\\u4fca","w_795","w_795,w_831,w_832",0],["\\uff0c","w_796","w_796,w_833",0],["\\u4e5f\\u8bb8","w_797","w_797,w_834",0],["\\u53ea\\u6709","w_798","w_798,w_835",0],["\\u51e0\\u5206","w_799","w_799,w_836,w_837",0],["\\u53ef\\u4ee5","w_800","w_800,w_801,w_841",0],["\\u8ba9","w_801","w_800,w_801,w_841",0],["\\u81ea\\u5df1","w_802","w_802,w_839",0],["\\u81ea\\u4fe1","w_803","w_803,w_840",0],["\\u7684","w_804","w_804,w_842",0],["\\u5e05\\u6c14","w_805","w_805,w_843",1],["\\u548c","w_806","w_806,w_844",0],["\\u9b45\\u529b","w_807","w_807,w_845",0],["\\uff0c","w_808","w_808,w_846",0],["\\u4f46\\u662f","w_809","w_809,w_847",0],["\\u4e00\\u5b9a","w_810","w_810,w_848,w_849",0],["\\u6709","w_811","w_811,w_850",0],["\\u7740","w_812","w_812,w_813,w_851",0],["\\u4e30\\u5bcc","w_813","w_812,w_813,w_851",0],["\\u7684","w_814","w_814",0],["\\u60c5\\u611f","w_815","w_815,w_853",0],["\\uff0c","w_816","w_816,w_854",0],["\\u6e29\\u67d4","w_817","w_817,w_855",0],["\\u800c","w_818","w_818,w_856",0],["\\u7ec6\\u817b","w_819","w_819,w_857",0],["\\u7684","w_820","w_820",0],["\\u5185\\u6db5","w_821","w_821,w_858",0],["\\u3002","w_822","w_822,w_859",0]],[["Such","w_823","w_788,w_823",0," "],["a","w_824","",0," "],["man","w_825","w_790,w_825",0],[",","w_826","w_791,w_826",0," "],["not","w_827","w_792,w_827",0," "],["necessarily","w_828","w_793,w_828,w_829",0," "],["a","w_829","w_793,w_828,w_829",0," "],["very","w_830","w_794,w_830",0," "],["handsome","w_831","w_795,w_831,w_832",0," "],["man","w_832","w_795,w_831,w_832",0],[",","w_833","w_796,w_833",0," "],["perhaps","w_834","w_797,w_834",0," "],["only","w_835","w_798,w_835",0," "],["a","w_836","w_799,w_836,w_837",0," "],["fraction","w_837","w_799,w_836,w_837",0," "],["of","w_838","",0," "],["their","w_839","w_802,w_839",0," "],["self-confidence","w_840","w_803,w_840",0," "],["allows","w_841","w_800,w_801,w_841",0," "],["the","w_842","w_804,w_842",0," "],["handsome","w_843","w_805,w_843",0," "],["and","w_844","w_806,w_844",0," "],["charming","w_845","w_807,w_845",0],[",","w_846","w_808,w_846",0," "],["but","w_847","w_809,w_847",0," "],["it","w_848","w_810,w_848,w_849",0," "],["certainly","w_849","w_810,w_848,w_849",0," "],["is","w_850","w_811,w_850",0," "],["rich","w_851","w_812,w_813,w_851",0," "],["in","w_852","",0," "],["emotion","w_853","w_815,w_853",0],[",","w_854","w_816,w_854",0," "],["tender","w_855","w_817,w_855",0," "],["and","w_856","w_818,w_856",0," "],["delicate","w_857","w_819,w_857",0," "],["content","w_858","w_821,w_858",0],[".","w_859","w_822,w_859",0]],"http:\\\\/\\\\/bodguard.blog.163.com\\\\/blog\\\\/static\\\\/9308491720092215285945",4917775]]', 'single': ''}, 'trans_result': {'type': 2, 'data': [{'prefixWrap': 0, 'src': '非常帅气', 'relation': [], 'dst': 'Very handsome', 'result': [[0, 'Very handsome', ['0|12'], [], ['0|12'], ['0|13']]]}], 'domain': 'all', 'status': 0, 'from': 'zh', 'keywords': [{'means': ['handsome', 'dashing'], 'word': '帅气'}], 'to': 'en'}, 'dict_result': [], 'logid': 2678879425}
这个字典分别有'logid', 'liju_result', 'dict_result', 'trans_result'四个键,而拉到最后我们会发现,我们需要的东西只在'trans_result'里,所以其他东西可以不要了,打印一下data['trans_result']:
{'data': [{'src': '非常帅气', 'relation': [], 'prefixWrap': 0, 'result': [[0, 'Very handsome', ['0|12'], [], ['0|12'], ['0|13']]], 'dst': 'Very handsome'}], 'domain': 'all', 'from': 'zh', 'type': 2, 'to': 'en', 'status': 0, 'keywords': [{'means': ['handsome', 'dashing'], 'word': '帅气'}]}
出来的结果又是一个字典,包含'status', 'from', 'data', 'domain', 'to', 'type', 'keywords'七个键,而我们需要的东西在“data”中,所以又有print(data["trans_result"]["data"]):
[{'src': '非常帅气', 'dst': 'Very handsome', 'prefixWrap': 0, 'relation': [], 'result': [[0, 'Very handsome', ['0|12'], [], ['0|12'], ['0|13']]]}]
得到的是一个列表,列表里只有一个元素,这个元素是一个字典,我们需要里面的“dst”键对应的值。
所以最后总结为result = data["trans_result"]["data"][0]["dst"],最终传入“非常帅气”的输出结果就是:Very handsome,然后通过dataRefrom方法给重新做成驼峰命名法的形态输出。
3)SpiderMan
爬虫的最后一步就是爬虫调度器了,爬虫调度器,在没有做GUI之前,先用input和print代替GUI的输入和输出
class SpiderMan(object):
def __init__(self):
self.downloader = HtmlDownloader()
self.parser = HtmlParser()
def translate(self):
# 获取用户输入内容
string = input("请输入要翻译的内容:")
text = self.downloader.download(string)
result = self.parser.parse(text)
print(result)
测试一下:
if __name__ == "__main__":
spider = SpiderMan()
spider.translate()
运行结果:
请输入要翻译的内容:非常帅气
veryHandsome
基本爬虫部分就差不多是这样了,接下来就是要操心GUI界面的时候了。
2.GUI实现输入输出
from tkinter import *
import datetime
class GUI(object):
def __init__(self):
# 创建主窗体,设置标题
self.window = Tk()
self.window.title(string="NameMaster")
self.window.maxsize(600, 500)
self.window.minsize(600, 500)
# 创建框架,用于存放控件并分割区域
self.frame1 = Frame(self.window, relief=RAISED, borderwidth=2, width=500, height=250)
self.frame1.place(x=0, y=0)
self.frame2 = Frame(self.window, relief=RAISED, borderwidth=2, width=500, height=250)
self.frame2.place(x=0, y=250)
self.frame3 = Frame(self.window, relief=RAISED, borderwidth=2, width=200, height=500)
self.frame3.place(x=500, y=0)
# 放置标签控件
self.label1 = Label(self.frame1, text="请填写需要翻译的内容:")
self.label1.place(x=10, y=0)
self.label2 = Label(self.frame2, text="以下为翻译结果:")
self.label2.place(x=10, y=0)
self.label3 = Label(self.frame2, width=65, height=10, borderwidth=4, relief=RIDGE)
self.label3.place(x=15, y=25)
self.label4 = Label(self.frame2)
self.label4.place(x=10, y=215)
# 创建一个文本输入框
self.textbox = Text(self.frame1, width=65, height=15, borderwidth=4, relief=RIDGE)
self.textbox.place(x=15, y=25)
# 创建一个按钮
self.button = Button(self.frame3, text="翻译", width=9, height=2)
self.button.place(x=10, y=210)
# textbox绑定回车键事件
self.textbox.bind("<Return>", self.handleEnterEvent)
# 按钮绑定鼠标左键单击事件
self.button.bind("<ButtonPress-1>", self.handleLeftButtonPressEvent)
# 创建spiderMan对象
self.spider = SpiderMan()
# 运行窗口
self.window.mainloop()
def handleEnterEvent(self, event):
'''
textbox内回车键敲下事件函数
:param event:
:return:
'''
data = self.textbox.get(0.0, END)
# 去除内容最后的回车和空格,以显得美观
for i in range(len(data) - 1, -1, -1):
if data[i] in ["\n", "\r", " "]:
if i == 0:
return
continue
else:
data = data[0:i + 1]
break
content = self.spider.translate(data)
self.label4["text"] = "翻译完成:" + datetime.datetime.now().strftime("%H : %M : %S") + " ."
self.label3["text"] = content
def handleLeftButtonPressEvent(self, event):
'''
鼠标左键单击button按钮事件函数
:param event:
:return:
'''
data = self.textbox.get(0.0, END)
if len(data) == 0:
return
# 去除内容最后的回车和空格,以显得美观
for i in range(len(data) - 1, -1, -1):
if data[i] in ["\n", "\r", " "]:
if i == 0:
return
continue
else:
data = data[0:i + 1]
break
content = self.spider.translate(data)
self.label4["text"] = "翻译完成:" + datetime.datetime.now().strftime("%H : %M : %S") + " ."
self.label3["text"] = content
所以只要这样一下
if __name__ == "__main__":
window = GUI()
就能打开运行窗口,在上方文本框输入需要翻译的内容,敲回车或者点击翻译按钮即可进行翻译
【扩展】
扩展功能:
1.翻译完成后直接复制到剪切板
2.添加单选框,提供多种翻译后的Reformat格式(如标准,驼峰,下划线,全小写,全大写,开头全大写)
扩展功能实现
1.翻译完成后直接复制到剪切板
导入剪切板模块win32clipboard
在GUI类里添加方法:
def clipBoard(self, content):
win32clipboard.OpenClipboard()
win32clipboard.EmptyClipboard()
win32clipboard.SetClipboardText(content)
win32clipboard.CloseClipboard()
在回车事件以及键盘事件最后调用:
如此一来就能把内容弄到剪切板上去,在其他地方鼠标右键粘贴或者Ctrl+v就可以了。
2.添加单选框,提供多种翻译后的Reformat格式
首先GUI类构造方法中要加上单选选项:
# 创建单选按钮
self.options = ["hump", "standard", "under", "lower", "upper", "cap"]
self.var = IntVar()
Radiobutton(self.frame3, text="标准", variable=self.var, value=1).place(x=10, y=300)
Radiobutton(self.frame3, text="驼峰", variable=self.var, value=0).place(x=10, y=330)
Radiobutton(self.frame3, text="下划线", variable=self.var, value=2).place(x=10, y=360)
Radiobutton(self.frame3, text="全小写", variable=self.var, value=3).place(x=10, y=390)
Radiobutton(self.frame3, text="全大写", variable=self.var, value=4).place(x=10, y=420)
Radiobutton(self.frame3, text="开头全大写", variable=self.var, value=5).place(x=10, y=450)
专门负责输出内容格式处理的HtmlParser就要改成这样子:
class HtmlParser(object):
def parse(self, string, mode):
data = json.loads(string)
result = data["trans_result"]["data"][0]["dst"]
if mode == "standard":
return result
elif mode == "hump":
output = self.dataRefromHump(result)
return output
elif mode == "under":
output = self.dataRefromUnder(result)
return output
elif mode == "lower":
output = self.dataRefromLower(result)
return output
elif mode == "upper":
output = self.dataRefromUpper(result)
return output
else:
output = self.dataRefromCap(result)
return output
def dataRefromHump(self, string):
lis = string.split(" ")
new_string = lis[0].lower()
lis.pop(0)
if len(lis):
for i in lis:
new_string += i.capitalize()
return new_string
def dataRefromUnder(self, string):
lis = string.split(" ")
new_string = lis[0].lower()
lis.pop(0)
if len(lis):
for i in lis:
new_string += "_" + i
return new_string
def dataRefromLower(self, string):
lis = string.split(" ")
new_string = ""
if len(lis):
for i in lis:
new_string += i.lower()
return new_string
def dataRefromUpper(self, string):
lis = string.split(" ")
new_string = ""
if len(lis):
for i in lis:
new_string += i.upper()
return new_string
def dataRefromCap(self, string):
lis = string.split(" ")
new_string = ""
if len(lis):
for i in lis:
new_string += i.capitalize()
return new_string
增加了各种处理方法,以及在parse的输入参数中加了一个mode参数,用于识别形态
所以,spiderMan类中的输入参数,以及回车事件和按钮事件调用spiderMan对象时,都要加上这个mode参数
这样子就基本大功告成啦,下面是效果一览:
以上,就是整个程序的所有
【如有任何好的建议或者意见,欢迎在评论区评论】