本文主要就是对比用python写的一个谷歌翻译的爬虫,分享一下MATLAB代码,主要使用的函数就是urlread和regexp。
先看一下效果:
代码是这个样子的(两个文件一个主文件:Google_translate.m)一个函数文件Translate_mean.m:我用的全局变量传递参数所以都没有输入。这个是主函数Google_translate.m的内容:
function Google_translate
global Source_first Translate_two Source_Content Translate_Content
global Langurage
fh = figure('name','Google Translate');
Langurage = {'zh-CN','en','ja','fr','de'};
Source_first = uicontrol('parent', fh, 'style', 'popupmenu',...
'units','normalized',...
'position',[0.4, 0.7 0.1, 0.1],...
'string',Langurage);
Translate_two = uicontrol('parent', fh, 'style', 'popupmenu',...
'units','normalized',...
'position',[0.55, 0.7 0.1, 0.1],...
'string',Langurage);
Source_Content = uicontrol('parent', fh, 'style', 'edit',...
'units','normalized',...
'position',[0.1, 0.1, 0.4, 0.6],...
'HorizontalAlignment','left',...
'Max',2);
% set( Source_Content )
Translate_Content = uicontrol('parent', fh, 'style', 'edit',...
'units','normalized',...
'position',[0.55, 0.1, 0.4, 0.6],...
'HorizontalAlignment','left',...
'BackgroundColor',[1,1,1],...
'Max',2);
% set( Translate_Content )
Translate = uicontrol('parent', fh, 'style', 'pushbutton',...
'units','normalized',...
'position',[0.1, 0.74, 0.2, 0.07],...
'string','Translate',...
'callback','Translate_mean');
uicontrol('parent',fh, 'style','text','units','normalized',...
'position',[0.23, 0.84, 0.6, 0.1],...
'string','Google translation API using',...
'fontsize',18)
这个是button调用的小函数Translate_mean.m的内容:
function Translate_mean
%{
Source = 'en';
Totrans = 'cn';
ANS_url = urlread(['http://translate.google.cn/translate_a/single?client=gtx&sl=',Source, '&tl=', Totrans, '&dt=t&q=googleq=how are you']);
%}
global Source_first Translate_two Source_Content Translate_Content
global Langurage
H1 = get(Source_first, 'value');
H2 = get(Translate_two, 'value');
Content_source = get(Source_Content,'string');
Source = Langurage{H1};
Totrans = Langurage{H2};
Content = Content_source;
Content(Content=='。') = '.';
LEN = size(Content,1);
TransContent = cell(LEN,1);
for ihang = 1:LEN
% string to UTF-8
Str2 = dec2hex( unicode2native(Content(ihang,:), 'UTF-8') );
Allstr = [];
for i = 1:length(Str2)
Allstr = [ Allstr,'%', Str2(i,:) ];
end
ANS_url = urlread(['http://translate.google.cn/translate_a/single?client=gtx&sl=',...
Source, '&tl=', Totrans, '&dt=t&q=googleq=', Allstr, '&ie=UTF-8']);
Tl = regexp( ANS_url, '[[["[gG]oogle.*u003d (.*)","google', 'tokens' );
TransContent{ihang} = Tl{1,1}{1,1};
end
set(Translate_Content, 'string', TransContent)
以上就是如何实现最开始的效果图的哪个功能。我下面给一个小代码是用来普通测试,测试过了之后就可以再去调试上面两个小函数:(有一点注释讲究着看吧)
clear;clc
Source = 'zh-CN';
% Source = 'en';
Totrans = 'en';
Str = '你是个沙雕额';
% string to UTF-8
Str2 = dec2hex( unicode2native(Str, 'UTF-8') );
Allstr = [];
for i = 1:length(Str2)
Allstr = [ Allstr,'%', Str2(i,:), ];
end
% Str = '%E6%B1%89';
% sl = source language
% tl = translate language
% ie = input format
% oe = output format
ANS_urlr = urlread(['http://translate.google.cn/translate_a/single?client=gtx&sl=',...
Source, '&tl=', Totrans, '&dt=t&q=googleq=',Allstr,'&ie=UTF-8']);
Tl = regexp( ANS_urlr, '[[["[gG]oogle.*u003d (.*)","google', 'tokens' );
result = Tl{1,1}{1,1}
% Tl{1,1}{1,1}
% &oe=UTF-8
% &ie=UTF-8
MATLAB的爬虫比python的似乎要方便。
就是感觉很粗暴,直接一个网址。但是想了想简单的东西一般功能就少一点吧。