最近终于空下来了,所以下个Ruby玩玩,安装Ruby很简单,去
官网下载一个
一键安装包既可,linux下的安装,大家Google下就有很多教程了.对于IDE网上说NetBeans支持得很完美,但是因为本人比较喜欢Eclipse,所以还是跟大家推荐
EasyEclipse for Ruby and Rails,当然你可以选择只下RoR的插件而不弄个全新的Eclipse.
以前一直在用Java写爬虫工具抓图片,对HttpClient包装,正则表达式处理那个是累啊,就算弄好了工具类,有时候一会又想不起来放哪儿,但Ruby对方面包装的就很强大,短短几十行代码就搞定了这一切:
页面获取和文件下载的方法.
抓取图片的具体应用:
以前一直在用Java写爬虫工具抓图片,对HttpClient包装,正则表达式处理那个是累啊,就算弄好了工具类,有时候一会又想不起来放哪儿,但Ruby对方面包装的就很强大,短短几十行代码就搞定了这一切:
页面获取和文件下载的方法.
util
.
rb
:
require ' net/http '
def query_url(url)
return Net :: HTTP . get(URI . parse(url));
end
def save_url(url , dir , filename)
filename = url[url . rindex ( ' / ' ) + 1 , url . length - 1 ] if filename == nil || filename . empty ?
require ' open-uri '
Dir . mkdir ( " #{dir} " ) if dir != nil && ! dir . empty ? && ! FileTest . exist ? (dir)
open (url) do | fin |
if true
File . new( " #{dir}#{filename} " , " wb " ) . close
open ( " #{dir}#{filename} " , " wb " ) do | fout |
while buf = fin . read ( 1024 ) do
fout . write buf
STDOUT . flush
end
end
end
end
end
require ' net/http '
def query_url(url)
return Net :: HTTP . get(URI . parse(url));
end
def save_url(url , dir , filename)
filename = url[url . rindex ( ' / ' ) + 1 , url . length - 1 ] if filename == nil || filename . empty ?
require ' open-uri '
Dir . mkdir ( " #{dir} " ) if dir != nil && ! dir . empty ? && ! FileTest . exist ? (dir)
open (url) do | fin |
if true
File . new( " #{dir}#{filename} " , " wb " ) . close
open ( " #{dir}#{filename} " , " wb " ) do | fout |
while buf = fin . read ( 1024 ) do
fout . write buf
STDOUT . flush
end
end
end
end
end
抓取图片的具体应用:
require
"
util
"
begin
start_url = ' http://list.mall.taobao.com/1424/g-d-----40-0--1424.htm '
while start_url != nil && ! start_url . empty ? do
print " 开始下载#{start_url}\n "
content = query_url(start_url)
next_page = content . scan( / < a href = " (.*?) " class = " next-page " >< span > 下一页 <\/ span ><\/ a >/ )
next_url = nil
next_url = next_page[ 0 ][ 0 ] if next_page != nil && next_page . length > 0 && next_page[ 0 ] . length > 0
imgs = content . scan( /< img src = " (http:\/\/img[\d].*?) " \/>/ )
for img in imgs
url = img[ 0 ];
save_url(url , " d:\\mall\\ " , nil)
end
start_url = next_url;
# break;
end
end
使用一天之后感觉ruby的语法很自然,很好理解,上手比较容易,而且相关包封装的也很好,确实比较适合拿来玩玩小程序.
begin
start_url = ' http://list.mall.taobao.com/1424/g-d-----40-0--1424.htm '
while start_url != nil && ! start_url . empty ? do
print " 开始下载#{start_url}\n "
content = query_url(start_url)
next_page = content . scan( / < a href = " (.*?) " class = " next-page " >< span > 下一页 <\/ span ><\/ a >/ )
next_url = nil
next_url = next_page[ 0 ][ 0 ] if next_page != nil && next_page . length > 0 && next_page[ 0 ] . length > 0
imgs = content . scan( /< img src = " (http:\/\/img[\d].*?) " \/>/ )
for img in imgs
url = img[ 0 ];
save_url(url , " d:\\mall\\ " , nil)
end
start_url = next_url;
# break;
end
end