技术不分贵贱,用不同的语言一样能实现相同的东西,但我想说的是相比C++和Java,总有一些东西能更高效得做到一些我们想要达到的目的。以下列出来的就是对我感触较深的几点:
- 用HTML5去做App上的事情。比如 在美中国学生尤雨溪(Evan You)两天打造HTML5版的Clear。做的事情并不复杂,但是对个人PR收益极大,讨巧,性价比太高了...
-
用脚本去做一些看上去非常高端的事情。比如爬虫。类似ruby这样的语言带给我们的方便是难以比拟的,我不清楚之前类似爬虫这样的事情是怎么去做的,但是ruby脚本化的实现,让我非常吃惊。
1 require 'rubygems'
2 require 'mechanize'
3 require 'csv'
4
5 file=File.open("test.txt",'w')
6
7 @num = 0
8
9 def parse_page(agent, page, file)
10 throw "error" unless page
11 table = page.search('table#gdem_list')
12 puts "==Start Prase"
13 throw "empty" if table.empty?
14 table = table.first
15
16 rows = table.search('tbody.tbody').search('tr')
17 rows.each do |row|
18 id = row.search('td')[1].inner_text.strip
19 id = id.split('_')[1]
20 puts "== saving #{id}"
21 agent.get("http://datamirror.csdb.cn/gdemDownload.dem?id=#{id}&fileType=gdem_utm&type=gdem_utm").save_as
22 @num += 1
23 puts "saved num:#{@num} id:#{id}"
24 file.write(id + ",")
25 end
26 end
27
28 def parse_line(line)
29 end
30
31 def print_page(page)
32 pp page
33 puts page.to_s.length
34 end
35
36 puts "Starting Grab.rb"
37 MATH_CLASSES_URL = "http://datamirror.csdb.cn/list.dem?opType=search&type=gdem&mode=zb&txtMaxX=140&txtMinX=70&txtMaxY=60&txtMinY=10"
38 agent = Mechanize.new
39 agent.user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
40 #agent.keep_alive = true
41
42 cookie = Mechanize::Cookie.new("JSESSIONID", "BA58528B76124698AD033EE6DF12B986:-1")
43 cookie.domain = "datamirror.csdb.cn"
44 cookie.path = "/"
45 agent.cookie_jar.add!(cookie)
46
47 puts "Getting page for the first time"
48 page = agent.get MATH_CLASSES_URL
49 print_page(page)
50
51 puts "Configuring FORM"
52 form_sel = page.form(:name => "listForm")
53 puts form_sel
54
55 # maxRows:80
56 # maxRows:20
57 # gdem_list_tr_:true
58 # gdem_list_p_:1
59 # gdem_list_mr_:80
60 form_sel.maxRows = 80
61 form_sel["gdem_list_tr_"] = true
62 form_sel["gdem_list_mr_"] = 80
63
64 puts "==Start Set Value of Each Select"
65 (1..35).each do |page|
66
67 form_sel["gdem_list_p_"] = page
68 puts "=========Posting FORM========="
69
70 new_page = form_sel.submit
71 pp new_page
72 parse_page(agent, new_page,file)
73 puts "Posted FORM"
74 end
75 file.close -
做浏览器的插件。像比如做火车票自动登录的插件,难度低,但影响非常大。
https://github.com/zzdhidden/12306