源代码:以https://www.crowdfunder.com/deals网站为例(该网页并非使用异步加载方法)
#-*-coding:utf8-*- import requests import re #url = 'https://www.crowdfunder.com/deals' #html = requests.get(url).text #print html.encode("gb18030")
#用于使用异步加载方法的网站,直接改变page的参数值即可
url = 'https://www.crowdfunder.com/deals&template=false' data = { 'entities_only':'true', 'page':'1' } html_post = requests.post(url,data=data) title = re.findall('"card-title">(.*?)',html_post.text,re.S) for each in title: print each.encode("gb18030")
运行结果:是筛选出部分网页信息
D:\Python27\python.exe D:/pycharm/class2/company.py Carlson Wireless Technologies Inc. Outski Inc. RUYO Inc Equities.com, Inc. Playground Sessions P2P Cash Effluent Free Desalination Corporation a/k/a EFD Corp. AxCent Tuning Systems Phoenix Financial Holdings, Inc. SPORT-11 Bakersfield Investment Club LDR [lee-der] Brands FAB Financial Inc. Gun Academy Ascenergy iPELLI design Contour Foods, LLC OneMarket BUSINESSPACE, Marketing, Publicity and Communication Corp. Mine Shaft Brewing truBrain Hamilton Investment Properties Community Housing BStriker ClypCall Digitzs SU Labs Accelerator Seed Fund SlideBatch Tracemyfile Banter Freestar Energy Group WineSimple, Inc. PhotoSurvey
Process finished with exit code 0
注意问题:编码不匹配问题将输出的编码 格式改为gb18030,代码为:print each.encode("gb18030")