最近想试试python的爬虫库,就找了个只有字符串的的网页来爬取。网址如下:
打开后看到是一些歌名还有hash等信息。按照hash|filename的方式存在文件里,先贴代码
#coding=utf-8
import urllib
import re
import os
def getHtml(url):
page = urllib.urlopen(url)
html = page.read()
return html
def getHash(html):
reg = r'"hash":"(.+?)",'
has = re.compile(reg)
hashlist = re.findall(has,html)
with open('1.txt','w') as f: