逐行读取url,然后bs4提取a标签内的文字,在建立两个列表,一个append()不停载入url,然后做分析,存在http就先写入本地。然后用random模块,随机选择几个要用的写入~~~
关于random库,基本使用方法如下
import random
list = [1,2,3,4,5,6,7,8,9]
sss = random.sample(list,6)
print sss
源代码如下~
#coding = utf-8
import re
import requests
import time
from bs4 import BeautifulSoup as asp
import random
headeraa = {'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)',}
hansb = open('urllist.txt','r') #将url放进urllist.txt
hanssb = hansb.readlines()
hansb.close()
print hanssb
zhzhzh = open('url.txt','a+') #开始写入
<