python爬虫之requests_html 爬取身份证信息(假的信息)

python爬虫之requests_html 爬取身份证信息(假的信息)

直接上源码,粘贴即用


```python
import requests
from requests_html import HTMLSession
session = HTMLSession()
def huoqu():
    session = HTMLSession()
    for url in listlink():
        r = session.get(url)
        for i in range(1,16):
            name = r.html.xpath("//table[2][@class='table']/tbody/tr["+str(i)+"]/td", first=True).text
            id = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[2]", first=True).text
            age = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[3]", first=True).text
            sex = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[4]", first=True).text
            add = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[5]", first=True).text
            print(name,id,age,sex,add)


def listlink():
    r = session.get('http://sfzdq.uzuzuz.com/sfz/510000.html')
    a=r.html.xpath("//ul[@class='list-group']", first=True).absolute_links
    return (list(a))
if __name__ == '__main__':
    huoqu()
写入csv文件

```python
import requests
from requests_html import HTMLSession
import csv
session = HTMLSession()
f = open('身份证信息.csv','w',encoding='utf-8')
csv_writer = csv.writer(f)
csv_writer.writerow(["姓名","年龄","性别"])
def huoqu():
    session = HTMLSession()
    for url in listlink():
        r = session.get(url)
        for i in range(1,16):
            name = r.html.xpath("//table[2][@class='table']/tbody/tr["+str(i)+"]/td", first=True).text
            id = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[2]", first=True).text
            age = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[3]", first=True).text
            sex = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[4]", first=True).text
            add = r.html.xpath("//table[2][@class='table']/tbody/tr[" + str(i) + "]/td[5]", first=True).text
            a=([name,id,age,sex,add])
            # 保存到本地excel
            csv_writer.writerow(a)
    f.close()



def listlink():
    r = session.get('http://sfzdq.uzuzuz.com/sfz/510000.html')
    a=r.html.xpath("//ul[@class='list-group']", first=True).absolute_links
    return (list(a))
if __name__ == '__main__':
    huoqu()


  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值