如题,此次为urllib爬取淘宝网零食板块的代码,并将其写入txt文件中:
一、进入淘宝网-查看源代码
二、通过源码得出正则表达式
- title_pat = '"raw_title":"(.*?)",'
- price_pat = '"view_price":"(.*?)",'
- loc_pat = '"item_loc":"(.*?)",'
- sale_pat = '"view_sales":"(.*?)人付款",'
- comment_pat = '"comment_count":"(.*?)",'
三、书写代码进行爬取
- import re
- import urllib.request
- import urllib.error
- import pandas as pd