数据采集是数据可视化分析的第一步,也是最基础的一步,数据采集的数量和质量越高,后面分析的准确的也就越高,我们来看一下淘宝网的数据该如何爬取。
淘宝网站是一个动态加载的网站,我们之前可以采用解析接口或者用Selenium自动化测试工具来爬取数据,但是现在淘宝对接口进行了加密,使我们很难分析出来其中的规律,同时淘宝也对Selenium进行了反爬限制,所以我们要换种思路来进行数据获取。
打开开发者模式,开始对网页进行观察后发现,淘宝商品的数据竟然在源网页中存储着。
经过以上分析,现在我们就可以开始构造爬虫程序了。
01 导入爬虫使用的库
import requests
import re
import time
import random
import openpyxl
02 发起请求
for page in range(1,101):
params = (
('q', '棉袄'),
('imgfile', ''),
('commend', 'all'),
('ssid', 's5-e'),
('search_type', 'item'),
('sourceId', 'tb.index'),
('spm', 'a21bo.jianhua.201856-taobao-item.2'),
('ie', 'utf8'),
('initiative_id', 'tbindexz_20170306'),
('hintq', '1'),
('s', str(page*44)),
)
response = requests.get(url, params=params)
02 数据存储
a = 0
b = 0
for i in range(44):
try:
sheet.append([dianpumingcheng[i],shangpinming[i],float(jiage[i]),fahuodi[i],fukuanrenshu[i]])
except:
a+=1
if a>30:
print(f"第{page}页数据未爬取......")
wb.save('棉袄.xlsx')
# 把xxx改成你想要的存储的名称即可
b = 1
break
if b == 1:
break
print(f"已爬取完第{page}页数据......")
time.sleep(random.randint(3,5))
print(f'共爬取{page}页数据......')
03数据清洗
数据采集后,要对其进行清洗,剔除脏数据,用以提高分析的准确性
数据展示:
{
"item": {
"num_iid": "652874751412",
"title": "奶油风布艺沙发现代简约轻奢小户型客厅直排可拆洗沙发原木可定制",
"desc_short": "",
"price": 480,
"total_price": "",
"suggestive_price": "",
"orginal_price": 480,
"nick": "现代布艺沙发",
"num": 1515,
"detail_url": "https://item.taobao.com/item.htm?id=652874751412",
"pic_url": "//img.alicdn.com/imgextra/i4/2568161054/O1CN01aYBriY1Jem9UDtt9e_!!2568161054.jpg",
"brand": "#0 工厂",
"brandId": "",
"rootCatId": "",
"cid": 50020632,
"desc": "<div > \n <div >\n <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN01LFmSOU1Jem9QOjMPb_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN014vyOOT1Jem9DpHz3Y_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01B3PpsA1Jem9N8V7uf_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN015JbyeY1Jem9MZshUt_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01HXSoxx1Jem9RvgzHN_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN01IEultA1Jem9MdEx8R_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN0176K98O1Jem9QOjE69_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN013Pxp1O1Jem9RvgeTv_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01SfyZ8M1Jem9QOi1Gx_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01bb1POa1Jem9Sdgve2_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN018Eo9dV1Jem9KV0y79_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01vuEofr1Jem9Nzy9xY_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01qw9sAi1Jem8wkNKpy_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01HeFhFw1Jem8rLnjBY_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01SNgjoi1Jem9QOil15_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01RXf3RA1Jem9DpHVwj_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01gZmZjt1Jem9ISThgm_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01YL0FHM1Jem9PQTjX9_!!2568161054.jpg\" />\n <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01UhsEhZ1Jem8yvJIhZ_!!2568161054.jpg\" />\n </div> \n </div><img src=\"https://www.o0b.cn/i.php?t.png&rid=gw-3.65ed52f0e2b1e&p=1778786415&k=i_key&t=1710052086\" style=\"display:none\" />",
"item_imgs": [
{
"url": "//img.alicdn.com/imgextra/i4/2568161054/O1CN01aYBriY1Jem9UDtt9e_!!2568161054.jpg"
},
{
"url": "//img.alicdn.com/imgextra/i3/2568161054/O1CN01kjOfNb1Jem9DmWn8Y_!!2568161054.jpg"
},
{
"url": "//img.alicdn.com/imgextra/i1/2568161054/O1CN01HoB9ha1Jem9DmWn8r_!!2568161054.jpg"
},
{
"url": "//img.alicdn.com/imgextra/i4/2568161054/O1CN011PjP2P1Jem9MXEUFT_!!2568161054.jpg"
},
{
"url": "//img.alicdn.com/imgextra/i3/2568161054/O1CN01KUfBFL1Jem9KTTMn1_!!2568161054.jpg"
}
],
"item_weight": "",
"post_fee": "0.00",
"freight": "",
"express_fee": "",
"ems_fee": "",
"shipping_to": "",
"video": {
"url": "https://cloud.video.taobao.com/play/u/2568161054/p/2/e/6/t/1/428224913062.mp4?appKey=38829"
},
"sample_id": "",
"props_name": "31480:32527954:几人坐:定制尺寸;31480:14306495909:几人坐:单人100*95*67cm;31480:21480914361:几人坐:四人位240*95*67cm;31480:21480914362:几人坐:大四人320*95*76cm;31480:14306495907:几人坐:双人165*95*67cm;31480:14306495908:几人坐:三人210*95*67cm;31480:14306495906:几人坐:脚踏90*60*48cm;31480:1387571900:几人坐:3米贵妃沙发;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"prop_imgs": {
"prop_img": [
{
"properties": "1627207:28321",
"url": "http://img.alicdn.com/imgextra/i1/2568161054/O1CN017GTZ4h1Jem9Qra1ap_!!2568161054.jpg"
}
]
},
"props_imgs": {
"prop_img": [
{
"properties": "1627207:28321",
"url": "http://img.alicdn.com/imgextra/i1/2568161054/O1CN017GTZ4h1Jem9Qra1ap_!!2568161054.jpg"
}
]
},
"property_alias": "",
"props": [
{
"name": "品牌",
"value": "#0 工厂"
},
{
"name": "型号",
"value": "520"
},
{
"name": "材质",
"value": "木"
},
{
"name": "木质材质",
"value": "松木"
},
{
"name": "面料",
"value": "绒布"
},
{
"name": "风格",
"value": "北欧"
},
{
"name": "几人坐",
"value": "脚踏90*60*48cm,双人165*95*67cm,三人210*95*67cm,单人100*95*67cm,四人位240*95*67cm,大四人320*95*76cm,3米贵妃沙发,定制尺寸"
},
{
"name": "颜色分类",
"value": "乳白色"
},
{
"name": "填充物",
"value": "海绵"
},
{
"name": "结构工艺",
"value": "木质工艺"
},
{
"name": "是否可定制",
"value": "是"
},
{
"name": "沙发组合形式",
"value": "U形"
},
{
"name": "是否可拆洗",
"value": "是"
},
{
"name": "适用对象",
"value": "成年人"
},
{
"name": "是否带储物空间",
"value": "否"
},
{
"name": "产地",
"value": "上海"
},
{
"name": "地市",
"value": "上海市"
},
{
"name": "区县",
"value": "奉贤区"
},
{
"name": "是否组装",
"value": "否"
},
{
"name": "出租车是否可运输",
"value": "否"
},
{
"name": "填充物硬度",
"value": "软"
},
{
"name": "款式定位",
"value": "经济型"
}
],
"total_sold": "-1",
"skus": {
"sku": [
{
"price": 3000,
"total_price": 0,
"orginal_price": 3000,
"properties": "31480:32527954;1627207:28321",
"properties_name": "31480:32527954:几人坐:定制尺寸;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 200,
"sku_id": "5039985183003"
},
{
"price": 968,
"total_price": 0,
"orginal_price": 968,
"properties": "31480:14306495909;1627207:28321",
"properties_name": "31480:14306495909:几人坐:单人100*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 200,
"sku_id": "4881047531346"
},
{
"price": 2388,
"total_price": 0,
"orginal_price": 2388,
"properties": "31480:21480914361;1627207:28321",
"properties_name": "31480:21480914361:几人坐:四人位240*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 200,
"sku_id": "5039985183001"
},
{
"price": 3188,
"total_price": 0,
"orginal_price": 3188,
"properties": "31480:21480914362;1627207:28321",
"properties_name": "31480:21480914362:几人坐:大四人320*95*76cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 200,
"sku_id": "5039985183002"
},
{
"price": 1688,
"total_price": 0,
"orginal_price": 1688,
"properties": "31480:14306495907;1627207:28321",
"properties_name": "31480:14306495907:几人坐:双人165*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 129,
"sku_id": "4881047531344"
},
{
"price": 2088,
"total_price": 0,
"orginal_price": 2088,
"properties": "31480:14306495908;1627207:28321",
"properties_name": "31480:14306495908:几人坐:三人210*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 186,
"sku_id": "4881047531345"
},
{
"price": 480,
"total_price": 0,
"orginal_price": 480,
"properties": "31480:14306495906;1627207:28321",
"properties_name": "31480:14306495906:几人坐:脚踏90*60*48cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 200,
"sku_id": "4881047531343"
},
{
"price": 3400,
"total_price": 0,
"orginal_price": 3400,
"properties": "31480:1387571900;1627207:28321",
"properties_name": "31480:1387571900:几人坐:3米贵妃沙发;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
"quantity": 200,
"sku_id": "5039984824000"
}
]
},
"seller_id": "2568161054",
"sales": 0,
"shop_id": "567158267.",
"props_list": {
"31480:14306495906": "几人坐:脚踏90*60*48cm",
"31480:14306495907": "几人坐:双人165*95*67cm",
"31480:14306495908": "几人坐:三人210*95*67cm",
"31480:14306495909": "几人坐:单人100*95*67cm",
"31480:21480914361": "几人坐:四人位240*95*67cm",
"31480:21480914362": "几人坐:大四人320*95*76cm",
"31480:1387571900": "几人坐:3米贵妃沙发",
"31480:32527954": "几人坐:定制尺寸",
"1627207:28321": "颜色分类:乳白色 尺寸颜色可定制"
},
"seller_info": {
"nick": "现代布艺沙发",
"item_score": null,
"score_p": null,
"delivery_score": null,
"shop_type": "",
"user_num_id": "2568161054",
"sid": null,
"title": "",
"zhuy": "https://shop567158267..taobao.com",
"cert": null,
"open_time": "",
"credit_score": null,
"shop_name": "现代布艺沙发"
},
"tmall": false,
"error": "",
"location": "江苏南通",
"data_from": "ha",
"has_discount": "false",
"is_promotion": "false",
"promo_type": null,
"props_img": {
"1627207:28321": "http://img.alicdn.com/imgextra/i1/2568161054/O1CN017GTZ4h1Jem9Qra1ap_!!2568161054.jpg"
},
"format_check": "ok",
"desc_img": [
"http://img.alicdn.com/imgextra/i3/2568161054/O1CN01LFmSOU1Jem9QOjMPb_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i3/2568161054/O1CN014vyOOT1Jem9DpHz3Y_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01B3PpsA1Jem9N8V7uf_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i2/2568161054/O1CN015JbyeY1Jem9MZshUt_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01HXSoxx1Jem9RvgzHN_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i3/2568161054/O1CN01IEultA1Jem9MdEx8R_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i3/2568161054/O1CN0176K98O1Jem9QOjE69_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i4/2568161054/O1CN013Pxp1O1Jem9RvgeTv_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01SfyZ8M1Jem9QOi1Gx_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01bb1POa1Jem9Sdgve2_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i3/2568161054/O1CN018Eo9dV1Jem9KV0y79_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01vuEofr1Jem9Nzy9xY_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01qw9sAi1Jem8wkNKpy_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01HeFhFw1Jem8rLnjBY_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01SNgjoi1Jem9QOil15_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01RXf3RA1Jem9DpHVwj_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01gZmZjt1Jem9ISThgm_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01YL0FHM1Jem9PQTjX9_!!2568161054.jpg",
"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01UhsEhZ1Jem8yvJIhZ_!!2568161054.jpg"
],
"shop_item": [],
"relate_items": []
},
"error": "",
"secache": "4ad7ad2480af253fec9c2fd4daa266bb",
"secache_time": 1710052086,
"secache_date": "2024-03-10 14:28:06",
"translate_status": "",
"translate_time": 0,
"language": {
"default_lang": "cn",
"current_lang": "cn"
},