玩转数据-Python数据采集的方法

407 篇文章 0 订阅

数据采集

“所有的数字都是数据”,“图片、字母、文字等都是数据”,只要承载了一定的信息,这些数字、图片、文本、声音等都可以认为是数据。没有承载信息的数字是不能作为数据的,认识清楚这个问题,是踏入大数据之门的第一步。

很多新手小白在做数据抓取的时候苦恼于技术难,编写代码容易报错,很容易影响工作进度,那么小白刚入行该怎么编写爬虫代码,今天就来给大家详细讲一讲,希望对你有帮助。

python数据采集之API接口

应用编程接口(Application Programming Interface,API)

1.API接口提取商品ID,并使用正则将数据解析存入csv

通过API接口提取商品信息

{
	"item": {
		"num_iid": "652874751412",
		"title": "奶油风布艺沙发现代简约轻奢小户型客厅直排可拆洗沙发原木可定制",
		"desc_short": "",
		"price": 480,
		"total_price": "",
		"suggestive_price": "",
		"orginal_price": 480,
		"nick": "惜情yqq1127",
		"num": 1515,
		"detail_url": "https://item.taobao.com/item.htm?id=652874751412",
		"pic_url": "//img.alicdn.com/imgextra/i4/2568161054/O1CN01aYBriY1Jem9UDtt9e_!!2568161054.jpg",
		"brand": "#0 工厂",
		"brandId": "",
		"rootCatId": "",
		"cid": 50020632,
		"desc": "<div > \n   <div >\n    <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN01LFmSOU1Jem9QOjMPb_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN014vyOOT1Jem9DpHz3Y_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01B3PpsA1Jem9N8V7uf_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN015JbyeY1Jem9MZshUt_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01HXSoxx1Jem9RvgzHN_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN01IEultA1Jem9MdEx8R_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN0176K98O1Jem9QOjE69_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN013Pxp1O1Jem9RvgeTv_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01SfyZ8M1Jem9QOi1Gx_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01bb1POa1Jem9Sdgve2_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i3/2568161054/O1CN018Eo9dV1Jem9KV0y79_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01vuEofr1Jem9Nzy9xY_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01qw9sAi1Jem8wkNKpy_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i1/2568161054/O1CN01HeFhFw1Jem8rLnjBY_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01SNgjoi1Jem9QOil15_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01RXf3RA1Jem9DpHVwj_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01gZmZjt1Jem9ISThgm_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i2/2568161054/O1CN01YL0FHM1Jem9PQTjX9_!!2568161054.jpg\"  />\n    <img src=\"http://img.alicdn.com/imgextra/i4/2568161054/O1CN01UhsEhZ1Jem8yvJIhZ_!!2568161054.jpg\"  />\n   </div> \n  </div><img src=\"https://www.o0b.cn/i.php?t.png&rid=gw-1.65f69faa75a5a&p=1778787634&k=i_key&t=1710661549\" style=\"display:none\" />",
		"item_imgs": [
			{
				"url": "//img.alicdn.com/imgextra/i4/2568161054/O1CN01aYBriY1Jem9UDtt9e_!!2568161054.jpg"
			},
			{
				"url": "//img.alicdn.com/imgextra/i3/2568161054/O1CN01kjOfNb1Jem9DmWn8Y_!!2568161054.jpg"
			},
			{
				"url": "//img.alicdn.com/imgextra/i1/2568161054/O1CN01HoB9ha1Jem9DmWn8r_!!2568161054.jpg"
			},
			{
				"url": "//img.alicdn.com/imgextra/i4/2568161054/O1CN011PjP2P1Jem9MXEUFT_!!2568161054.jpg"
			},
			{
				"url": "//img.alicdn.com/imgextra/i3/2568161054/O1CN01KUfBFL1Jem9KTTMn1_!!2568161054.jpg"
			}
		],
		"item_weight": "",
		"post_fee": 0,
		"freight": "",
		"express_fee": "",
		"ems_fee": "",
		"shipping_to": "",
		"video": {
			"url": "https://cloud.video.taobao.com/play/u/2568161054/p/2/e/6/t/1/428224913062.mp4?appKey=38829"
		},
		"sample_id": "",
		"props_name": "31480:14306495906:几人坐:脚踏90*60*48cm;31480:14306495907:几人坐:双人165*95*67cm;31480:14306495908:几人坐:三人210*95*67cm;31480:14306495909:几人坐:单人100*95*67cm;31480:21480914361:几人坐:四人位240*95*67cm;31480:21480914362:几人坐:大四人320*95*76cm;31480:1387571900:几人坐:3米贵妃沙发;31480:32527954:几人坐:定制尺寸;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
		"prop_imgs": {
			"prop_img": [
				{
					"properties": "1627207:28321",
					"url": "http://img.alicdn.com/imgextra/i1/2568161054/O1CN017GTZ4h1Jem9Qra1ap_!!2568161054.jpg"
				}
			]
		},
		"props_imgs": {
			"prop_img": [
				{
					"properties": "1627207:28321",
					"url": "http://img.alicdn.com/imgextra/i1/2568161054/O1CN017GTZ4h1Jem9Qra1ap_!!2568161054.jpg"
				}
			]
		},
		"property_alias": "",
		"props": [
			{
				"name": "品牌",
				"value": "#0 工厂"
			},
			{
				"name": "型号",
				"value": "520"
			},
			{
				"name": "材质",
				"value": "木"
			},
			{
				"name": "木质材质",
				"value": "松木"
			},
			{
				"name": "面料",
				"value": "绒布"
			},
			{
				"name": "风格",
				"value": "北欧"
			},
			{
				"name": "几人坐",
				"value": "脚踏90*60*48cm,双人165*95*67cm,三人210*95*67cm,单人100*95*67cm,四人位240*95*67cm,大四人320*95*76cm,3米贵妃沙发,定制尺寸"
			},
			{
				"name": "颜色分类",
				"value": "乳白色"
			},
			{
				"name": "填充物",
				"value": "海绵"
			},
			{
				"name": "结构工艺",
				"value": "木质工艺"
			},
			{
				"name": "是否可定制",
				"value": "是"
			},
			{
				"name": "沙发组合形式",
				"value": "U形"
			},
			{
				"name": "是否可拆洗",
				"value": "是"
			},
			{
				"name": "适用对象",
				"value": "成年人"
			},
			{
				"name": "是否带储物空间",
				"value": "否"
			},
			{
				"name": "产地",
				"value": "上海"
			},
			{
				"name": "地市",
				"value": "上海市"
			},
			{
				"name": "区县",
				"value": "奉贤区"
			},
			{
				"name": "是否组装",
				"value": "否"
			},
			{
				"name": "出租车是否可运输",
				"value": "否"
			},
			{
				"name": "填充物硬度",
				"value": "软"
			},
			{
				"name": "款式定位",
				"value": "经济型"
			}
		],
		"total_sold": "-1",
		"skus": {
			"sku": [
				{
					"price": 480,
					"total_price": 0,
					"orginal_price": 480,
					"properties": "31480:14306495906;1627207:28321",
					"properties_name": "31480:14306495906:几人坐:脚踏90*60*48cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
					"quantity": 200,
					"sku_id": "4881047531343"
				},
				{
					"price": 1688,
					"total_price": 0,
					"orginal_price": 1688,
					"properties": "31480:14306495907;1627207:28321",
					"properties_name": "31480:14306495907:几人坐:双人165*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
					"quantity": 129,
					"sku_id": "4881047531344"
				},
				{
					"price": 2088,
					"total_price": 0,
					"orginal_price": 2088,
					"properties": "31480:14306495908;1627207:28321",
					"properties_name": "31480:14306495908:几人坐:三人210*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
					"quantity": 186,
					"sku_id": "4881047531345"
				},
				{
					"price": 968,
					"total_price": 0,
					"orginal_price": 968,
					"properties": "31480:14306495909;1627207:28321",
					"properties_name": "31480:14306495909:几人坐:单人100*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
					"quantity": 200,
					"sku_id": "4881047531346"
				},
				{
					"price": 2388,
					"total_price": 0,
					"orginal_price": 2388,
					"properties": "31480:21480914361;1627207:28321",
					"properties_name": "31480:21480914361:几人坐:四人位240*95*67cm;1627207:28321:颜色分类:乳白色 尺寸颜色可定制",
					"quantity": 200,
					"sku_id": "5039985183001"

它们为不同的应用提供了方便友好的接口。不同的开发者用不同的架构,甚至不同的语言编写软件都没问题——因为 API 设计的目的就是要成为一种通用语言,让不同的软件进行信息共享。API的数据获取是大数据采集的一种方式,也是蜘蛛技术中最简单的一个环节。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值