天猫商品详情爬取(以及评论信息)

今天试了下爬取天猫,没系统写,只是看了下每页各种数据怎么获取。
其中商品详情页出现了点问题,记录一下。
如果直接用从列表页拿到的链接请求的话,
在这里插入图片描述
最后得出来的结果是没有价格信息的,其他到没注意
在这里插入图片描述
在这里插入图片描述

最后各种试,发现要从页面中取出另一个js链接,这个链接返回的数据是有价格信息的

在这里插入图片描述

import requests
url='https://mdskip.taobao.com/core/initItemDetail.htm?isUseInventoryCenter=false&cartEnable=true&service3C=false&isApparel=false&isSecKill=false&tmallBuySupport=true&isAreaSell=false&tryBeforeBuy=false&offlineShop=false&itemId=607257943493&showShopProm=false&isPurchaseMallPage=false&itemGmtModified=1576527683000&isRegionLevel=false&household=false&sellerPreview=false&queryMemberRight=true&addressLevel=2&isForbidBuyItem=false&callback=setMdskip&timestamp=1578195009673&isg=dBSdxmE7QgYv0fA0BOCChurza779RIRbSuPzaNbMi_5dO1Tss5QOoxN69e96cjWATq8B45113t2TuFbuJg1-vjhOwTvQb4M2B&isg2=BL6-w5IEbHgccrhADSRzDpUCD9TAV4JxIAO33WjHDoH8C1_l0IzfibWqg5diNnqR&areaId=510100&cat_id=2'
headers={
    'Referer':'https://mdskip.taobao.com//core/initItemDetail.htm/_____tmd_____/punish?x5secdata=5e0c8e1365474455070961b803bd560607b52cabf5960afff39b64ce58073f78005654c1c031882a4c6dbedc85c51a441e3b919afa3298cff90c7626668fde860480def1935cf7544236ad19f2057552faa04c5a4741d78a3444916b235ae29cba45bd36bb8e49de97f26cfdecdbf948396052f1caa3b074546afe1c63fda94f00013ede75fd2a9d8eb3665574184336b45fc8a83fb7899cb8ec1e17b434d60fc4f66162bb2f483ccf2b55d158c298559fdc7b6ce8d2a594959dc501c6600df14872d54e92099cf7195680d2ba3b88511f76a4dbb2b594f8c93b60b948d1702fc695fdfb4765ce3b35f862ccc49a7ddbc070bd41eaf21a1d470b225d2dd40c0cafb3f59c461c51b8d9da168f0e68f989878b25517da9db5e2a0f7f0a1b8e6130c9c58bbe9ca0d4667afa0e550cc8ca351677f0472a23701cb860d0d41b647a37c8248933146442ba6ff7f958e4788f6268332ab21102bb58aa52e29810b4b19c0c1df8c88bfca12a767e0118f8a56142989ab47c351d91b6c92135d43282b79e7761d10039ab8ebcc28bec399e580b54b24749603dfa93a95d2702cbc2acfc327f5f8a7739d78cacb54f2d194ad3c969af6c5a446d8cd63d7eaa38b14d6c7a048ddf8ade6b164b9479e80cc95859b3f7&x5step=2',
    'Sec-Fetch-Mode': 'no-cors',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36',
    'cookie':'t=6a61f92ccbd970e84526ac1f5f16b5ca; UM_distinctid=16dc4a60e3b604-066db812675b97-386a410b-100200-16dc4a60e3c2c6; miid=248548291090120761; tk_trace=oTRxOWSBNwn9dPyorMJE%2FoPdY8zfvmw%2Fq5hmCUfRWfRinUXBTBL%2BCO%2FNzRpSKWNvEzD5AocUKglf3lGWFQAqmEdNHvK6DuuBIN20rHgsBPCaU9DrSMoukYv0XWHaEJsmT0d7YdHvMiCLs6Jt7vIT%2BqnJvRzOAaMVzZJjOKEECr5uohFVYvvaBN7MUJjRsGU3g8mhPnv%2BHhlfUAnfGnZeMe2yT2M%2BWxPlTYMMlH0gytXsLZb%2BkDuGefoQG7caVJjdMuCN2Voc67K%2FWCfrEG25oRrsog%3D%3D; cookie2=120e0680b568ad50f0c72733499e7956; _tb_token_=fed3d4e460706; v=0; cna=8tEpFixmOiICAW64zk6EFHTi; unb=2044069096; uc1=existShop=false&pas=0&tag=8&lng=zh_CN&cookie14=UoTbldM7nmU%2Fvw%3D%3D&cookie16=VT5L2FSpNgq6fDudInPRgavC%2BQ%3D%3D&cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&cookie21=URm48syIZJwTkNGk7euL6g%3D%3D; uc3=id2=UUjViSNYfbKLAw%3D%3D&lg2=Vq8l%2BKCLz3%2F65A%3D%3D&vt3=F8dBxdgpCfFJT%2BAlFmA%3D&nk2=D85B8wMn1%2B%2BGl1pynPVn; csg=b2beedab; lgc=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; cookie17=UUjViSNYfbKLAw%3D%3D; dnk=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; skt=ed8770349f9c6948; existShop=MTU3ODE4ODE2NQ%3D%3D; uc4=nk4=0%40De3W3b9EisQm60ydmctSvpRH%2FFEV6sw56RY%3D&id4=0%40U2o3vUzmEscWShqVUst018sE7Tjc; tracknick=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; _cc_=U%2BGCWk%2F7og%3D%3D; tg=0; _l_g_=Ug%3D%3D; sg=%E8%85%B063; _nk_=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; cookie1=AnCBIq9bOXVNWWlLCosHrWeRXuCnIYtL8x9Jt2vZR%2Bo%3D; enc=zoiogTkjkBtYr7w9dRq32fT7A2QRUm%2BmihpGKejzoHJ5V7bpTmXohUbHW0hPAqmdvvJ5kjRvIiN3BOvq8xuTmg%3D%3D; isg=BLy8y4Bjzl7KS_oCe0icPF42jVquHWDX1pn1I5Y9w6eKYVzrvsFqbzPQRcm8LJg3; l=dBxkNf0qQLHcPYBTBOCaourza77TIIRYSuPzaNbMi_5Qg6Ts_-bOoxN1tF96cjWf9lTB45113tv9-etk2UMqWXSpXUJ6nxDc.; ucn=center; x5sec=7b226d616c6c64657461696c736b69703b32223a226665386164646134356164313163333737633735623463313535353734383134434a793078664146454f37653662762b322f436548786f4d4d6a41304e4441324f5441354e6a7331227d'
}
rq=requests.get(url,headers=headers)
print(rq.text)

在这里插入图片描述
简单记录一下,有了真实链接的获取方式,其他就简单了。

网上找参考的时候发现大多都是评论的爬取解决方式,吃完午饭试试看评论爬取,看看难点在哪

2020年1月5日更新

吃过饭看了下,跟上面的爬取详情页一样的解决办法,在网页内有js链接
在这里插入图片描述
代码跟上面的一样,换换链接就好了,注意到url链接中有一个currentPage,把他的值更换就可以实现翻页了

import requests
from pprint import pprint
url='https://rate.tmall.com/list_detail_rate.htm?itemId=597087667969&spuId=1251197212&sellerId=1786613187&order=3&currentPage=1&append=0&content=1&tagId=&posi=&picture=&groupId=&ua=098%23E1hvcpvnvRgvUvCkvvvvvjiPRsLUQjYbRLshgjljPmP9tjrRRLsv1jlWP2FWAjnvRphvChCvvvmCvpvWzC16cQ2NznswOTB4dphvmZC233pIvhCgITwCvvBvpvpZRphvChCvvvvPvpvhvv2MMQhCvvXvovvvvvmtvpvIphvvvvvvphCvpCBXvvCv3yCvHHyvvhn2phvZ7pvvpiivpCBXvvCmeuyCvv3vpvoYRkknCgyCvvXmp99het%2BEvpCWpxtBv8RKNxGw4w2UVC%2Bw4cCmsWQKK5CmwBx1lB9XahOw4w2UeCQw4cCmsRvKK5CmwyB1lB9XwBvw4w2UVXIw4cCmsnpKK5Cmah11lOwCvvBvppvvdphvmZCm%2BCoEvhCTQ9%3D%3D&needFold=0&_ksTS=1578204466022_611&callback=jsonp612'
headers={
    'Referer':'https://mdskip.taobao.com//core/initItemDetail.htm/_____tmd_____/punish?x5secdata=5e0c8e1365474455070961b803bd560607b52cabf5960afff39b64ce58073f78005654c1c031882a4c6dbedc85c51a441e3b919afa3298cff90c7626668fde860480def1935cf7544236ad19f2057552faa04c5a4741d78a3444916b235ae29cba45bd36bb8e49de97f26cfdecdbf948396052f1caa3b074546afe1c63fda94f00013ede75fd2a9d8eb3665574184336b45fc8a83fb7899cb8ec1e17b434d60fc4f66162bb2f483ccf2b55d158c298559fdc7b6ce8d2a594959dc501c6600df14872d54e92099cf7195680d2ba3b88511f76a4dbb2b594f8c93b60b948d1702fc695fdfb4765ce3b35f862ccc49a7ddbc070bd41eaf21a1d470b225d2dd40c0cafb3f59c461c51b8d9da168f0e68f989878b25517da9db5e2a0f7f0a1b8e6130c9c58bbe9ca0d4667afa0e550cc8ca351677f0472a23701cb860d0d41b647a37c8248933146442ba6ff7f958e4788f6268332ab21102bb58aa52e29810b4b19c0c1df8c88bfca12a767e0118f8a56142989ab47c351d91b6c92135d43282b79e7761d10039ab8ebcc28bec399e580b54b24749603dfa93a95d2702cbc2acfc327f5f8a7739d78cacb54f2d194ad3c969af6c5a446d8cd63d7eaa38b14d6c7a048ddf8ade6b164b9479e80cc95859b3f7&x5step=2',
    'Sec-Fetch-Mode': 'no-cors',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36',
    'cookie':'t=6a61f92ccbd970e84526ac1f5f16b5ca; UM_distinctid=16dc4a60e3b604-066db812675b97-386a410b-100200-16dc4a60e3c2c6; miid=248548291090120761; tk_trace=oTRxOWSBNwn9dPyorMJE%2FoPdY8zfvmw%2Fq5hmCUfRWfRinUXBTBL%2BCO%2FNzRpSKWNvEzD5AocUKglf3lGWFQAqmEdNHvK6DuuBIN20rHgsBPCaU9DrSMoukYv0XWHaEJsmT0d7YdHvMiCLs6Jt7vIT%2BqnJvRzOAaMVzZJjOKEECr5uohFVYvvaBN7MUJjRsGU3g8mhPnv%2BHhlfUAnfGnZeMe2yT2M%2BWxPlTYMMlH0gytXsLZb%2BkDuGefoQG7caVJjdMuCN2Voc67K%2FWCfrEG25oRrsog%3D%3D; cookie2=120e0680b568ad50f0c72733499e7956; _tb_token_=fed3d4e460706; v=0; cna=8tEpFixmOiICAW64zk6EFHTi; unb=2044069096; uc1=existShop=false&pas=0&tag=8&lng=zh_CN&cookie14=UoTbldM7nmU%2Fvw%3D%3D&cookie16=VT5L2FSpNgq6fDudInPRgavC%2BQ%3D%3D&cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&cookie21=URm48syIZJwTkNGk7euL6g%3D%3D; uc3=id2=UUjViSNYfbKLAw%3D%3D&lg2=Vq8l%2BKCLz3%2F65A%3D%3D&vt3=F8dBxdgpCfFJT%2BAlFmA%3D&nk2=D85B8wMn1%2B%2BGl1pynPVn; csg=b2beedab; lgc=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; cookie17=UUjViSNYfbKLAw%3D%3D; dnk=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; skt=ed8770349f9c6948; existShop=MTU3ODE4ODE2NQ%3D%3D; uc4=nk4=0%40De3W3b9EisQm60ydmctSvpRH%2FFEV6sw56RY%3D&id4=0%40U2o3vUzmEscWShqVUst018sE7Tjc; tracknick=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; _cc_=U%2BGCWk%2F7og%3D%3D; tg=0; _l_g_=Ug%3D%3D; sg=%E8%85%B063; _nk_=lmd%5Cu5F85%5Cu4F60%5Cu957F%5Cu53D1%5Cu53CA%5Cu8170; cookie1=AnCBIq9bOXVNWWlLCosHrWeRXuCnIYtL8x9Jt2vZR%2Bo%3D; enc=zoiogTkjkBtYr7w9dRq32fT7A2QRUm%2BmihpGKejzoHJ5V7bpTmXohUbHW0hPAqmdvvJ5kjRvIiN3BOvq8xuTmg%3D%3D; isg=BLy8y4Bjzl7KS_oCe0icPF42jVquHWDX1pn1I5Y9w6eKYVzrvsFqbzPQRcm8LJg3; l=dBxkNf0qQLHcPYBTBOCaourza77TIIRYSuPzaNbMi_5Qg6Ts_-bOoxN1tF96cjWf9lTB45113tv9-etk2UMqWXSpXUJ6nxDc.; ucn=center; x5sec=7b226d616c6c64657461696c736b69703b32223a226665386164646134356164313163333737633735623463313535353734383134434a793078664146454f37653662762b322f436548786f4d4d6a41304e4441324f5441354e6a7331227d'
}
rq=requests.get(url,headers=headers)
pprint(rq.text)
  • 4
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 8
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值