Python小白自学第二天：获取网站，数据解析（实例代码）

最新推荐文章于 2023-03-25 10:55:47 发布

Python_story

最新推荐文章于 2023-03-25 10:55:47 发布

阅读量196

点赞数

分类专栏： Python自学文章标签： python

本文链接：https://blog.csdn.net/Python_story/article/details/115720283

版权

代码import requestsfrom lxml import etreeURL = ‘http://www.chinastor.com/si/hub/list_239_2.html’res = requests.get(URL)res.encoding = ‘gb2312’html = res.content.decode(‘gb2312’)e_html = etree.HTML(html)title = e_html.xpath(’//table[@class=“hovertable

摘要由CSDN通过智能技术生成

代码
import requests
from lxml import etree
URL = ‘http://www.chinastor.com/si/hub/list_239_2.html’
res = requests.get(URL)
res.encoding = ‘gb2312’
html = res.content.decode(‘gb2312’)
e_html = etree.HTML(html)
title = e_html.xpath(’//table[@class=“hovertable”]/tr[@onmouseover]//a[@href]/text()’)
print(title)

代码解释
1、from lxml import etree
XPath 是一门在 XML 文档中查找信息的语言。XPath 可用来在 XML 文档中对元素和属性进行遍历。XPath 是 W3C XSLT 标准的主要元素，并且 XQuery 和 XPointer 都构建于 XPath 表达之上。
导入命令：from lxml import etree

2、res.encoding = 'gb2312’和res.content.decode(‘gb2312’)
encoding：编码，decode：解码；
res.encoding直接读取源文件字节流，后方指定对该字节流的编码方案，当我们的文件编码是UTF-8时，我们encoding使用了GB2312那就会造成对字节流的编码错误，此时要替换过来，再进行指定编码；

.content.decode就是读取源文件字节流，随后将该字节流编码为Unicode编码,方便再进行更加精确的解码；

3、etree.

最低0.47元/天解锁文章

Python_story

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python小白自学第二天：获取网站，数据解析（实例代码）

代码import requestsfrom lxml import etreeURL = ‘http://www.chinastor.com/si/hub/list_239_2.html’res = requests.get(URL)res.encoding = ‘gb2312’html = res.content.decode(‘gb2312’)e_html = etree.HTML(html)title = e_html.xpath(’//table[@class=“hovertable
复制链接

扫一扫