jason3586596-CSDN博客

原创 lxml结合xpath注意事项

1.使用Xpath语法，应该使用Element.xpath方法，来执行xpath选择，示例代码如下：trs = html.xpath("//tr[position()>2]")xpath函数返回的永远是一个列表2.获取某个标签的属性：href = html.xpath("//a/@href")3.获取文本，通过xpath下的text（）函数：address = tr.xpath("./td[4]/text()")[0]4.在某个标签下使用xpath函数 , 获取其子孙函数, 应

2020-07-19 13:48:47 231

翻译 P20【数据解析】1-xpath简介以及工具安装

XPath语法和lxml模块什么是XPath？xpath （xml path language）是一门xml 和html文档中查找信息的语言，可以用来在xml和html中对元素和属性进行遍历XPath开发工具Chrome插件 XPath Helper Firefox插件 XPath Checker选取节点XPath 使用路径表达式在 XML 文档中选取节点。节点是通过沿着路径或者 step 来选取的。下面列出了最有用的路径表达式：表达...

2020-07-07 21:08:18 155

转载 requests库

requests库虽然Python的标准库中 urllib模块已经包含了平常我们使用的大多数功能，但是它的 API 使用起来让人感觉不太好，而 Requests宣传是 “HTTP for Humans”，说明使用更简洁方便。文档地址：利用pip可以非常方便的安装：pip install requests发送GET请求：最简单的发送get请求就是通过requests.get来调用： response = requests.get("http://www.baidu.com/").

2020-07-07 20:58:54 127

转载 P18【网络请求】15-requests处理cookie信息

#encoding: utf-8import requests# response = requests.get('https://www.baidu.com/')# print(response.cookies.get_dict())url = "http://www.renren.com/PLogin.do"data = { 'email': "xxxxxxxxx@qq.com", 'password': "xxxxxxxxx"}headers = { ...

2020-07-07 20:05:36 1216 1

原创网络爬虫-课时21剖析分页信息

网络学习笔记import requestsres = requests.get('http://api.roll.news.sina.com.cn/zt_list?channel=news&cat_1=gnxw&cat_2==gdxw1||=gatxw||=zs-pl||=mtjj&level==1||=2&show_ext=1&show_all=1...

2018-08-28 22:17:25 152

原创转换为首字母缩写

str1 = input("请输入英文单词：")str2 = str1.upper()list1 = str2.split()for word in list1: print(word[0], end = '')注意最后加入end = ''，否则自动换行

2018-08-28 22:15:49 1028

原创网络爬虫-课时15抓取新闻评论数

import requestscommments = requests.get('http://comment5.news.sina.com.cn/page/info?version=1&format=js&channel=gn&newsid=comos-fxvctcc8121090&group=&compress=0&ie=utf-8&o...

2018-07-01 11:49:09 478

原创网络爬虫-课时18信息抽取函式

来自网络学习笔记

2018-07-01 11:46:13 205

原创网络爬虫-课时9抓取新闻内文页面

import requestsfrom bs4 import BeautifulSoupres = requests.get('http://news.sina.com.cn/c/nd/2016-08-20/doc-ifxvctcc8121090.shtml')res.encoding = 'utf-8'print(res.text)soup = BeautifulSoup(res.te...

2018-06-30 10:18:52 386

原创网络爬虫-课时5 用BeautifulSoup 剖析网页元素

打开运行，输入jupyter booknew(新建) python3import requestsres = requests.get('http://news.sina.com.cn/')res.encoding = 'utf-8'#print (res.text)from bs4 import BeautifulSouphtml_sample = ' \<html> \ <b...

2018-06-28 21:14:40 198

转载 Python Counter函数

>>> c = Counter() # 创建一个新的空counter>>> c = Counter('abcasdf') # 一个迭代对象生成的counter>>> c = Counter({'red': 4, 'yello': 2}) # 一个...

2018-04-25 08:25:27 3849

jason的博客