精简简要
1、用xpath提取的单行文本可以用xpath(‘normalize-space(xpath提取的语法)’)
2、用xpath提取的是列表数据,但数据前后有换行符、空格的,不能用normalize-space,需要二次提取转化,才能变成列表是去除换行符和空格符的。
例子:xpath_list=['\n 2700-3000元/月\n \n ', '\n 2800-3000元/月\n \n ', '\n 薪资面议\n \n ', '\n 3000元/月\n \n ', '\n 3000-4600元/月\n \n ', '\n 3200-5000元/月\n \n ', '\n 4000-8000元/月\n \n ', '\n 3000-3500元/月\n \n ', '\n 2500-4000元/月\n \n ', '\n 薪资面议\n \n ', '\n 2800-3000元/月\n \n ', '\n 薪资面议\n \n ', '\n 3000-4000元/月\n \n ', '\n 2700-3000元/月\n \n ', '\n 2900-3500元/月\n \n ', '\n 3500-4000元/月\n \n ', '\n 2500-3500元/月\n \n ', '\n 2800-3500元/月\n \n ', '\n 5500-7000元/月\n \n ', '\n 2200-4500元/月\n \n ']
prt=[p.strip() for p in list]
print(prt)
[‘2700-3000元/月’, ‘2800-3000元/月’, ‘薪资面议’, ‘3000元/月’, ‘3000-4600元/月’, ‘3200-5000元/月’, ‘4000-8000元/月’, ‘3000-3500元/月’, ‘2500-4000元/月’, ‘薪资面议’, ‘2800-3000元/月’, ‘薪资面议’, ‘3000-4000元/月’, ‘2700-3000元/月’, ‘2900-3500元/月’, ‘3500-4000元/月’, ‘2500-3500元/月’, ‘2800-3500元/月’, ‘5500-7000元/月’, ‘2200-4500元/月’]
完整代码:
import requests
from lxml import etree
url=‘https://www.0737anhua.com/search/1?subarea_id=21’
headers={
‘user-agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.70’,
}
req=requests.get(url=url,headers=headers)
text_str=req.text
xpath_str=etree.HTML(text_str)
公司名
company_name=xpath_str.xpath(“//td[contains(@class,‘company’)]/a/@title”)
职位名
job_name=xpath_str.xpath(“//td[contains(@class,‘job’)]/a/@title”)
工资
wages_data=xpath_str.xpath(“//div/table/tbody//td[6]/text()”)
#wages_data=xpath_str.xpath(“normalize-space(//div/table/tbody//td[6]/text())”) 这样去到的就会把列表值变为单行文本值
prt=[p.strip() for p in wages_data]
for company,job,wages in zip(company_name,job_name,prt):
with open(‘安化人才网2.csv’,‘a+’) as f:
f.write(‘{},{},{}\n’.format(company,job,wages))
f.close()