github书籍索引目录

最新推荐文章于 2023-03-16 13:33:28 发布

xuexilangren1

最新推荐文章于 2023-03-16 13:33:28 发布

阅读量1k

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/xuexilangren1/article/details/89600946

版权

python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

import requests
import bs4
response=requests.get('http://www.69p11.xyz/')
content_response=response.content
soup=bs4.BeautifulSoup(content_response,"lxml")
title=soup.find("article")
print(type(title))#/html/body/div[5]/div/main/div[2]/div[1]/div[7]/div[2]/article/html/body/div[5]/div/main/div[2]/div[1]/div[7]/div[2]/article/h3[3]html body.logged-in.env-production div.application-main div main#js-repo-pjax-container div.container.new-discussion-timeline.experiment-repo-nav div.repository-content div#readme.Box.Box--condensed.instapaper_body.md.js-code-block-container div.Box-body article.markdown-body.entry-content.p-5 h3

<class 'NoneType'>

from lxml import etree
import os
header={
    'Host': 'github.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'
}
con=requests.get('http://www.69p11.xyz/').content
sou=bs4.BeautifulSoup(con,"lxml")
arti=sou.find_all('div',class_='w')
print(type(arti[1]))
f=open('D:\\python_study\\BeautifulPicture\\c.txt','w',encoding='utf8')

for i in arti:
    for j in i.descendants:
        if(j.name=='a'):
            #print(i['href'],i.string)
            if(j.string!=None):
                f.write(j.string+','+j['href']+'\n')
                #print(j['href'],"\t",j.string)
f.close()
#xp=etree.HTML(arti.string)
#con=xp.xpath("//html/body/div[5]/div")
#print(con)

 <class 'bs4.element.Tag'>

xuexilangren1

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
github书籍索引目录

import requestsimport bs4response=requests.get('http://www.69p11.xyz/')content_response=response.contentsoup=bs4.BeautifulSoup(content_response,"lxml")title=soup.find("article")print(type(title)...
复制链接

扫一扫