A Record |爬取论文链接

最新推荐文章于 2021-03-23 12:24:50 发布

LALAAYANG

最新推荐文章于 2021-03-23 12:24:50 发布

阅读量140

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/weixin_43894553/article/details/113519497

版权

Python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏


import requests
import urllib.request
from bs4 import BeautifulSoup
from lxml import etree
import json


def get_links(html):
    global links_list
    links_list=[]
    soup = BeautifulSoup(html, "lxml")
    for target in soup.find_all('a'): # 通过find定位标签
        try:
            value=target.get('href')
        except:
            value=''
        #if value.__contains__('www.computer.org/csdl'):
            #value='http://www.ieee-security.org/TC/SP2015/'+value
        if value:
            print(value)
            write_to_file(value)



def links():
    response = urllib.request.urlopen('http://www.ieee-security.org/TC/SP2019/program-papers.html')
    return get_links(response.read().decode('utf-8'))

def write_to_file(content):
    with open('19allssplink.txt', 'a', encoding='utf-8') as f:
        print(type(json.dumps(content)))
        f.write(json.dumps(content,ensure_ascii=False)+'\n')

links()

优惠劵

LALAAYANG

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
A Record |爬取论文链接

import requestsimport urllib.requestfrom bs4 import BeautifulSoupfrom lxml import etreeimport jsondef get_links(html): global links_list links_list=[] soup = BeautifulSoup(html, "lxml") for target in soup.find_all('a'): # 通过find定位标签.
复制链接

扫一扫