关键词词云怎么做_制作CVPR 热词云(并爬取pdf地址 名称)

该博客介绍如何爬取CVPR论文页面,提取论文名称和PDF链接,并统计出现频率最高的词汇,用于创建热词云。首先,通过正则表达式从网页中抓取论文名称和PDF链接,然后存储到数据库。接着,统计所有论文名称中的词汇频次,将高频词汇及其数量保存到数据库,以便进一步生成热词云。
摘要由CSDN通过智能技术生成

#!/usr/bin/python #这里是解释器位置和python版本#-*- coding: utf-8 -*- #编码格式

"""@author: CuiXingYu

@contact: a15931829662@163.com

@software: PyCharm

@file: CVPR.py

@time: 2020/4/17 19:36"""

importreimportrequestsimportpymysqldefget_context(url):"""params:

url: link

return:

web_context"""web_context=requests.get(url)returnweb_context.textdefget_conn():"""建立数据库连接

:return:"""conn=pymysql.connect(#本机IP地址

host='127.0.0.1',#数据库用户名

user='root',#密码

password='101032',#需要操作的数据库名称

db='db_database07',

)#cursor对象 可以进行sql语句执行 和 获得返回值

cursor=conn.cursor()returnconn,cursordefclose_conn(conn,cursor):"""关闭连接

:param conn: 连接对象

:param cursor: cursor对象

:return:"""

ifcursor:

cursor.close()ifconn:

conn.close()defget_name():"""获取论文的名字 url 地址

:return:"""conn,cursor=get_conn()

url= 'http://openaccess.thecvf.com//CVPR2019.py'web_context=get_context(url)#find paper files

'''(?<=href=\"): 寻找开头,匹配此句之后的内容

.+: 匹配多个字符(除了换行符)

?pdf: 匹配零次或一次pdf

(?=\">pdf): 以">pdf" 结尾

|: 或'''info=[]#link pattern: href="***_CVPR_2019_paper.pdf">pdf

link_list = re.findall(r"(?<=href=\").+?pdf(?=\">pdf)|(?<=href=\').+?pdf(?=\">pdf)", web_context)#name pattern: ***

name_list = re.findall(r"(?<=2019_paper.html\">).+(?=)", web_context)for one,two inzip(name_list,link_list):

info.append([one,two])#sql语句 对数据库进行操作

sql = "insert into paperinfo(name,url) values(%s,%s)"

try:#执行sql语句

cursor.executemany(sql,info)

conn.commit()except:

conn.rollback()

close_conn(conn, cursor)defsaveContent_list(hotword ,number):"""插入数据库

:param hotword: 单词

:param number: 数量

:return:"""

#打开数据库连接(ip/数据库用户名/登录密码/数据库名)

conn,cursor=get_conn()

sql="insert into hotword (hotword,number) values (%s,%s)"val=(hotword,number)

cursor.execute(sql,val)#使用 fetchone() 方法获取数据.

conn.commit()#关闭数据库连接(别忘了)

conn.close()defget_hotword():"""爬取热词并统计数目

:return:"""url= 'http://openaccess.thecvf.com//CVPR2019.py'web_context=get_context(url)

name_list= re.findall(r"(?<=2019_paper.html\">).+(?=)", web_context)

text= " "

for word inname_list:

text= text +word

word=text.split()

word_dict={}for w inword:if w not inword_dict:

word_dict[w]= 1

else:

word_dict[w]= word_dict[w] + 1a= sorted(word_dict.items(), key=lambda item: item[1], reverse=True)#sql语句 对数据库进行操作

for x ina:try:

word=x[0]

num=x[1]

saveContent_list(word,num)except:print("失败")

get_hotword()

get_name()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Gatys et al. (2016) proposed an algorithm for style transfer, which can generate an image that combines the content of one image and the style of another image. The algorithm is based on the neural style transfer technique, which uses a pre-trained convolutional neural network (CNN) to extract the content and style features from the input images. In this algorithm, the content and style features are extracted from the content and style images respectively using the VGG-19 network. The content features are extracted from the output of one of the convolutional layers in the network, while the style features are extracted from the correlations between the feature maps of different layers. The Gram matrix is used to measure these correlations. The optimization process involves minimizing a loss function that consists of three components: the content loss, the style loss, and the total variation loss. The content loss measures the difference between the content features of the generated image and the content image. The style loss measures the difference between the style features of the generated image and the style image. The total variation loss is used to smooth the image and reduce noise. The optimization is performed using gradient descent, where the gradient of the loss function with respect to the generated image is computed and used to update the image. The process is repeated until the loss function converges. The code for this algorithm is available online, and it is implemented using the TensorFlow library. It involves loading the pre-trained VGG-19 network, extracting the content and style features, computing the loss function, and optimizing the generated image using gradient descent. The code also includes various parameters that can be adjusted, such as the weight of the content and style loss, the number of iterations, and the learning rate.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值