CNNVD相对CNVD的爬取更简单一点,当前并未遇到明显的爬虫限制机制
初步分析
首先,还是使用我钟爱的爬虫框架——pyspider,选取first page作为起始页http://www.cnnvd.org.cn/web/vulnerability/querylist.tag
![](https://img-blog.csdnimg.cn/img_convert/c21bc5006f1e6c25baaf697428c89023.png)
CNNVD的页面只需要提交get请求即可递归访问到下一页
下面进入到漏洞详情页,抓取需要的信息
![](https://img-blog.csdnimg.cn/img_convert/f865511bb6d615d5874421c5e07ccac1.png)
根据页面的HTML各个节点分析,使用css选择器就可以定位到漏洞信息
cnnvd_level = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div.detail_xq.w770 > ul > li:nth-child(2) > a').text()
cve_id = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div.detail_xq.w770 > ul > li:nth-child(3) > a').text()
vulnerable_type = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div.detail_xq.w770 > ul > li:nth-child(4) > a').text()
upload_time = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div.detail_xq.w770 > ul > li:nth-child(5) > a').text()
threat_type = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div.detail_xq.w770 > ul > li:nth-child(6) > a').text()
update = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div.detail_xq.w770 > ul > li:nth-child(7) > a').text()
vulnerable_detail = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div:nth-child(3)').text()
vulnerable_notice = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div:nth-child(4)').text()
reference_url = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div:nth-child(5)').text()
patch = response.doc('body > div.container.m_t_10 > div > div.fl.w770 > div:nth-child(9)').text()
爬取结果
![](https://img-blog.csdnimg.cn/img_convert/25172958b230bd03d5d736d4a5954ad9.png)