ChatGPT炒股：自动批量下载特定主题的股票公告

最新推荐文章于 2024-10-04 11:41:02 发布

AIGCTribe

最新推荐文章于 2024-10-04 11:41:02 发布

阅读量424

点赞数

文章标签： chatgpt

本文链接：https://blog.csdn.net/AIGCTribe/article/details/131005221

版权

很多财经网站、证券交易所等网站都有股票公告。有时候，我们需要从海量公告信息中查找特定信息。比如，查询所有股票2023年预计关联交易的内容，怎么自动批量下载呢？

下面以股转系统的新三板股票为例，来说明如何用ChatGPT编程下载。

首先，打开挂牌公司公告，网址是：https://www.neeq.com.cn/disclosure/announcement.html

输入：2023年日常性关联交易，然后点击查询，

可以看到requestURL是https://www.neeq.com.cn/disclosureInfoController/infoResult_zh.do?callback=jQuery331_1685664278031

Content-Type是:application/x-www-form-urlencoded; charset=UTF-8

说明这个页面的数据都是动态生成的，用常规静态页面的方法是无法获取到的。

点击response，可以看到动态生成的内容，是json格式

点击payload，可以看到这些动态参数是通过form data方式传递给网站服务器的

要获取这个网站内容，要告诉chatgpt网站的Request URL、Request headers、formdata 这些信息，然后发送post请求来获取网址数据，可以在ChatGPT里面输入提示词如下：

一个动态网页，其相关信息如下：

Request URL:

https://www.neeq.com.cn/disclosureInfoController/infoResult_zh.do?callback=jQuery331_16854

Request headers

Accept:

text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, /; q=0.01

Accept-Encoding:

gzip, deflate, br

Accept-Language:

zh-CN,zh;q=0.9,en;q=0.8

Connection:

keep-alive

Content-Length:

538

Content-Type:

application/x-www-form-urlencoded; charset=UTF-8

Host:

http://www.neeq.com.cn

Origin:

https://www.neeq.com.cn

Referer:

https://www.neeq.com.cn/disclosure/announcement.html

Sec-Ch-Ua:

"Google Chrome";v="113", "Chromium";v="113", "Not-A.Brand";v="24"

Sec-Ch-Ua-Mobile:

Sec-Ch-Ua-Platform:

"Windows"

Sec-Fetch-Dest:

empty

Sec-Fetch-Mode:

cors

Sec-Fetch-Site:

same-origin

User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36

X-Requested-With:

XMLHttpRequest

formdata = 'noticeType%5B%5D=5&disclosureType%5B%5D=5&disclosureSubtype%5B%5D=&page=&companyCd=&isNewThree=1&keyword=2023+%E5%B9%B4%E6%97%A5%E5%B8%B8%E6%80%A7%E5%85%B3%E8%81%94%E4%BA%A4%E6%98%93&xxfcbj%5B%5D=3&hyType%5B%5D=&needFields%5B%5D=companyCd&needFields%5B%5D=companyName&needFields%5B%5D=disclosureTitle&needFields%5B%5D=disclosurePostTitle&needFields%5B%5D=destFilePath&needFields%5B%5D=publishDate&needFields%5B%5D=xxfcbj&needFields%5B%5D=destFilePath&needFields%5B%5D=fileExt&needFields%5B%5D=xxzrlx&siteId=1&sortfield=xxssdq&sorttype=asc'

写一段Python代码，发送POST请求，获取这个网页的数据

程序运行后，返回这样的字符串，其中有一段json数据，股票公告下载的地址就在json数据中。

要用程序批量下载PDF，需要首先要去掉开头和结尾的字符串，得到其中的json数据，然后从json数据中提取destFilePath（PDF文件下载地址）和disclosureTitle（PDF文件标题）内容，然后让程序自动下载就好了。

经过多次尝试，构建ChatGPT的提示词如下：

从一个动态网页获取了网页数据response.text,

去掉开头的字符串：jQuery331_1685491901352([{"listInfo":{"content":[，去掉结尾的字符串：,"firstPage":true,"lastPage":false,"number":0,"numberOfElements":20,"size":20,"sort":null,"totalElements":796,"totalPages":40},"status":0}]) ，只保留中间的json内容;

然后从json文件中提取出所有的destFilePath和disclosureTitle内容；

在所有的destFilePath内容前加上 https://www.neeq.com.cn，构建出一个PDF文件下载地址，以disclosureTitle作为PDF文件标题名，注意：要用正则表达式将文件名中[]、:这样的特殊符号替换为下划线；

下载所有PDF文件，保存到电脑d盘“关联交易”文件夹

注意：需要在代码中添加应对反爬虫的一些措施，比如添加请求头、延迟请求等