跟之前提到的其它交易所数据提取方式相似,首先在浏览器里按F12,提取访问API的地址和数据格式。
可以看到中国金融期货交易所的api地址是:
GET http://www.cffex.com.cn/sj/hqsj/rtj/202108/27/index.xml
数据格式是xml的,数据片段如下:
<dailydatas>
<dailydata>
<instrumentid>IC2109</instrumentid>
<tradingday>20210827</tradingday>
<openprice>7035.2</openprice>
<highestprice>7124.2</highestprice>
<lowestprice>7030.6</lowestprice>
<closeprice>7106.8</closeprice>
<preopeninterest>143208</preopeninterest>
<openinterest>145668</openinterest>
<presettlementprice>7062.2</presettlementprice>
<settlementpriceif>7111.6</settlementpriceif>
<settlementprice>7111.6</settlementprice>
<volume>61973</volume>
<turnover>87810915840</turnover>
<productid>IC</productid>
<delta></delta>
<expiredate>20210917</expiredate>
</dailydata>
<dailydatas>
xml是曾经最流行的格式,虽然目前几乎已经被json取代,但还是有不少系统在使用。
用程序获取数据的方式跟之前的基本相同,不重复了。这里还是主要介绍一下解析和提取数据的方式。
解析xml用python自带的工具就足够了:
from xml.etree.ElementTree import XML
测试发现先完整的组装好二维的list,然后再用list去创建dataframe,比提取一行数据就添加到dataframe要快得多,所以实现的时候按如下方式完成:
r = requests.get(cffex_url, headers=headers)
xml = XML(r.content.decode())
for child in xml:
data_item = []
for column in columns:
data_item.append(child.find(f'./{column}').text)
data_list.append(data_item)
df = pd.DataFrame(data=data_list, columns=columns)
最终也是保存到mysql的数据库里,完整代码如下:
import multiprocessing
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed
from xml.etree.ElementTree import XML
import requests
from common.utils import *
from tushare_client.base import AbstractDataRetriever
from tushare_client.stock_calendar import StockCalendar
class CffexDaily(AbstractDataRetriever):
def __init__(self):
super().__init__('futures_cffex_daily')
def _full(self, **kwargs):
self._get_data_list('20110101', today())
def _delta(self, **kwargs):
df_origin = self.query(fields='max(tradingday)')
if df_origin.empty or df_origin.iat[0, 0] is None:
self._get_data_list('20110101', today())
else:
self._get_data_list(df_origin.iat[0, 0], today())
def _get_data_list(self, start_date, end_date, max_worker=multiprocessing.cpu_count() * 2):
df_cal_date = StockCalendar().query(
fields='cal_date',
where=f'`exchange`=\'cffex\' and is_open=\'1\' and cal_date >\'{start_date}\' and cal_date <= \'{end_date}\'',
order_by='cal_date')
with ThreadPoolExecutor(max_worker) as executor:
future_to_date = \
{executor.submit(self._get_daily_data, trade_date=row['cal_date']): row
for index, row in df_cal_date.iterrows()}
for future in as_completed(future_to_date):
row = future_to_date[future]
try:
data = future.result()
except Exception as ex:
self.logger.error(f"failed to retrieve {row['cal_date']}")
self.logger.exception(ex)
def _get_daily_data(self, trade_date):
cffex_url = f'http://www.cffex.com.cn/sj/hqsj/rtj/{trade_date[0:6]}/{trade_date[6:8]}/index.xml'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0',
'Accept': '*/*',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
}
r = requests.get(cffex_url, headers=headers)
xml = XML(r.content.decode())
data_list = []
columns = ['instrumentid', 'tradingday', 'openprice', 'highestprice', 'lowestprice', 'closeprice',
'preopeninterest', 'openinterest', 'presettlementprice', 'settlementprice', 'turnover', 'volume',
'productid', 'expiredate']
for child in xml:
data_item = []
for column in columns:
data_item.append(child.find(f'./{column}').text)
data_list.append(data_item)
df = pd.DataFrame(data=data_list, columns=columns)
self._save(df)
if __name__ == '__main__':
CffexDaily().retrieve()