Python 实现玻璃期货数据处理、入库与分析：从代码到应用

陈奕昆

于 2025-04-06 09:53:56 发布

阅读量961

点赞数 23

分类专栏：期货分析python技术分享文章标签： python 算法

本文链接：https://blog.csdn.net/kenter1983/article/details/147021431

版权

期货分析python技术分享专栏收录该内容

4 篇文章

订阅专栏

Python 实现期货数据处理与分析：从代码到应用

引言

在金融市场中，期货数据的处理和分析对于投资者和分析师来说至关重要。Python 凭借其丰富的库和简洁的语法，成为了处理和分析期货数据的强大工具。本文将详细解读一段用于处理期货持仓和行情数据的 Python 代码，帮助读者理解代码的实现逻辑和功能。

代码整体功能概述

这段代码的主要功能是处理期货持仓和行情数据，将数据存储到 SQLite 数据库中。具体来说，它会遍历指定目录下的 Excel 文件，根据文件名的不同格式分别处理持仓数据和行情数据，并将处理后的数据插入到相应的数据库表中。

代码详细解读

1. 导入必要的库

import sqlite3
import pandas as pd
import os
from datetime import datetime

sqlite3：用于连接和操作 SQLite 数据库。
pandas：用于数据处理和分析，特别是读取 Excel 文件和数据清洗。
os：用于文件和目录操作。
datetime：用于日期处理。

2. 主函数 `process_futures_data`

def process_futures_data(directory_path='czce_holdings', db_name='futures_data.db'):
    """处理期货持仓数据的主函数"""
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()

    # 创建行情数据表（新增）
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS futures_data (
            contract_code TEXT PRIMARY KEY,
            previous_settlement REAL,
            today_open REAL,
            highest_price REAL,
            lowest_price REAL,
            today_close REAL,
            today_settlement REAL,
            change1 REAL,
            change2 REAL,
            volume INTEGER,
            open_interest INTEGER,
            interest_change INTEGER,
            turnover REAL,
            delivery_settlement_price REAL,
            trade_date TEXT
        )''')

    # 创建持仓数据表（原结构）
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS holdings (
        category        TEXT NOT NULL,    -- 品种名称
        date            TEXT NOT NULL,    -- 日期（YYYY-MM-DD）
        member_name     TEXT NOT NULL,    -- 会员简称
        trade_volume    INTEGER DEFAULT 0,-- 成交量（手）
        volume_change   INTEGER DEFAULT 0,-- 成交量增减
        long_position   INTEGER DEFAULT 0,-- 持买仓量
        long_change     INTEGER DEFAULT 0,-- 持买增减
        short_position  INTEGER DEFAULT 0,-- 持卖仓量
        short_change    INTEGER DEFAULT 0,-- 持卖增减
        PRIMARY KEY (category, date, member_name)
    )
    ''')

    # 创建索引
    cursor.execute('CREATE INDEX IF NOT EXISTS idx_main ON holdings (category, date)')
    cursor.execute('CREATE INDEX IF NOT EXISTS idx_futures_date ON futures_data (trade_date)')

    # 遍历处理文件
    for filename in sorted(os.listdir(directory_path)):
        file_path = os.path.join(directory_path, filename)
        try:
            # 提取日期（支持两种文件名格式）
            if filename.startswith('Price_FutureDataHolding_'):
                file_date = extract_date(filename, prefix_len=24)
                process_price_data(conn, cursor, file_path, file_date)
            elif filename.startswith('FutureDataHolding_'):
                file_date = extract_date(filename, prefix_len=18)
                process_holding_data(cursor, file_path, file_date)
            else:
                continue

            conn.commit()

        except Exception as e:
            print(f"处理失败 {filename}: {str(e.__str__())}")
            continue
    conn.close()
    print("\n数据处理完成！")

连接到 SQLite 数据库，并创建两个表：futures_data 用于存储行情数据，holdings 用于存储持仓数据。
创建索引以提高查询效率。
遍历指定目录下的文件，根据文件名的前缀判断是行情数据文件还是持仓数据文件，并调用相应的处理函数。
处理完成后提交事务并关闭数据库连接。

3. 持仓数据处理函数 `process_holding_data`

def process_holding_data(cursor, file_path, file_date):
    """处理持仓数据（原逻辑优化）"""
    print(f"处理持仓文件: {os.path.basename(file_path)}")

    # 原持仓处理逻辑（保持原有结构）
    df = pd.read_excel(file_path, header=None, engine='xlrd')

    holdings = {}
    current_category = ""

    for _, row in df.iterrows():
        row_str = [str(cell).strip() for cell in row]

        if "品种：" in row_str[0]:
            current_category = row_str[0].split("：")[1].split()[0]
            continue

        if row_str[0].isdigit() and not row_str[0].startswith("合计"):
            parts = {
                'trade': parse_section(row_str, 1),
                'long': parse_section(row_str, 4),
                'short': parse_section(row_str, 7)
            }

            for data_type, data in parts.items():
                if data['member']:
                    key = (current_category, file_date, data['member'])

                    if key not in holdings:
                        holdings[key] = {
                            'trade_volume': 0,
                            'volume_change': 0,
                            'long_position': 0,
                            'long_change': 0,
                            'short_position': 0,
                            'short_change': 0
                        }

                    if data_type == 'trade':
                        holdings[key].update({
                            'trade_volume': data['value'],
                            'volume_change': data['change']
                        })
                    elif data_type == 'long':
                        holdings[key].update({
                            'long_position': data['value'],
                            'long_change': data['change']
                        })
                    elif data_type == 'short':
                        holdings[key].update({
                            'short_position': data['value'],
                            'short_change': data['change']
                        })

    # 批量插入
    for key, values in holdings.items():
        cursor.execute('''
            INSERT OR REPLACE INTO holdings
            VALUES (?,?,?,?,?,?,?,?,?)
        ''', (*key, values['trade_volume'], values['volume_change'],
              values['long_position'], values['long_change'],
              values['short_position'], values['short_change']))

读取 Excel 文件并将其存储为 DataFrame。
遍历 DataFrame 的每一行，根据行内容提取品种名称和持仓数据。
将提取的数据存储在字典 holdings 中。
批量插入数据到 holdings 表中。

4. 日期提取函数 `extract_date`

def extract_date(filename, prefix_len):
    """通用日期提取函数"""
    #prefix_len = prefix_len +1
    date_str = filename[prefix_len:prefix_len + 8]
    print(f"调试信息 -> 文件名: {filename}, 切割位置: [{prefix_len}:{prefix_len + 8}], 提取字符串: '{date_str}'")
    return datetime.strptime(date_str, '%Y%m%d').strftime('%Y-%m-%d')

从文件名中提取日期信息，并将其转换为 YYYY-MM-DD 格式。

5. 行情数据处理函数 `process_price_data`

def process_price_data(conn, cursor, file_path, file_date):
    """处理行情数据"""
    print(f"处理行情文件: {os.path.basename(file_path)}")

    # 1. 读取Excel（跳过前两行标题）
    df = pd.read_excel(file_path, header=None, engine='xlrd' ,sheet_name='sheet1', skiprows=2)

    # 2. 强制列名对齐（防止Excel格式变化）
    df.columns = [
        '合约代码', '昨结算', '今开盘', '最高价', '最低价',
        '今收盘', '今结算', '涨跌1', '涨跌2', '成交量(手)',
        '持仓量', '增减量', '成交额(万元)', '交割结算价'
    ]

    # 3. 数据类型处理
    df['合约代码'] = df['合约代码'].astype(str).str.strip()  # 转字符串并清理空格
    df['合约代码'] = df['合约代码'].fillna('')  # 填充空值

    # 4. 过滤汇总行（严格匹配）
    df = df[~df['合约代码'].str.contains('小计|总计', na=False)]

    # 5. 数据清洗（数值列处理）
    numeric_cols = ['昨结算', '今开盘', '最高价', '最低价', '今收盘', '今结算', '涨跌1', '涨跌2', '成交额(万元)']
    for col in numeric_cols:
        df[col] = df[col].apply(clean_number)

    # 6. 列名映射
    column_mapping = {
        '合约代码': 'contract_code',
        '昨结算': 'previous_settlement',
        '今开盘': 'today_open',
        '最高价': 'highest_price',
        '最低价': 'lowest_price',
        '今收盘': 'today_close',
        '今结算': 'today_settlement',
        '涨跌1': 'change1',
        '涨跌2': 'change2',
        '成交量(手)': 'volume',
        '持仓量': 'open_interest',
        '增减量': 'interest_change',
        '成交额(万元)': 'turnover',
        '交割结算价': 'delivery_settlement_price'
    }
    df = df.rename(columns=column_mapping)

    # 7. 添加日期列并入库
    df['trade_date'] = file_date
    df.to_sql('futures_data', conn, if_exists='append', index=False)
    print(f"成功入库 {len(df)} 条记录")

读取 Excel 文件并跳过前两行标题。
强制列名对齐，防止 Excel 格式变化。
处理数据类型，过滤汇总行，清洗数值列。
进行列名映射，将中文列名转换为英文列名。
添加日期列并将数据插入到 futures_data 表中。

6. 数据解析函数 `parse_section`

def parse_section(row, start_idx):
    """解析数据部分（每3列为一组）"""
    return {
        'member': row[start_idx] if start_idx < len(row) else "",
        'value': clean_number(row[start_idx + 1]) if start_idx + 1 < len(row) else 0,
        'change': clean_number(row[start_idx + 2]) if start_idx + 2 < len(row) else 0
    }

解析数据部分，每 3 列为一组，提取会员名称、数值和增减量。

7. 数据清洗函数 `clean_number`

def clean_number(value):
    """数据清洗：处理空值和千分位"""
    try:
        return int(str(value).replace(',', '')) if str(value).strip() not in ['', '-'] else 0
    except:
        return 0

处理空值和千分位，将数据转换为整数类型。

8. 主程序入口

if __name__ == "__main__":
    if not os.path.exists('czce_holdings'):
        os.makedirs('czce_holdings')
        print("请将数据文件放入 czce_holdings 目录后重新运行")
    else:
        process_futures_data()
        print("\n数据验证命令：")
        print("sqlite3 futures_data.db 'SELECT * FROM holdings WHERE member_name=\"中信期货（代客）\"'")

检查 czce_holdings 目录是否存在，如果不存在则创建该目录。
调用 process_futures_data 函数处理数据。
输出数据验证命令。

9. 运行结果

在这里插入图片描述

总结

通过以上代码，我们可以看到如何使用 Python 处理期货持仓和行情数据，并将其存储到 SQLite 数据库中。代码中使用了 pandas 进行数据处理和清洗，sqlite3 进行数据库操作，同时考虑了数据格式的变化和异常情况的处理。在实际应用中，我们可以根据需要对代码进行扩展，例如添加数据分析和可视化的功能。

希望本文能帮助读者更好地理解和应用 Python 进行期货数据处理和分析。如果你有任何问题或建议，欢迎在评论区留言。