这段代码主要的功能是从一个ODPS表中获取表和列的元组数据信息，并将这些信息写入到另外两个ODPS表中

最新推荐文章于 2024-02-28 10:09:23 发布

weixin_45988458

最新推荐文章于 2024-02-28 10:09:23 发布

阅读量258

点赞数

文章标签： odps 大数据数据仓库

本文链接：https://blog.csdn.net/weixin_45988458/article/details/132445846

版权

import os
import sys
import json
import time
import odps
from odps.df import DataFrame

reload(sys)
sys.setdefaultencoding('utf-8' )

# 远程连接到ODPS，本地连接就不需要这一步
# o = odps. ODPS('Endpoint', 'AccessKeyId', 'AcessKeySecret', 'Project')

# 从ODPS获取DataFrame
iris = DataFrame(o.get_table('表名'))

partition_column = None   #odps表没有分区
partition_column_ds = 'ds'
table_list = []
column_list = []

# 遍历 iris DataFrame 的每一行数据
for record in iris.execute():
    status = 0
    project_space = record['name_space']
    ename = record['table_name']

    # 检查表是否存在
    if o.exist_table(ename, project=project_space):
        status = 1
        t = o.get_table(ename, project=project_space)
        id = t.table_id  # 表主键
        data_size = t.size  # 存储量
        desc = t.comment  # 描述
        create_time = t.creation_time  # 创建时间
        update_time = t.last_modified_time  # 最后修改时间
        table_last_meta_modified_time = t.last_meta_modified_time  # 最后DDL时间

        # 执行SQL查询统计表记录数
        with o.execute_sql('SELECT COUNT(1) as cs FROM ' + ename, project=project_space).open_reader() as reader:
             count = reader[0]['cs']
             print('Count:', count)

        table_list.append([id, ename, None, None, project_space, None, None, None, status, update_time, create_time, data_size, count, desc, None])
        print([id, ename, None, None, project_space, None, None, None, status, update_time, create_time, data_size, count, desc, None])

        table_columns = t.schema.names  # 字段
        for col_ename in table_columns:
            col_name = t.schema[col_ename].comment  # 字段中文名
            type = str(t.schema[col_ename].type)  # 类型
            column_list.append([None, id, ename, col_name, col_ename, None, None, type, ename, project_space])
    else:
        table_list.append([None, ename, None, None, project_space, None, None, None, status, None, None, None, count, None, None])

# 获取 'fct_cst_tables_df' 表和 'fct_cst_columns_df' 表对象
_table = o.get_table('fct_cst_tables_df', project='project_space')
_column = o.get_table('fct_cst_columns_df', project='project_space')

# 将结果写入 'fct_cst_tables_df' 表和 'fct_cst_columns_df' 表
o.write_table('project_space.fct_cst_tables_df', table_list)
o.write_table('project_space.fct_cst_columns_df', column_list)

具体步骤

导入必要的模块和包。
创建ODPS连接，使用你提供的访问凭证信息和项目配置。（本地连接不需要这一步，远程连接才要）
通过DataFrame对象读取fct_cst_mqsjzyb_df表的数据。
遍历这个表的每一行记录，获取相关的信息并存储。
- 如果表在ODPS中存在，则获取表的ID、存储量、描述、创建时间等信息，并存储到table_list中。
  - 同时获取表的字段信息，包括字段中文名、类型等，并存储到column_list中。
- 如果表在ODPS中不存在，只将表的名称、所属项目等信息存储到table_list中。
获取目标表fct_cst_tables_df和fct_cst_columns_df。
创建表分区。
将表信息写入到fct_cst_tables_df表中。
将列信息写入到fct_cst_columns_df表中。