业务需求:在没有sqoop的基础上,将oracle归集库的数据迁移至hive作为ods数据源,mysql数据库中的指标表迁移至hive作为dw层。
在MySQL中生成hive建表语句:
#!/usr/bin/python
# -*- coding: utf-8 -*-
table_name = [
'zzlq'
,'z_zzjgdw_base_info'
,'z_zzjg_base_info'
,'z_za_base_info'
]
for i in range(len(table_name)):
sql1 = """SELECT
CONCAT('create table ','%s','(')
UNION ALL
SELECT
CONCAT(
COLUMN_NAME,
' ',
CASE
WHEN DATA_TYPE in ('varchar','longtext','char','datetime','timestamp','varbinary','bit','mediumtext','set','longblob','text','blob','time','date') THEN
'string'
WHEN DATA_TYPE = 'decimal' THEN
COLUMN_TYPE
WHEN DATA_TYPE = 'float' THEN
'double'
ELSE
DATA_TYPE
END -- 数据类型转换
,
' comment ',
'\\'',
CASE
WHEN COLUMN_COMMENT is NULL THEN
COLUMN_NAME
ELSE
replace(COLUMN_COMMENT,';',',')
END,
'\\','
)
FROM
information_schema. COLUMNS t1
WHERE
t1.table_schema = 'lqioc_ioc_yw'
and t1.TABLE_NAME = '%s'
UNION ALL
SELECT
concat(
')',
'COMMENT \\'',
COALESCE (t2.TABLE_COMMENT ,'%s'),
'\\'
-- PARTITIONED BY (DATE STRING COMMENT \\'日期分区\\') -- 分区表取消注释
ROW FORMAT DELIMITED FIELDS TERMINATED BY \\',\\' STORED AS TEXTFILE;'
)
FROM
information_schema. TABLES t2
WHERE
t2.table_schema = 'lqioc_ioc_yw'
and t2.table_name = '%s'
union all
""" % (table_name[i],table_name[i],table_name[i],table_name[i])
print(sql1)
在Oracle中生成建表语句:
#!/usr/bin/python
# -*- coding: utf-8 -*-
table_name = ['ZZXXXX'
,'ZYPXXXXX'
,'ZYJQLYJDQKB'
,'ZYAZ'
,'ZXYHXX'
,'ZSJH'
]
for i in range(len(table_name)):
sql1 = """SELECT 'create table lqioc_ioc_ods.' || '%s' || ' (' FROM dual UNION ALL
SELECT
col
FROM
(
SELECT
T .column_name || CASE
WHEN c.DATA_TYPE IN (
'CHAR',
'NCHAR',
'VARCHAR',
'VARCHAR2',
'NVARCHAR2',
'DATE',
'TIMESTAMP',
'TIMESTAMP WITH TIME ZONE',
'TIMESTAMP WITH LOCAL TIME ZONE',
'INTERVAL YEAR TO MOTH',
'INTERVAL DAY TO SECOND',
'BLOB',
'CLOB',
'NCLOB',
'BFILE',
'RAW',
'LONG RAW'
) THEN
' STRING '
WHEN C.DATA_TYPE = 'INTEGER' THEN
' BIGINT '
WHEN C.DATA_TYPE = 'NUMBER' THEN
(
CASE
WHEN C.DATA_SCALE IS NOT NULL
AND c.DATA_SCALE <> 0 THEN
' DECIMAL(' || C.DATA_PRECISION || ',' || C.DATA_SCALE || ') '
WHEN C.DATA_PRECISION < 3 THEN
' TINYINT '
WHEN C.DATA_PRECISION < 5 THEN
' SMALLINT '
WHEN C.DATA_PRECISION < 10 THEN
' INT '
ELSE
' BIGINT '
END
)
WHEN C.DATA_TYPE IN (
'BINARY_FLOAT',
'BINARY_DOUBLE',
'FLOAT'
) THEN
' DOUBLE '
ELSE
' STRING '
END || 'comment ''' || REGEXP_REPLACE (
T .comments,
'[' || CHR (10) || CHR (13) || CHR (9) || CHR (32) || ']',
''
) || ''',' col
FROM
all_COL_COMMENTS T,
all_TAB_COLUMNS c
WHERE
c.column_name = T .column_name
AND c. OWNER = T . OWNER
AND c.TABLE_NAME = T .TABLE_NAME
AND c. OWNER = 'SJGJ'
AND c.TABLE_NAME = '%s'
ORDER BY
c.COLUMN_ID
)
UNION ALL
SELECT
')ROW FORMAT DELIMITED FIELDS TERMINATED BY ''\001'' TBLPROPERTIES(''creator''=''sunruzi'',''create_at''=''' || TO_CHAR (
SYSDATE,
'yyyy-MM-dd hh24:mi:ss'
) || ''');'
FROM
all_tab_comments T
WHERE
OWNER = 'SJGJ'AND table_name = '%s'
union all """ % (table_name[i],table_name[i],table_name[i])
print(sql1)
看了这么多,写了这么多,后来发现一篇好文章有了灵感:
《使用pyspark模仿sqoop从oracle导数据到hive的主要功能(自动建表,分区导入,增量,解决数据换行符问题)》
迫不及待的去测试,提前说下是python2.x的,改之!我走了~~~~~~~~~~~~~~~~~~~~~~~~~~