我们要从MySQL当中导出数据到Greenplum当中,按照以下步骤就可以
1:将MySQL当中的表导出外部文件
以schema_name.table_name为例
selectproduct_id,number, name, english_name, purchase_name, system_name, bar_code, category_one, category_two, category_three,
parent_id, parent_number, brand_id, supplier_id, price, ad_word, give_integral, shelf_life, FROM_UNIXTIME(shelve_date), product_area, country,
sale_unit, specification, weight, length, width, height, storage_conditions, storage, model, refuse_notes, status, is_promote,
is_gift, is_book, is_outgoing, is_presale, is_fragile, is_have, is_cod, is_return, is_oos, is_seasonal, is_multicity, is_package, is_show, click,
favorite, min_purchase_unit, in_price, refer_in_price, mwaverage_price, is_unique_number, is_batch_number, qs_proportion, shelf_life_proportion, box_specification,
max_unsalable, advent_shelves, pro_warning, FROM_UNIXTIME(add_time), operator_id,FROM_UNIXTIME( audit_time), remark, price_type, new_tag, product_type, business_model, is_sell, return_policy,
package, inventory, merchant_number, modified_time ,now()from schema_name.table_name INTO OUTFILE ‘/tmp/table_name.txt‘;
导的时候需要注意,一些字符的转换,对于这张表来说,主要就是在MySQL当中一些时间格式存储的为INT类型,我们需要进行转化后然后导出,而且在Greenplum当中建表的时候会多一个时间字段,我们这里默认导出现在时间。按照以上格式进行导出。
2:将文件拷贝到Greenplum服务器上,并且创建外部表
先将文件拷贝到外部表的目录下,这个比较简单,什么方法都可以,然后创建外部表:
create external TABLE schema_name.table_name_ext( product_id int,number varchar(10),
namevarchar(100),
english_namevarchar(100),
purchase_namevarchar(100),
system_namevarchar(100),
bar_codevarchar(255),
category_oneint,
category_twoint,
category_threeint,
parent_idint,
parent_numberint,
brand_idint,
supplier_idint,
priceint,
ad_wordvarchar(100),
give_integralint,
shelf_lifeint,
shelve_datetimestampwithout time zone,
product_areaint,
countryint,
sale_unitvarchar(20),
specificationvarchar(255),
weightdecimal(10,2) ,
lengthint,
widthint,
heightint,
storage_conditionsvarchar(255),
storagesmallint,
modelvarchar(20),
refuse_notesvarchar(255),
statussmallint,
is_promotesmallint,
is_giftsmallint,
is_booksmallint,
is_outgoingsmallint,
is_presaleint,
is_fragilesmallint,
is_havesmallint,
is_codsmallint,
is_returnsmallint,
is_oossmallint,
is_seasonalsmallint,
is_multicitysmallint,
is_packagesmallint,
is_showsmallint,
clickint,
favoriteint,
min_purchase_unitint,
in_priceint,
refer_in_priceint,
mwaverage_priceint,
is_unique_numberint,
is_batch_numberint,
qs_proportionint,
shelf_life_proportionDOUBLE PRECISION,
box_specificationvarchar(50),
max_unsalableint,
advent_shelvesint,
pro_warningint,
add_timetimestampwithout time zone,
operator_idint,
audit_timetimestampwithout time zone,
remarkvarchar(255),
price_typesmallint,
new_tagint,
product_typeint,
business_modelsmallint,
is_sellsmallint,
return_policysmallint,
packagevarchar(200),
inventoryvarchar(200),
merchant_numberint,
modified_timetimestampwithout time zone,
dw_modified_timetimestampwithout time zone
) location(‘gpfdist://172.16.16.34:9888/table_name.txt‘)
FORMAT‘TEXT‘SEGMENT REJECT LIMIT1000000rows ;
这里我们要指定‘gpfdist://10.102.35.192:9888/table_name.txt‘,这个IP地址加上外部表就可以了,后面要把这个文件拷贝到 gpfdist 的目录当中,我们看下启动方式gpfdist -d /tmp -p 9888,也就是要把外部文件拷贝到/tmp目录下才可以。其他的注意列名对应就好
然后查询一下,一般情况列对上就不会有问题。
3:导入到Greenplum当中正式表
先创建一张正式表:
create table schema_name.table_name ( product_id int,number varchar(10),
namevarchar(100),
english_namevarchar(100),
purchase_namevarchar(100),
system_namevarchar(100),
bar_codevarchar(255),
category_oneint,
category_twoint,
category_threeint,
parent_idint,
parent_numberint,
brand_idint,
supplier_idint,
priceint,
ad_wordvarchar(100),
give_integralint,
shelf_lifeint,
shelve_datetimestampwithout time zone,
product_areaint,
countryint,
sale_unitvarchar(20),
specificationvarchar(255),
weightdecimal(10,2) ,
lengthint,
widthint,
heightint,
storage_conditionsvarchar(255),
storagesmallint,
modelvarchar(20),
refuse_notesvarchar(255),
statussmallint,
is_promotesmallint,
is_giftsmallint,
is_booksmallint,
is_outgoingsmallint,
is_presaleint,
is_fragilesmallint,
is_havesmallint,
is_codsmallint,
is_returnsmallint,
is_oossmallint,
is_seasonalsmallint,
is_multicitysmallint,
is_packagesmallint,
is_showsmallint,
clickint,
favoriteint,
min_purchase_unitint,
in_priceint,
refer_in_priceint,
mwaverage_priceint,
is_unique_numberint,
is_batch_numberint,
qs_proportionint,
shelf_life_proportionDOUBLE PRECISION,
box_specificationvarchar(50),
max_unsalableint,
advent_shelvesint,
pro_warningint,
add_timetimestampwithout time zone,
operator_idint,
audit_timetimestampwithout time zone,
remarkvarchar(255),
price_typesmallint,
new_tagint,
product_typeint,
business_modelsmallint,
is_sellsmallint,
return_policysmallint,
packagevarchar(200),
inventoryvarchar(200),
merchant_numberint,
modified_timetimestampwithout time zone,
dw_modified_timetimestampwithout time zone
)distributed by(product_id);
然后导入数据:
insert intoschema_name.table_name
select * fromschema_name.table_name_ext
这样就把外部表数据导出到了内部表,均匀分布在每个segment上。注意schema_name.table_name的结构要和schema_name.table_name_ext是一致的。
原文:https://www.cnblogs.com/shengdimaya/p/8762059.html