Python随机生成上亿数据,并写入greenplum数据库中

最近项目中需要自己制造大量数据,并对greenplum的插件--gptext(全文检索)的性能进行测试。如果直接用python进行写入,性能太差,每小时大约插入几十万的数据。后面想到解决方法,先用python生成数据,写入到文本文件中,后使用greenplum的copy函数,进行数据的批量导入,大大节约了的时间,提升了效率:

1、使用Python随机生成数据:

具体方法见:https://blog.csdn.net/weixin_43315211/article/details/87929993

2、写入文本文件:

def write_to_csv():
    count=0
    with open('C:\\Users\\Administrator\\Desktop\\people_info.csv', 'a') as  f:
        for i in range (10000):
        count+=1
        items=mkitems()  ##mkitems()是随机生成信息的函数,返回的是一个字典
        j=items.values()
        f.writelines(",".join(j) + '\n')
        if (count%1000==0):
            print(count)

3、使用copy函数写入greenplum数据库:

copy函数的使用方法:

COPY table [(column [, ...])] FROM {'file' | STDIN}
     [ [WITH] 
       [OIDS]
       [HEADER]
       [DELIMITER [ AS ] 'delimiter']
       [NULL [ AS ] 'null string']
       [ESCAPE [ AS ] 'escape' | 'OFF']
       [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
       [CSV [QUOTE [ AS ] 'quote'] 
            [FORCE NOT NULL column [, ...]]
       [FILL MISSING FIELDS]
     [ [LOG ERRORS INTO error_table] [KEEP] 
       SEGMENT REJECT LIMIT count [ROWS | PERCENT] ]

COPY {table [(column [, ...])] | (query)} TO {'file' | STDOUT}
      [ [WITH] 
        [OIDS]
        [HEADER]
        [DELIMITER [ AS ] 'delimiter']
        [NULL [ AS ] 'null string']
        [ESCAPE [ AS ] 'escape' | 'OFF']
        [CSV [QUOTE [ AS ] 'quote'] 
             [FORCE QUOTE column [, ...]] ]
copy people_info(id_number,name,birthday,gender,phone,birth_place) from '/home/people_info.csv' with header delimiter ',' csv;

 

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值