python oracle批量导入_oracle大数据量python导入实践-1w/s

原标题:oracle大数据量python导入实践-1w/s

在项目中需要将一个80w+的csv数据(200+m)导入到oracle库,一开始使用的是Navicat for Oracle的导入工具。跑了五六分钟之后绝望了,因为才跑了2%,按这样的速度跑半天都跑不完。

回想了一下老本行mysql,觉得Oracle应该也会有一个批量插入的高效率语句的写法。于是翻看了oracle的官方文档精通 Oracle+Python,第 1 部分:查询最佳应践

一次多行

大型的插入操作不需求多次的单独插入,这是因为Python通过cx_Oracle.Cursor.executemany方法完全支持一次插入多行。

限制执行操作的数量极大地改善了程序性能,因此在编写存在大量插入操作的应用程序时应首先考虑这一功能。

呲!这就是我想要的。

迅速仿造官方的demo写了一个简易的导入脚本,跑了一下,138.7s也就是将近2分多钟就跑完了这200多m,差不多88w行的数据。附上性能测试

总行数:8856402017-05-1714:11:24===begin===2017-05-1714:11:24prepareend2017-05-1714:11:26=1-10000(9999)finish2017-05-1714:11:27=10001-20000(9999)finish2017-05-1714:11:28=20001-30000(9999)finish2017-05-1714:11:30=30001-40000(9999)finish2017-05-1714:11:31=40001-50000(9999)finish2017-05-1714:11:33=50001-60000(9999)finish2017-05-1714:11:34=60001-70000(9999)finish2017-05-1714:11:35=70001-80000(9999)finish2017-05-1714:11:37=80001-90000(9999)finish2017-05-1714:11:38=90001-100000(9999)finish2017-05-1714:11:39=100001-110000(9999)finish2017-05-1714:11:42=110001-120000(9999)finish2017-05-1714:11:44=120001-130000(9999)finish2017-05-1714:11:45=130001-140000(9999)finish2017-05-1714:11:46=140001-150000(9999)finish2017-05-1714:11:47=150001-160000(9999)finish2017-05-1714:11:49=160001-170000(9999)finish2017-05-1714:11:50=170001-180000(9999)finish2017-05-1714:11:51=180001-190000(9999)finish2017-05-1714:11:53=190001-200000(9999)finish2017-05-1714:11:56=200001-210000(9999)finish2017-05-1714:11:58=210001-220000(9999)finish2017-05-1714:12:00=220001-230000(9999)finish2017-05-1714:12:02=230001-240000(9999)finish2017-05-1714:12:04=240001-250000(9999)finish2017-05-1714:12:06=250001-260000(9999)finish2017-05-1714:12:10=260001-270000(9999)finish2017-05-1714:12:14=270001-280000(9999)finish2017-05-1714:12:16=280001-290000(9999)finish2017-05-1714:12:17=290001-300000(9999)finish2017-05-1714:12:19=300001-310000(9999)finish2017-05-1714:12:25=310001-320000(9999)finish2017-05-1714:12:26=320001-330000(9999)finish2017-05-1714:12:28=330001-340000(9999)finish2017-05-1714:12:29=340001-350000(9999)finish2017-05-1714:12:30=350001-360000(9999)finish2017-05-1714:12:51=360001-370000(9999)finish2017-05-1714:12:53=370001-380000(9999)finish2017-05-1714:12:54=380001-390000(9999)finish2017-05-1714:12:56=390001-400000(9999)finish2017-05-1714:12:58=400001-410000(9999)finish2017-05-1714:13:36=410001-420000(9999)finish2017-05-1714:13:37=420001-430000(9999)finish2017-05-1714:13:39=430001-440000(9999)finish2017-05-1714:13:40=440001-450000(9999)finish2017-05-1714:13:42=450001-460000(9999)finish2017-05-1714:13:43=460001-470000(9999)finish2017-05-1714:13:45=470001-480000(9999)finish2017-05-1714:13:47=480001-490000(9999)finish2017-05-1714:13:48=490001-500000(9999)finish2017-05-1714:13:50=500001-510000(9999)finish2017-05-1714:13:51=510001-520000(9999)finish2017-05-1714:13:52=520001-530000(9999)finish2017-05-1714:13:54=530001-540000(9999)finish2017-05-1714:13:56=540001-550000(9999)finish2017-05-1714:13:57=550001-560000(9999)finish2017-05-1714:13:59=560001-570000(9999)finish2017-05-1714:14:01=570001-580000(9999)finish2017-05-1714:14:03=580001-590000(9999)finish2017-05-1714:14:05=590001-600000(9999)finish2017-05-1714:14:06=600001-610000(9999)finish2017-05-1714:14:07=610001-620000(9999)finish2017-05-1714:14:09=620001-630000(9999)finish2017-05-1714:14:10=630001-640000(9999)finish2017-05-1714:14:11=640001-650000(9999)finish2017-05-1714:14:13=650001-660000(9999)finish2017-05-1714:14:14=660001-670000(9999)finish2017-05-1714:14:16=670001-680000(9999)finish2017-05-1714:14:17=680001-690000(9999)finish2017-05-1714:14:19=690001-700000(9999)finish2017-05-1714:14:20=700001-710000(9999)finish2017-05-1714:14:23=710001-720000(9999)finish2017-05-1714:14:28=720001-730000(9999)finish2017-05-1714:14:34=730001-740000(9999)finish2017-05-1714:14:37=740001-750000(9999)finish2017-05-1714:14:40=750001-760000(9999)finish2017-05-1714:14:43=760001-770000(9999)finish2017-05-1714:14:46=770001-780000(9999)finish2017-05-1714:14:49=780001-790000(9999)finish2017-05-1714:14:51=790001-800000(9999)finish2017-05-1714:14:55=800001-810000(9999)finish2017-05-1714:15:15=810001-820000(9999)finish2017-05-1714:15:17=820001-830000(9999)finish2017-05-1714:15:19=830001-840000(9999)finish2017-05-1714:15:20=840001-850000(9999)finish2017-05-1714:15:22=850001-860000(9999)finish2017-05-1714:15:24=860001-870000(9999)finish2017-05-1714:15:27=870001-880000(9999)finish2017-05-1714:15:28=880001-890000(5639)finish2017-05-1714:15:28executeend

请按任意键继续...

[Finishedin138.7s]

从这里可以看出,1w行的数据1s就可以搞定了,这效率杠杠的。文末献上粗糙的python导入脚本。

困难回顾

在实际操作时,可能会遇到以下问题

1.ORA-12899: value too large for column

大意就是值越界的,原来的数据库是没有这个问题的,于是猜测是编码集的原因,由于我用的是utf8,与库的编码集不一致,于是乎一个汉字被转成\uxxxx之类的。

解决方法:在脚本加入os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8',让数据库自行转换编码。

2. ORA-03135: connection lost contact

意思就是说与数据库的连接断开的,原因是我的执行时间太长,超过了Oracle设置允许的最大时间

解决方法:在Oracle安装目录下找到/network/admin/sqlnet.ora,如博主的目录/app/oracle/product/11.2.0/dbhome_1/network/admin/sqlnet.ora。在该文件追加

sqlnet.expire_time = 60

时间单位:分钟

取值范围:大于0

默认取值:无

以下是python demo

#-*-coding:utf8-*-importcx_Oracleimportcsvimporttimeimportos#编码转换os.environ['NLS_LANG']='SIMPLIFIEDCHINESE_CHINA.UTF8'M=[]

csvfile=file('C:\Users\Administrator\Desktop\Table.csv','rb')

reader=csv.reader(csvfile)forlineinreader:try:

M.append((line[0],line[1],line[2],line[3]))exceptAttributeError:passcsvfile.close()

#创建数据库连接conn=cx_Oracle.connect('oracle','123456','192.168.102.126:1521/ORCL')#获取操作游标cursor=conn.cursor()printlen(M)printtime.strftime('%Y-%m-%d%H:%M:%S',time.localtime(time.time()))print'===begin==='cursor.prepare(INSERTINTOMY_TABLE(ID,COMPANY,DEPARTMENT,NAME)VALUES(:1,:2,:3,:4))printtime.strftime('%Y-%m-%d%H:%M:%S',time.localtime(time.time()))print'prepareend'foriinrange(1,31):

begin=(i-1)*30000

end=i*30000

cursor.executemany(None,M[begin:end])printtime.strftime('%Y-%m-%d%H:%M:%S',time.localtime(time.time())),'=',begin,'-',end,'(',len(M[begin:end]),')','finish'printtime.strftime('%Y-%m-%d%H:%M:%S',time.localtime(time.time()))print'executeend'conn.commit()#885640printtime.strftime('%Y-%m-%d%H:%M:%S',time.localtime(time.time()))print'end'r=cursor.execute(SELECTCOUNT(*)FROMMY_TABLE)printcursor.fetchone()#关闭连接,释放资源cursor.close()

conn.close()

听说使用数据泵的导入导出效率更高,有机会尝试一下。

最新网 www.zuinow.com 2018-06-25 15:42:10

05 1714 finish2017 9999 14

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值