Python脚本访问Greenplum数据库安装指导

  • JDBC是什么?

    JDBC API是一个Java API,可以访问任何类型表列数据,特别是存储在关系数据库中的数据。JDBC代表Java数据库连接。

    JDBC库中所包含的API任务通常与数据库使用:

    • 连接到数据库

    • 创建SQL或MySQL语句

    • 在数据库中执行SQL或MySQL查询

    • 查看和修改记录

    安装前准备

    (1)操作系统(系统上面要安装一些必备的开发工具(比如gcc等))

    linux-82:/home/PyODBC # cat/etc/SuSE-release

    SUSE Linux EnterpriseServer 11 (x86_64)

    VERSION = 11

    PATCHLEVEL = 1

    (2)安装所需的软件包

    greenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.zip

    --GP官网下载,GP的JDBC和ODBC驱动

    pyodbc-3.0.6-1.x86_64.rpm

    --Python连接GP需要pyodbc驱动包

    unixODBC-2.2.12-204.3.1.x86_64.rpm

    --unixODBC的驱动管理器

    (3)将上面的包上传到搭建环境的服务器上面,比如/home/PyODBC

    安装GP驱动包

    (1) 解压greenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.zip

    unzipgreenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.zip

    (2) 执行解压后得到greenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.bin可执行文件

    linux-82:/home/PyODBC # ./greenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.bin

    ********************************************************************

    You must read and accept the EMCConnectivity license agreement

    before installing

    ********************************************************************

    SOFTWARE LICENSE ANDMAINTENANCE AGREEMENT

    *** IMPORTANT INFORMATION - PLEASE READ CAREFULLY ***

    ThisSoftware contains computer programs and other proprietary material and

    information,the use of which is subject to and expressly conditioned upon

    acceptanceof this Software License and Maintenance Agreement (the "Agreement").

    ThisAgreement is a legally binding document between you (meaning the individual

    person orthe entity that the individual represents that has obtained the

    Softwarefor its internal productive use and not for outright resale) (the

    "Customer")and EMC (which means (i) EMC Corporation, if Customer is located in

    theUnited States; (ii) the local EMC sales subsidiary, if Customer is located

    in acountry in which EMC Corporation has a local sales subsidiary; and (iii)

    EMCInformation Systems International ("EISI"), if Customer is locatedoutside

    theUnited States and in a country in which EMC Corporation does not have a

    localsales subsidiary). Unless EMC agrees otherwise in writing, this Agreement

    governsCustomer's use of the Software except to the extent all or any portion

    of theSoftware is: (a) the subject of a separate written agreement; or (b)

    governedby a third party licensor's terms and conditions. Capitalized terms

    havemeaning stated in the Agreement.

    IfCustomer does not have a currently enforceable, written and separately signed

    softwarelicense agreement directly with EMC or the Distributor from whom

    Customerobtained this Software, then by clicking on the "Agree" or"Accept" or

    similarbutton at the end of this Agreement, or proceeding with the

    ********************************************************************

    Do you accept the EMC Connectivity licenseagreement? [yes | no]

    ********************************************************************

    yes ---------同意许可

    ********************************************************************

    Providethe installation path for Greenplum Connectivity or press ENTER to

    acceptthe default installation path:/usr/local/greenplum-connectivity-4.3.0.0-build-2

    ********************************************************************

    ********************************************************************

    InstallGreenplum Connectivity into</usr/local/greenplum-connectivity-4.3.0.0-build-2>? [yes | no]

    ********************************************************************

    yes ----------------保持默认的安装路径,你也可以自由指定安装路径

    ********************************************************************

    /usr/local/greenplum-connectivity-4.3.0.0-build-2does not exist.

    Create/usr/local/greenplum-connectivity-4.3.0.0-build-2 ? [ yes | no ]

    (Selectingno will exit the installer)

    ********************************************************************

    yes ----------------创建安装目录

    Extractingproduct to /usr/local/greenplum-connectivity-4.3.0.0-build-2

    ********************************************************************

    Installationcomplete.

    GreenplumConnectivity is installed in /usr/local/greenplum-connectivity-4.3.0.0-build-2

    Greenplumdocumentation is available for download at http://powerlink.emc.com.

    ************************************************************************

    (3) 配置Greenplum DB数据库驱动

    查看安装目录时,如下:

    linux-82:/usr/local/greenplum-connectivity-4.3.0.0-build-2/drivers/odbc# ll

    total 24

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:49 psqlodbc-08.02.0400

    drwxr-xr-x 6 hadoop users 4096 2013-11-15 09:52 psqlodbc-08.02.0500

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:53 psqlodbc-08.03.0400

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:54 psqlodbc-08.04.0200

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:55 psqlodbc-09.00.0200

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:56psqlodbc-09.02.0100

    我们会看到有好几个版本的驱动,我们可以选择psqlodbc-08.02.0500版本的,再查看如下目录:

    linux-82:/usr/local/greenplum-connectivity-4.3.0.0-build-2/drivers/odbc/psqlodbc-08.02.0500# ll

    total 48

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:51 datadirect-51sp2_64

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:51 datadirect-52_64

    drwxr-xr-x 3 hadoop users 4096 2013-11-15 09:52 datadirect-53sp2_64

    -r--r--r-- 1 hadoop users 25746 2013-11-15 09:50 license.txt

    -r--r--r-- 1 hadoop users 1383 2013-11-15 09:50 readme.txt

    drwxr-xr-x 4hadoop users 4096 2013-11-15 09:50unixodbc-2.2.12

    同样我们可以看到驱动管理器。

    鉴于GP基于Postgresql8.2版本,我们这里面选择驱动为psqlodbc-08.02.0500,驱动管理器选择为datadirect-52_64。

    所以,我们修改greenplum_connectivity_path.sh文件中的内容:

    GP_ODBC_DRIVER=psqlodbc-08.02.0500 --值与实际目录名称相同

    GP_ODBC_DRIVER_MANAGER=datadirect-52_64 --值与实际目录名称相同

    注:该文件默认权限位444,是不允许编辑的,你可以手动修改文件的权限,也可以修改整个安装目录的权限位755,如下:

    chmod -R 755greenplum-connectivity-4.3.0.0-build-2

    保存greenplum_connectivity_path.sh后,要记得source,使环境变量生效,如下:

    source greenplum_connectivity_path.sh

    安装unixODBC驱动

    (1) 直接安装RPM包

    rpm -ivhunixODBC-2.2.12-204.3.1.x86_64.rpm

    (2) 查看RPM包安装路径

    linux-82:/home/PyODBC# rpm -ql unixODBC-2.2.12-204.3.1

    /etc/unixODBC

    /etc/unixODBC/ODBCDataSources

    /etc/unixODBC/odbc.ini

    /etc/unixODBC/odbcinst.ini

    /usr/bin/dltest

    /usr/bin/isql

    /usr/bin/iusql

    /usr/bin/odbc_config

    /usr/bin/odbcinst

    /usr/lib64/libboundparam.so.1

    /usr/lib64/libboundparam.so.1.0.0

    /usr/lib64/libgtrtst.so.1

    /usr/lib64/libgtrtst.so.1.0.0

    /usr/lib64/libodbc.so

    /usr/lib64/libodbc.so.1

    /usr/lib64/libodbc.so.1.0.0

    /usr/lib64/libodbccr.so.1

    /usr/lib64/libodbccr.so.1.0.0

    /usr/lib64/libodbcinst.so

    /usr/lib64/libodbcinst.so.1

    /usr/lib64/libodbcinst.so.1.0.0

    /usr/lib64/unixODBC

    /usr/lib64/unixODBC/libesoobS.so

    /usr/lib64/unixODBC/libesoobS.so.1

    /usr/lib64/unixODBC/libesoobS.so.1.0.0

    /usr/lib64/unixODBC/libmimerS.so

    /usr/lib64/unixODBC/libmimerS.so.1

    /usr/lib64/unixODBC/libmimerS.so.1.0.0

    /usr/lib64/unixODBC/libnn.so

    /usr/lib64/unixODBC/libnn.so.1

    /usr/lib64/unixODBC/libnn.so.1.0.0

    /usr/lib64/unixODBC/libodbcdrvcfg1S.so

    /usr/lib64/unixODBC/libodbcdrvcfg1S.so.1

    /usr/lib64/unixODBC/libodbcdrvcfg1S.so.1.0.0

    /usr/lib64/unixODBC/libodbcdrvcfg2S.so

    /usr/lib64/unixODBC/libodbcdrvcfg2S.so.1

    /usr/lib64/unixODBC/libodbcdrvcfg2S.so.1.0.0

    /usr/lib64/unixODBC/libodbcminiS.so

    /usr/lib64/unixODBC/libodbcminiS.so.1

    /usr/lib64/unixODBC/libodbcminiS.so.1.0.0

    /usr/lib64/unixODBC/libodbcmyS.so

    /usr/lib64/unixODBC/libodbcmyS.so.1

    /usr/lib64/unixODBC/libodbcmyS.so.1.0.0

    /usr/lib64/unixODBC/libodbcnnS.so

    /usr/lib64/unixODBC/libodbcnnS.so.1

    /usr/lib64/unixODBC/libodbcnnS.so.1.0.0

    /usr/lib64/unixODBC/libodbcpsql.so

    /usr/lib64/unixODBC/libodbcpsql.so.1

    /usr/lib64/unixODBC/libodbcpsql.so.1.0.0

    /usr/lib64/unixODBC/libodbcpsql.so.2

    /usr/lib64/unixODBC/libodbcpsql.so.2.0.0

    /usr/lib64/unixODBC/libodbcpsqlS.so

    /usr/lib64/unixODBC/libodbcpsqlS.so.1

    /usr/lib64/unixODBC/libodbcpsqlS.so.1.0.0

    /usr/lib64/unixODBC/libodbctxtS.so

    /usr/lib64/unixODBC/libodbctxtS.so.1

    /usr/lib64/unixODBC/libodbctxtS.so.1.0.0

    /usr/lib64/unixODBC/liboplodbcS.so

    /usr/lib64/unixODBC/liboplodbcS.so.1

    /usr/lib64/unixODBC/liboplodbcS.so.1.0.0

    /usr/lib64/unixODBC/liboraodbcS.so

    /usr/lib64/unixODBC/liboraodbcS.so.1

    /usr/lib64/unixODBC/liboraodbcS.so.1.0.0

    /usr/lib64/unixODBC/libsapdbS.so

    /usr/lib64/unixODBC/libsapdbS.so.1

    /usr/lib64/unixODBC/libsapdbS.so.1.0.0

    /usr/lib64/unixODBC/libtdsS.so

    /usr/lib64/unixODBC/libtdsS.so.1

    /usr/lib64/unixODBC/libtdsS.so.1.0.0

    /usr/lib64/unixODBC/libtemplate.so

    /usr/lib64/unixODBC/libtemplate.so.1

    /usr/lib64/unixODBC/libtemplate.so.1.0.0

    /usr/share/doc/packages/unixODBC

    /usr/share/doc/packages/unixODBC/AUTHORS

    /usr/share/doc/packages/unixODBC/COPYING

    /usr/share/doc/packages/unixODBC/ChangeLog

    /usr/share/doc/packages/unixODBC/NEWS

    /usr/share/doc/packages/unixODBC/README

    /usr/share/doc/packages/unixODBC/README.GTK

    /usr/share/doc/packages/unixODBC/README.SuSE

    /usr/share/doc/packages/unixODBC/index.html

    /usr/share/doc/packages/unixODBC/smallbook.gif

    /usr/share/doc/packages/unixODBC/unixODBC.gif

    可以可看到unixODBC配置文件位于/etc/unixODBC下面:

    linux-82:/etc/unixODBC # ll /etc/unixODBC

    total 4

    drwxr-xr-x 2 root root 4096 2011-06-15 00:32 ODBCDataSources

    -rw-r--r-- 1 root root 02011-06-15 00:32 odbc.ini

    -rw-r--r-- 1 root root 02011-06-15 00:32 odbcinst.ini

    (3) 编辑unixODBC的两个配置文件,如下:

    linux-82:/etc/unixODBC # cat /etc/unixODBC/odbc.ini

    [GreenplumDSN]

    Driver =Greenplum ----值要和/etc/unixODBC/odbcinst.ini中名字一致

    Trace = 1

    Debug=1

    Database = noas ----GP数据库名

    Servername = 10.41.24.102 ----GP的IP地址

    UserName = noas ----GP用户名

    Password = noas ----GP用户密码

    Port = 5432 ----GP访问端口号

    ReadOnly = No

    RowVersioning = No

    DisallowPremature = No

    ShowSystemTables = Yes

    ShowOidColumn = No

    FakeOidIndex = No

    useDeclareFetch = 1

    Fetch = 4096

    UpdatableCursors = Yes

    Protocol = 7.4-1

    linux-82:/etc/unixODBC # cat /etc/unixODBC/odbcinst.ini

    [Greenplum]

    Description = PostgreSQL driver forGreenplum

    Driver=/usr/local/greenplum-connectivity-4.3.0.0-build-2/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/psqlodbcw.so ------GP的ODBC驱动

    UsageCount=1

    FileUsage= 1

    (4) 使用isql测试

    linux-82:/etc/unixODBC # isql GreenplumDSN

    +---------------------------------------+

    | Connected! |

    | |

    | sql-statement |

    | help [tablename] |

    | quit |

    | |

    +---------------------------------------+

    SQL> select user;

    +-----------------------------------------------------------------+

    | current_user |

    +-----------------------------------------------------------------+

    | noas |

    +-----------------------------------------------------------------+

    SQLRowCount returns -1

    1 rows fetched

    SQL>

    注:如果你在执行isql时,出现如下情况:

    linux-82:/etc/unixODBC # isql GreenplumDSN

    [ISQL]ERROR: Could not SQLConnect

    这个问题很大情况下是你没有source文件greenplum_connectivity_path.sh导致的,执行source greenplum_connectivity_path.sh文件后,再执行就OK了,最好的办法是将source该文件加入到系统环境变量中。

    安装pyodbc驱动

    (1) 直接安装RPM包

    linux-82:/home/PyODBC # rpm -ivh pyodbc-3.0.6-1.x86_64.rpm

    (2) 查看pyodbc安装目录

    linux-82:/home/PyODBC # rpm -qlpyodbc-3.0.6-1

    /usr/local/lib64/python2.6/site-packages/pyodbc-3.0.6-py2.6.egg-info

    /usr/local/lib64/python2.6/site-packages/pyodbc.so

    测试python脚本

    (1) 准备python测试脚本,如下:

    linux-82:/etc/unixODBC # cat hellokitty.py

    #!/usr/bin/python

    #-*- encoding: utf-8 -*-

    ####################################################################

    # name: hellokitty.py

    # describe: 测试python访问Greenplum数据库

    ########################################################################

    import pyodbc

    import sys

    reload(sys)

    sys.setdefaultencoding('utf8')

    class GreenplumTest:

    debug = 1

    def __init__(self,dbinfo):

    self.UID = dbinfo[1]

    self.PWD = dbinfo[2]

    odbcinfo ='DSN=%s;UID=%s;PWD=%s'%(dbinfo[0],dbinfo[1],dbinfo[2])

    self.cnxn =pyodbc.connect(odbcinfo,autocommit=True,ansi=True)

    self.cursor =self.cnxn.cursor()

    def __del__(self):

    if self.cursor:

    self.cursor.close()

    if self.cnxn:

    self.cnxn.close()

    def _printinfo(self,msg):

    print"%s"%(msg)

    print "\n"

    def testsql(self):

    # 获取表的所有字段

    sql = '''

    select * from hello1 h3 join hello2h4 on h3.id = h4.id;

    '''

    self.cursor.execute(sql.strip())

    row = self.cursor.fetchall()

    return row

    #Main

    defmain():

    # 检查传入参数个数

    if len(sys.argv) < 4 :

    print 'usage: python SybaseDSN usernamepassword\n'

    sys.exit(1)

    # 定义连接GP的信息

    dbinfo = []

    dbinfo.append(sys.argv[1])

    dbinfo.append(sys.argv[2])

    dbinfo.append(sys.argv[3])

    GPT= GreenplumTest(dbinfo)

    ret = GPT.testsql()

    return ret

    if__name__ == '__main__':

    sys.exit(main())

    (2) 测试过程:

    linux-82:/etc/unixODBC # python hellokitty.py GreenplumDSN noas noas

    [(3, 'hello3', 3, 'hello3')]

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1、脚本运行环境python3.6 2、脚本忽略LOB字段数据 3、脚本生成以^为分割符的csv格式文件 4、脚本避免转义字符将‘\’替换为‘/’,数据中存在分割符:^转换为# 5、特殊字符处理:删除:'\u0000','"',oracle数据回车符:'\r','\n' 6、支持按分区导入 7、支持内存处理数据导入greenplum,导入失败生成csv格式文件,方便排错 8、支持生成csv格式文件导入greenplum。 9、传递不同配置文件,多次执行达到并行处理(注意内存溢出) 例: 执行脚本 生成日志目录 配置文件 python3 Main_Mem.py ProcessLog1 config1.ini & python3 Main_Mem.py ProcessLog2 config2.ini & python3 Main_Mem.py ProcessLog3 config3.ini & python3 Main_Mem.py ProcessLog4 config4.ini & 10、采用python 中的copy_from方法实现导入greenplum 11、各目录用途: conf :存放配置文件 etl_dat: 存放导出数据 log :存放输出日志 py_tool 存放编写工具脚本: DBconn: 数据库连接池配置及查询返回方法 DumpCsv:导出csv文件方法 Log:日志输出方法 DataBase:判断表是否存在、表或分区表是否有数据、是否为分区表、数据导出、表分区遍历、指定分区表导出等方法 py_main:主程序目录: Main_Mem.py:主程序目录: 运行:python3 Main_Mem.py ProcessLog1 config.ini 12、主程序需要修改目录参数: if __name__ == '__main__': #引用编写的包 sys.path.append('/home/oracle/PyETL2.0/py_tool') import Log,DataBase,DumpCsv #输出日志路径: path = '/home/oracle/PyETL2.0/log/'+sys.argv[1] isExists = os.path.exists(path) if not isExists: os.makedirs(path) #shutil.move(path,path+) logger = Log.log(path) #读取配置文件 config = Confile('/home/oracle/PyETL2.0/conf/', sys.argv[2]) 13、后期还会更新,欢迎提供宝贵意见。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值