I need to access tables from Impala through CLI using python on the same cloudera server
I have tried below code to establish the connection :
def query_impala(sql):
cursor = query_impala_cursor(sql)
result = cursor.fetchall()
field_names = [f[0] for f in cursor.description]
return result, field_names
def query_impala_cursor(sql, params=None):
conn = connect(host='xx.xx.xx.xx', port=21050, database='am_playbook',user='xxxxxxxx', password='xxxxxxxx')
cursor = conn.cursor()
cursor.execute(sql.encode('utf-8'), params)
return cursor
but since I am on the same cloudera server, I will not need to provide the host name. Could you please provide the correct code to access Impala/hive tables existing on the same server through python.
解决方案
you can use pyhive to make connection to hive and get access to your hive tables.
from pyhive import hive
import pandas as pd
import datetime
conn = hive.Connection(host="hostname", port=10000, username="XXXX")
hive.connect('hostname', configuration={'hive.execution.engine':'tez'})
query="select col1,col2,col3,col4 from db.yourhiveTable"
start_time= datetime.datetime.now()
data=pd.read_sql(query,conn)
print(data)
end_time=datetime.datetime.now()
print 'Finished reading from Hive table', (start_time-end_time).seconds/60.0,' minutes'