系统环境:
- CentOS 7.8.2003
首先介绍一下superset,superset是由aribnb开源,现在由Apache进行孵化,采用B/S架构,Python语言开发的商业智能软件,也就是BI。
印象较深的几个特点:
- 采用Python语言进行开发,Flask框架,对于Python开发者的二次开发非常有帮助;
- 使用Docker进行部署,对于运维人员来说是非常友好;
- 支持的数据库非常多,只要在SQLAlchemy支持范围内
其他特性参考:
https://zhuanlan.zhihu.com/p/100526555
安装
支持Docker和普通安装方式,对于嫌麻烦的我自然选择Docker
>>> git clone https://github.com/apache/incubator-superset/
>>> cd incubator-superset
>>> docker-compose up
你会发现一切只需要等待就好了,但是因为网络问题,时间可能会比较长,而且中间可能会中断,多试几次就好了
创建数据库
默认端口为8088
访问http://你的ip:8088
superset_app | DEBUG:superset.models.core:Database.get_sqla_engine(). Masked URL: impala://10.6.207.5:21150/bqdb;AuthMech=3
superset_app | INFO:superset.views.core:Invalid driver Can't load plugin: sqlalchemy.dialects:impala
修改源码
出现上面错误的原因是由于没有安装impyla包
# 清除镜像
>>> docker-compose down --rmi all
# 修改文件
>>> vim requirements.txt
...
impyla
# 如果有Kerberos认证则还需要thrift_sasl==0.2.1
thrift_sasl==0.2.1
# 在安装thrift_sasl==0.2.1前需要apt安装3个依赖,否则会报错
>>> vim setup.py
# 在extras_require字典里加如下代码
"thrift_sasl": ["thrift_sasl==0.2.1"],
"impyla": ["impyla"],
>>> vim Dockerfile-dev
FROM preset/superset:dev
COPY ./requirements* ./docker/requirements* /app/
USER root
RUN cd /app \
&& apt-get update -y \
&& apt-get install -y --no-install-recommends \
build-essential \
libsasl2-dev \
libsasl2-2 \
libsasl2-modules-gssapi-mit \
&& rm -rf /var/lib/apt/lists/* \
&& pip install -e . \
&& pip install --no-cache -r requirements.txt -r requirements-dev.txt \
&& pip install --no-cache -r requirements-extra.txt \
&& pip install --no-cache -r requirements-local.txt || true
USER superset
superset_app | ERROR:impala.hiveserver2:Failed to open transport (tries_left=1)
superset_app | Traceback (most recent call last):
superset_app | File "/usr/local/lib/python3.6/site-packages/impala/hiveserver2.py", line 1009, in _execute
superset_app | return func(request)
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/thrift.py", line 219, in _req
superset_app | return self._recv(_api)
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/thrift.py", line 231, in _recv
superset_app | fname, mtype, rseqid = self._iprot.read_message_begin()
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/protocol/binary.py", line 373, in read_message_begin
superset_app | self.trans, strict=self.strict_read)
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/protocol/binary.py", line 165, in read_message_begin
superset_app | sz = unpack_i32(inbuf.read(4))
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/transport/base.py", line 60, in read
superset_app | return readall(self._read, sz)
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/transport/base.py", line 12, in readall
superset_app | chunk = read_fn(sz - have)
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/transport/buffered/__init__.py", line 41, in _read
superset_app | buf = self._trans.read(max(rest_len, self._buf_size))
superset_app | File "/usr/local/lib/python3.6/site-packages/thriftpy2/transport/socket.py", line 132, in read
superset_app | message='TSocket read 0 bytes')
superset_app | thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')
superset_app | DEBUG:impala.hiveserver2:Closing transport (tries_left=1)
superset_app | DEBUG:impala.hiveserver2:Closing HS2 connection
superset_app | DEBUG:impala.hiveserver2:close_service: client=<thriftpy2.thrift.TClient object at 0x7f6e5c971898>
superset_app | ERROR:superset.views.core:Unexpected error (impala.error.HiveServer2Error) Failed after retrying 3 times
出现上面的问题可能是由于我们的数据库加了Kerberos认证,这导致普通的SQLAlchemy create_engine方法失效了,查了半天找到了superset中create_engine的位置:
>>> vim superset/models/core.py
# 在开头加如下代码
from impala.dbapi import connect
# 在get_sqla_engine函数结尾加如下代码
return create_engine('impala://', creator=self.conn)
def conn(self):
return connect(host='xx.xx.xx.xx',
port=xxxxx,
database='xxxx',
user='xxxxxx', password='xxxxxx!',
auth_mechanism='PLAIN')
以上代码会强行返回一个指定数据库地址的impala engine,这会导致无法创建其他地址的数据库,因而只是一个临时的解决方案,最通用的解决方案还需要进一步修改代码,时间有限,就先搁置了。
测试一下效果
此时无论你URI填的什么,只要符合规范如’impala://’,'mysql://'都可以,因为后端代码已将数据连接写死了
最后说一下,这个superset项目目前貌似是由一个人进行维护,所以说对于使用Flask+Docker进行开发的个人开发者来说,这个项目的源码非常有借鉴意义