Apache Superset 二次开发

本文详细介绍了Apache Superset的二次开发过程,包括基础组件如Flask、Django、PyDruid等的介绍,以及安装、本地运行、开发环境搭建的步骤和常见问题。还涉及了在开发中遇到的MySQL时区问题、Superset升级中的错误和参数调优等后期优化内容。同时,文章提供了解决Y轴数据异常、Druid集群管理等具体问题的方案,并分享了社区资源和参考资料。
摘要由CSDN通过智能技术生成

基本概念

 Superset 是 Airbnb 开源的一个旨在视觉,直观和交互式的数据探索平台(曾用名 Panoramix、Caravel,现已进入 Apache 孵化器)

基础组件

Flask

 Python 几大著名 Web 框架之一,以其轻量级, 高可扩展性而著名

  • Jinja2
    模板引擎

  • Werkzeug
    WSGI 工具集

Gunicorn

 Gunicorn 是一个开源的 Python WSGI HTTP 服务器,移植于 Ruby 的 Unicorn 项目的采用 pre-fork 模式的服务器

WSGI

 WSGI,即 Python **W**eb **S**erver **G**ateway **I**nterface,是专门用于 Python 应用程序或框架与 Web 服务器之间的一种接口,没有官方的实现,因为 WSGI 更像一个协议,只要遵照这些协议,WSGI 应用都可以在 任何服务器上运行,反之亦然

Pre-Fork

 一个进程处理一个请求,基于 select 模型,所以最多一次创建 1024 个进程
 预先创建进程,pre-fork 采用的是预派生子进程方式,用子进程处理不同的请求,每个请求对应一个子进程,进程之间是彼此独立的
 一定程度上加快了进程的响应速度

Django

 Django 是一个开放源代码的 Web 应用框架,由 Python 写成。采用了 MVC 的软件设计模式,使得开发复杂的、数据库驱动的网站变得简单
 Django 注重组件的重用性和” 可插拔性”,敏捷开发和 DRY 法则(Do not Repeat Yourself)

 核心组件
* 物件导向的映射器,用作数据模型(以 Python 类的形式定义)和 关联性数据库间的媒介
* 基于正则表达式的 URL 分发器
* 视图系统,用于处理请求
* 模板系统

PyDruid

 A Python connector for Druid
 Exposes a simple API to create, execute, and analyze Druid queries

Pandas

 Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive

SciPy

 SciPy 是基于 Numpy 构建的一个集成了多种数学算法和方便的函数的 Python 模块

Scikit-learn

 Machine Learning in Python

D3.js

 D3.js 是一个操纵数据的 JavaScript 库

安装

基础环境

OS
$ uname -a
Linux 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013

# For Fedora and RHEL-derivatives
# [Doc]: Other System https://superset.apache.org/installation.html#os-dependencies
$ sudo yum upgrade python-setuptools -y
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y
Machines
# 外网(http://192.168.1.10:9097/)
superset01                     192.168.1.10           Superset
druid01                        192.168.1.11           Druid
druid02                        192.168.1.12           MySQL

# Cluster 配置
Cluster                         druid cluster
Coordinator Host                192.168.1.11
Coordinator Port                8081
Coordinator Endpoint            druid/coordinator/v1/metadata
Broker Host                     192.168.1.13
Broker Port                     8082
Broker Endpoint                 druid/v2
Cache Timeout                   86400               # 1day: result_backend


# 线上(http://192.168.2.10:9097)
druid-prd01                     192.168.2.10         Superset
druid-prd02                     192.168.2.11         Druid

# Cluster 配置
Cluster                         druid cluster
Coordinator Host                192.168.2.11
Coordinator Port                8081
Coordinator Endpoint            druid/coordinator/v1/metadata
Broker Host                     192.168.2.13
Broker Port                     8082
Broker Endpoint                 druid/v2
Cache Timeout                   86400                 # 1day: result_backend

Python 相关

Python
$ python --version
  Python 2.7.8

[Note]: Superset is tested using Python 2.7 and Python 3.4+. Python 3 is the recommended version, Python 2.6 won't be supported.'

## 升级 Python(stable: Python 2.7.12 | 3.4.5, lastest: Python 3.5.2 [2016/12/15])
https://www.python.org/downloads/

# 在 python ftp 服务器中下载到,对应版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz

# 编译
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make
$ make install

$ ls /usr/local/python27/ -al

  drwxr-xr-x.  6 root root 4096 1215 14:22 .
  drwxr-xr-x. 13 root root 4096 1215 14:20 ..
  drwxr-xr-x.  2 root root 4096 1215 14:22 bin
  drwxr-xr-x.  3 root root 4096 1215 14:21 include
  drwxr-xr-x.  4 root root 4096 1215 14:22 lib
  drwxr-xr-x.  3 root root 4096 1215 14:22 share


# 覆盖原来的 python6
$ which python
  /usr/local/bin/python
# mv /usr/bin/python /usr/bin/python_old
$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/
$ python --version
  Python 2.7.12

# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum

  # 第一行修改为 python2.6
  #!/usr/bin/python2.6

$ yum --version | sed '2,$d'
  3.2.29
Pip
$ pip --version
$ pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)

# upgrade setup tools and pip
$ pip install --upgrade setuptools pip

## Offline 环境下安装 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下载 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0

$ cd setuptools-32.0.0
$ python setup.py install

# https://pypi.python.org/pypi/pip 下载 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install
  Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
  Processing dependencies for pip==9.0.1
  Finished processing dependencies for pip==9.0.1

$ pip --version
  pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)
Virtualenv
$ pip install virtualenv

# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate

## Offline 环境下安装 virtualenv
# https://pypi.python.org/pypi/virtualenv#downloads 下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install

$ virtualenv --version
  15.1.0

Superset 相关

Superset 初始化
$ pip install superset

## Offline 环境下安装 superset
# https://pypi.python.org/pypi/superset 下载 superset-0.15.0.tar.gz
$ tar zxvf superset-0.15.0.tar.gz
$ cd superset-0.15.0
$ python setup.py install

# Create an admin user
$ fabmanager create-admin --app superset

  Username [admin]:        # login name
  User first name [admin]: # first name
  User last name [user]:   # lastname
  Email [admin@fab.org]:   # email, must unique
  Password: 
  Repeat for confirmation: 
  Error: the two entered values do not match
  Password:             #superset
  Repeat for confirmation: #superset
  // ...
  Recognized Database Authentications.
  2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
  Admin User superset db upgrade created.

# Initialize the database
$ superset db upgrade

  // ...
  INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
  INFO  [alembic.runtime.migration] Will assume transactional DDL.


# Load some data to play with
$ superset load_examples

  Loading examples into <SQLA engine=u'sqlite:root/.superset/superset.db'>
  Creating default CSS templates
  Loading energy related dataset
  Creating table [wb_health_population] reference
  2016-12-14 17:58:09,568:INFO:root:Creating database reference
  2016-12-14 17:58:09,575:INFO:root:sqlite:root/.superset/superset.db
  Loading [World Bank's Health Nutrition and Population Stats]'
  Creating table [wb_health_population] reference
  2016-12-14 17:58:30,840:INFO:root:Creating database reference
  2016-12-14 17:58:30,846:INFO:root:sqlite:root/.superset/superset.db


# Create default roles and permissions
$ superset init

  Loading examples into <SQLA engine=u'sqlite:root/.superset/superset.db'>
  Creating default CSS templates
  Loading energy related dataset
  Creating table [wb_health_population] reference
  2016-12-14 17:58:09,568:INFO:root:Creating database reference
  2016-12-14 17:58:09,575:INFO:root:sqlite:root/.superset/superset.db
  Loading [World Bank's Health Nutrition and Population Stats]
  Creating table [wb_health_population] reference
  2016-12-14 17:58:30,840:INFO:root:Creating database reference
  2016-12-14 17:58:30,846:INFO:root:sqlite:root/.superset/superset.db
  Creating slices
  Creating a World's Health Bank dashboard
  Loading [Birth names]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table [birth_names] reference
  2016-12-14 17:58:52,276:INFO:root:Creating database reference
  2016-12-14 17:58:52,280:INFO:root:sqlite:root/.superset/superset.db
  Creating some slices
  Creating a dashboard
  Loading [Random time series data]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table [random_time_series] reference
  2016-12-14 17:58:53,953:INFO:root:Creating database reference
  2016-12-14 17:58:53,957:INFO:root:sqlite:root/.superset/superset.db
  Creating a slice
  Loading [Random long/lat data]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table reference
  2016-12-14 17:59:09,732:INFO:root:Creating database reference
  2016-12-14 17:59:09,736:INFO:root:sqlite:root/.superset/superset.db
  Creating a slice
  Loading [Multiformat time series]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table [multiformat_time_series] reference
  2016-12-14 17:59:10,421:INFO:root:Creating database reference
  2016-12-14 17:59:10,426:INFO:root:sqlite:root/.superset/superset.db
  Creating some slices
  Loading [Misc Charts] dashboard
  Creating the dashboard


# Start the web server on port 8088
$ superset runserver -p 8088

# To start a development web server, use the -d switch
# superset runserver -d

# Refresh Druid Datasource (after config it)
$ superset refresh_druid
Virtualenv 工作空间
# superset01 192.168.1.10
$ cd root
$ virtualenv -p /usr/local/bin/python --system-site-packages --always-copy superset
$ source superset/bin/activate

# 详见下文 `遇到的坑` - `安装 superset需要下载依赖库` 部分
# pip install --download package -r requirements.txt
$ pip install -r /root/requirements.txt

$ superset runserver -a 0.0.0.0 -p 8088

# 建议使用 rsync,详见 `部署上线` 部分
$ cd /root
$ tar zcvf virtualenv.tar.gz virtualenv/
$ scp virtualenv.tar.gz root@192.168.1.13:/root/

# 192.168.1.13
$ cd /root/virtualenv/superset
$ source bin/activate
VirtualenvWrapper
## 【拓展】
# virtualenvwrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
  # 增加
  export WORKON_HOME=~/virtualenv
  source /usr/local/bin/virtualenvwrapper.sh

$ mkvirtualenv --python=/usr/bin/python superset
  Running virtualenv with interpreter /usr/bin/python
  New python executable in /root/virtualenv/superset/bin/python
  Installing setuptools, pip, wheel...done.
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值