基本概念
Superset 是 Airbnb 开源的一个旨在视觉,直观和交互式的数据探索平台(曾用名 Panoramix、Caravel,现已进入 Apache 孵化器)
基础组件
Flask
Python 几大著名 Web 框架之一,以其轻量级, 高可扩展性而著名
Jinja2
模板引擎Werkzeug
WSGI 工具集
Gunicorn
Gunicorn 是一个开源的 Python WSGI HTTP 服务器,移植于 Ruby 的 Unicorn 项目的采用 pre-fork 模式的服务器
WSGI
WSGI,即 Python **W**eb **S**erver **G**ateway **I**nterface,是专门用于 Python 应用程序或框架与 Web 服务器之间的一种接口,没有官方的实现,因为 WSGI 更像一个协议,只要遵照这些协议,WSGI 应用都可以在 任何服务器上运行,反之亦然
Pre-Fork
一个进程处理一个请求,基于 select 模型,所以最多一次创建 1024 个进程
预先创建进程,pre-fork 采用的是预派生子进程方式,用子进程处理不同的请求,每个请求对应一个子进程,进程之间是彼此独立的
一定程度上加快了进程的响应速度
Django
Django 是一个开放源代码的 Web 应用框架,由 Python 写成。采用了 MVC 的软件设计模式,使得开发复杂的、数据库驱动的网站变得简单
Django 注重组件的重用性和” 可插拔性”,敏捷开发和 DRY 法则(Do not Repeat Yourself)
核心组件
* 物件导向的映射器,用作数据模型(以 Python 类的形式定义)和 关联性数据库间的媒介
* 基于正则表达式的 URL 分发器
* 视图系统,用于处理请求
* 模板系统
PyDruid
A Python connector for Druid
Exposes a simple API to create, execute, and analyze Druid queries
Pandas
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive
SciPy
SciPy 是基于 Numpy 构建的一个集成了多种数学算法和方便的函数的 Python 模块
Scikit-learn
Machine Learning in Python
D3.js
D3.js 是一个操纵数据的 JavaScript 库
安装
基础环境
OS
$ uname -a
Linux 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013
# For Fedora and RHEL-derivatives
# [Doc]: Other System https://superset.apache.org/installation.html#os-dependencies
$ sudo yum upgrade python-setuptools -y
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y
Machines
# 外网(http://192.168.1.10:9097/)
superset01 192.168.1.10 Superset
druid01 192.168.1.11 Druid
druid02 192.168.1.12 MySQL
# Cluster 配置
Cluster druid cluster
Coordinator Host 192.168.1.11
Coordinator Port 8081
Coordinator Endpoint druid/coordinator/v1/metadata
Broker Host 192.168.1.13
Broker Port 8082
Broker Endpoint druid/v2
Cache Timeout 86400 # 1day: result_backend
# 线上(http://192.168.2.10:9097)
druid-prd01 192.168.2.10 Superset
druid-prd02 192.168.2.11 Druid
# Cluster 配置
Cluster druid cluster
Coordinator Host 192.168.2.11
Coordinator Port 8081
Coordinator Endpoint druid/coordinator/v1/metadata
Broker Host 192.168.2.13
Broker Port 8082
Broker Endpoint druid/v2
Cache Timeout 86400 # 1day: result_backend
Python 相关
Python
$ python --version
Python 2.7.8
[Note]: Superset is tested using Python 2.7 and Python 3.4+. Python 3 is the recommended version, Python 2.6 won't be supported.'
## 升级 Python(stable: Python 2.7.12 | 3.4.5, lastest: Python 3.5.2 [2016/12/15])
https://www.python.org/downloads/
# 在 python ftp 服务器中下载到,对应版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz
# 编译
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make
$ make install
$ ls /usr/local/python27/ -al
drwxr-xr-x. 6 root root 4096 12月 15 14:22 .
drwxr-xr-x. 13 root root 4096 12月 15 14:20 ..
drwxr-xr-x. 2 root root 4096 12月 15 14:22 bin
drwxr-xr-x. 3 root root 4096 12月 15 14:21 include
drwxr-xr-x. 4 root root 4096 12月 15 14:22 lib
drwxr-xr-x. 3 root root 4096 12月 15 14:22 share
# 覆盖原来的 python6
$ which python
/usr/local/bin/python
# mv /usr/bin/python /usr/bin/python_old
$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/
$ python --version
Python 2.7.12
# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum
# 第一行修改为 python2.6
#!/usr/bin/python2.6
$ yum --version | sed '2,$d'
3.2.29
Pip
$ pip --version
$ pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
# upgrade setup tools and pip
$ pip install --upgrade setuptools pip
## Offline 环境下安装 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下载 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0
$ cd setuptools-32.0.0
$ python setup.py install
# https://pypi.python.org/pypi/pip 下载 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install
Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Processing dependencies for pip==9.0.1
Finished processing dependencies for pip==9.0.1
$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)
Virtualenv
$ pip install virtualenv
# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate
## Offline 环境下安装 virtualenv
# https://pypi.python.org/pypi/virtualenv#downloads 下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install
$ virtualenv --version
15.1.0
Superset 相关
Superset 初始化
$ pip install superset
## Offline 环境下安装 superset
# https://pypi.python.org/pypi/superset 下载 superset-0.15.0.tar.gz
$ tar zxvf superset-0.15.0.tar.gz
$ cd superset-0.15.0
$ python setup.py install
# Create an admin user
$ fabmanager create-admin --app superset
Username [admin]: # login name
User first name [admin]: # first name
User last name [user]: # lastname
Email [admin@fab.org]: # email, must unique
Password:
Repeat for confirmation:
Error: the two entered values do not match
Password: #superset
Repeat for confirmation: #superset
// ...
Recognized Database Authentications.
2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
Admin User superset db upgrade created.
# Initialize the database
$ superset db upgrade
// ...
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
# Load some data to play with
$ superset load_examples
Loading examples into <SQLA engine=u'sqlite:root/.superset/superset.db'>
Creating default CSS templates
Loading energy related dataset
Creating table [wb_health_population] reference
2016-12-14 17:58:09,568:INFO:root:Creating database reference
2016-12-14 17:58:09,575:INFO:root:sqlite:root/.superset/superset.db
Loading [World Bank's Health Nutrition and Population Stats]'
Creating table [wb_health_population] reference
2016-12-14 17:58:30,840:INFO:root:Creating database reference
2016-12-14 17:58:30,846:INFO:root:sqlite:root/.superset/superset.db
# Create default roles and permissions
$ superset init
Loading examples into <SQLA engine=u'sqlite:root/.superset/superset.db'>
Creating default CSS templates
Loading energy related dataset
Creating table [wb_health_population] reference
2016-12-14 17:58:09,568:INFO:root:Creating database reference
2016-12-14 17:58:09,575:INFO:root:sqlite:root/.superset/superset.db
Loading [World Bank's Health Nutrition and Population Stats]
Creating table [wb_health_population] reference
2016-12-14 17:58:30,840:INFO:root:Creating database reference
2016-12-14 17:58:30,846:INFO:root:sqlite:root/.superset/superset.db
Creating slices
Creating a World's Health Bank dashboard
Loading [Birth names]
Done loading table!
--------------------------------------------------------------------------------
Creating table [birth_names] reference
2016-12-14 17:58:52,276:INFO:root:Creating database reference
2016-12-14 17:58:52,280:INFO:root:sqlite:root/.superset/superset.db
Creating some slices
Creating a dashboard
Loading [Random time series data]
Done loading table!
--------------------------------------------------------------------------------
Creating table [random_time_series] reference
2016-12-14 17:58:53,953:INFO:root:Creating database reference
2016-12-14 17:58:53,957:INFO:root:sqlite:root/.superset/superset.db
Creating a slice
Loading [Random long/lat data]
Done loading table!
--------------------------------------------------------------------------------
Creating table reference
2016-12-14 17:59:09,732:INFO:root:Creating database reference
2016-12-14 17:59:09,736:INFO:root:sqlite:root/.superset/superset.db
Creating a slice
Loading [Multiformat time series]
Done loading table!
--------------------------------------------------------------------------------
Creating table [multiformat_time_series] reference
2016-12-14 17:59:10,421:INFO:root:Creating database reference
2016-12-14 17:59:10,426:INFO:root:sqlite:root/.superset/superset.db
Creating some slices
Loading [Misc Charts] dashboard
Creating the dashboard
# Start the web server on port 8088
$ superset runserver -p 8088
# To start a development web server, use the -d switch
# superset runserver -d
# Refresh Druid Datasource (after config it)
$ superset refresh_druid
Virtualenv 工作空间
# superset01 192.168.1.10
$ cd root
$ virtualenv -p /usr/local/bin/python --system-site-packages --always-copy superset
$ source superset/bin/activate
# 详见下文 `遇到的坑` - `安装 superset需要下载依赖库` 部分
# pip install --download package -r requirements.txt
$ pip install -r /root/requirements.txt
$ superset runserver -a 0.0.0.0 -p 8088
# 建议使用 rsync,详见 `部署上线` 部分
$ cd /root
$ tar zcvf virtualenv.tar.gz virtualenv/
$ scp virtualenv.tar.gz root@192.168.1.13:/root/
# 192.168.1.13
$ cd /root/virtualenv/superset
$ source bin/activate
VirtualenvWrapper
## 【拓展】
# virtualenvwrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
# 增加
export WORKON_HOME=~/virtualenv
source /usr/local/bin/virtualenvwrapper.sh
$ mkvirtualenv --python=/usr/bin/python superset
Running virtualenv with interpreter /usr/bin/python
New python executable in /root/virtualenv/superset/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]