Druid + Superset

最新推荐文章于 2022-09-10 15:00:18 发布

weixin_34419321

最新推荐文章于 2022-09-10 15:00:18 发布

阅读量1.9k

点赞数

文章标签：开发工具 json 运维

原文链接：https://my.oschina.net/hblt147/blog/3006382

版权

2019独角兽企业重金招聘Python工程师标准>>>

Divolte + Kafka + Druid + Superset

在当今世界，您希望尽快向客户学习。本博客介绍了如何使用开源技术设置流分析。我们将使用Divolte，Kafka，Superset和Druid建立一个系统，让您即时深入了解您的客户。

以流媒体方式进行分析使您能够持续分析客户的行为并立即对其采取行动。例如：

当我们使用A / B测试执行新实验时，我们希望监控实验并且可以提前终止实验A或B，如果结果显示一个显着优于另一个。
用户在您网站上的所有操作都会告诉他们他们的意图。当我们能够立即处理数据时，我们可以为每个用户定制内容。
收集有关应用程序用法的一般信息，以便与应用程序的下一次迭代保持一致。

此堆栈可以替代Google Analytics（分析），并允许您在自己的环境中直接获取所有数据，并将数据保留在第三方供应商之外。事件由Divolte捕获，使用Kafka和Druid处理，并使用Superset进行可视化。

在深入讨论之前，我想详细说明使用过的组件。然后，将解释如何设置和使用工具。

Divolte
Divolte Collector是一款可扩展且高性能的服务器，用于收集点击流数据并将其发布到接收器，例如Kafka，HDFS或S3。Divolte由GoDataDriven开发，并根据开源许可证向公众开放。

Divolte可用作构建从基本Web分析仪表板到实时推荐器引擎或横幅优化系统的任何基础。通过在客户的浏览器中使用一小段JavaScript和一个像素，它可以收集有关他们在网站或应用程序上的行为的数据。

Kafka
Apache™Kafka是一种快速，可扩展，耐用且容错的发布 - 订阅消息传递系统。Kafka以其高吞吐量，可靠性和复制而闻名。Kafka与Apache Flink和Apache Spark结合使用可以实现流数据的实时分析和呈现。

在这个设置中，Kafka用于收集和缓冲事件，然后由德鲁伊摄取。我们需要Kafka持久保存数据，并在发生突发事件时充当缓冲区，这种情况发生在例如电视广告播出时。

Druid
Druid是一个开源分析数据存储，专为事件数据的商业智能（OLAP）查询而设计。Druid提供来自Kafka的低延迟实时数据提取，灵活的数据探索和快速数据聚合。

德鲁伊将处理数据并以我们要求的形式对其进行整形。现有的德鲁伊部署已经扩展到数万亿事件和数PB的数据，因此我们不必担心规模。

Superset
Apache™Superset是一个数据探索和可视化Web应用程序，提供直观的界面来探索和可视化数据集，并创建交互式仪表板。最初由Airbnb开发，但现在正在成为一个Apache™项目。

入门

有什么比在自己的机器上运行概念证明更有趣？使用Docker很容易设置堆栈的本地实例，因此我们可以尝试一下并探索可能性。

要设置系统，我们首先克隆git存储库：

git clone https://github.com/Fokko/divolte-kafka-druid-superset.git
cd divolte-kafka-druid-superset
git submodule update --init --recursive

我们需要初始化和更新子模块，因为我们依赖于几个图像：Kris Geusenbroek的Divolte和Kafka图像。这些都是出色的图像，为什么在人群维护的同时打扰自己呢？

接下来，我们需要在本地构建图像，然后我们启动它们。我个人喜欢明确删除图像的旧立场，以确保没有旧的状态：

docker-compose rm -f && docker-compose build && docker-compose up

由于我们从头开始构建图像，这可能需要一段时间。执行docker-compose up命令后，服务正在启动。一切都在运行之前可能需要一些时间。

几秒钟后，我们可以启动浏览器并检查服务：

服务	网址
Divolte	HTTP：//本地主机：8290 /
德鲁伊控制台	HTTP：//本地主机：8081 /
德鲁伊索引	HTTP：//本地主机：8081 / console.html
超	HTTP：//本地主机：8088 /
样品申请	HTTP：//本地主机：8090 /

这里写图片描述

接下来我们需要告诉德鲁伊听取正确的卡夫卡主题。这是通过使用以下方式将JSON格式的主管规范文件发布到索引服务来完成的curl：

curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8081/druid/indexer/v1/supervisor
{"id":"divolte-clickstream"}

此supervisor-spec.json文件包含Kafka群集的位置，数据规范以及数据应如何编制索引。

我们使用略微修改的Divolte Avro架构。Divolte可以完全自定义，但我们只添加了一个名为technology我们在示例应用程序中使用的附加字段。对于模式，需要有一个用Groovy编写的映射函数，用于根据输入填充字段。查看我们的mapping.groovy所有字段都是从元数据中填充的，除了使用EventParameter显式提供的技术。

在我们执行了curl如上所述的命令后，您应该得到{"id":"divolte-clickstream"}响应。您可以检查德鲁伊索引控制台以验证该作业是否已列入Kafka。您可能已经注意到，Druid REST API非常易编，并且与Apache Airflow等编排工具非常完美地集成。

这里写图片描述

在索引控制台上，我们可以看到大约每五分钟就会启动一项工作。通过Kafka进入的所有新事件都直接在内存中编入索引并保存在堆中。在给定的时间跨度之后，事件将持久存储在深层存储上，例如HDFS或S3。在存储数据之前，它以段为单位进行分块，默认为500mb，并且位图索引被计算并存储在数据附近。

当图像持久存储到深层存储时，它是可配置的，这应该根据情况选择。吞吐量越高，时间越短。您不希望在内存中保留太多事件，但您也不希望过于频繁，因为小文件会对文件系统造成开销。

让我们来看看我们的示例应用程序，它能够向Divolte发送事件。Divolte不仅支持网络应用，还支持桌面，移动甚至嵌入式应用 - 只要您能够向Divolte发送HTTP请求。

访问该应用程序，然后单击您喜欢的所有技术。单击会生成我们稍后可视化的事件。

这里写图片描述

当点击其中一个技术徽标时，在后台使用Javascript将操作传递给Divolte：

// The first argument is the event type; the second argument is a JavaScript object
// containing arbitrary event parameters, which may be omitted
divolte.signal('myCustomEvent', { param: 'foo',  otherParam: 'bar' })

有关如何配置Divolte的更多信息，请参阅优秀的Divolte指南以获取更多信息。

现在我们已经配置了Divolte，Kafka和Druid，并发布了一些事件，现在是配置Superset的时候了。请转到http：// localhost：8088 / druidclustermodelview / add，如下所示，我们可以填写所有字段，docker-druid因为这是所提供的docker-compose中的别名：

这里写图片描述

接下来，我们必须通过打开特定URL http：// localhost：8088 / druid / refresh_datasources /来显式刷新Druid数据源，这也可以在Superset的菜单中找到。这将联系德鲁伊协调员并要求提供可用的数据源及其相应的模式。

现在一切都已加载，我们可以开始制作我们的第一片。Superset中的切片是可以在一个或多个仪表板中使用的图表或表格。

超级图表

在创建如上所述的简单圆环图后，您可以轻松地显示Divolte在Superset中收集的数据。对我来说，Scala是迄今为止最受欢迎的技术！点击徽标上的其他内容，即可立即更改仪表板

这当然是一个非常简单的例子，但是使用德鲁伊可以很容易地绘制每种技术的活动。当您正确实施Divolte事件时，使用Superset也很容易可视化复杂的处理挖掘活动。

这个例子显示了Divolte，Kafka，Druid和Superset的堆栈。如果你想把它移到生产中，这套Docker镜像对你没有帮助：你需要设置一个合适的Kafka和Druid集群。Superset不需要大量资源，因为所有繁重的数据分组和过滤都是由德鲁伊完成的。众所周知，Divolte只使用一个实例来处理大量请求，但也可以将它放在像Nginx或HAproxy这样的反向代理之后。

NiFi，Kafka，Tranquility，Druid和Superset的流媒体推文

时间概念是所有大数据处理技术的核心，但在数据流处理领域尤为重要。实际上，可以合理地说，不同系统处理基于时间的处理的方式是小麦和谷壳的区别，至少在实时流处理的世界中是这样。

如今，对流处理的需求正在增加。Hadoop项目的一个共同需求是从流数据中构建最新的指标。

社交媒体分析是一个很好的用例，展示我们如何构建一个显示流媒体分析的仪表板，包括NiFi，Kafka，Tranquility，Druid和Superset

此处理流程包括以下步骤：

1）使用Apache NiFi进行推文摄取

2）使用Apache Kafka进行流处理

3）将数据与Tranquility集成

4）使用德鲁伊的OLAP数据库存储

5）使用Apache Superset进行可视化

在开始编码之前，先看看每个组件：

Nifi：https：//br.hortonworks.com/apache/nifi/

卡夫卡：https：//br.hortonworks.com/apache/kafka/

宁静：https：//github.com/druid-io/tranquility

德鲁伊：https：//br.hortonworks.com/apache/druid/

SuperSet：https：//superset.incubator.apache.org/

我们可以手动安装所有组件，或者只使用Hortonworks的HDF：https：//br.hortonworks.com/products/data-platforms/hdf/

规划HDF部署的指南

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1/bk_planning-your-deployment/bk_planning-your-deployment.pdf

为了构建这个HDF集群，我们使用了4台机器，分别是16个内核和32个RAM。我让每台机器负责一个组件：

设置完这个环境后，我们可以开始在Nifi中构建我们的流程：

第1阶段：NIFI

HTTP：//蒂亚戈-2：9090 / nifi /

这个流程有3个步骤：

第1步 - 获取选定的推文

处理器：getTwitter

第2步 - 清理并转换json推文

处理器：EvaluateJasonPath
从推文中提取关键属性

处理器：RouteOnAttribute

只找到不空的推文

处理器：替换文本

在推文中替换文本

第3步 - 将json发送到Kafka主题

处理器：PutKafka

这个过程应该给我们一个像这样的流媒体消息：

{
“tweet_id” ：971824225936953344 ，
“created_unixtime” ：1520535925741 ，
“created_time” ：“Thu Mar 08 19:05:25 +0000 2018” ，
“lang” ：“en” ，
“displayname” ：“thiagos25” ，
“time_zone” ：“圣保罗 - 巴西” ，
“msg” ：“Hello world！”
}

第二阶段：卡夫卡

Apache Kafka是一个使用发布 - 订阅消息模式的实时流处理器。我们将使用Kafka接收传入消息并将其发布到Druid将订阅的特定基于主题的队列（twitter_demo）。Tranquility（德鲁伊索引器）会读取这些消息并将它们插入到德鲁伊数据库中。

使用以下命令创建名为“twitter_demo”的Kafka主题：

==>在master-3上

cd / usr / hdp / 2.6 。3.0 - 235 / kafka
./ kafka - 主题。sh - 创建\
- zookeeper thiago - 2.field 。霍顿工程。com ：2181 ，thiago - 3.field 。霍顿工程。com ：2181 ，thiago - 4.field 。霍顿工程。com ：2181 \
- 复制- 因子1 \
- 分区1 \
- 主题twitter_demo

我们可以通过以下方式查看创建主题列表

./ kafka - 主题。sh - list - zookeeper thiago - 2.field 。霍顿工程。com ：2181 ，thiago - 3.field 。霍顿工程。com ：2181 ，thiago - 4.field 。霍顿工程。com ：2181

我们可以用以下消息来消费

./ kafka - console - consumer 。嘘
- zookeeper thiago - 2.field 。霍顿工程。com ：2181 ，thiago - 3.field 。霍顿工程。com ：2181 ，thiago - 4.field 。霍顿工程。com ：2181 \
- 主题twitter_demo
- 从- 开始

...并列出：

./ kafka - run - class 。sh kafka 。工具。GetOffsetShell - 经纪人- 列出thiago - 2.field 。霍顿工程。com ：2181 - 主题twitter_demo - 时间- 1

第3阶段：宁静

现在是时候享受一些宁静 - 抱歉为文字游戏！

Tranquility是德鲁伊的朋友，帮助我们实时向德鲁伊发送事件流。它可以为我们无缝地处理分区，复制，服务发现和架构翻转，无需停机。Tranquility是用Scala编写的，捆绑了惯用的Java和Scala API，可以很好地与Finagle，Samza，Spark，Storm和Trident一起使用。

Tranquility Kafka是一个简化从Kafka获取数据的应用程序。它通过使用Kafka分区和消费者组可扩展且高度可用，并且可以配置为将来自多个Kafka主题的数据推送到多个Druid数据源。

https://github.com/druid-io/tranquility/blob/master/docs/kafka.md

首先要做的是：要从Kafka流中读取，我们将定义一个配置文件来描述数据源名称，要读取的Kafka主题以及我们读取的数据的一些属性。将以下JSON配置保存为kafka.json

这指示Tranquility从主题“twitter_demo”中读取并将其收到的消息推送到名为“twitter_demo”的Druid数据源中。在它读取的消息中，Tranquility使用_submission_time列（或键）来表示时间戳。

{
“dataSources” ：{
“twitter_demo” ：{
“spec” ：{
“dataSchema” ：{
“dataSource” ：“twitter_demo” ，
“parser” ：{
“type” ：“string” ，
“parseSpec” ：{
“timestampSpec” ：{
“column” ：“created_unixtime” ，
“格式” ：“自动”
}，
“dimensionsSpec” ：{
“尺寸” ：[]，
“dimensionExclusions” ：[
“时间戳” ，
“值”
]
}，
“格式” ：“json”
}
}，
“granularitySpec” ：{
"type" : "uniform",
"segmentGranularity" : "six_hour",
"queryGranularity" : "none"
},
"metricsSpec" : []
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT720000M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1",
"topicPattern" : "twitter_demo"
}
}
},
"properties" : {
"zookeeper.connect" : "thiago-2.field.hortonworks.com:2181,thiago-3.field.hortonworks.com:2181,thiago-4.field.hortonworks.com:2181",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"commit.periodMillis" : "15000",
“consumer.numThreads” ：“2” ，
“kafka.zookeeper.connect” ：“thiago-2.field.hortonworks.com:2181,thiago-3.field.hortonworks.com:2181,thiago-4.field.hortonworks.com:2181” ，
“kafka.group.id” ：“宁静 - 卡夫卡”
}
}

...并将其放在与德鲁伊相同的目录中：

==>在master-4上

cd / usr / hdp / 2.6 。3.0 - 235 / druid / conf - quickstart / tranquility / kafka 。JSON

为了管理索引Kafka数据的连续过程，我们将下载，更改目录并运行Druid的Tranquility扩展。使用以下内容获取最新版本并解压缩：

须藤卷曲- ØHTTP ：//static.druid.io/tranquility/releases/tranquility-distribution-0.8.0.tgz
sudo tar xzvf宁静- 分布- 0.8 。0.tgz
cd宁静- 分布- 0.8 。0
＃在/ usr / HDP / 2.6.3.0-235 /德/ CONF-快速启动/安宁/安宁 - 分布 - 0.8.0 /
sudo bin / tranquility kafka - configFile ../ kafka 。JSON

第四阶段：德鲁伊

德鲁伊是一个摇滚探索性分析数据存储，能够实时提供大数据的交互式查询。

在HDP / HDF中，通过SuperSet可以轻松使用德鲁伊，我们可以构建我们的德鲁伊数据源并管理所有德鲁伊列以适应我们的json推文模式。

阶段5：Superset仪表板

我们可以使用Superset进行探索性分析，并定义我们将针对Druid API执行的JSON查询，并用于构建我们的仪表板。

创建德鲁伊数据源后，您可以创建切片并将其全部放入仪表板中。

有些图片胜过千言万语：

1）创建我们的切片

2）查询切片

3）在仪表板中保存所有切片

4）展示我们的仪表板

最后，我们可以看到一个很棒的实时Twitter仪表板，其中包含有关位置，地图，语言的信息，我们甚至可以单独阅读每条推文，看看我们的客户情感分析是什么......但是这个是下一篇文章的问题。

参考

http://druid.io/docs/latest/tutorials/tutorial-kafka.html

http://druid.io/blog/2013/08/30/loading-data.html

https://github.com/druid-io/tranquility

Druid集群集成了Superset工具。Superset对Druid做了深度集成，同时也支持多种关系型数据库。由于Druid也支持SQL，所以可以通过Superset以两种方式访问Druid，即Druid原生查询语言或者SQL。

Superset默认安装在emr-header-1节点，目前还不支持HA。在使用该工具前，确保您的主机能够正常访问emr-header-1。您可以通过打 SSH 隧道的方式连接到主机。

登录Superset
在浏览器地址栏中输入 http://emr-header-1:18088 后回车，打开Superset登录界面，默认用户名/密码为 admin/admin，请您登录后及时修改密码。
添加 Druid 集群
登录后默认为英文界面，可点击右上角的国旗图标选择合适的语言。接下来在上方菜单栏中依次选择数据源 > Druid 集群来添加一个 Druid 集群。

配置好协调机（Coordinator）和代理机（Broker）的地址，注意 E-MapReduce 中默认端口均为相应的开源端口前加数字1，例如开源 Broker 端口为 8082，E-MapReduce 中为 18082。
刷新或者添加新数据源
添加好 Druid 集群之后，您可以单击数据源 > 扫描新的数据源，这时 Druid 集群上的数据源（datasource）就可以自动被加载进来。

您也可以在界面上单击数据源 > Druid 数据源自定义新的数据源（其操作等同于写一个 data source ingestion 的 json 文件），步骤如下：

自定义数据源时需要填写必要的信息，然后保存。

保存之后点击左侧三个小图标中的第二个，编辑该数据源，填写相应的维度列与指标列等信息。
查询 Druid
数据源添加成功后，单击数据源名称，进入查询页面进行查询。
（可选）将 Druid 作为数据库使用
Superset 提供了 SQLAlchemy 以多种方言支持各种各样的数据库，其支持的数据库类型如下表所示。

Superset 亦支持该方式访问 Druid，Druid 对应的 SQLAlchemy URI 为 “druid://emr-header-1:18082/druid/v2/sql”，如下图所示，将 Druid 作为一个数据库添加：

接下来就可以在 SQL 工具箱里用 SQL 进行查询了：

superset二次开发

基本概念
　Superset 是 Airbnb 开源的一个旨在视觉，直观和交互式的数据探索平台（曾用名 Panoramix、Caravel，现已进入 Apache 孵化器）

基础组件
Flask
　Python 几大著名 Web 框架之一，以其轻量级, 高可扩展性而著名

Jinja2
模板引擎

Werkzeug
WSGI 工具集

Gunicorn
　Gunicorn 是一个开源的 Python WSGI HTTP 服务器，移植于 Ruby 的 Unicorn 项目的采用 pre-fork 模式的服务器

WSGI
　WSGI，即 Python **W**eb **S**erver **G**ateway **I**nterface，是专门用于 Python 应用程序或框架与 Web 服务器之间的一种接口，没有官方的实现，因为 WSGI 更像一个协议，只要遵照这些协议，WSGI 应用都可以在任何服务器上运行，反之亦然

Pre-Fork
　一个进程处理一个请求，基于 select 模型，所以最多一次创建 1024 个进程
　预先创建进程，pre-fork 采用的是预派生子进程方式，用子进程处理不同的请求，每个请求对应一个子进程，进程之间是彼此独立的
　一定程度上加快了进程的响应速度

Django
　Django 是一个开放源代码的 Web 应用框架，由 Python 写成。采用了 MVC 的软件设计模式，使得开发复杂的、数据库驱动的网站变得简单
　Django 注重组件的重用性和” 可插拔性”，敏捷开发和 DRY 法则（Do not Repeat Yourself）

　核心组件
* 物件导向的映射器，用作数据模型（以 Python 类的形式定义）和关联性数据库间的媒介
* 基于正则表达式的 URL 分发器
* 视图系统，用于处理请求
* 模板系统

PyDruid
　A Python connector for Druid
　Exposes a simple API to create, execute, and analyze Druid queries

Pandas
　Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive

SciPy
　SciPy 是基于 Numpy 构建的一个集成了多种数学算法和方便的函数的 Python 模块

Scikit-learn
　Machine Learning in Python

D3.js
　D3.js 是一个操纵数据的 JavaScript 库

安装
基础环境
OS
$ uname -a
Linux 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013

# For Fedora and RHEL-derivatives
# [Doc]: Other System https://superset.apache.org/installation.html#os-dependencies
$ sudo yum upgrade python-setuptools -y
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y
1
2
3
4
5
6
7
8
9
10
Machines
# 外网（http://192.168.1.10:9097/）
superset01 192.168.1.10 Superset
druid01 192.168.1.11 Druid
druid02 192.168.1.12 MySQL

# Cluster 配置
Cluster druid cluster
Coordinator Host 192.168.1.11
Coordinator Port 8081
Coordinator Endpoint druid/coordinator/v1/metadata
Broker Host 192.168.1.13
Broker Port 8082
Broker Endpoint druid/v2
Cache Timeout 86400 # 1day: result_backend

# 线上（http://192.168.2.10:9097）
druid-prd01 192.168.2.10 Superset
druid-prd02 192.168.2.11 Druid

# Cluster 配置
Cluster druid cluster
Coordinator Host 192.168.2.11
Coordinator Port 8081
Coordinator Endpoint druid/coordinator/v1/metadata
Broker Host 192.168.2.13
Broker Port 8082
Broker Endpoint druid/v2
Cache Timeout 86400 # 1day: result_backend
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Python 相关
Python
$ python --version
Python 2.7.8

[Note]: Superset is tested using Python 2.7 and Python 3.4+. Python 3 is the recommended version, Python 2.6 won't be supported.'

## 升级 Python（stable: Python 2.7.12 | 3.4.5, lastest: Python 3.5.2 [2016/12/15]）
https://www.python.org/downloads/

# 在 python ftp 服务器中下载到，对应版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz

# 编译
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make
$ make install

$ ls /usr/local/python27/ -al

drwxr-xr-x. 6 root root 4096 12月 15 14:22 .
drwxr-xr-x. 13 root root 4096 12月 15 14:20 ..
drwxr-xr-x. 2 root root 4096 12月 15 14:22 bin
drwxr-xr-x. 3 root root 4096 12月 15 14:21 include
drwxr-xr-x. 4 root root 4096 12月 15 14:22 lib
drwxr-xr-x. 3 root root 4096 12月 15 14:22 share

# 覆盖原来的 python6
$ which python
/usr/local/bin/python
# mv /usr/bin/python /usr/bin/python_old
$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/
$ python --version
Python 2.7.12

# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum

# 第一行修改为 python2.6
#!/usr/bin/python2.6

$ yum --version | sed '2,$d'
3.2.29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Pip
$ pip --version
$ pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)

# upgrade setup tools and pip
$ pip install --upgrade setuptools pip

## Offline 环境下安装 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下载 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0

$ cd setuptools-32.0.0
$ python setup.py install

# https://pypi.python.org/pypi/pip 下载 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install
Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Processing dependencies for pip==9.0.1
Finished processing dependencies for pip==9.0.1

$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Virtualenv
$ pip install virtualenv

# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate

## Offline 环境下安装 virtualenv
# https://pypi.python.org/pypi/virtualenv#downloads 下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install

$ virtualenv --version
15.1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Superset 相关
Superset 初始化
$ pip install superset

## Offline 环境下安装 superset
# https://pypi.python.org/pypi/superset 下载 superset-0.15.0.tar.gz
$ tar zxvf superset-0.15.0.tar.gz
$ cd superset-0.15.0
$ python setup.py install

# Create an admin user
$ fabmanager create-admin --app superset

Username [admin]: # login name
User first name [admin]: # first name
User last name [user]: # lastname
Email [admin@fab.org]: # email, must unique
Password:
Repeat for confirmation:
Error: the two entered values do not match
Password: #superset
Repeat for confirmation: #superset
// ...
Recognized Database Authentications.
2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
Admin User superset db upgrade created.

# Initialize the database
$ superset db upgrade

// ...
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.

# Load some data to play with
$ superset load_examples

# Create default roles and permissions
$ superset init

Loading examples into <SQLA engine=u'sqlite:root/.superset/superset.db'>
Creating default CSS templates
Loading energy related dataset
Creating table [wb_health_population] reference
2016-12-14 17:58:09,568:INFO:root:Creating database reference
2016-12-14 17:58:09,575:INFO:root:sqlite:root/.superset/superset.db
Loading [World Bank's Health Nutrition and Population Stats]
Creating table [wb_health_population] reference
2016-12-14 17:58:30,840:INFO:root:Creating database reference
2016-12-14 17:58:30,846:INFO:root:sqlite:root/.superset/superset.db
Creating slices
Creating a World's Health Bank dashboard
Loading [Birth names]
Done loading table!
--------------------------------------------------------------------------------
Creating table [birth_names] reference
2016-12-14 17:58:52,276:INFO:root:Creating database reference
2016-12-14 17:58:52,280:INFO:root:sqlite:root/.superset/superset.db
Creating some slices
Creating a dashboard
Loading [Random time series data]
Done loading table!
--------------------------------------------------------------------------------
Creating table [random_time_series] reference
2016-12-14 17:58:53,953:INFO:root:Creating database reference
2016-12-14 17:58:53,957:INFO:root:sqlite:root/.superset/superset.db
Creating a slice
Loading [Random long/lat data]
Done loading table!
--------------------------------------------------------------------------------
Creating table reference
2016-12-14 17:59:09,732:INFO:root:Creating database reference
2016-12-14 17:59:09,736:INFO:root:sqlite:root/.superset/superset.db
Creating a slice
Loading [Multiformat time series]
Done loading table!
--------------------------------------------------------------------------------
Creating table [multiformat_time_series] reference
2016-12-14 17:59:10,421:INFO:root:Creating database reference
2016-12-14 17:59:10,426:INFO:root:sqlite:root/.superset/superset.db
Creating some slices
Loading [Misc Charts] dashboard
Creating the dashboard

# Start the web server on port 8088
$ superset runserver -p 8088

# To start a development web server, use the -d switch
# superset runserver -d

# Refresh Druid Datasource (after config it)
$ superset refresh_druid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Virtualenv 工作空间
# superset01 192.168.1.10
$ cd root
$ virtualenv -p /usr/local/bin/python --system-site-packages --always-copy superset
$ source superset/bin/activate

# 详见下文 `遇到的坑` - `安装 superset需要下载依赖库` 部分
# pip install --download package -r requirements.txt
$ pip install -r /root/requirements.txt

$ superset runserver -a 0.0.0.0 -p 8088

# 建议使用 rsync，详见 `部署上线` 部分
$ cd /root
$ tar zcvf virtualenv.tar.gz virtualenv/
$ scp virtualenv.tar.gz root@192.168.1.13:/root/

# 192.168.1.13
$ cd /root/virtualenv/superset
$ source bin/activate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
VirtualenvWrapper
## 【拓展】
# virtualenvwrapper 是 virtualenv 的扩展工具，可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
# 增加
export WORKON_HOME=~/virtualenv
source /usr/local/bin/virtualenvwrapper.sh

$ mkvirtualenv --python=/usr/bin/python superset
Running virtualenv with interpreter /usr/bin/python
New python executable in /root/virtualenv/superset/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]#
(superset) [root@superset01 virtualenv]# deactivate

$ workon superset
(superset) [root@superset01 virtualenv]# lsvirtualenv -b
superset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
部署上线
拷贝
# rsync 替换 scp 可以确保软链接也能被 cp
$ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4

//...
sent 142935894 bytes received 180102 bytes 3920986.19 bytes/sec
total size is 359739823 speedup is 2.51

# 在本机和目标机器的 superset 目录下校验文件数量
$ find | wc -l
10113

# 重复以上步骤，从跳板机 rsync 到线上机器
$ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4

# virtualenv 创建依赖的 python
$ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan
$ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root

$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep / # nessnary!!
$ python -V
Python 2.7.12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
动态链接库
# 虽然软链接已经 rsync 过来了，但是目标机器相关目录下，没有对应的 Python 的动态链接库
$ file /root/superset/lib/python2.7/lib-dynload

/root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`

# 需要和联网环境中，创建 VirtualEnv 时的 Python 全局环境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /

$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail
1
2
3
4
5
6
7
8
9
10
11
用户权限
# 创建用户
$ adduser superset
$ cd /home/superset
# 如果存在版本号，需要创建软链接
$ chown -R superset:superset superset-0.15.4
$ ln -s superset-0.15.4 superset

$ chown -h superset:superset superset
$ su - superset
1
2
3
4
5
6
7
8
9
元数据存储
# 修改数据库
$ vim ./lib/python2.7/site-packages/superset/config.py

# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://user:password@mysql01:3306/superset1?charset=utf8'

$ mysql -hmysql01 -p3306 -uuser -ppassword
> use superset1;
> show tables;
+-------------------------+
| Tables_in_superset1 |
+-------------------------+
| ab_permission |
| ... |
| url |
+-------------------------+
28 rows in set (0.00 sec)

# mysqldump -hmysql01 -p3306 -uuser -ppassword superset1 > superset1.sql
$ mysqldump -hmysql01 -p3306 -uuser -ppassword --single-transaction superset1 > superset1.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
参考
* mysqldump: 1044 Access denied when using LOCK TABLES
* 解决 mysqldump: Got error: 1044: Access denied for user 的方法

启动
$ cd /home/superset/superset-0.15.4
$ source bin/activate
$ mkdir logs
$ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 -w 4 > logs/superset.log &
1
2
3
4
本地运行
依赖
Windows 相关
Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat)
描述
　error: Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat). Get it from http://aka.ms/vcpython27

解决
# download vcredist_x64.exe from http://www.microsoft.com/en-us/download/details.aspx?id=2092
$ pip install wheel setuptools
# VCForPython27.msi 下载安装
1
2
3
‘openssl/opensslv.h’: No such file or directory
解决
# download openssl-0.9.8h-1-setup.exe from http://gnuwin32.sourceforge.net/packages/openssl.htm
1
参考
Installing OpenSSL on Windows
Cannot open include file: ‘stdint.h’: No such file or directory
解决
# Microsoft Visual C++ 2015 Redistributable Update 3
# download vc_redist.x64.exe from https://www.microsoft.com/zh-CN/download/details.aspx?id=53840
$ vim D:\apps\Python27\Lib\distutils\msvc9compiler.py

def get_build_version():
return 9.0
def find_vcvarsall(version):
return r'C:\Users\yuzhouwan\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\vcvarsall.bat'

$ cd superset-0.15.4
$ python setup.py install

# Microsoft 提供的 VCForPython27.msi 默认使用 VC2008，而 stdint.h 是从 VC2012 开始支持的
# 2014 年之后，VCForPython27.msi 便不再维护，决定尝试用 ubuntu or remote debug ...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
参考
Developer Network <cstdint>
The latest supported Visual C++ downloads
Windows 平台使用 Microsoft Visual C++ Compiler for Python 2.7 编译 python 扩展
Why Microsoft Visual Studio cannot find <stdint.h>? [duplicate]
Ubuntu 相关
安装 VMware

　15+ VMWARE WORKSTATION PRO 12.X UNIVERSAL LICENSE KEYS FOR WIN & LIN

Python 相关
Make sure that you use the correct version of ‘pip’
描述
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'D:\apps\Python27\python.exe'
1
解决
# 安装 pip，下载 https://bootstrap.pypa.io/get-pip.py 安装文件
$ python get-pip.py

$ pip --version
pip 8.1.1 from d:\apps\python27\lib\site-packages (python 2.7)
1
2
3
4
5
参考
How do I install pip on Windows?
‘Connection to pypi.python.org timed out. (connect timeout=15)’
描述
$ pip install --upgrade pip
'Connection to pypi.python.org timed out. (connect timeout=15)'
1
2
解决
# 设置 proxy
$ export https_proxy="http://10.10.10.10:8080"
$ pip install --upgrade pip
$ pip --version
pip 9.0.1 from d:\apps\python27\lib\site-packages (python 2.7)
1
2
3
4
5
参考
Using applications behind a corporate proxy
setup.py failed with error code 1
描述
Command "d:\apps\python27\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\yuzhouwan\\appdata\\local\\temp\\pip-build-zzbhrq\\sasl\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\yuzhouwan\appdata\local\temp\pip-erwavd-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\yuzhouwan\appdata\local\temp\pip-build-zzbhrq\sasl\
1
解决
$ pip install --upgrade setuptools pip
$ pip install superset

# Download superset-0.15.4.tar.gz from https://pypi.python.org/pypi/superset
$ tar zxvf superset-0.15.4.tar.gz
$ cd superset-0.15.4
$ python setup.py install
1
2
3
4
5
6
7
参考
superset 0.15.4 (A interactive data visualization platform build on SqlAlchemy and druid.io)
开发环境搭建
依赖
$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
$ Python 2.7.12

$ mv /usr/local/bin/python /usr/local/bin/python_bak
$ ln -s /usr/local/python27/bin/python /usr/local/bin/python
1
2
3
4
5
6
7
8
9
10
11
12
虚拟环境
$ cd /root
$ virtualenv -p /usr/local/bin/python --system-site-packages env
$ cd env
$ mkdir code
1
2
3
4
代码
# windows
$ cd E:\Github\super\env
$ git init
$ git remote add origin master https://github.com/asdf2014/superset.git
$ git pull origin master

# SFTP
# 上传到 /root/env/code
1
2
3
4
5
6
7
8
安装
$ cd /root/env/code
$ source /root/env/bin/activate

$ cd /root/env/code/superset/static
$ mv assets assets_bak
$ ln -s ../assets assets

$ cd /root/env/code
$ python setup.py develop

Finished processing dependencies for superset==0.15.4

$ pip freeze | grep superset
superset==0.15.4

# Create an admin user
$ fabmanager create-admin --app superset

$ superset db upgrade
$ superset init
$ superset load_examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Npm
# [Mac OS]
$ sudo yum group install "Development Tools" --setopt=group_package_types=mandatory,default,optional --skip-broken -y
$ sudo yum install curl git m4 ruby texinfo bzip2-devel curl-devel expat-devel ncurses-devel zlib-devel -y

# ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/linuxbrew/go/install)" # Do not run this as root!
$ wget https://raw.githubusercontent.com/Homebrew/linuxbrew/go/install --no-check-certificate
$ mv install install.rb
$ vim install.rb

　# abort "Don't run this as root!" if Process.uid == 0

$ mkdir -p /root/.linuxbrew/bin
$ export PATH="/root/.linuxbrew/bin:$PATH"
$ ruby install.rb

$ vim ~/.bashrc

　export PATH="$HOME/.linuxbrew/bin:$PATH"
　export MANPATH="$HOME/.linuxbrew/share/man:$MANPATH"
　export INFOPATH="$HOME/.linuxbrew/share/info:$INFOPATH"

# [CentOS]
$ yum install npm
$ cd /root/env/code/superset/assets # package.json
$ npm install

# if visit https://github.com/jquery/jquery.git return timeout
$ vim /etc/hosts

　192.30.253.112 github.com
　151.101.100.133 assets-cdn.github.com
　192.30.253.117 api.github.com
　192.30.253.121 codeload.github.com
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
测试
$ cd /root/env/code
$ chmod 777 *sh
$ cd /root/env/code/superset/bin
$ chmod 777 superset

$ cd /root/env/code
$ bash run_tests.sh
1
2
3
4
5
6
7
IDE 中远程开发
Remote Debug
　详见我的另一篇博客中 Remote Debug 部分：《Python》

参考
Contributing
How to Install Node.js and NPM on Linux
Why yum groupinstall "<package group name>" is failing on RHEL 7 with error “There is no installed groups file”?
二次开发
Others Category
问题
描述
　对 HBase 的 Region 层面进行聚合，group 出来的 Region 会很多，在 DistributionPieViz 中展示会很卡顿，而且不美观

解决
增加 row_limit 可以排除 topN 之外的数据
$ cd /root/superset-0.15.4
$ vim ./lib/python2.7/site-packages/superset/viz.py

fieldsets = ({
'label': None,
'fields': (
'metrics', 'groupby',
'limit',
'pie_label_type',
('donut', 'show_legend'),
'labels_outside',
'row_limit',
)
},)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
others_category 将 topN 之外的数据聚合
$ cd /root/superset-0.15.4
$ vim ./lib/python2.7/site-packages/superset/viz.py

fieldsets = ({
'label': None,
'fields': (
'metrics', 'groupby',
'limit',
'pie_label_type',
('donut', 'show_legend'),
'labels_outside',
'row_limit',
'others_category',
)
},)

$ vim ./lib/python2.7/site-packages/superset/forms.py

'others_category': (BetterBooleanField, {
"label": _("Others category"),
"default": True,
"description": _("Aggregate data outside of topN into a single category")
}),

# models.py
# Others类别，没有被排在最后，而是重新又进行了一次排序
# "others_category": "y" 属性没有传递下来

self.status = None
self.error_message = None
self.others_category = form_data.get("others_category")

top_n = 10
if top_n > 0:
df_head = df.head(top_n)
df_tail = df.tail(len(df) - 10)
other_metrics_sum = []
for i in range(0, len(metrics) - 1):
metric = metrics[i]
other_metrics_sum[i] = df_tail[metric].sum()
df_other = pd.DataFrame([['Others', other_metrics_sum]], columns=df.columns)
df = df_head.append(df_other, ignore_index=True)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Tips: 已提 RP#2176 Aggregate data outside of topN into a single category

Y 轴数据异常
描述
　Y 轴本应该是 0 的起点，变成 -997m 负数

解决
　已提 RP#2307 Some problem in Y Axis

后期优化
MySQL 时区问题
查询
描述
$ lib/python2.7/site-packages/superset/config.py

　from dateutil import tz

　# Druid query timezone
　# tz.tzutc() : Using utc timezone
　# tz.tzlocal() : Using local timezone
　# other tz can be overridden by providing a local_config
　DRUID_IS_ACTIVE = True
　DRUID_TZ = tz.tzlocal() # +08:00

　# DRUID_TZ = tz.gettz('Asia/Shanghai')
1
2
3
4
5
6
7
8
9
10
11
12
解决
　已提 RP#2143 Using the time zone with specific name for querying Druid

展示
描述
dttm.tz_convert(dttm.tzinfo._filename.split('zoneinfo/')[1]) - pytz.timezone(dttm.tzinfo._filename.split('zoneinfo/')[1]).localize(EPOCH)
1
解决
　已提 RP#2370 Fix timezone issues in slices

参考
List of tz database time zones
Why those slices do not pay attention to time zone when they display?
Superset 升级
# 直接利用 pip install 的方式进行升级
$ pip freeze | grep superset
$ superset==0.13.2

$ pip install superset==-1
versions: 0.12.0, 0.13.0, 0.13.1, 0.13.2, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.3, 0.15.4

$ pip install superset==0.15.4

# 发现之前的配置数据都消失了，需要做一些 config 的调整
$ vim ./lib/python2.7/site-packages/superset/config.py

# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://root:root@192.168.1.12:3306/superset?charset=utf8'

$ vim /root/superset-0.15.4/bin/activate

# VIRTUAL_ENV="/root/superset"
VIRTUAL_ENV="/root/superset-0.15.4"

# then could just run "superset runserver -a 0.0.0.0 -p 9097"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Unknown column ‘datasources.filter_select_enabled’ in ‘field list’
描述
InternalError: (pymysql.err.InternalError) (1054, u"Unknown column 'datasources.filter_select_enabled' in 'field list'") [SQL: u'SELECT datasources.created_on AS datasources_created_on, datasources.changed_on AS datasources_changed_on, datasources.id AS datasources_id, datasources.datasource_name AS datasources_datasource_name, datasources.is_featured AS datasources_is_featured, datasources.is_hidden AS datasources_is_hidden, datasources.filter_select_enabled AS datasources_filter_select_enabled, datasources.description AS datasources_description, datasources.default_endpoint AS datasources_default_endpoint, datasources.user_id AS datasources_user_id, datasources.cluster_name AS datasources_cluster_name, datasources.offset AS datasources_offset, datasources.cache_timeout AS datasources_cache_timeout, datasources.params AS datasources_params, datasources.perm AS datasources_perm, datasources.changed_by_fk AS datasources_changed_by_fk, datasources.created_by_fk AS datasources_created_by_fk \nFROM datasources \nWHERE datasources.datasource_name = %(datasource_name_1)s \n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'datasource_name_1': u'bi-dfp-oms-detail'}]
1
解决
$ superset db upgrade
$ superset refresh_druid
1
2
Issues with Druid timezones
描述
　Those methods that named tzutc and tzlocal in tz work for me…
　Oh no.. They are not working when i upgrade superset from v0.13.2 into v0.15.4, even if i try to use DRUID_TZ = tz.gettz(‘Asia/Shanghai’) :-(

　详见：Issues with Druid timezones #1369

解决
$ cd /root/superset-0.15.4
$ ./bin/python -m pip freeze | grep superset

superset==0.13.2

$ ./bin/python -m pip uninstall superset
$ ./bin/python -m pip install superset==0.15.4
$ ./bin/python -m pip freeze | grep superset

superset==0.15.4

$ ./bin/python ./bin/easy_install lib/pycharm-debug.egg
# config remote python

$ ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097
# nohup ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &

$ ./bin/python ./bin/superset db upgrade
$ ./bin/python ./bin/superset refresh_druid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
pydevd 无法进行 remote debug
描述
　版本从 0.13.2 升级到 0.15.4，在 debug 的时候会启动两个进程（会导致 pydevd 无法进行 remote debug）

$ ps -ef | grep superset | grep -v grep

root 22567 1632 19 12:05 pts/0 00:00:03 ./bin/python ./bin/superset runserver -d -p 9097
root 22578 22567 24 12:05 pts/0 00:00:03 /root/superset-0.15.4/bin/python ./bin/superset runserver -d -p 9097
1
2
3
4
解决
直接用 cli.py 启动 –not ok
$ vim ./lib/python2.7/site-packages/superset/config.py

# append
manager.run()

$ ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -a 0.0.0.0 -p 9097

$ ps -ef | grep superset | grep -v grep

root 25238 1632 35 13:07 pts/0 00:00:03 ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
root 25247 25238 55 13:07 pts/0 00:00:03 /root/superset-0.15.4/bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
1
2
3
4
5
6
7
8
9
10
11
尝试解决 WARNING:werkzeug: * Debugger is active! 问题
$ vim lib/python2.7/site-packages/werkzeug/serving.py

class ThreadedWSGIServer(ThreadingMixIn, BaseWSGIServer):

"""A WSGI server that does threading."""
multithread = True

$ vim lib/python2.7/site-packages/flask/app.py

options.setdefault('use_reloader', self.debug)

$ superset/__init__.py
1
2
3
4
5
6
7
8
9
10
11
12
　已提 RP#2136 Fix werkzeug instance was created twice in Debug Mode

参考
Exception in thread Thread-1 from latest update to werkzeug 0.11 #813
Flask/Werkzeug debugger, process model, and initialization code
How to stop Flask from initialising twice in Debug Mode?
Sqlite3 切换为 MySQL
尝试 SQLite 自带的 dump 命令
# superset01 192.168.1.10 Superset
$ cd /root/.superset
$ ll -sail

1285 43256 -rw-r--r-- 1 root root 44288000 Jan 22 14:06 superset.db

$ sqlite3 superset.db
sqlite> .databases
seq name file
--- --------------- ----------------------------------------------------------
0 main /root/.superset/superset.db

sqlite> .tables
ab_permission columns multiformat_time_series
ab_permission_view css_templates query
ab_permission_view_role dashboard_slices random_time_series
ab_register_user dashboard_user slice_user
ab_role dashboards slices
ab_user datasources sql_metrics
ab_user_role dbs table_columns
ab_view_menu energy_usage tables
access_request favstar url
alembic_version logs wb_health_population
birth_names long_lat
clusters metrics

# not suit for mysql
# sqlite> .output superset.sql
# sqlite> .dump

$ vim dump_for_mysql.py

# https://github.com/EricHigdon/sqlite3tomysql

$ sqlite3 superset.db .dump | python dump_for_mysql.py > superset.sql

$ ls -sail

1285 43256 -rw-r--r-- 1 root root 44288000 Jan 22 14:06 superset.db
18631 76968 -rw-r--r-- 1 root root 78812197 Jan 22 14:35 superset.sql

$ vim superset.sql

id INTEGER NOT NULL,
# 替换为 (主键) 自增长
id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,

$ scp superset.sql root@192.168.1.12:/home/mysql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
自己实现 sqlite3tomysql.py
# druid02 192.168.1.12 MySQL
$ ps -ef | grep mysql | grep -v druid | grep -v grep

mysql 11435 8530 0 14:13 pts/4 00:00:00 /bin/sh /home/mysql/bin/mysqld_safe --defaults-file=/home/mysql/my.cnf
mysql 12192 11435 0 14:13 pts/4 00:00:00 /home/mysql/bin/mysqld --defaults-file=/home/mysql/my.cnf --basedir=/home/mysql --datadir=/home/mysql/data --plugin-dir=/home/mysql/lib/mysql/plugin --log-error=/home/mysql/data/druid02.err --open-files-limit=8192 --pid-file=/home/mysql/data/druid02.pid --socket=/home/mysql/data/mysql.sock --port=3306
mysql 12223 8530 0 14:13 pts/4 00:00:00 mysql -uroot -p -S /home/mysql/data/mysql.sock

$ su - mysql
$ mysql -uroot -p -S /home/mysql/data/mysql.sock
mysql> show databases;
mysql> create database superset;
mysql> show databases;
mysql> use superset;

# 执行 sqlite3tomysql.py
mysql -uroot -p superset2 -S /home/mysql/data/mysql.sock --default-character-set=utf8 < superset.sql.schema.sql
mysql -uroot -p superset2 -S /home/mysql/data/mysql.sock --default-character-set=utf8 < superset.sql.data.sql

# 避免表之间外键依赖，可以在 mysql 命令行中，使用 source .superset.sql.schema.sql 的方式，多次批量导入
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
元数据存储
# superset01 192.168.1.10 Superset
$ cd /root/superset
$ find ./ -name config.py
./lib/python2.7/site-packages/caravel/config.py
./lib/python2.7/site-packages/sqlalchemy/testing/config.py
./lib/python2.7/site-packages/pandas/core/config.py
./lib/python2.7/site-packages/superset/config.py
./lib/python2.7/site-packages/setuptools/config.py
./lib/python2.7/site-packages/numpy/distutils/command/config.py
./lib/python2.7/site-packages/gunicorn/config.py
./lib/python2.7/site-packages/panoramix/config.py
./lib/python2.7/site-packages/flask/config.py
./lib/python2.7/site-packages/alembic/testing/config.py
./lib/python2.7/site-packages/alembic/config.py

$ vim ./lib/python2.7/site-packages/superset/config.py
# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://root:root@192.168.1.12:3306/superset?charset=utf8'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
启动
# 先执行，一系列 superset 初始化工作
$ nohup superset runserver -a 0.0.0.0 -p 9097 -w 4 2>&1 > logs/superset.log &
1
2
Tips: 代码 & 操作步骤，详见：Convert SQLite into MySQL

参考
Caravel 小帆船元数据库迁移
Sqlite 的数据导入导出
sqlite 表结构和数据的导出
MySql 主键自动增长
命令行模式下 MYSQL 导入导出 .sql 文件的方法
Quick easy way to migrate SQLite3 to MySQL?
Rails plugin for a database-independent dump format, data.yml
Translating Perl to Python
如何优雅的将数据从 sqlite3 迁移到 mysql
Run this with sqlite3 sample.db .dump | python dump_for_mysql.py > dump.sql
This script parses the SQL files exported form sqlite3 .dump, and make it compatible for MySQL import.
Windows 下 MySQL 批量执行 SQL 脚本文件
参数调优
# 适当增加 gunicorn 的 worker 数量（default：2）
$ cd /root/superset
$ source bin/activate
$ mkdir logs
$ nohup ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097 -w 4 2>&1 > logs/superset.log &
1
2
3
4
5
日志
ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
描述
(superset) [root@superset01 superset-0.15.4]# ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.script is deprecated, use flask_script instead.
.format(x=modname), ExtDeprecationWarning
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.sqlalchemy is deprecated, use flask_sqlalchemy instead.
.format(x=modname), ExtDeprecationWarning
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.sqlalchemy._compat is deprecated, use flask_sqlalchemy._compat instead.
.format(x=modname), ExtDeprecationWarning
/root/superset-0.15.4/lib/python2.7/site-packages/flask_cache/init.py:152: UserWarning: Flask-Cache: CACHE_TYPE is set to null, caching is effectively disabled.
warnings.warn("Flask-Cache: CACHE_TYPE is set to null, "
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
.format(x=modname), ExtDeprecationWarning
1
2
3
4
5
6
7
8
9
10
11
解决
$ vim ./bin/superset

+import warnings
+from flask.exthook import ExtDeprecationWarning
+warnings.simplefilter('ignore', ExtDeprecationWarning)
+
from superset.cli import manager
1
2
3
4
5
6
7
　已提 RP#2138 Fix ExtDeprecationWarning

参考
How can I disable ExtDeprecationWarning for external libs in flask
UserWarning: Flask-Cache: CACHE_TYPE is set to null, caching is effectively disabled. #2137
遇到的坑
创建 user 时，需保证 email 的唯一性
Recognized Database Authentications.
2016-12-14 18:12:36,007:ERROR:flask_appbuilder.security.sqla.manager:Error adding new user to database. (sqlite3.IntegrityError) column email is not unique [SQL: u'INSERT INTO ab_user (first_name, last_name, username, password, active, email, last_login, login_count, fail_login_count, created_on, changed_on, created_by_fk, changed_by_fk) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (u'superset', u'yuzhouwan', u'superset', 'pbkdf2:sha1:1000$e3imUMx0$83b38fb2a0f628d1379379bb353fc80697c435a1', 1, u'yuzhouwan@gmail.com', None, None, None, '2016-12-14 18:12:36.004721', '2016-12-14 18:12:36.004773', None, None)]
No user created an error occured
1
2
3
　使用 admin / admin 用户登录，进行修改

缺少的依赖包
描述
RuntimeError: Compression requires the (missing) zlib module
1
解决
$ yum install zlib
$ yum install zlib-devel

# 进到 python2.7 目录重新编译安装，软链接不需要重建
$ cd /root/software/Python-2.7.12
$ make
$ make install

# 进到 setup-tools 目录重新安装
$ cd /root/software/setuptools-32.0.0
$ python setup.py install
1
2
3
4
5
6
7
8
9
10
11
Python 无法装载模块（RedHat Problem）
pip: command not found
# 利用装载模块的方式使用 pip
$ python -m pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)

# 修改命令别名
$ vim ~/.bashrc

# 未生效可直接执行
alias pip='python -m pip'

$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)
1
2
3
4
5
6
7
8
9
10
11
12
virtualenv: command not found
$ vim ~/.bashrc
alias virtualenv='python -m virtualenv'

$ virtualenv --version
15.1.0
1
2
3
4
5
安装 superset 需要下载依赖库
sasl/sasl.h：没有那个文件或目录
描述
gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
error: command 'gcc' failed with exit status 1

cc1plus: 警告：命令行选项 “-Wstrict-prototypes” 对 Ada/C/ObjC 是有效的，但对 C++ 无效
在包含自 sasl/saslwrapper.cpp：254 的文件中:
sasl/saslwrapper.h:22:23: 错误：sasl/sasl.h：没有那个文件或目录
1
2
3
4
5
6
解决
$ gcc -v
使用内建 specs。
目标：x86_64-redhat-linux
配置为：../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
线程模型：posix
gcc 版本 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

# 安装 g++
# g++是c++的编译器，安装好之后，gcc会自动寻找c++程序所需的编译环境，进而编译成功
# wget ftp://rpmfind.net/linux/centos/6.8/os/x86_64/Packages/gcc-c++-4.4.7-17.el6.x86_64.rpm (需要完全一致 gcc 4.4.7-4才行)
# http://rpm.pbone.net/index.php3/stat/4/idpl/25438297/dir/scientific_linux_6/com/gcc-c++-4.4.7-4.el6.x86_64.rpm.html
# http://rpm.pbone.net/index.php3/stat/4/idpl/25440518/dir/scientific_linux_6/com/libstdc++-devel-4.4.7-4.el6.x86_64.rpm.html
$ rpm -ivh libstdc++-devel-4.4.7-4.el6.x86_64.rpm
$ rpm -ivh gcc-c++-4.4.7-4.el6.x86_64.rpm

$ g++ -v
使用内建 specs。
目标：x86_64-redhat-linux
配置为：../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
线程模型：posix
gcc 版本 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
命令行选项 “-Wstrict-prototypes” 对 Ada/C/ObjC 是有效的，但对 C++ 无效
描述
　cc1plus: 警告：命令行选项 “-Wstrict-prototypes” 对 Ada/C/ObjC 是有效的，但对 C++ 无效

解决
# cmake 版本过低（这里是没有安装）
# https://cmake.org/ (stable: 3.6.3, lastest: 3.7.1, date: 2016/12/16)
# https://cmake.org/cmake/help/v3.6/
$ wget --no-check-certificate https://cmake.org/files/v3.6/cmake-3.6.3.tar.gz # To connect to cmake.org insecurely
$ tar zxvf cmake-3.6.3.tar.gz
$ cd cmake-3.6.3
$ ./bootstrap
$ make
$ gmake install

$ cmake -version
$ cmake version 3.6.3
$ CMake suite maintained and supported by Kitware (kitware.com/cmake).

# reboot (should)

$ cd ~
$ mkdir virtualenv
$ cd virtualenv
$ virtualenv env1
$ virtualenv --python=/usr/bin/python env1

# new problem
# IOError: [Errno 40] Too many levels of symbolic links: '/root/virtualenv/env1/bin/python'
# 不能直接 rm -rf env1，需要用 rmvirtualenv 才行
$ rmvirtualenv env1
$ cd env1
$ source bin/activate # 退出 deactivate
(env1) [root@edeppreapp01 env1]# python -V
Python 2.7.12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
参考
CentOS 6.5 64 位上安装 OpenCV
mkvirtualenv: Too many levels of symbolic links
Could not find a version that satisfies the requirement pytz>dev
描述
# 如果一个一个依赖去安装会很麻烦
Could not find a version that satisfies the requirement pytz>dev (from celery==3.1.23) (from versions: )
Could not find a version that satisfies the requirement billiard<3.4,>=3.3.0.23 (from celery==3.1.23) (from versions: )
No matching distribution found for amqp<2.0,>=1.4.9 (from kombu==3.0.35)
No matching distribution found for anyjson>=0.3.3 (from kombu==3.0.35)
No matching distribution found for kombu<3.1,>=3.0.34 (from celery==3.1.23)
No matching distribution found for celery==3.1.23 (from superset)
Could not find suitable distribution for Requirement.parse('werkzeug==0.11.10')
pip install thrift-0.9.3.tar.gz
No matching distribution found for six (from sasl==0.2.1)
No matching distribution found for sasl>=0.2.1 (from thrift-sasl==0.2.1)
No local packages or working download links found for thrift-sasl>=0.2.1
1
2
3
4
5
6
7
8
9
10
11
12
解决
$ pip list
$ pip freeze > requirements.txt
$ mkdir packages
$ pip install --download package -r requirements.txt

$ cd packages
$ scp celery-3.1.23-py2.py3-none-any.whl root@druid01:/root/software/packages

# --find-links 可以在指定目录中，找到 superset 的相关依赖，依次安装好
$ python -m pip install --no-index --find-links=packages superset # -r requirements.txt
1
2
3
4
5
6
7
8
9
10
参考
断网环境下利用 pip 安装 Python 离线安装包
如何制作 pip 离线安装环境
ImportError: No module named _ssl
解决
# 安装 ssl
$ yum install yum-downloadonly -y

$ yum -y install ncurses ncurses-devel gcc-c++ libxml2-devel gd gd-devel libpng libpng-devel libjpeg libjpeg-devel libmcrypt libmcrypt-devel openldap-devel openldap-servers openldap-clients autoconf freetype-devel libtool-ltdl-devel openssl openssl-devel gcc automake autoconf libtool make --downloadonly --downloaddir=.

$ yum -y install GeoIP gmp libevent libmcrypt libtidy libXpm libxslt mhash mysql mysql-server nfs-utils nginx perl-DBD-MySQL perl-DBI php php-common php-fpm php-gd php-mbstring php-mcrypt php-mhash php-mysql php-pdo php-xml t1lib --downloadonly --downloaddir=.

$ rpm -Uvh --force --nodeps *.rpm

# 重新编译 Python
$ cd /root/software/Python-2.7.12
$ vim Modules/Setup.dist

# 取消注释
SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

$ ./configure --enable-shared CFLAGS=-fPIC //--enable-shared option means to generate dynamic library libpython2.7.so.1.0
make && make install

# Not work
$ python --version
Python 2.7.12

$ python
Python 2.7.12 (default, Dec 19 2016, 10:58:27)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/usr/local/lib/python2.7/ssl.py", line 97, in <module>
>>> import _ssl # if we can't import it, let the error propagate
>>> ImportError: No module named _ssl
>>> quit()

# 安装缺少的 openssl-devel
$ rpm -aq | grep openssl
openssl-1.0.1e-42.el6_7.4.x86_64

$ yum install openssl-devel -y

$ rpm -aq | grep openssl
openssl-1.0.1e-42.el6_7.4.x86_64
openssl-devel-1.0.1e-42.el6_7.4.x86_64

#修改 Setup 文件
$ vim /root/software/Python-2.7.12/Modules/Setup
# Socket module helper for socket(2)
_socket socketmodule.c timemodule.c

# Socket module helper for SSL support; you must comment out the other
# socket line above, and possibly edit the SSL variable:
#SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

# 重新编译
$ cd /root/software/Python-2.7.12
$ make && make install

$ python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>>

$ cd /root/virtualenv/superset/bin
[root@olap03-sit bin]# python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl

$ /root/virtualenv/superset/bin/python
Python 2.7.12 (default, Dec 16 2016, 16:23:17)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/usr/local/python27/lib/python2.7/ssl.py", line 97, in <module>
>>> import _ssl # if we can't import it, let the error propagate
>>> ImportError: No module named _ssl

$ mv /root/virtualenv/superset/bin/python /root/virtualenv/superset/bin/python_old
$ ln -s /usr/local/bin/python /root/virtualenv/superset/bin/

$ ./python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> quit()
>>> [root@olap03-sit bin]#
>>> [root@olap03-sit bin]#
>>> [root@olap03-sit bin]# pwd
>>> /root/virtualenv/superset/bin
>>> [root@olap03-sit bin]# /root/virtualenv/superset/bin/python
>>> Python 2.7.12 (default, Dec 19 2016, 11:08:33)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> quit()
>>> [root@olap03-sit bin]# python
>>> Python 2.7.12 (default, Dec 19 2016, 11:08:33)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>>
>>> source bin/activate
>>> (superset) [root@olap03-sit superset]# which python
>>> /root/virtualenv/superset/bin/python
>>> (superset) [root@olap03-sit superset]# python
>>> Python 2.7.12 (default, Dec 19 2016, 11:08:33)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> quit()

# ImportError: No module named gunicorn.app.base
import gunicorn.app.base
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
参考
How to import _ssl in python 2.7.6?
ImportError: No module named _ssl
Yum 离线安装（二）— 制作 centos 离线包
离线安装 openssl-devel 顺序
Python 2.7 ImportError: no module named _ssl (not ok)
python 安装完毕后，提示找不到 ssl 模块的解决步骤
How to Solve: Virtualenv importError No Module Named XXX
python: error while loading shared libraries: libpython2.7.so.1.0
描述
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC //--enable-shared option means to generate dynamic library libpython2.7.so.1.0
$ make && make install
$ python -V
python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
1
2
3
4
解决
$ yum reinstall python-libs --not work

$ ll /usr/local/python27/lib/libpython2.7.so.1.0 --not work
$ vim /etc/ld.so.conf

include ld.so.conf.d/*.conf
include /usr/local/Python2.7/lib

$ /sbin/ldconfig -v | grep /
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
Python 2.7.12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
参考
解决 error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No su
Python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file (not work)
ImportError: No module named pysqlite2
解决
$ vim /root/superset/lib/python2.7/site-packages/sqlalchemy/dialects/sqlite/pysqlite.py

# 修改 sqlite3
@classmethod
def dbapi(cls):
try:
# 改为 from sqlite3 import dbapi2 as sqlite
from pysqlite2 import dbapi2 as sqlite
except ImportError as e:
try:
from sqlite3 import dbapi2 as sqlite # try 2.5+ stdlib name.
except ImportError:
raise e
return sqlite

# Redhat 5.3 环境下，要源代码安装 sqlite3，然后安装 python 才能有 _sqlite3.so 这个文件
$ wget https://sqlite.org/snapshot/sqlite-snapshot-201612131847.tar.gz
$ sqlite3 --version
3.6.20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
参考
ImportError: No module named pysqlite2
Python 使用中出现错误：ImportError: No module named _sqlite3
pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
解决
方法一
# 清除所有的 alias 和 superset 源码中 python 路径的修改
$ which pip
$ alias pip='python -m pip'
$ /root/superset/bin/python

$ vim ~/.bashrc
# alias pip='python -m pip'
# alias virtualenv='python -m virtualenv'

# Source global definitions
# export WORKON_HOME=~/virtualenv
# source /usr/local/bin/virtualenvwrapper.sh

$ source ~/.bashrc
$ deactivate
$ yum install python-pip

$ unalias pip
$ which pip
$ /usr/bin/pip

$ superset runserver -a 0.0.0.0 -p 9999

$ cd /usr/local/lib/python2.7/site-packages

$ python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> ssl
<module 'ssl' from '/usr/local/lib/python2.7/ssl.pyc'>
>>> quit()

$ vim mypkpath.pth
/usr/local/lib/python2.7

$ vim ~/.bashrc
alias python=/usr/local/bin/python
alias pip=/usr/bin/pip

$ source ~/.bashrc --not work（superset 的 py程序开头都有 #!/root/superset/bin/python）
$ vim /root/superset/bin/superset
#!/usr/local/bin/python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
方法二
# 利用 prefix 将 python 的第三方库安装到 /usr/lib 中
$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
Python 2.7.12
1
2
3
4
5
6
参考
How to globally modify the default PYTHONPATH (sys.path)?
为 Python 添加默认模块搜索路径
多进程 fork
virtualenv using incorrect sys.path
Error while processing cluster ‘druid cluster’ (sqlite3. Operational Error) database is locked
描述
　[Web UI] Sources - Druid Clusters 配置 - Refresh Druid Metadata

原因
　Web 中无法维持长连接，会超时

解决
　superset refresh_druid

Tips: 目前最新的 v0.22.1 版本中，已经解决了这个问题，可以在页面上直接点击 “Sources - Refresh Druid Metadata” 按钮，完成操作（2017-12-12）

参考
Superset installation
An unknown error occurred. (Status: 0) Maybe the request timed out?
描述
　部分图标无法正常显示

解决
# 打开 debug 模式，查看详细日志，定位问题
$ vim ./lib/python2.7/site-packages/superset/config.py

# DEBUG = False
DEBUG = True
1
2
3
4
5
ImportError: No module named pymysql
解决
　pip install pymysql

uHost druid01 is not allowed to connect to this MySQL server
描述
　nohup superset runserver -a 0.0.0.0 -p 8888 2>&1 &

2017-01-22 16:36:53,013:ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization failed: (pymysql.err.InternalError) (1130, u"Host 'druid01' is not allowed to connect to this MySQL server")
1
解决
GRANT ALL PRIVILEGES ON *.* TO 'root'@'druid01' IDENTIFIED BY 'root' WITH GRANT OPTION;
1
参考
远程连接 MySQL 时，提示 “is not allowed to connect to this MySQL server” 的解决方法
Permission for Druid
解决
　增加新的数据源之后，需要 superset init，来更新 permission 相关的数据表

参考
A little question about permission #2206
Update Druid Cluster’s Name
解决
alter table datasources drop FOREIGN KEY `datasources_ibfk_2`;
update clusters set cluster_name='Druid Cluster' where cluster_name='druid cluster';
update datasources set cluster_name ='Druid Cluster' where cluster_name ='druid cluster';
alter table datasources add constraint `datasources_ibfk_2` FOREIGN KEY (`cluster_name`) REFERENCES `clusters` (`cluster_name`);
# show create table datasources; # troubleshooting
1
2
3
4
5
参考
IndexError: list index out of range #2245
An unexpected error occurred: “https://registry.yarnpkg.com/convert-source-map: ETIMEDOUT”
描述
$ yarn
yarn install v1.3.2
info No lockfile found.
[1/4] Resolving packages...
error An unexpected error occurred: "https://registry.yarnpkg.com/@vx%2fbounds: ETIMEDOUT".
info If you think this is a bug, please open a bug report with the information provided in "/home/superset/software/incubator-superset-0.22.1/superset/assets/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
1
2
3
4
5
6
7
解决
# 由于不知名的外星力量，需要先替换掉原始的 IP 地址
$ vim /etc/hosts
104.16.59.173 registry.yarnpkg.com

# 控制网络并发量，减少 TIMEOUT 发生的可能
$ yarn --network-concurrency 1
1
2
3
4
5
6
参考
ETIMEOUT 104.16.59.173:443 #944
社区跟进
Issues
Pull Request

　详见：《开源社区》

资料
Doc
Superset´s documentation
Gunicorn´s documentation
Gunicorn
Flask Doc
Jinja Doc
Werkzeug Doc
Django Doc
Welcome to pydruid´s documentation!
Python Data Analysis Library
The SciPy Stack specification
scikit-learn
Pandas Doc
Superset: A data exploration platform designed to be visual, intuitive and iteractive
Help Doc
fabmanager --help

Usage: fabmanager [OPTIONS] COMMAND [ARGS]...

This is a set of commands to ease the creation and maintenance of your
flask-appbuilder applications.

All commands that import your app will assume by default that your running
on your projects directory just before the app directory. will assume also
that on the __init__.py your initializing AppBuilder like this (using a
var named appbuilder) just like the skeleton app::

appbuilder = AppBuilder(......)

If your using different namings use app and appbuilder parameters.

Options:
--help Show this message and exit.

Commands:
babel-compile Babel, Compiles all translations
babel-extract Babel, Extracts and updates all messages...
collect-static Copies flask-appbuilder static files to your...
create-addon Create a Skeleton AddOn (needs internet...
create-admin Creates an admin user
create-app Create a Skeleton application (needs internet...
create-db Create all your database objects (SQLAlchemy...
list-users List all users on the database
list-views List all registered views
reset-password Resets a user's password'
run Runs Flask dev web server.
security-cleanup Cleanup unused permissions from views and...
version Flask-AppBuilder package version
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Blog
Flask
Flask 开发博客（上）
VirtualEnv 和 VirtualEnvWrapper 总结
Gunicorn
Gunicorn 快速入门
Gunicorn：开源 Python WSGI HTTP 服务器
使用 Gunicorn 和 Nginx, Supervisor 部署 Django 项目
Book
《Flask 之旅》
Source
pydruid (A Python connector for Druid)
rpmfind.net
rpm.pbone.net （better, download with mirror.switch.ch）