redash 的 数据源中不包括spark-thriftserver, 而spark-thriftserver 的出现是为了替代 hiveserver2 直接执行sql。
尝试修改 redash 中 hive 相关的逻辑 以此来支持spark-thriftserver。
1. 测试环境:
centos 7
spark2.4.3
hadoop2.10
2. 安装redash
新建文件:docker-compose.yaml
version: '2'
services:
server:
image: redash/8.0.2.b37747
command: server
depends_on:
- postgres
- redis
ports:
- "5000:5000"
environment:
PYTHONUNBUFFERED: 0
REDASH_LOG_LEVEL: "INFO"
REDASH_REDIS_URL: "redis://redis:6379/0"
REDASH_DATABASE_URL: "postgresql://postgres@postgres/postgres"
REDASH_COOKIE_SECRET: veryverysecret
REDASH_WEB_WORKERS: 4
worker:
image: redash/8.0.2.b37747
command: scheduler
environment:
PYTHONUNBUFFERED: 0
REDASH_LOG_LEVEL: "INFO"
REDASH_REDIS_URL: "redis://redis:6379/0"
REDASH_DATABASE_URL: "postgresql://postgres@postgres/postgres"
QUEUES: "queries,scheduled_queries,celery"
REDASH_COOKIE_SECRET: veryverysecret
WORKERS_COUNT: 2
redis:
image: redis:3.0-alpine
postgres:
image: postgres:9.5.6-alpine
ports:
- "5432:5432"
volumes:
- /data/test_data/pg_data/postgres-data:/var/lib/postgresql/data
nginx:
image: redash/nginx:latest
ports:
- "80:80"
depends_on:
- server
links:
- server:redash
初始化数据库 Postgres: postgres 的 volumes(/data/test_data/pg_data/postgres-data:/var/lib/postgresql/data)可以自行配置
docker-compose run --rm server create_db
启动服务:
docker-compose up -d
直接访问:http://ip:5000 端口测试
3. 修改redash docker 的代码
查看container: docker ps
进入docker 内:docker exec -u root 63b6ab6c200a -it /bin/bash
修改: site-packages/pyhive/sqlalchemy_hive.py(需要自己安装 vim)
def get_table_names(self, connection, schema=None, **kw):
query = 'SHOW TABLES'
if schema:
query += ' IN ' + self.identifier_preparer.quote_identifier(schema)
return [row[1] for row in connection.execute(query)]
修改:../redash/redash/query_runner/hive_ds.py
def _get_tables(self, schema):
schemas_query = "show schemas"
tables_query = "show tables in %s"
columns_query = "show columns in %s.%s"
schema_name = self.configuration.get("database", "default")
for table_name in [
a
for a in [
str(a["tableName"])
for a in self._run_query_internal(tables_query % schema_name)
]
if len(a) > 0
]:
columns = [
a
for a in [
str(a["col_name"])
for a in self._run_query_internal(
columns_query % (schema_name, table_name)
)
]
if len(a) > 0
]
if schema_name != "default":
table_name = "{}.{}".format(schema_name, table_name)
schema[table_name] = {"name": table_name, "columns": columns}
return list(schema.values())
4. 修改提交: docker commit -m "add vim & modify python support show tables" -a "user1" 63b6ab6c200a redash:debug1
5. 修改docker-compose.yml 使用最新的镜像, 重启redash
docker-compose down
docker-compose run --rm server create_db
docker-compose up -d
6. 启动spark-thriftserver
sbin/start-thriftserver.sh \
--master local[*] \
--driver-memory=40G \
--executor-memory=20G \
--hiveconf hive.server2.authentication=NONE \
--hiveconf hive.server2.thrift.port=10001 \
--conf spark.hadoop.fs.s3a.endpoint=s3.cn-northwest-1.amazonaws.com.cn \
--conf spark.local.dir=/data/spark-local \
--packages org.apache.spark:spark-hive_2.11:2.4.3,org.apache.spark:spark-hive-thriftserver_2.11:2.4.3,org.apache.hive:hive-shims:2.3.4,mysql:mysql-connector-java:5.1.48
7. 测试
新建hive datasource