大数据领域主要是以java为主,次要的编程语言为python,scala等,本文介绍和python相关的大数据:
python所需要的版本为python3.6:
数据源:
MySQL:
oracle:
MS SQL server:
postgresql: pip install psycopg2
MongoDB:
Neo4J:
Redis:
大数据处理:
Hadoop:(HDFS、MapReduce、YARN)
pip install dask
pip install mrjob
pip install pydoop (默认的为pydoop 1.2不稳定)
# pip install --pre pydoop
hive:
pyhive impyla
hbase:
happybase
presto:
pip install presto
pip install presto-python-client
clickhouse:
ElasticSearch:
elasticsearch-py
pip install elasticsearch
pip install pysolr
pip install elasticsearch-dsl
kafka:
pip install kafka-python
kafka pykafka
spark:
flink:
kylin:
kylinpy
kudu:
kudu-python
impala:
impyla
apache-beam
大数据可视化:
pyecharts
hue
superset
调度系统:
luigi
airflow
安全:
Druid.io:官方
pip install pydruid
网站:
https://github.com/druid-io/pydruid
注意:
若同时安装上述软件,其依赖的软件包会有冲突。建议独立部署.