离线数仓各工具常用命令
前言:近期学习大数据离线数仓,接触到了许多处理数据的应用及工具:Hadoop、hive、sqoop、azkaban、flume、datax、superset、zookeeper、Kafka、redis等,做项目需要开启节点、启动服务,结束关掉服务节点等操作,这里将我常用的进行总结,以便随时查看。后期接触更多则继续在这里更新#离线数仓各工具常用命令及操作!三更
Linux
虚拟机关机
shutdown -h now
同步更新时间
ntpdate -u ntp.api.bz
虚拟机之间拷贝文件
scp file 远程用户名@远程服务器:目标路径
刷新环境变量
source /etc/profile
Hadoop
启动节点
start-all.sh
关闭节点
stop-all.sh
Hive
启动hive
hive
启动hive元数据
hive --service metastore &
远程连接启动
hiveserver2
HQL
查看方法
show functions;
查看方法如何使用
desc function xxx;
开启本地模式
set hive.exec.mode.local.auto=true;
开启动态分区
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
Sqoop
导入数据
sqoop import --connect jdbc:mysql://192.168.109.101:3306/test_db \
--username root --password 123456 \
--table tname \
--target-dir hdfs://192.168.109.101:9820/sqoopdata/tname \
--delete-target-dir
sqoop import \
--connect jdbc:mysql://192.168.109.101:3306/dbname \
--username root --password 123456 \
--hive-import --hive-database hdbname \
--hive-table tname \
--query "select * from t1 where \$CONDITIONS" \
--target-dir hdfs文件路径 \
--delete-target-dir -m 1
Azkaban
启动Azkaban
...exec-server]# ./bin/start-exec.sh #三台 先起
...web-server]# ./bin/start-web.sh
Flume
启动flume-agent
...flume]# flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/confname.conf -Dflume.root.logger=INFO,console
Datax
执行任务脚本
...datax]# ./bin/datax.py job/first.json
Superset
启动环境
~]# source activate
(base)...~]# conda activate superset
(superset)...~]# gunicorn -w 1 -t 120 -b 192.168.109.101:8787 "superset.app:create_app()"
停止Superset
停掉gunicorn进程
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
退出Superset环境
conda deactivate
数据源配置
mysql://root:12345@192.168.109.101/dbname?charset=utf8
Zookeeper
启动zookeeper(三台)
zkServer.sh start #启动
status #状态
Kafka
启动kafka服务:先保证已经启动Zookeeper集群
启动Kafka(三台)
...kafka]# /usr/local/kafka-2.4.1/bin/kafka-server-start.sh -daemon config/server.properties
Kafka服务测试
...kafka]# zkCli.sh
Topic操作
...kafka]# kafka-topics.sh --create \ ##创建topic
--topic hadoop \ ## 指定要创建的topic的名称
--zookeeper 192.168.109.101:2181,192.168.109.102:2181,192.168.109.103:2181/kafka \ ##指定kafka关联的zk地址
--partitions 3 \ ##指定该topic的分区个数
--replication-factor 3 ##指定副本因子
----------------------------------------------------------------
----------------------------------------------------------------
...kafka]# kafka-topics.sh --list \ ##查看topic
--zookeeper 192.168.109.101:2181,192.168.109.102:2181,192.168.109.103:2181/kafka
-------------------------------------------------------------------
------------------------------------------------------------------
...kafka]# kafka-topics.sh --describe \ ##查看topic信息
--topic hadoop \
--zookeeper 192.168.109.101:2181,192.168.109.102:2181,192.168.109.103:2181/kafka
---------------------------------------------------------------------
----------------------------------------------------------------------
...kafka]# kafka-topics.sh --alter \ ##修改topic
--topic hadoop \
--zookeeper 192.168.109.101:2181,192.168.109.102:2181,192.168.109.103:2181/kafka \
--partitions 4
------------------------------------------------------------------
--------------------------------------------------------------------
...kafka]# kafka-topics.sh --delete \ ##删除topic
--topic hadoop \
--zookeeper 192.168.109.101:2181,192.168.109.102:2181,192.168.109.103:2181/kafka
-----------------------------------------------------------------
-----------------------------------------------------------------
生产数据
...data]# kafka-console-producer.sh \
--topic hadoop \
--broker-list 192.168.109.101:9092,192.168.109.102:9092,192.168.109.103:9092
消费数据
...~]# kafka-console-consumer.sh \
--topic hadoop \
--bootstrap-server 192.168.109.101:9092,192.168.109.102:9092,192.168.109.103:9092
--from-beginning
发送数据
启动flume-agent
...~]# telnet 192.168.109.101 port ##port:confname.conf文件中设置的端口号
查看zookeeper目录
[zk:localhost:2181(CONNECTID)1] ls /kafka/brokers/topics
Kafka Eagle
Kafka Eagle系统命令
ke.sh start #启动Kafka Eagle
ke.sh stop #停止Kafka Eagle
ke.sh restart #重启Kafka Eagle
ke.sh status #查看Kafka Eagle系统状态
ke.sh stats #统计资源
ke.sh find [ClassName] #查看Kafka Eagle系统中的类是否存在
Redis
前端启动
...redis]# src/redis-server redis.conf
后端启动
修改redis.conf文件
...redis]# src/redis-server redis.conf
查看是否启动
...redis]# ps -ef | grep redis
关闭redis
kill -9 pid
进入客户端
...redis]# src/redis-cli -h 192.168.109.101 -p 6379 -a 123
进入指定数据库
...redis]# src/redis-cli -h 192.168.109.101 -p 6379 -a 123
切换数据库
192.168.10.101:6379[1]> select 0
Redis集群
启动实例
...cluster]# /usr/local/redis-3.0.6/src/redis-server 7001/redis.conf
进入集群
redis-cli -c -h 192.168.109.101 -p 7001
Openresty
后台启动
sudo openresty -p /opt/app/collect-app/
生产数据
/opt/soft/frp/frpc http --sd chlinrei -l 8802 -s frp.qfbigdata.com:7001 -u chlinrei
查看日志access
tail -f /opt/app/collect-app/logs/collect-app.access.log
Supervisor
启动supervisor
systemctl start supervisord
停止supervisor
systemctl stop supervisord
查看启动状态
systemctl status supervisord
启动节点命令
supervisorctl start xxx
Presto
先启动元数据
hive --service metastore &
启动/停止presto-server
/opt/soft/presto/presto-sever/bin/launcher start / stop
presto连接hive metastore
presto --server 192.168.109.101:9080 --catalog hive --schema dbname
查看Presto启动日志
...presto-server]# vi /data/presto/data/var/log/server.log
查询数据库
show schemas;
退出查询
按键Q
翻页查询
按键N或者Z
下一行
回车
Clickhouse
单机
后台服务启动
clickhouse-server --config-file=/etc/clickhouse-server/config.xml &
或
...clickhouse-server]# clickhouse-server -C ./config.xml &
查看服务是否启动
~]# ps -ef | grep clickhouse
客户端连接
...clickhouse-server]# clickhouse-client \
--host=localhost \
--port=9009 \
--user=default \
--password=123456
集群
注:集群模式需要依赖zk来存储元数据,所以需要先启动zk,才能启动clickhouse。
服务启动(三台)
...home]# clickhouse-server --config-file=/etc/clickhouse-server/config.xml
客户端连接
...clickhouse-server]# clickhouse-client \
--host=192.168.109.101 \
--port=9009 \
--user=default \
--password=123456
恢复单机
rm -rf /data/clickhouse
Prometheus
启动Prometheus
/opt/soft/prometheus/prom/prometheus --storage.tsdb.path="/data/prometheus/data/" --log.level=debug --web.enable-lifecycle --web.enable-admin-api --config.file=/opt/soft/promethetheus/prom/prometheus.yml &
# 端口:9090
Grafana
启动
# 进入目录启动 /opt/soft/grafana/graf
...graf]# ./bin/grafana-server -config conf/grafana.ini &
# 端口:3000
admin 123456
docker
启动docker
systemctl start docker
启动milvus容器
docker run -d --name milvus_cpu_0.10.0 \
-p 19530:19530 \
-p 19121:19121 \
-v /home/$USER/milvus/db:/var/lib/milvus/db \
-v /home/$USER/milvus/conf:/var/lib/milvus/conf \
-v /home/$USER/milvus/logs:/var/lib/milvus/logs \
-v /home/$USER/milvus/wal:/var/lib/milvus/wal \
milvusdb/milvus:0.10.0-cpu-d061620-5f3c00
查看容器运行状态
sudo docker ps
查看日志
docker logs CONTAINER ID
Flink
启动集群
start-cluster.sh
关闭
stop-cluster.sh
提交批次作业
flink run /usr/local/flink-1.14.3/examples/batch/WordCount.jar --input /home/wc.txt --output /home/out/01
查看结果
cat /home/out/01
将文件放到HDFS文件系统
flink run /usr/local/flink-1.14.3/examples/batch/WordCount.jar --input /home/wc.txt --output hdfs://192.168.109.101:9820/output/01
重启jobmanager
...flink]# jobmanager.sh start
启动历史服务
...flink]# historyserver.sh start
Session模式操作
启动Flink session
先确保yarn是启动okay。
yarn-session.sh -s 3 -jm 1024 -tm 1024
启动流式作业
nc -lk 6666
flink run /usr/local/flink-1.14.3/examples/streaming/SocketWindowWordCount.jar --port 6666
通过Flink --list来查看jobId
flink list
杀死自定jobID
flink cancel jobID
列出yarn的应用列表
yarn application --list
杀死指定applicationID
yarn application -kill applicationId
Per-Job模式操作
直接使用Flink run运行即可,注:运行yarn-cluster之前确保standalone集群关闭
flink run -t yarn-per-job -ys 1 -ynm flinkwc -yjm 1024 -ytm 1024 /usr/local/flink-1.14.3/examples/streaming/SocketWindowWordCount.jar --port 6666
查找提交的job
yarn application -list
Application模式操作
Application几乎和Per-Job模式一样,直接提交作业即可.注:提交时需要使用flink run-application,而不是flink run。
flink run-application -t yarn-application \
-Djobmanager.memory.process.size=1024m \
-Dtaskmanager.memory.process.size=1024m \
-Dtaskmanager.numberOfTaskSlots=1 \
-Dparallelism.default=1 \
-Dyarn.application.name="flink-wc" \
/usr/local/flink-1.14.3/examples/streaming/SocketWindowWordCount.jar --port 6666
查找提交的job
yarn application -list
杀死job
yarn application -kill application_1645070143470_0005
HBase
启动zk
zkServer.sh start
启动HBase服务
...hbase]# start-hbase.sh
HBase的客户端连接
hbase shell
查看HBase
netstat -nltp | grep 节点