安装包版本列表(2019.4.17)
名称 | 版本号 |
zookeeper | 3.4.13 |
hadoop | 3.1.2 |
flume | 1.9.0 |
hbase | 1.2.9 |
hive | 3.1.1 |
kafka | 2.1.1 |
sqoop | 1.4.6_1 |
storm | 1.2.2 |
mysql | 8.0.15 |
jdk安装(略)(1.8)
安装zookeeper(3.4.13)
环境变量
conf目录下:
cp zoo_sample.cfg zoo.cfg
修改zoo.cfg
dataDir=/usr/local/Cellar/zookeeper/3.4.13/tmp
dataLogDir=/usr/local/Cellar/zookeeper/3.4.13/logs
创建tmp和logs
bin下zkServer.sh start启动 zkServer.sh status查看启动状态
hadoop伪分布式安装(3.1.2)
配置环境变量:$HADOOP_HOME=… ${HADOOP_HOME}/bin
source ~/.profile
配置免密登录
ssh-keygen
ssh-copy-id localhost@127.0.0.1
关闭防火墙
配置hadoop-env.sh
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"
配置yarn-env.sh
JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"
配置core-site.xml
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp/hadoop-${user.name}</value>
</property>
</configuration>
配置hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
配置yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>
配置 mapred-site.xml
<property>
<!-- 指定mapreduce运行在yarn上 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
hadoop version 测试是否安装成功
在bin目录下执行,提示"……has been successfully formatted" 等字样出现即说明格式化成功
hdfs namenode -format
在sbin启动
start-all.sh
jps查看有5个进程SecondaryNameNode、DataNode、NodeManager、ResourceManager、NameNode表示成功
访问http://localhost:9870(旧版hadoop2.x使用端口50070)
安装mysql(8.0.15)
sudo apt-get install mysql-server
sudo apt-get install mysql-client
sudo apt-get install libmysqlclient-dev
检查是否安装成功:
linux
sudo netstat -tap | grep mysql
mac
lsof -i:端口号
#查看所有监听的端口
sudo lsof | grep mysql
登录验证:
mysql -u root -p
启动/关闭/重启mysql
service mysql start
service mysql stop
service mysql restart
运行上面命令,其实是service命令去找/etc/init.d下的相关的mysql脚本去执行启动、关闭动作。
安装Hive(3.1.1)
环境变量
/conf目录下:
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
在hdfs目录下建立三个文件,用来存放hive信息,并赋予777权限
注意:必须开启hdfs服务,不然报错
创建以下目录均可在localhost://9870中目录栏utility/browse the dictionary中找到
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/tmp
hdfs dfs -mkdir -p /user/hive/log
hdfs dfs -chmod -R 777 /user/hive/warehouse
hadoop fs -chmod 777 /user/hive/tmp
hdfs dfs -chmod -R 777 /user/hive/tmp
hdfs dfs -chmod -R 777 /user/hive/log
修改hive-env.sh文件
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.2/libexec
export HIVE_HOME=/usr/local/Cellar/hive/3.1.1/libexec
export HIVE_CONF_DIR=/usr/local/Cellar/hive/3.1.1/libexec/conf
修改hive-site.xml文件
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/hive/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
hive目录下创建tmp文件
将mysql-connector-java-5.1.46-bin.jar放入lib目录下
初始化hive,在hive2.0以后的版本,初始化命令都是:
schematool -dbType mysql -initSchema
注意:如果原先已有mysql强烈建议卸载后用brew再安装,不然会报如下错误:
failed to get schema version 或者 access denied for user
如果遇到上述错误网上建议的修改权限方法非但没有用,还有可能弄崩mysql,之后只能重装了~
或者
在mysql命令下执行
#创建数据库
mysql> create database hive;
#赋予访问权限
mysql> grant all privileges on hive.* to root@localhost identified by '密码' with grant option;
mysql> flush privileges;
bin目录下执行hive启动
安装Flume(1.9.0)
配置环境变量
简单示例:在conf文件中 cp flume-conf.properties.template flume-conf.properties
vim flume-conf.properties 如下:删除所有内容并添加
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/test
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 1000
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume:
flume-ng agent --conf ../conf --conf-file /usr/local/Cellar/flume/1.9.0/libexec/conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console
创建一个test.txt写些内容文件用于测试
测试:
./flume-ng avro-client --conf /usr/local/Cellar/flume/1.9.0/libexec/conf --host 0.0.0.0 --port 44444 --filename ../../test.txt
查看flume的进程
linux
ps -aux | grep flume
mac
sudo lsof | grep flume
监听文件并向hdfs和kafka写数据配置
a1.sources=r1
a1.sinks=fs kfk
a1.channels=c1 c2
a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /home/tellhow-iot2/doc/test.log
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
a1.sources.r1.selector.type=replicating
a1.sinks.kfk.type=org.apache.flume.sink.kafka.KafkaSink
a1.sinks.kfk.topic=test
a1.sinks.kfk.brokerList=localhost:9092
#接收数据安全机制0 1 -1
a1.sinks.kfk.requiredAcks=1
#a1.sinks.kfk.batchSize = 2
a1.sinks.kfk.serializer.class=kafka.serializer.StringEncoder
a1.sinks.fs.type=hdfs
#%y-%m-%d/%H%M/
a1.sinks.fs.hdfs.path=hdfs://localhost:9000/source/%y-%m-%d/
# 文件的命名, 前缀
#a1.sinks.k1.hdfs.filePrefix = events-
# 文件的命名, 后缀
a1.sinks.k1.hdfs.fileSuffix=.log
# 临时文件名前缀inUsePrefix,临时文件名后缀inUseSuffix
# 10 分钟就改目录,生成新的目录 2018-11-20/1010 2018-11-20/1020 2018-11-20/1030
#a1.sinks.k1.hdfs.round = true
#a1.sinks.k1.hdfs.roundValue = 10
#a1.sinks.k1.hdfs.roundUnit = minute
#压缩格式gzip, bzip2, lzo, lzop, snappy
#a1.sinks.fs.hdfs.codeC = gzip
# 时间:每3s滚动生成一个新的文件 0表示不使用时间来滚动
a1.sinks.fs.hdfs.rollInterval=0
#空间: 文件滚动的大小限制(bytes) 当达到500b是滚动生成新的文件,默认128M
a1.sinks.fs.hdfs.rollSize=0
#写入多少个event数据后滚动文件(事件个数),滚动生成新的文件
a1.sinks.fs.hdfs.rollCount=0
#5个事件就开始往里面写入
#a1.sinks.k1.hdfs.batchSize = 5
#用本地时间格式化目录
a1.sinks.k1.hdfs.useLocalTimeStamp=flase
#下沉后, 生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType=DataStream
#最大允许打开的HDFS文件数,当打开的文件数达到该值,最早打开的文件将会被关闭
a1.sinks.k1.hdfs.maxOpenFiles=5000
#HDFS副本数,写入 HDFS 文件块的最小副本数, 该参数会影响文件的滚动配置,一般将该参数配置成1,才可以按照配置正确滚动文件
#a1.sinks.k1.hdfs.minBlockReplicas = 1
#默认值:10000,执行HDFS操作的超时时间(单位:毫秒);callTimeout
#threadsPoolSize:默认值:10,hdfs sink 启动的操作HDFS的线程数。
#rollTimerPoolSize:默认值:1,hdfs sink 启动的根据时间滚动文件的线程数。
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
a1.sources.r1.channels=c1 c2
a1.sinks.fs.channel=c1
a1.sinks.kfk.channel=c2
安装kafka(2.11-2.1.1)
kafka目录下创建日志路径
mkdir logs
vim config/server.properties修改配置文件中21、31、36和60行:
broker.id=1
listeners=PLAINTEXT://localhost:9092
advertised.listeners=PLAINTEXT://localhost:9092
log.dirs=/usr/local/Cellar/kafka/2.1.1/libexec/logs
启动zookeeper验证
bin/kafka-server-start.sh config/server.properties
创建主题
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
查看列表
bin/kafka-topics.sh --list --zookeeper localhost:2181
生产者
./kafka-console-producer.sh --broker-list localhost:9092 --topic test
消费者
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
查看Topic消息
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
第一行给出了所有分区的摘要,每个附加行给出了关于一个分区的信息。 由于我们只有一个分区,所以只有一行。
“Leader”: 是负责给定分区的所有读取和写入的节点。 每个节点将成为分区随机选择部分的领导者。
“Replicas”: 是复制此分区日志的节点列表,无论它们是否是领导者,或者即使他们当前处于活动状态。
“Isr”: 是一组“同步”副本。这是复制品列表的子集,当前活着并被引导到领导者。
启动kafka:
bin/kafka-server-start.sh config/server.properties &
安装storm(先要有jdk、zookeeper)(1.2.2)
修改storm.yaml
storm.zookeeper.servers:
- "127.0.0.1"
storm.zookeeper.port: 2181
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
storm.local.dir: "/usr/local/Cellar/storm/1.2.2/data"
nimbus.seeds: ["127.0.0.1"]
启动storm:
启动niumbus
./storm nimbus >> /usr/local/Cellar/storm/1.2.2/logs/nimbus.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/nimbus.log
启动UI
./storm ui>> /usr/local/Cellar/storm/1.2.2/logs/ui.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/ui.log
启动supervisor
./storm supervisor >> /usr/local/Cellar/storm/1.2.2/logs/supervisor.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/supervisor.log
启动logviewer
./storm logviewer>> /usr/local/Cellar/storm/1.2.2/logs/logviewer.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/logviewer.log
验证:浏览器打开webUI,http://localhost:8080
*当正式运行项目导入jar包之后,启动topology
./bin/storm jar /usr/local/Cellar/storm/1.2.2/libexec/examples/storm-starter/storm-starter-topologies-0.9.5.jar storm.starter.WordCountTopologywordcount
安装hbase
在"/usr/local/Cellar/hbase/1.2.9"下面新建文件夹如下
/hadoop/pids
/hbasetmp
/zookeepertmp
修改conf下的hbase-env.sh:
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"
export HBASE_PID_DIR="/usr/local/Cellar/hbase/1.2.9/hadoop/pids"
修改conf下的hbase-site.xml:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>/usr/local/Cellar/hbase/1.2.9/hbasetmp</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/Cellar/hbase/1.2.9/zookeepertmp</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
环境变量
启动:
start-hbase.sh
hbase shell
sqoop安装(1.4.6_1)
环境变量
conf目录下cp sqoop-env-template.sh sqoop-env.sh
修改:
export HADOOP_COMMON_HOME=/usr/local/Cellar/hadoop/3.1.2
export HADOOP_MAPRED_HOME=/usr/local/Cellar/hadoop/3.1.2
export HIVE_HOME=/usr/local/Cellar/hive/3.1.1
export HBASE_HOME=/usr/local/Cellar/hbase/1.2.9
lib放入mysql-connect-java-5.1.41.jar
将sqoop-1.4.4.jar放到hadoop中的/share/hadoop/mapreduce/的lib目录下
sqoop命令:
mysql创建表后缀
create table stu2 (`id` varchar(20),`name` varchar(20)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
#sqoop导出hive的数据到mysql
sqoop export --connect jdbc:mysql://localhost:3306/test --username root --password root --table stu1 --export-dir '/user/hive/warehouse/parkdb.db/stu1' --fields-terminated-by '\t';
#从mysql导出到hive
sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table kafka --hive-import --create-hive-table --hive-table parkdb.stu2 -m 1
各别启动项:
zookeeper/3.4.13/libexec/bin/zkServer.sh start
hadoop/3.1.2/libexec/sbin/start-all.sh
kafka/2.1.1/libexec/bin/kafka-server-start.sh ./kafka/2.1.1/config/libexec/server.properties &
kafka/2.1.1/libexec/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
flume-ng agent --conf /usr/local/Cellar/flume/1.9.0/libexec/conf --conf-file /usr/local/Cellar/flume/1.9.0/libexec/conf/bridge03.properties --name a1 -Dflume.root.logger=INFO,console
删除kafka主题
kafka/2.1.1/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
间隔1s循环shell插入文件
#!/bin/sh
function rand(){
min=$1
max=$(($2-$min+1))
num=$(date +%s%N)
echo $(($num%$max+$min))
}
for i in {1..100};
do
random = $(rand 1 100)
echo "$random $random $random $random $random 2018-11-$i" >> /home/tellhow-iot2/doc/test.log;
sleep 1;
done;
安装错误解决方案:
1.ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
先检查/usr/etc/my.cnf,可以选择添加
[client]
port = 3306
socket = /tmp/mysql.sock
default-character-set = utf8
[mysqld]
collation-server = utf8_unicode_ci
character-set-server = utf8
init-connect ='SET NAMES utf8'
max_allowed_packet = 64M
bind-address = 127.0.0.1
port = 3306
socket = /tmp/mysql.sock
innodb_file_per_table=1
[mysqld_safe]
timezone = '+0:00'
然后修改.bash_profile
export PATH=$PATH:/usr/local/Cellar/mysql/8.0.15/bin
接着去/usr/local/var设置权限如下(关键步骤)
sudo chmod -R 777 mysql
最后去/usr/local/Cellar/mysql/8.0.15/bin下面启动
mysql.server start