版本
- java: 1.8 openjdk .* 0_371
- hadoop: 3.2.1
- mysql: 8.0.31
- mysql-connector-java-8.0.30.jar
- flume: 1.9.0
- sqoop: 1.4.7
- spark-3.3.2-bin-hadoop3-scala2.13.tgz
os
安装 linux系统(vm上操作)
- version: ubuntu 20.
- 安装系统时注意使用镜像源
https://mirrors.aliyun.com/ubuntu/
- 安装完成后,如果看到安装底部有 “cancel update…”,就证明已安装完成。可以直接重启虚拟机
ssh免密登录 (ubuntu上操作)
sudo apt update
sudo apt install net-tools
sudo apt install ssh
ssh-kegen -t rsa
制作公私钥以开启免密登录,windows端也可以制作- 将公钥放置到authorized_keys中,
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 启动服务
sudo /etc/init.d/ssh start
MobaXterm
一个远程连接工具,好像比较好用,可以传文件,但不能命令补全
使用方式跟其他工具差不多
JDK
sudo apt install openjdk-8
的安装方式不会安装编译器(具体我也不清楚),就是少装了东西
- download
- 一般下载version8,因为更常用
- 在jdk下载列表中选择
x64 Compressed Archive
版本的tar包 - 将tar包解压至
/usr/lib/jvm
,tar -zvxf jdk-8u371-linux-x64.tar.gz -C /usr/lib/jvm
- 配置java相关环境遍历
sudov vim ~/.bashrc
#set oracle jdk environment (自行修改相应名称和路径)
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_371
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
- 配置软路由,防止某些只认软路由的软件(我也不清楚)出现错误
sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.8.0_371/bin/java 300
sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.8.0_371/bin/javac 300
sudo update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/jdk1.8.0_371/bin/jar 300
sudo update-alternatives --install /usr/bin/javah javah /usr/lib/jvm/jdk1.8.0_371/bin/javah 300
sudo update-alternatives --install /usr/bin/javap javap /usr/lib/jvm/jdk1.8.0_371/bin/javap 300
sudo update-alternatives --config java
- test
java -version
mysql
- 可以选择在线安装
sudo apt install mysql-server
,默认版本会因为linux系统的版本不同而不同,下面时离线安装(没试过)- 离线安装似乎更好处理错误
- downland
- 解压并安装,安装顺序(版本可能不一样),中途需要配置用户和密码
sudo dpkg --install mysql-community-client-plugins_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install mysql-community-client-core_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install mysql-common_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install mysql-community-client_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install libmysqlclient21_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install libmysqlclient-dev_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install mysql-client_8.0.31-1ubuntu20.04_amd64.deb
sudo apt install libaio1
sudo apt-get install libmecab2
## 下面的两步出了错误可以忽略
sudo dpkg --install mysql-community-server-core_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install mysql-community-server_8.0.31-1ubuntu20.04_amd64.deb
sudo dpkg --install mysql-server_8.0.31-1ubuntu20.04_amd64.deb
- 启动、登录、测试
hadoop dfs
安装
注意版本与java版本的对应关系
- downland
- 解压
sudo tar -zvxf hadoop-*.tar.gz -C /usr/local/hadoop
并给权限 - 编辑
~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
- source
- bash验证
hadoop version
单节点(开发模式还是这个好)
- 修改权限
sudo chown -R king:king /usr/local/hadoop
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
更改export JAVA_HOME=
\/usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
/usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
/usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
</configuration>
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
- 创建namenode和 datanode的文件夹
mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
hadoop namenode -format
- 验证
start-all.sh
,jps
多节点(在单节点的基础上进行配置)
/usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
hdfs-site.xml
<property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property>
- 在上述的hdfs-site.xml中注释掉 datanode中的内容
- 编辑
/etc/hosts
加入master、slave1、salve2 - 配置网络,示例:
sudo ivm /etc/netplan/00-installer-config.yaml
(注意替换)
network:
ethernets:
ens33:
addresses: [192.168.22.100/24]
dhcp4: false
gateway4: 192.168.22.2
nameservers:
addresses: [192.168.22.1,114.114.114.114]
version: 2
- 判断是否修改成功
sudo netplan try
sudo netplan apply
让配置生效并重启- 从master克隆出slave1(jdk、mysql、hadoop、环境变量)
- 配置slave1的网络,并,在hdfs-site.xml中修改datanode的配置(删除namenode配置)
- 确保master可以远程控制salve1、salve2
- master中删除之前的namenode下的current文件(hdfs的启动缓存)和删除datanode
- slave*中删除掉namenode,以及datanode中的current
-
hadoop namenode -format
- 验证
start-all.sh
,jps
YARN UI网站
http://192.168.22.100:8088
HDFS UI 网站
http://192.168.22.100:50070
flume
用来采集数据
install
- download->tar->rename, 修改所属用户和用户组(对于日志采集者部分来说时不要操作)
- 该版本特殊操作
sudo rm /usr/local/flume/lib/guava-*.jar
- 添加
guava-27.0-jre.jar
数据端口实验
sudo apt install netcat-traditional --config nc
可能需要先sudo apt update
- 服务端:
nc -l -p 9999 localhost
- 客户端:
nc localhost 9999
- 然后再客户端发指令可服务端的结果
日志采集
- 创建
/usr/local/flume/conf/test.conf
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
- run
/usr/local/flume/bin/flume-ng agent -n a1 -c conf -f conf/test.conf -Dflume.root.logger=INFO,console
,出现绿色 connected 则启动成功 - 测试
telnet localhost 44444
随便发一些东西
sync dfs
hive
install
- download->tar->rename, 修改所属用户和用户组
- 编辑
~/.bashrc
export HIVE_HOME=/usr/local/hive
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_CONF=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin
cp /usr/local/hive/conf/hive-default.xml.template /usr/local/hive/conf/hive-site.xml
vim /usr/local/hive/conf/hive-site.xml
<!-- 以下property需要修改,其余的不用 -->
<property>
<!-- 数据库连接字串 -->
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8&createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<!-- 驱动名 -->
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<!-- 数据库账号 -->
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<!-- 数据库密码 -->
<name>javax.jdo.option.ConnectionPassword</name>
<value>111111</value>
<description>password to use against metastore database</description>
</property>
<!-- 允许任何人访问 -->
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>
<!-- 禁用代理功能 -->
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<!-- hive远程传输方式改为http,这样的话就能在客户端用http访问数据库了 -->
<property>
<name>hive.server2.transport.mode</name>
<value>http</value>
<description>
Expects one of [binary, http].
Transport mode of HiveServer2.
</description>
</property>
修改上述内容后
删除 3215 行的特殊字符  否则会报错(特殊操作)
将文件中出现的${system:java.io.tmpdir} 替换为 /usr/local/hive/tmp
将所有 ${system:user.name} 替换成 ${user.name}
- 复制 mysql-connector-java-*.jar 到 lib 目录
- 该版本特殊操作
sudo rm /usr/local/flume/lib/guava-*.jar
- 添加
guava-27.0-jre.jar
- mysql 创建用户
create user 'hive'@'%' identified by '12';
grant all privileges on *.* to 'hive'@'%';
flush privileges;
- 用新用户登录mysql看看是否创建成功
- 启动hive
/usr/local/hive/schematool -dbType mysql -initSchema
有competed则ok - 进入hive(先启动hdfs)看看能否运行hive的sql 命令
sqoop
install
- download->tar->rename, 修改所属用户和用户组
- 这里用的版本用两个tar包,其中bin包里的sqoop-*.jar应解压到
/usr/loca/sqoop
中 - 将
avro-*.jar
和mysql-connector-java-*.java
包放入/usr/local/sqoop/lib/
中(可能是特殊操作) cp /usr/local/sqoop/conf/sqoop-env-template.sh /usr/local/sqoop.conf/sqoop.sh
vim /usr/local/sqoop/conf/sqoop.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
hive 迁入 mysql
- hive 身份进入mysql创建hive.result
create database hive;
create table result(id int(10) primary key, name varchar(10), count int(10));
- 迁移
/usr/local/sqoop/bin/sqoop export \
--connect jdbc:mysql://master:3306/hive \
--username hive \
--password 123 \
--table result \
--export-dir /user/hive/warehouse/hive.db/result \
--columns "id,name,count" \
--input-fields-terminated-by ',' \
--input-lines-terminated-by '\n' \
--bindir /usr/local/sqoop/lib
- 查看mysql表中是否用数据
azkaban
使用很友好,安装有点不太友好
一个重要的问题:在后期azkaban的任务调度过程中,对hadoop执行的内存要求,大于3G所以在/usr/local/hadoop/etc/hadoop/hadoop-env.sh 得配置内存的大小超过3G
export HDFS_NAMENODE_OPTS="-XX:+UseParallelGC -Xmx4g"
spark
- install
cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
vim /usr/loca/spark/conf/spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
- 使用hive,之后要启动hive(
hive --service metasotre &
jps 中有RunJar 则启动ok)
vim /usr/local/hive.conf/hive-site.xml
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
- 命令终端 spark-shell
- 运行jar程序
spark-submit --class com.example.WordCount --master local *.jar
- 与hadoop集成
./sbin/start-master.sh
idea 开发环境搭建
将local machine 更改为ssh就能实现远程调试、开发