按步骤走就行:
(1)raini@biyuzhe:~$ gedit .bashrc
#java
export JAVA_HOME=/home/raini/app/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$JRE_HOME/bin:$PATH
#scala
export SCALA_HOME=/home/raini/app/scala-2.10.6
export PATH=${SCALA_HOME}/bin:$PATH
#spark
export SPARK_HOME=/home/raini/spark1
export PATH=$PATH:$SPARK_HOME/bin:$PATH
# hadoop2.6
export HADOOP_PREFIX=/home/raini/hadoop2
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"
export HADOOP_PREFIX PATH CLASSPATH
(2)raini@biyuzhe:~$ sudo apt-get install rsync
(3)raini@biyuzhe:~$ sudo apt-get install openssh-server
cd ~/.ssh/ # 若没有该目录,请先执行一次ssh localhost
ssh-keygen -t rsa # 会有提示,都按回车就可以
cat id_rsa.pub >> authorized_keys # 加入授权
使用ssh localhost试试能否直接登录
(4)raini@biyuzhe:~$ sudo gedit /etc/hosts
127.0.0.1 localhost
127.0.1.1 biyuzhe
#10.155.243.206 biyuzhe
#有的说这里必须修改,否则后面会遇到连接拒绝等问题
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
(5)修改配置文件:etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/raini/app/jdk
export HADOOP_COMMON_HOME=/home/raini/hadoop
(6)raini@biyuzhe:~$ gedit .bashrc
添加export PATH="/home/raini/hadoop/bin:/home/raini/hadoop/sbin:
如 export PATH="/home/raini/hadoop/bin:/home/raini/hadoop/sbin: $JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"
(7)修改文件etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/raini/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.proxyuser.master.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.master.groups</name>
<value>*</value>
</property>
</configuration>
(8)修改etc/hadoop/hdfs-site.xml:
<configuration>
<!-- spark 配置主节点名 和 端口号
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:9001</value>
</property>-->
<!-- 虽然只需要配置 fs.defaultFS 和 dfs.replication 就可以运行(官方教程如此),
不过若没有配置 hadoop.tmp.dir 参数,则默认使用的临时目录为 /tmp/hadoo-hadoop,
由于系统重启后,找不到namenode进程,这是因为这个目录在重启时有可能被系统清理掉,而导致必须重新执行 format 才行,所以加入下面配置。 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/raini/hadoop/tmp/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/raini/hadoop/tmp/dfs/datanode</value>
</property>
<!-- 配置副本数 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 不为ture就不能使用 webhdfs 的 LISTSTATUS(liststatus),list-filestatus等需要列出文件 文件状态的命令。因为这些信息都是由 namenode 保存的 -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
(9)修改配置文件mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value> 旧的吧?
</property>
-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
</configuration>
(10)修改配置文件yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!--resourcemanager 的 地址 -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<!--resourcemanager 调度器端口 -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<!--resourcemanager 管理器端口-->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<!--resourcemanager 的 Web 端口,监控 job 的资源调度 -->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
</configuration>
(11) raini@biyuzhe:~$ source .bashrc
raini@biyuzhe:~/hadoop$ sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-namenode-biyuzhe.out
biyuzhe: starting datanode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-datanode-biyuzhe.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:7Th7Qu6av5WOqmmVLemv3YN+52LAcHw4BuFBNwBt5DU.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-secondarynamenode-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
14242 Jps
14106 SecondaryNameNode
13922 DataNode------------------(无namenode)
(12) raini@biyuzhe:~/hadoop$ hdfs namenode -format
raini@biyuzhe:~/hadoop$ sbin/stop-dfs.sh
Stopping namenodes on [localhost]
localhost: no namenode to stop
biyuzhe: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
raini@biyuzhe:~/hadoop$ sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-namenode-biyuzhe.out
biyuzhe: starting datanode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-datanode-biyuzhe.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-secondarynamenode-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
14919 NameNode-----------------------(namenode)
15407 Jps
15271 SecondaryNameNode
15073 DataNode
(13)raini@biyuzhe:~/hadoop$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/raini/hadoop/logs/yarn-raini-resourcemanager-biyuzhe.out
biyuzhe: starting nodemanager, logging to /home/raini/app/hadoop-2.7.2/logs/yarn-raini-nodemanager-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
15625 NodeManager
14919 NameNode
15271 SecondaryNameNode
15073 DataNode
15937 Jps
15501 ResourceManager
(14)验证: yarn:http://localhost:8088/
hadoop: http://localhost:50070
Overview 'localhost:9000' (active)
Started: | Sat Apr 23 14:04:17 CST 2016 |
---|---|
Version: | 2.7.2, rb165c4fe8a74265c792ce23f546c64604acf0e41 |
Compiled: | 2016-01-26T00:08Z by jenkins from (detached from b165c4f) |
Cluster ID: | CID-b0ad8d51-6ea3-4bfc-a1d8-ee0cbc9a8ff6 |
Block Pool ID: | BP-890697487-127.0.1.1-1461391390144 |
--------------------------------spark安装
(2)配置Spark环境变量
export SPARK_HOME=/home/raini/spark
export PATH=${SPARK_HOME}/bin:$PATH
(3)配置spark-env.sh
export JAVA_HOME=/home/raini/app/jdk
export SCALA_HOME=/home/raini/app/scala
export SPARK_WORKER_MEMORY=4g
export SPARK_MASTER=biyuzhe
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI=8099
export SPARK_WORKER_CORES=2
export HADOOP_CONF_DIR=/home/raini/hadoop/etc/hadoop
(4)cp slaves.template slaves
#localhost
biyuzhe
spark/sbin/start-all.sh 启动
------------------------mysql----Hive2.0.0安装
1)mysql安装
$sudo apt-get install mysql-server
登录mysql:$mysql -u root -p
建立数据库hive:mysql>create database hive;
mysql>show databases;//查看创建;
这里一定要把hive数据库的字符集修改为latin1,而且一定要在hive初次启动的时候就修改字符集 (否则就等着删除操作的时候死掉吧)
mysql>alter database hive character set latin1;
创建hive用户,并授权:mysql>grant all on hive.* to hive@'%' identified by 'hive';
(法二: DROP USER 'hive'@'%';
mysql> create user 'hive'@'%' identified by 'hive';
赋予权限 grant all privileges on *.* to 'hive'@'%' with grant option;
)
更新:mysql>flush privileges;
查询mysql的版本:mysql>select version();//这里是5.7.11-0ubuntu6
下载mysql的JDBC驱动包: http://dev.mysql.com/downloads/connector/j/
下载mysql-connector-java-5.1.38.tar.gz ,复制msyql的JDBC驱动包到Hive的lib目录下。
2)Hive安装
官网http://hive.apache.org/下载apache-hive-2.0.0-bin.tar.gz并解压在home/hp路径下。
环境配置
添加如下:
#Hive
export HIVE_HOME=/home/raini/app/hive-2.0.0
export PATH=$PATH:${HIVE_HOME}/bin
export CLASSPATH=$CLASSPATH.:{HIVE_HOME}/lib
配置hive-env.sh文件
复制hive-env.sh.template,修改hive-env.sh文件
指定HADOOP_HOME及HIVE_CONF_DIR的路径如下:
HADOOP_HOME=/home/。。/hadoop
export HIVE_CONF_DIR=/home/。。/hive/conf
# export HADOOP_HEAPSIZE=512
# 含有额外的图书馆为蜂巢编译/执行必需的文件夹
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/raini/app/hive-2.0.0/lib
4)配置hive-site.xml文件
Hive uses Hadoop, so:
you must have Hadoop in your path OR
export HADOOP_HOME=<hadoop-install-dir>
In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w in HDFS before you can create a table in Hive.
Commands to perform this setup(需要给755权限):
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/tmp
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/log
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/warehouse
raini@biyuzhe:~$ hadoop fs -chmod g+w /user/hive/tmp
raini@biyuzhe:~$ hadoop fs -chmod g+w /user/hive/log
raini@biyuzhe:~$ hadoop fs -chmod g+w /user/hive/warehouse /usr/hive/tmp
You may find it useful, though it's not necessary, to set HIVE_HOME:
$ export HIVE_HOME=<hive-install-dir>
export HIVE_HOME=/home/raini/app/hive
$ sudo /etc/init.d/mysql status
hive/bin下要有->mysql-connector-java-5.1.38-bin.jar
(5)hive配置-设置hive数据将元数据存储在MySQL中 , hive需要将元数据存储在RDBMS中,默认情况下,配置为Derby数据库
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
<description>使用本机mysql服务器存储元数据。这种存储方式需要在本地运行一个mysql服务器</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <!--不能用biyuzhe -->
<description> biyuzhe使用的数据库?charcherEncoding=UTF-8</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>使用的链接方式</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>mysql用户名</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<!--
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>uri1,uri2,...该参数Hive中metastore(元数据存储)采用remote方式,非Local方式。jdbc/odbc connection hive,if mysql must set </description>
</property>
-->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>元数据存放的地放,需要在本地(不是hdfs中)新建这个目录</description>
</property>
<property>
<name>hive.exec.scratdir</name>
<value>/user/hive/tmp</value>
<description> hive的数据临时文件目录,需要在本地新建这个目录HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/user/hive/log</value>
<description>这个是用于存放hive相关日志的目录</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
-------------------------------------------finish hive-site.xml
cp hive-log4j.properties.template hive-log4j.proprties
vi hive-log4j.properties
hive.log.dir=
这个是当hive运行时,相应的日志文档存储到什么地方
(mine:hive.log.dir=/usr/hive/log/${user.name})
hive.log.file=hive.log
这个是hive日志文件的名字是什么
默认的就可以,只要您能认出是日志就好,
只有一个比较重要的需要修改一下,否则会报错。
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
如果没有修改的话会出现:
WARNING: org.apache.hadoop.metrics.EventCounter is deprecated.
please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
(只要按照警告提示修改即可)。
-------------------------------------------------------finish all
hive metastore 服务端启动命令:
hive --service metastore -p <port_num>
raini@biyuzhe:~/app/hive/tmp$ hive --service metastore > /tmp/hive_metastore.log 2>&1 &
[1] 26856
这里Hive中metastore(元数据存储)采用Local方式,非remote方式。
报错:
Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)
第一次需执行初始化命令$raini@biyuzhe:~$ schematool -dbTypemysql –initSchema
raini@biyuzhe:~$ schematool -initSchema -dbType mysql -userName=hive -passWord=hive
查看初始化后信息$ schematool -dbType mysql –info
启动Hadoop服务:$sbin/start-dfs.sh和$sbin/start-yarn.sh
启动raini@biyuzhe:~/app$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/raini/app/hive2.0.0/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/hive2.0.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/home/raini/app/hive2.0.0/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive (default)> show databases;
OK
default
Time taken: 1.017 seconds, Fetched: 1 row(s)
hive (default)>
hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';
报错FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
开启metastore:
raini@biyuzhe:~/app$ hive --service metastore
Starting Hive Metastore Server
hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';
OK
Time taken: 1.613 seconds
可以看到mysql中的元数据信息:
raini@biyuzhe:~$ mysql -u hive -p
mysql> select* from TBLS
-> ;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| 41 | 1461469991 | 1 | 0 | raini | 0 | 41 | test | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)
Hadoop中查看生成的文件:
raini@biyuzhe:~$ hdfs dfs -ls /user/hive/warehouse/
Found 1 items
drwxrwxrwx - raini supergroup 0 2016-04-24 11:53 /user/hive/warehouse/test