MAC OS 伪分布式Apache Hadoop及HBase Hive Zookeeper Flume Mahout Spark2 配置

由于CDH版本自带的Spark1.6,而官方网站上所写的Spark2配置方法似乎需要CM,而MAC OS不能使用CM。出于学习Spark2需要我只得配置原生Apache Hadoop伪分布,详细记录一下整个过程。

1、配置环境:macOS Sierra 10.12.6

                    jdk:     : java version "1.8.0_131"

                    maven : Apache Maven 3.5.0

                    scala    : Scala 2.12.2

                    mysql. : Server version: 5.7.18 MySQL Community Server
             

                    ssh无密码登录配置完成


2、包下载:

              apache-flume-1.7.0-bin.tar

      apache-hive-2.3.0-bin.tar

      apache-mahout-distribution-0.13.0.tar

      hadoop-2.8.1.tar

      hbase-1.3.1-bin.tar

      kafka_2.12-0.11.0.0.tar

      pig-0.17.0.tar

      spark-2.2.0-bin-hadoop2.7.tar

      sqoop-1.99.7-bin-hadoop200.tar

      sqoop-1.99.7.tar

      zookeeper-3.4.9.tar


3、Hadoop伪分布配置

3.1 在用户主目录下创建Hadoop/目录,将解压后的hadoop-2.8.1移动到该目录下

    cd ~

    mkdir Hadoop/

    mv ~/Downloads/hadoop-2.8.1 ~/Hadoop/

3.2 配置hdfs-site.xml

    cd ~/Hadoop/hadoop-2.8.1/etc/hadoop

    vi hdfs-site.xml

    添加内容如下:

    <configuration>

      <property>

        <name>dfs.replication</name>

        <value>1</value>

      </property>

      <property>

        <name>dfs.namenode.name.dir</name>

        <value>/Users/hwg/Hadoop/hadoop-2.8.1/dfs/name</value>     

      </property>

      <property>

        <name>dfs.datanode.data.dir</name>

        <value>/Users/hwg/Hadoop/hadoop-2.8.1/dfs/data</value>       

      </property>

      <property>

        <name>dfs.permissions</name>

        <value>true</value>

        <description>'true'打开权限,'false'关闭权限</description>

      </property>

    </configuration>

3.3 更改hadoop-env.sh

    cd ~/Hadoop/hadoop-2.8.1/etc/hadoop

    vi hadoop-env.sh

    修改JDK配置:

    # The java implementation to use.

    export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home"

3.4 修改core-site.xml

    cd ~/Hadoop/hadoop-2.8.1/etc/hadoop

    vi core-site.sh

    添加配置:

    <configuration>

      <property>

        <name>hadoop.tmp.dir</name>

        <value>/Users/hwg/Documents/apache/hadoop-2.8.1/hadoop_tmp</value>

        <description>A base for other temporary directories.</description>

      </property>

      <property>

        <name>fs.default.name</name>

        <value>hdfs://localhost:9000</value>

      </property>

      <property>

        <name>mapred.job.tracker</name>

        <value>hdfs://localhost:9001</value>

      </property>

      <property>

        <name>dfs.replication</name>

        <value>1</value>

      </property>

    </configuration>

3.5 配置mapred-site.xml

    cd ~/Hadoop/hadoop-2.8.1/etc/hadoop

    cp mapred-site.xml.template mapred-site.xml

    vi mapred-site.xml

    添加配置如下:

    <configuration>

      <property>

        <name>mapred.job.tracker</name>

        <value>localhost:9001</value>

      </property>

    </configuration> 

3.6 启动Hadoop

    ~/Hadoop/hadoop-2.8.1/bin/hadoop namenode -format

    ~/Hadoop/hadoop-2.8.1/sbin/start-all.sh

    如果没有报错,使用jps查看运行状态

       

    13760 NodeManager

    13686 ResourceManager

    13591 SecondaryNameNode

    13435 NameNode

    13503 DataNode


4、HIVE配置

4.1 元数据库配置

  我使用本地MySQL作为HIVE元数据库

  安装完成后,登入  mysql -u root

  创建用户         mysql> create user 'hive' identified by '123456';

  授权            mysql> grant all on *.* to 'hive'@'%' identified by '123456';

                 mysql> grant all on *.* to 'hive'@'localhost' identified by '123456';

  创建元数据库     mysql> create database metastore;    

  初始化          ~/Hadoop/apache-hive-2.3.0-bin/bin/schematool -initSchema --dbType mysql

4.2 hive-site.xml配置

  cd ~/Hadoop/apache-hive-2.3.0-bin/conf

  cp hive-default.xml.template hive-site.xml

  vi hive-site.xml

  以下几项需要修改:

  

  <property>

    <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://localhost:3306/metastore?characterEncoding=UTF-8
</value>

    <description>

      JDBC connect string for a JDBC metastore.

      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.

      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

    </description>

  </property>

  

  <property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

    <description>Driver class name for a JDBC metastore</description>

  </property>

  

  <property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>hive</value>

    <description>Username to use against metastore database</description>

  </property>

  

  <property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>123456</value>

    <description>password to use against metastore database</description>

  </property>


  <property>

    <name>datanucleus.schema.autoCreateAll</name>

    <value>true</value>

    <description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead.</description>

  </property>

  

  <property>

    <name>hive.exec.local.scratchdir</name>

    <value>/tmp/hive/iotmp</value>

    <description>Local scratch space for Hive jobs</description>

  </property>


  <property>

    <name>hive.downloaded.resources.dir</name>

    <value>/tmp/hive/iotmp</value>

    <description>Temporary local directory for added resources in the remote file system.</description>

  </property>

  

  <property>

    <name>hive.querylog.location</name>

    <value>/Users/hwg/hive/iotmp</value>

    <description>Location of Hive run time structured log file</description>

  </property>


4.3 hive-env.sh配置

  添加如下内容:  

   export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

   export HIVE_CONF_DIR=/Users/hwg/Hadoop/apache-hive-2.3.0-bin/conf

   export HADOOP_HOME=/Users/hwg/Hadoop/hadoop-2.8.1


4.4 添加MySQL Connector

   cp ~/Downloads/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar ~/Hadoop/apache-hive-2.3.0-bin/lib/

       

4.5 启动HIVE

   ~/Hadoop/apache-hive-2.3.0-bin/bin/hive


5、 HBase配置

5.1 hbase-env.sh配置

    cd ~/Hadoop/hbase-1.3.1/

    vi hbase-env.sh

    添加如下内容:

    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

    export HBASE_CLASSPATH=/Users/hwg/Hadoop/hbase-1.3.1/conf

    export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

    export HBASE_MANAGES_ZK=true

5.2 hbase-site.xml配置

    cd ~/Hadoop/hbase-1.3.1/

    vi hbase-site.xml

    添加如下配置:

    <configuration>

      <property>

        <name>hbase.rootdir</name>

        <value>hdfs://localhost:9000</value>

        <description>此参数指定了HRegion服务器的位置,即数据存放位置</description>

      </property>

      <property>

        <name>dfs.replication</name>

        <value>1</value>

        <description>此参数指定了Hlog和Hfile的副本个数,此参数的设置不能大于HDFS的节点数。伪分布式下DataNode只有一台,因此此参数应设置为1</description>

      </property>

      <property>

        <name>hbase.cluster.distributed</name>

        <value>true</value>

      </property>

    </configuration>

5.3 启动HBase

    先要启动HDFS

    然后启动HBase:

        cd ~/Hadoop/hbase-1.3.1/

        bin/start-hbase.sh

    jps查看运行情况:

     32161 ResourceManager

     31954 DataNode

     33458 HRegionServer

     33523 Jps

     31876 NameNode

     32052 SecondaryNameNode

     32244 NodeManager

     33303 HQuorumPeer

     33352 HMaster


    OK!


6、zookeeper配置(伪分布下似乎没什么用)

6.1 配置ZK_HOME环境变量

    vi ~/.profile

    添加如下内容:

    export ZK_HOME="/Users/hwg/Hadoop/zookeeper-3.4.9"

    export PATH=${ZK_HOME}/bin:${JAVA_HOME}:${PATH}

    

    source ~/.profile使之立即生效

    测试:

    zkServer.sh status

    显示:

    ZooKeeper JMX enabled by default

    Using config: /Users/hwg/Hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg

    grep: /Users/hwg/Hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg: No such file or directory

    mkdir: : No such file or directory

    grep: /Users/hwg/Hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg: No such file or directory

    grep: /Users/hwg/Hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg: No such file or directory

    Error contacting service. It is probably not running.


    OK!正常,接着配置

6.2 zoo.cfg 配置

   http://blog.csdn.net/u011523533/article/details/48626199

   这篇文章写得很详细,并且按此步骤来可以配置成功


7、Mahout配置

7.1 添加环境变量

    vi ~/.profile

    添加内容

    export MAHOUT_HOME="/Users/hwg/Hadoop/apache-mahout-distribution-0.13.0"

    export MAHOUT_CONF_DIR="/Users/hwg/Hadoop/apache-mahout-distribution-0.13.0/conf"

    export PATH=${MAHOUT_HOME}/bin:${MAHOUT_CONF_DIR}:$PATH

    

    source ~/.profile  使之立即生效

7.2 运行mahout

    mahout

    显示:

    

    MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.

    Running on hadoop, using /Users/hwg/Hadoop/hadoop-2.8.1/bin/hadoop HADOOP_CONF_DIR=/Users/hwg/Hadoop/hadoop-2.8.1/etc/hadoop

MAHOUT-JOB: /Users/hwg/Hadoop/apache-mahout-distribution-0.13.0/mahout-examples-0.13.0-job.jar. 

An example program must be given as the first argument.

Valid program names are:


     没有问题,其中MAHOUT_LOCAL is not set;正常。


8、Flume安装

8.1 添加环境变量

    vi ~/.profile

    添加如下内容:

    export FLUME_HOME="/Users/hwg/Hadoop/apache-flume-1.7.0-bin"

    export PATH=${FLUME_HOME}/bin:${PATH}


    source ~/.profile使之立即生效

8.2 flume-env.sh配置

    cd $FLUME_HOME

    cp flume-env.sh.template flume-env.sh   
    vi flume-env.sh

    添加如下内容:

    export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home"

    export HADOOP_HOME="/Users/hwg/Hadoop/hadoop-2.8.1"

8.3 版本验证

     flume-ng version

        这里我遇到了问题:

          错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty

    原因:安装hbase即出现该错误

    解决:注释掉$HIVE_HOME/conf/hbase-env.sh中

         export HBASE_CLASSPATH=/Users/hwg/Hadoop/hbase-1.3.1/conf

         这一行

    再次版本验证:

    显示:

    Flume 1.7.0

    Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

    Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707

    Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016

    From source with checksum 0d21b3ffdc55a07e1d08875872c00523

    

    OK!


9、Spark安装配置

9.1 添加环境变量

    export SPARK_HOME="/Users/hwg/Hadoop/spark-2.2.0-bin-hadoop2.7"

    export PATH=${SPARK_HOME}/bin:${PATH}


9.2 slaves配置

    cd $SPARK_HOME

    cp slaves.template slaves

   

9.3 spark-env.sh配置

    cd $SPARK_HOME/conf/

    cp spark-env.sh.template spark-env.sh

    vi spark-env.sh

    添加如下内容:

    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

    export HADOOP_HOME=/Users/hwg/Hadoop/hadoop-2.8.1

    export SCALA_HOME=/Users/hwg/scala-2.12.2

    export HADOOP_CONF_DIR=/Users/hwg/Hadoop/hadoop-2.8.1/etc/hadoop

    export SPARK_MASTER_IP=localhost

    export SPARK_WORKER_MEMORY=512M


9.4 Spark启动

    前提:Hadoop伪分布已启动

    $SPARK_HOME/sbin/start-all.sh

    $SPARK_HOME/bin/spark-shell

    

    这里我遇到了一个问题:

    org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x;

           解决方法:

     hadoop fs -chmod 777 /tmp/hive

     再次启动spark-shell:

     

     Setting default log level to "WARN".

     To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

     17/09/07 21:38:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your      platform... using builtin-java classes where applicable

     17/09/07 21:38:08 WARN metastore.ObjectStore: Failed to get database global_temp, returning      NoSuchObjectException

     Spark context Web UI available at http://127.0.0.1:4040

     Spark context available as 'sc' (master = local[*], app id = local-1504791481406).

     Spark session available as 'spark'.

     Welcome to

            ____              __

          / __/__  ___ _____/ /__

         _\ \/ _ \/ _ `/ __/  '_/

        /___/ .__/\_,_/_/ /_/\_\   version 2.2.0

           /_/

         

     Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)

     Type in expressions to have them evaluated.

     Type :help for more information.


     OK!
9.5 Hive On Spark相关问题
      使用Hive On Spark,需要讲Hive配置文件拷贝至Spark配置目录下
      cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf

      然而,完成拷贝之后运行$SPARK_HOME/bin/spark-shell之后,报错:
       javax.jdo.JDOFatalInternalException: Error creating transactional connection factory

at

   ... ...

   NestedThrowablesStackTrace:

   java.lang.reflect.InvocationTargetException

   ... ...

   Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin    to create a ConnectionPool gave an error : The specified datastore driver 

   ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH 

   specification, and the name of the driver.

   ... ... 

   Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The 

   specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please 

   check your CLASSPATH specification, and the name of the driver.

   ... ...

   

   解决方法:

   vi $HIVE_HOME/conf/hive-site.xml

   修改hive.metastore.uris配置项

       <name>hive.metastore.uris</name>
       <value>thrift://namenode1:9083</value>
 

   将修改后的hive-site重新拷贝至$SPARK_HOME/conf/下

   rm $SPARK_HOME/conf/hive-site.xml

   cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/

   

   再次运行$SPARK_HOME/bin/spark-shell

   不再报错

   测试: 

   scala> spark.sql("select * from student").show()     注:student表为我在hive中创建的

   显示:  

   +-----+---+-----+

   | name|age|score| 

   +-----+---+-----+

   | John| 20| 88.0|

   |Marry| 21| 93.0|

   |  Pet| 22| 78.0|

   |  Tom| 22| 89.0|

   | Judy| 22| 90.0|

   | Andy| 24| 91.0|

   +-----+---+-----+

   OK!

10、Sqoop2安装配置

10.1 配置Hadoop代理访问

    vi $HADOOP_HOME/etc/hadoop/core-site.xml

    添加如下内容:

    

    <property>
      <name>hadoop.proxyuser.$SERVER_USER.hosts</name>
      <value>hwg</value>
    </property>
    <property>
      <name>hadoop.proxyuser.$SERVER_USER.groups</name>
      <value>*</value>
    </property>

10.2 添加环境变量

    vi ~/.profile

    添加一下内容:

    export SQOOP_HOME="/Users/hwg/Hadoop/sqoop-1.99.7-bin-hadoop200"

    export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra  #需要将mysql驱动拷贝到该目录下

    

10.3 SQOOP服务器配置

    配置$SQOOP_HOME/conf/sqoop_bootstrap.properties:

    使用默认值即可

    

    配置$SQOOP_HOME/conf/sqoop.properties   

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/Users/hwg/Hadoop/hadoop-2.8.1/etc/hadoop

org.apache.sqoop.security.authentication.type=SIMPLE

org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler

org.apache.sqoop.security.authentication.anonymous=true


10.4 验证安装成功

    $SQOOP_HOME/bin/sqoop2-tool verify

    显示:

    

Sqoop home directory: /Users/hwg/Hadoop/sqoop-1.99.7-bin-hadoop200

Sqoop tool executor:

Version: 1.99.7

Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine

Running tool: class org.apache.sqoop.tools.tool.VerifyTool

0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.

9    [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/Users/hwg/Hadoop/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/Users/hwg/Hadoop/apache-hive-2.3.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

Verification was successful.

Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.


OK!

 

      



















  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值