ubuntu搭建Hadoop+spark+mysql+hive伪分布学习环境

21 篇文章 1 订阅
5 篇文章 0 订阅

按步骤走就行:

(1)raini@biyuzhe:~$ gedit .bashrc

#java
export JAVA_HOME=/home/raini/app/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$JRE_HOME/bin:$PATH

#scala
export SCALA_HOME=/home/raini/app/scala-2.10.6
export PATH=${SCALA_HOME}/bin:$PATH

#spark
export SPARK_HOME=/home/raini/spark1
export PATH=$PATH:$SPARK_HOME/bin:$PATH

# hadoop2.6
export HADOOP_PREFIX=/home/raini/hadoop2
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"
export HADOOP_PREFIX PATH CLASSPATH


(2)raini@biyuzhe:~$ sudo apt-get install rsync

(3)raini@biyuzhe:~$ sudo apt-get install openssh-server

cd ~/.ssh/   # 若没有该目录,请先执行一次ssh localhost
ssh-keygen -t rsa   # 会有提示,都按回车就可以
cat id_rsa.pub >> authorized_keys  # 加入授权
使用ssh localhost试试能否直接登录

(4)raini@biyuzhe:~$ sudo gedit /etc/hosts

127.0.0.1    localhost
127.0.1.1    biyuzhe
#10.155.243.206  biyuzhe
#有的说这里必须修改,否则后面会遇到连接拒绝等问题

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

(5)修改配置文件:etc/hadoop/hadoop-env.sh

    export JAVA_HOME=/home/raini/app/jdk
    export HADOOP_COMMON_HOME=/home/raini/hadoop

(6)raini@biyuzhe:~$ gedit .bashrc

    添加export PATH="/home/raini/hadoop/bin:/home/raini/hadoop/sbin:
    如  export PATH="/home/raini/hadoop/bin:/home/raini/hadoop/sbin:   $JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"

(7)修改文件etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/raini/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
  <property>
      <name>io.file.buffer.size</name>
      <value>131072</value>
    </property>
   <property>
       <name>hadoop.proxyuser.master.hosts</name>
        <value>*</value>
   </property>
   <property>
       <name>hadoop.proxyuser.master.groups</name>
       <value>*</value>
   </property>
</configuration>


(8)修改etc/hadoop/hdfs-site.xml:

<configuration>
  <!-- spark 配置主节点名 和 端口号
      <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>localhost:9001</value>
     </property>-->
  <!-- 虽然只需要配置 fs.defaultFS 和 dfs.replication 就可以运行(官方教程如此),
       不过若没有配置 hadoop.tmp.dir 参数,则默认使用的临时目录为 /tmp/hadoo-hadoop,
       由于系统重启后,找不到namenode进程,这是因为这个目录在重启时有可能被系统清理掉,而导致必须重新执行 format 才行,所以加入下面配置。 -->
     <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home/raini/hadoop/tmp/dfs/namenode</value>
     </property>
     <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home/raini/hadoop/tmp/dfs/datanode</value>
     </property>

  <!-- 配置副本数 -->
     <property>
            <name>dfs.replication</name>
            <value>1</value>
     </property>
  <!-- 不为ture就不能使用 webhdfs 的 LISTSTATUS(liststatus),list-filestatus等需要列出文件 文件状态的命令。因为这些信息都是由 namenode 保存的  -->
     <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
     </property>
</configuration>


(9)修改配置文件mapred-site.xml

<configuration>

    <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
    </property>
<!--
    <property>
          <name>mapred.job.tracker</name>
          <value>localhost:9001</value>  旧的吧?
    </property>
-->
    <property>
          <name>mapreduce.jobhistory.address</name>
          <value>localhost:10020</value>
    </property>

     <property>
          <name>mapreduce.jobhistory.webapp.address</name>
          <value>localhost:19888</value>
     </property>

</configuration>


(10)修改配置文件yarn-site.xml

<configuration>  

<!-- Site specific YARN configuration properties-->

    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>

    <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
<!--resourcemanager 的 地址 -->
    <property>
       <name>yarn.resourcemanager.address</name>
       <value>localhost:8032</value>
    </property>
<!--resourcemanager 调度器端口 -->
    <property>
         <name>yarn.resourcemanager.scheduler.address</name>
         <value>localhost:8030</value>
    </property>

    <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>localhost:8031</value>
    </property>
<!--resourcemanager 管理器端口-->
     <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>localhost:8033</value>
    </property>
<!--resourcemanager 的 Web 端口,监控 job 的资源调度 -->
    <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>localhost:8088</value>
    </property>

</configuration>


(11)    raini@biyuzhe:~$ source .bashrc
    
    raini@biyuzhe:~/hadoop$ sbin/start-dfs.sh

Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-namenode-biyuzhe.out
biyuzhe: starting datanode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-datanode-biyuzhe.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:7Th7Qu6av5WOqmmVLemv3YN+52LAcHw4BuFBNwBt5DU.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-secondarynamenode-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
14242 Jps
14106 SecondaryNameNode
13922 DataNode------------------(无namenode)


(12) raini@biyuzhe:~/hadoop$ hdfs namenode -format

raini@biyuzhe:~/hadoop$ sbin/stop-dfs.sh
Stopping namenodes on [localhost]
localhost: no namenode to stop
biyuzhe: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode

raini@biyuzhe:~/hadoop$ sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-namenode-biyuzhe.out
biyuzhe: starting datanode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-datanode-biyuzhe.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-secondarynamenode-biyuzhe.out

raini@biyuzhe:~/hadoop$ jps
14919 NameNode-----------------------(namenode)
15407 Jps
15271 SecondaryNameNode
15073 DataNode


(13)raini@biyuzhe:~/hadoop$ sbin/start-yarn.sh  

starting yarn daemons
starting resourcemanager, logging to /home/raini/hadoop/logs/yarn-raini-resourcemanager-biyuzhe.out
biyuzhe: starting nodemanager, logging to /home/raini/app/hadoop-2.7.2/logs/yarn-raini-nodemanager-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
15625 NodeManager
14919 NameNode
15271 SecondaryNameNode
15073 DataNode
15937 Jps
15501 ResourceManager


(14)验证: yarn:http://localhost:8088/

    hadoop:  http://localhost:50070
        
 

Overview 'localhost:9000' (active)

Started:Sat Apr 23 14:04:17 CST 2016
Version:2.7.2, rb165c4fe8a74265c792ce23f546c64604acf0e41
Compiled:2016-01-26T00:08Z by jenkins from (detached from b165c4f)
Cluster ID:CID-b0ad8d51-6ea3-4bfc-a1d8-ee0cbc9a8ff6
Block Pool ID:BP-890697487-127.0.1.1-1461391390144

--------------------------------spark安装

(2)配置Spark环境变量

export SPARK_HOME=/home/raini/spark
export PATH=${SPARK_HOME}/bin:$PATH

(3)配置spark-env.sh
export JAVA_HOME=/home/raini/app/jdk
export SCALA_HOME=/home/raini/app/scala
export SPARK_WORKER_MEMORY=4g
export SPARK_MASTER=biyuzhe
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI=8099
export SPARK_WORKER_CORES=2
export HADOOP_CONF_DIR=/home/raini/hadoop/etc/hadoop

(4)cp slaves.template slaves

#localhost
biyuzhe

 

spark/sbin/start-all.sh 启动

------------------------mysql----Hive2.0.0安装

1)mysql安装

$sudo apt-get install mysql-server

登录mysql:$mysql -u root -p

建立数据库hive:mysql>create database hive;

                mysql>show databases;//查看创建;

这里一定要把hive数据库的字符集修改为latin1,而且一定要在hive初次启动的时候就修改字符集 (否则就等着删除操作的时候死掉吧)
                 mysql>alter database hive character set latin1;

创建hive用户,并授权:mysql>grant all on hive.* to hive@'%'  identified by 'hive';

(法二: DROP USER 'hive'@'%';

         mysql> create user 'hive'@'%' identified by 'hive';
       赋予权限  grant all privileges on *.* to 'hive'@'%' with grant option;

     )

更新:mysql>flush privileges;

 查询mysql的版本:mysql>select version();//这里是5.7.11-0ubuntu6

下载mysql的JDBC驱动包: http://dev.mysql.com/downloads/connector/j/

下载mysql-connector-java-5.1.38.tar.gz ,复制msyql的JDBC驱动包到Hive的lib目录下。

2)Hive安装

官网http://hive.apache.org/下载apache-hive-2.0.0-bin.tar.gz并解压在home/hp路径下。

环境配置

添加如下:
#Hive
export HIVE_HOME=/home/raini/app/hive-2.0.0
export PATH=$PATH:${HIVE_HOME}/bin
export CLASSPATH=$CLASSPATH.:{HIVE_HOME}/lib

配置hive-env.sh文件

复制hive-env.sh.template,修改hive-env.sh文件

指定HADOOP_HOME及HIVE_CONF_DIR的路径如下:

HADOOP_HOME=/home/。。/hadoop

export HIVE_CONF_DIR=/home/。。/hive/conf

# export HADOOP_HEAPSIZE=512

# 含有额外的图书馆为蜂巢编译/执行必需的文件夹
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/raini/app/hive-2.0.0/lib


4)配置hive-site.xml文件

      Hive uses Hadoop, so:
 
    you must have Hadoop in your path OR
    export HADOOP_HOME=<hadoop-install-dir>
 
In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w in HDFS before you can create a table in Hive.
 
Commands to perform this setup(需要给755权限):

raini@biyuzhe:~$ hadoop fs -mkdir -p  /user/hive/tmp
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/log
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/warehouse
raini@biyuzhe:~$ hadoop fs -chmod g+w   /user/hive/tmp

raini@biyuzhe:~$ hadoop fs -chmod g+w   /user/hive/log
raini@biyuzhe:~$ hadoop fs -chmod g+w   /user/hive/warehouse    /usr/hive/tmp

You may find it useful, though it's not necessary, to set HIVE_HOME:
 
  $ export HIVE_HOME=<hive-install-dir>
    export HIVE_HOME=/home/raini/app/hive

$ sudo /etc/init.d/mysql status

hive/bin下要有->mysql-connector-java-5.1.38-bin.jar

5)hive配置-设置hive数据将元数据存储在MySQL中 , hive需要将元数据存储在RDBMS中,默认情况下,配置为Derby数据库

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
   <name>hive.metastore.local</name>
   <value>true</value>
  <description>使用本机mysql服务器存储元数据。这种存储方式需要在本地运行一个mysql服务器</description>
</property>

<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <!--不能用biyuzhe -->
   <description> biyuzhe使用的数据库?charcherEncoding=UTF-8</description>
</property>

<property>
   <name>javax.jdo.option.ConnectionDriverName</name>
   <value>com.mysql.jdbc.Driver</value>
   <description>使用的链接方式</description>
</property>

<property>
   <name>javax.jdo.option.ConnectionUserName</name>
   <value>hive</value>
   <description>mysql用户名</description>
</property>

<property>
   <name>javax.jdo.option.ConnectionPassword</name>
   <value>hive</value>
</property>
<!--
<property>
     <name>hive.metastore.uris</name>
     <value>thrift://localhost:9083</value>
     <description>uri1,uri2,...该参数Hive中metastore(元数据存储)采用remote方式,非Local方式。jdbc/odbc connection hive,if mysql must set </description>  
</property>
-->
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>元数据存放的地放,需要在本地(不是hdfs中)新建这个目录</description>
</property>

<property>
    <name>hive.exec.scratdir</name>
    <value>/user/hive/tmp</value>
    <description> hive的数据临时文件目录,需要在本地新建这个目录HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

<property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log</value>
    <description>这个是用于存放hive相关日志的目录</description>
</property>

<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
</property>

</configuration>

-------------------------------------------finish hive-site.xml

cp hive-log4j.properties.template  hive-log4j.proprties

vi hive-log4j.properties

hive.log.dir=

这个是当hive运行时,相应的日志文档存储到什么地方

(mine:hive.log.dir=/usr/hive/log/${user.name})

hive.log.file=hive.log

这个是hive日志文件的名字是什么

默认的就可以,只要您能认出是日志就好,

只有一个比较重要的需要修改一下,否则会报错。

log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

如果没有修改的话会出现:

WARNING: org.apache.hadoop.metrics.EventCounter is deprecated.

please use org.apache.hadoop.log.metrics.EventCounter  in all the  log4j.properties files.

(只要按照警告提示修改即可)。

-------------------------------------------------------finish all

hive metastore 服务端启动命令:
hive --service metastore -p <port_num>

raini@biyuzhe:~/app/hive/tmp$ hive --service metastore > /tmp/hive_metastore.log 2>&1 &
[1] 26856

这里Hive中metastore(元数据存储)采用Local方式,非remote方式。

报错:
    Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)


第一次需执行初始化命令$raini@biyuzhe:~$ schematool -dbTypemysql –initSchema
raini@biyuzhe:~$ schematool -initSchema -dbType mysql -userName=hive -passWord=hive



查看初始化后信息$ schematool -dbType mysql –info

启动Hadoop服务:$sbin/start-dfs.sh和$sbin/start-yarn.sh

 

启动raini@biyuzhe:~/app$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/raini/app/hive2.0.0/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/hive2.0.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/home/raini/app/hive2.0.0/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive (default)> show databases;
OK
default
Time taken: 1.017 seconds, Fetched: 1 row(s)
hive (default)>

hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';

报错FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)


开启metastore

raini@biyuzhe:~/app$ hive --service metastore
Starting Hive Metastore Server


hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';
OK
Time taken: 1.613 seconds

可以看到mysql中的元数据信息:

raini@biyuzhe:~$ mysql -u hive -p

mysql> select* from TBLS
    -> ;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
|     41 |  1461469991 |     1 |                0 | raini |         0 |    41 | test     | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)

 

Hadoop中查看生成的文件:

raini@biyuzhe:~$ hdfs dfs -ls /user/hive/warehouse/
Found 1 items
drwxrwxrwx   - raini supergroup          0 2016-04-24 11:53 /user/hive/warehouse/test

 

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
HadoopHiveSpark和Zookeeper都是大数据技术栈中重要的组件。 Hadoop是一个由Apache开源的分布式文件系统和计算框架。它能够将大规模的数据分散存储在千台、万台、甚至更多的服务器上,并且实现数据的高效处理和分析。 Hive是在Hadoop之上构建的数据仓库基础设施。它提供了一个类似于SQL的查询语言,可以让用户通过简单的查询语句对存储在Hadoop集群中的数据进行操作和分析。Hive可以将结构化和半结构化的数据映射为一张表格,并提供了诸如过滤、连接、聚合等功能。 Spark是一个快速的、通用的集群计算系统。它提供了分布式数据处理的能力,采用了内存计算方式,相比于Hadoop MapReduce带来了更高的性能和更广泛的应用场景。Spark支持多种编程语言和丰富的组件库,如Spark SQL、Spark Streaming等,可以用于数据处理、机器学习、图计算等任务。 Zookeeper是一个开源的分布式协调服务。它实现了一个分布式的、高可用的、高性能的协调系统,可以提供诸如统一命名服务、配置管理、分布式锁等功能。Zookeeper能够帮助大数据系统中的不同组件进行协调和通信,确保分布式系统的一致性和可靠性。 综上所述,Hadoop用于分布式存储和计算,Hive用于数据仓库和查询分析,Spark用于高性能的集群计算,而Zookeeper用于分布式协调和通信。这四个技术在大数据领域中扮演着不可替代的角色,相辅相成,共同构建了现代大数据生态系统。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值