一、节点部署情况
1、cdh4.2.1 HA安装节点:
ip | hosts | 进程 | 备注 |
---|
10.32.71.18 | master1.jnhadoop.com | NameNode JournalNode ZooKeeper HdfsZkfc | 默认NN启动为active |
10.32.71.19 | master2.jnhadoop.com | JobTracker JournalNode ZooKeeper MrZkfc | 默认JT启动为active |
10.32.71.20 | master3.jnhadoop.com | NameNode JobTracker JournalNode ZooKeeper HdfsZkfc MrZkfc | 默认启动NN和JT为standby |
10.32.71.21 | 71.21.jnhadoop.com | TaskTracker DataNode | |
10.32.71.22 | 71.22.jnhadoop.com | TaskTracker DataNode | |
10.32.71.23 | 71.23.jnhadoop.com | TaskTracker DataNode | |
10.32.71.24 | 71.24.jnhadoop.com | TaskTracker DataNode | |
10.32.71.25 | 71.25.jnhadoop.com | TaskTracker DataNode | |
10.32.71.26 | 71.26.jnhadoop.com | TaskTracker DataNode | |
10.32.71.27 | 71.27.jnhadoop.com | TaskTracker DataNode | |
10.32.71.28 | 71.28.jnhadoop.com | TaskTracker DataNode | |
10.32.71.29 | 71.29.jnhadoop.com | TaskTracker DataNode | |
10.32.71.31 | 71.31.jnhadoop.com | TaskTracker DataNode | Client(aalog/hljlog) |
2、hive0.10安装节点:
ip | hosts | 进程 | 备注 |
---|
10.32.71.45 | | hive-mysql | |
10.32.71.18 | master1.jnhadoop.com | hive-MetaStore | |
10.32.71.31 | 71.31.jnhadoop.com | hive-client | |
3、impala1.0.1安装节点:
ip | hosts | 进程 | 备注 |
---|
10.32.71.19 | master1.jnhadoop.com | state-store | 元数据存储节点 |
10.32.71.21 | 71.21.jnhadoop.com | impalad | |
10.32.71.22 | 71.22.jnhadoop.com | impalad | |
10.32.71.23 | 71.23.jnhadoop.com | impalad | |
10.32.71.24 | 71.24.jnhadoop.com | impalad | |
10.32.71.25 | 71.25.jnhadoop.com | impalad | |
10.32.71.26 | 71.26.jnhadoop.com | impalad | |
10.32.71.27 | 71.27.jnhadoop.com | impalad | |
10.32.71.28 | 71.28.jnhadoop.com | impalad | |
10.32.71.29 | 71.29.jnhadoop.com | impalad | |
10.32.71.31 | 71.31.jnhadoop.com | impalad | impala-client |
二、cdh4u2 HA安装
见官方安装文档:
http://www.cloudera.com/content/support/en/documentation/cdh4-documentation/cdh4-documentation-v4-2-1.html
三、mysql安装
在10.32.71.45节点安装:
1、yum方式安装mysql
2、启动服务
4、设置root账号密码
执行:/usr/bin/mysql_secure_installation 提示: [...] Enter current password for root (enter for none): OK, successfully used password, moving on... [...] Set root password? [Y/n] y New password: Re-enter new password: Remove anonymous users? [Y/n] Y [...] Disallow root login remotely? [Y/n] N [...] Remove test database and access to it [Y/n] Y [...] Reload privilege tables now? [Y/n] Y All done! |
5、设置mysql开机启动:
执行:
/sbin/chkconfig mysqld on |
6、创建hive账号: hive/******
以root用户登录mysql,执行:
mysql -u root -p CREATE USER hive@localhost IDENTIFIED BY '******'; |
7、刷新权限
8、创建数据库
create database hive_impala; |
9、给hive账号赋予权限
grant all privileges on hive_impala.* to hive identified by '******' with grant option; |
10、刷新权限
四、hive 安装
1、MetaStore安装
(1)、在10.32.71.18上面执行:
sudo yum install hive hive-metastore |
说明:以前的版本hive安装的时候MetaStore和client都是安装在一起的,现在推荐将MetaStore和clien分开安装
(2)、下载mysql-connector-java.jar,然后将mysql-connector-java.jar拷贝到hive的lib目录下面
(3)、修改配置文件/etc/hive/conf/hive-site.xml
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://10.32.71.45:3306/hive_impala?createDatabaseIfNotExist=true</value> <description>the URL of the MySQL database</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>******</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>datanucleus.autoCreateTables</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateColumns</name> <value>false</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>false</value> </property> |
(4)、启动metastore服务:
sudo service hive-metastore start |
2、client安装
(1)、在10.32.71.31上面执行:
(2)、修改配置文件/etc/hive/conf/hive-site.xml
<property> <name>hive.metastore.local</name> <value>false</value> </property> <property> <name>hive.metastore.client.socket.timeout</name> <value>3600</value> <description>MetaStore Client socket timeout in seconds</description> </property> <property> <name>hive.metastore.uris</name> <value>thrift://10.32.71.18:9083</value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property> |
说明:
(1)、如果不使用default数据库而想新建数据库, 则必须按配置hive.metastore.warehouse.dir=/user/hive/warehouse在hdfs上面新建目录/user/hive/warehouse
(2)、如果使用了default数据库,但是在建表的时候没有指定LOCATION ,表会自动在/user/hive/warehouse下生成,这个时候也需要在hdfs上面建目录/user/hive/warehouse
(3)、如果使用了default数据库,在建表的时候指定LOCATION到当前用户的可访问目录下面,那么就不需要建目录/user/hive/warehouse
(4)、还没有测试当hive.metastore.warehouse.dir指定为当前用户可访问的目录的时候,不使用default数据库或者建表的时候不指定LOCATION会有什么问题。
五、impala安装
1、在要安装impala的节点上执行:
sudo yum install impala # Binaries for daemons sudo yum install impala-server # Service start/stop script sudo yum install impala-state-store # Service start/stop script |
2、修改各个datanode节点hdfs-site.xml配置文件
开启dfs.client.read.shortcircuit
<property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hadoop-hdfs/dn._PORT</value> </property> <property> <name>dfs.client.file-block-storage-locations.timeout</name> <value>3000</value> </property> |
开启本地数据block跟踪:
<property> <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> <value>true</value> </property> |
说明:
(1)、修改完成hdfs-site.xml之后,分发到其他datanode节点上/etc/hadoop/conf/
(2)、分发完成之后重启所有的datanode节点
3、修改impala启动参数
在state-store节点上修改启动配置文件/etc/default/impala,修改state-store的IP地址和impalad内存使用限制
IMPALA_STATE_STORE_HOST=10.32.71.19 IMPALA_STATE_STORE_PORT=24000 IMPALA_BACKEND_PORT=22000 IMPALA_LOG_DIR=/var/log/impala IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}" IMPALA_SERVER_ARGS=" \ -log_dir=${IMPALA_LOG_DIR} \ -state_store_port=${IMPALA_STATE_STORE_PORT} \ -use_statestore \ -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT} \ -mem_limit=70%" ENABLE_CORE_DUMPS=false # LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib # MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar # IMPALA_BIN=/usr/lib/impala/sbin # IMPALA_HOME=/usr/lib/impala # HIVE_HOME=/usr/lib/hive # HBASE_HOME=/usr/lib/hbase # IMPALA_CONF_DIR=/etc/impala/conf # HADOOP_CONF_DIR=/etc/impala/conf # HIVE_CONF_DIR=/etc/impala/conf # HBASE_CONF_DIR=/etc/impala/conf |
4、分发配置文件到各个impala安装节点,目录为/etc/impala/conf/,配置文件包括:
(1)、hdfs-site.xml:开启了dfs.client.read.shortcircuit和dfs.datanode.hdfs-blocks-metadata.enabled
(2)、core-site.xml:hdfs-site.xml和core-site.xml文件直接从hadoop安装配置文件copy。
(3)、hive-site.xml:只需要hive client的配置文件,也即包含meta-store服务地址的配置文件
(4)、impala:修改了启动参数state_store进程的IP和impalad最大使用内存
5、设置JAVA_HOME
在所有impala安装节点/etc/default/bigtop-utils中设置JAVA_HOME
export JAVA_HOME=/opt/apps/jdk export CLASSPATH=./:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar export PATH=$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH |
6、启动impala集群
在meta-store节点(10.32.71.19)执行:
service impala-state-store start |
在impalad进程节点上执行:
service impala-server start |
7、安装impala的client
在10.32.71.31上面执行
yum -y install impala-shell |
安装成功后切换到hljlog用户执行:
报错:
[@71.31.jnhadoop.com ~]$ impala-shell Traceback (most recent call last): File "/usr/lib/impala-shell/impala_shell.py", line 30, in ? from shell_output import OutputStream, DelimitedOutputFormatter, PrettyOutputFormatter File "/usr/lib/impala-shell/lib/shell_output.py", line 21, in ? csv.field_size_limit(sys.maxint) AttributeError: 'module' object has no attribute 'field_size_limit' |
解决方案见:
https://issues.cloudera.org/browse/IMPALA-396如下说明: Ishaan Joshi added a comment - 07/Jun/13 10:31 PM Turns out that field_size_limit only applies to the csv reader. The writer seemingly has no limit set. I've removed the line altogether. I ran some experiments to verify that we can write very large fields [upto 1GB], and it worked fine. commit 4daf1a2b878287aa0704eae5ee4b4b919a3d0099 Author: ishaan <ishaan@cloudera.com> Date: Wed Jun 5 16:04:13 2013 -0700 Remove csv.field_size_limit from the shell. 也即注释掉 /usr/lib/impala-shell/lib/shell_output.py文件中的 csv.field_size_limit(sys.maxint) |