hive调优fetch模式抓取，join表测试空key自动转换random，mapreduce动态分区设置

最新推荐文章于 2024-06-26 22:00:51 发布

cevent

最新推荐文章于 2024-06-26 22:00:51 发布

阅读量340

点赞数

分类专栏： hadoop zookeeper hive 文章标签：数据库 hive hadoop mysql hdfs

本文链接：https://blog.csdn.net/weixin_37056888/article/details/106652954

版权

hadoop 同时被 3 个专栏收录

40 篇文章 0 订阅

订阅专栏

zookeeper

23 篇文章 0 订阅

订阅专栏

hive

15 篇文章 0 订阅

订阅专栏

1.fetch配置




 
  
  [cevent@hadoop207 ~]$ cd /opt/module/hive-1.2.1/
  [cevent@hadoop207 hive-1.2.1]$ ll
  总用量 524
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 bin
  drwxrwxr-x. 2 cevent cevent   4096 5月   9 18:40 conf
  -rw-rw-r--. 1 cevent cevent  20403 6月   5 13:48 derby.log
  drwxrwxr-x. 4 cevent cevent   4096 4月  30 15:59 examples
  drwxrwxr-x. 7 cevent cevent   4096 4月  30 15:59 hcatalog
  -rw-rw-r--. 1 cevent cevent     23 5月   9 13:37 hive01.sql
  drwxrwxr-x. 4 cevent cevent   4096 5月   7 13:51 lib
  -rw-rw-r--. 1 cevent cevent  24754 4月  30 2015 LICENSE
  drwxrwxr-x. 2 cevent cevent   4096 6月   7 13:22 logs
  drwxrwxr-x. 5 cevent cevent   4096 4月  30 16:44 metastore_db
  -rw-rw-r--. 1 cevent cevent    397 6月  19 2015 NOTICE
  -rw-rw-r--. 1 cevent cevent   4366 6月  19 2015 README.txt
  -rw-rw-r--. 1 cevent cevent 421129 6月  19 2015 RELEASE_NOTES.txt
  -rw-rw-r--. 1 cevent cevent     11 5月   9 13:27 result.txt
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 scripts
  -rw-rw-r--. 1 cevent cevent    171 5月   9 13:24 server.log
  -rw-rw-r--. 1 cevent cevent      5 5月   8 14:05 server.pid
  -rw-rw-r--. 1 cevent cevent    326 5月  22 13:34 student.txt
  [cevent@hadoop207 hive-1.2.1]$ bin/beeline 
  Beeline version 1.2.1 by Apache Hive
  beeline> !connect
  jdbc:hive2://hadoop207.cevent.com:10000
  Connecting to jdbc:hive2://hadoop207.cevent.com:10000
  Enter username for
  jdbc:hive2://hadoop207.cevent.com:10000: cevent
  Enter password for
  jdbc:hive2://hadoop207.cevent.com:10000: ******
  Connected to: Apache Hive (version 1.2.1)
  Driver: Hive JDBC (version 1.2.1)
  Transaction isolation:
  TRANSACTION_REPEATABLE_READ
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  show databases;
  +----------------+--+
  | database_name  |
  +----------------+--+
  | cevent01       |
  | default        |
  +----------------+--+
  2 rows selected (1.695 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> use
  cevent01;
  No rows affected (0.073 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> 
  set
  hive.fetch.task.conversion=none; 将fetch转换为none，开启mapreduce模式
  No rows affected (0.03 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select * from c_emp;
  INFO 
  : Number of reduce tasks is set to 0 since there's no reduce operator
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1591539871344_0001
  INFO 
  : The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1591539871344_0001/
  INFO 
  : Starting Job = job_1591539871344_0001, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591539871344_0001/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1591539871344_0001
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 0
  INFO 
  : 2020-06-07 22:30:41,881 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-06-07 22:30:53,412 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.11 sec
  INFO 
  : MapReduce Total cumulative CPU time: 1 seconds 110 msec
  INFO 
  : Ended Job = job_1591539871344_0001
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      | 800.0         | NULL           | 20           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           | 20           |
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7844         | TURNER       | SALESMAN    | 7698           | 1981-9-8        | 1500.0        | 0.0            | 30           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (30.546 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.fetch.task.conversion=more;
   将fetch模式设置为more不开启mapreduce，效率更高
  No rows affected (0.006 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select * from c_emp;
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      | 800.0         | NULL           | 20           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           |
  20           |
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7844         | TURNER       | SALESMAN    | 7698           | 1981-9-8        | 1500.0        | 0.0 
            | 30           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (0.513 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  set
  hive.exec.mode.local.auto=true; 开启本地模式，执行mapreduce，但效率较fetch none更高，谨慎使用本地模式
  No rows affected (0.005 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select count(*) from c_emp;
  INFO 
  : Number of reduce tasks determined at compile time: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_local1010738823_0001
  INFO 
  : The url to track the job: http://localhost:8080/
  INFO 
  : Job running in-process (local Hadoop)
  INFO 
  : 2020-06-07 22:33:37,811 Stage-1 map = 100%,  reduce = 100%
  INFO 
  : Ended Job = job_local1010738823_0001
  +------+--+
  | _c0 
  |
  +------+--+
  | 15  
  |
  +------+--+
  1 row selected (1.613 seconds)

2.元数据准备

表元数据
表元数据2

3.修改mapreduce默认hive开启的数量




 
  
  [cevent@hadoop207 module]$ cd hive-1.2.1/
  [cevent@hadoop207 hive-1.2.1]$ cd conf/
  [cevent@hadoop207 conf]$ ll
  总用量 192
  -rw-rw-r--. 1 cevent cevent   1139 4月  30 2015
  beeline-log4j.properties.template
  -rw-rw-r--. 1 cevent cevent 168431 6月  19 2015 hive-default.xml.template
  -rw-rw-r--. 1 cevent cevent   2464 4月  30 16:06 hive-env.sh
  -rw-rw-r--. 1 cevent cevent   2662 4月  30 2015
  hive-exec-log4j.properties.template
  -rw-rw-r--. 1 cevent cevent   3093 5月   9 18:12
  hive-log4j.properties
  -rw-rw-r--. 1 cevent cevent   1354 5月   9 18:40 hive-site.xml
  -rw-rw-r--. 1 cevent cevent   1593 4月  30 2015 ivysettings.xml
  [cevent@hadoop207 conf]$ cat hive-site.xml  查看hive配置
  <?xml version="1.0"?>
  <?xml-stylesheet
  type="text/xsl" href="configuration.xsl"?>
   
  <configuration>
   
         
  <!--mySQL数据库地址-->
         
  <property>
           
  <name>javax.jdo.option.ConnectionURL</name>
           
  <value>jdbc:mysql://hadoop207.cevent.com:3306/metastore?createDatabaseIfNotExist=true</value>
           
  <description>JDBC connect string for a JDBC
  metastore</description>
         
  </property>
   
         
  <property>
           
  <name>javax.jdo.option.ConnectionDriverName</name>
           
  <value>com.mysql.jdbc.Driver</value>
           
  <description>Driver class name for a JDBC
  metastore</description>
         
  </property>
   
         
  <!--mySQL数据库访问用户名及密码-->
   
         
  <property>
           
  <name>javax.jdo.option.ConnectionUserName</name>
           
  <value>root</value>
           
  <description>username to use against metastore
  database</description>
         
  </property>
   
         
  <property>
           
  <name>javax.jdo.option.ConnectionPassword</name>
           
  <value>cevent</value>
            <description>password to use
  against metastore database</description>
         
  </property>
   
         
  <!-- 自定义hive查询显示的信息  -->
         
  <property>
           
  <name>hive.cli.print.header</name>
           
  <value>true</value>
         
  </property>
   
         
  <property>
           
  <property>
           
  <name>hive.cli.print.current.db</name>
           
  <value>true</value>
         
  
         
  <!--修改默认的reduce生成数量 ，去掉-->
          </property>
            <name>mapreduce.job.reduce</name>
           
  <value>4</value>
          </property>
   
   
  </configuration>
   
   
  [cevent@hadoop207 conf]$ vim hive-site.xml 修改配置
         
  </property>
           
  <value>4</value>
           
  <name>mapreduce.job.reduce</name>
         
  </property>
  <?xml version="1.0"?>
  <?xml-stylesheet
  type="text/xsl" href="configuration.xsl"?>
   
  <configuration>
   
         
  <!--mySQL数据库地址-->
         
  <property>
           
  <name>javax.jdo.option.ConnectionURL</name>
           
  <value>jdbc:mysql://hadoop207.cevent.com:3306/metastore?createDatabaseIfNotExist=true</value>
           
  <description>JDBC connect string for a JDBC
  metastore</description>
         
  </property>
   
         
  <property>
           
  <name>javax.jdo.option.ConnectionDriverName</name>
           
  <value>com.mysql.jdbc.Driver</value>
           
  <description>Driver class name for a JDBC
  metastore</description>
      
     </property>
   
         
  <!--mySQL数据库访问用户名及密码-->
   
         
  <property>
           
  <name>javax.jdo.option.ConnectionUserName</name>
           
  <value>root</value>
           
  <description>username to use against metastore database</description>
         
  </property>
   
         
  <property>
           
  <name>javax.jdo.option.ConnectionPassword</name>
           
  <value>cevent</value>
           
  <description>password to use against metastore
  database</description>
         
  </property>
   
         
  <!-- 自定义hive查询显示的信息  -->
         
  <property>
           
  <name>hive.cli.print.header</name>
           
  <value>true</value>
         
  </property>
   
         
  <property>
           
  <property>
           
  <name>hive.cli.print.current.db</name>
           
  <value>true</value>
   
         
  <!--修改默认的reduce生成数量-->
   
   
  </configuration>

4.插入数据




 
  
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  create table big_join(
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> id
  bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> time
  bigint,
  0: jdbc:hive2://hadoop207.cevent.com:10000> uid string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> keyword
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> url_rank
  int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_num
  int,
  0: jdbc:hive2://hadoop207.cevent.com:10000> click_url string
  0: jdbc:hive2://hadoop207.cevent.com:10000> )
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t'; 
  No rows affected (0.467 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table small_join(
  0: jdbc:hive2://hadoop207.cevent.com:10000> id bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> time
  bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> uid
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> keyword
  string,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  url_rank int,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  click_num int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_url
  string
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> )
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t'; 
  No rows affected (0.1 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table join_table(
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> id
  bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> time
  bigint,
  0: jdbc:hive2://hadoop207.cevent.com:10000> uid string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> keyword
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> url_rank
  int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_num
  int,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  click_url string
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  )
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t'; 
  No rows affected (0.1 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> load data
  local inpath '/opt/module/datas/bigtable' into table big_join;
  INFO 
  : Loading data to table cevent01.big_join from
  file:/opt/module/datas/bigtable
  INFO 
  : Table cevent01.big_join stats: [numFiles=1, totalSize=120734753]
  No rows affected (14.891 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath '/opt/module/datas/smalltable' into
  table small_join;
  INFO 
  : Loading data to table cevent01.small_join from
  file:/opt/module/datas/smalltable
  INFO 
  : Table cevent01.small_join stats: [numFiles=1, totalSize=12018355]
  No rows affected (1.723 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select *
  from big_join limit 5;
  +--------------+-----------------+-----------------------------------+-------------------+--------------------+---------------------+----------------------------------------------------------------+--+
  | big_join.id  |  big_join.time  |          
  big_join.uid            |
  big_join.keyword  |
  big_join.url_rank  |
  big_join.click_num  |                       big_join.click_url                       |
  +--------------+-----------------+-----------------------------------+-------------------+--------------------+---------------------+----------------------------------------------------------------+--+
  | 0            | 20111230000005  | 57375476989eea12893c0c3811607bcf  | 奇艺高清              | 1                  | 1                   |
  http://www.qiyi.com/                                           |
  | 0            |
  20111230000005  |
  66c5bb7774e31d0a22278249b26bc83a  | 凡人修仙传             | 3                  | 1                   |
  http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1  |
  | 0            |
  20111230000007  |
  b97920521c78de70ac38e3713f524b50  | 本本联盟              | 1                  | 1                   | http://www.bblianmeng.com/                                     |
  | 0            |
  20111230000008  |
  6961d0c97fe93701fc9c0d861d096cd9  | 华南师范大学图书馆         | 1                  | 1                   |
  http://lib.scnu.edu.cn/                                        |
  | 0            | 20111230000008  | f2f5a21c764aebde1e8afcc2871e086f  | 在线代理              | 2                  | 1                   | http://proxyie.cn/                                             |
  +--------------+-----------------+-----------------------------------+-------------------+--------------------+---------------------+----------------------------------------------------------------+--+
  5 rows selected (0.613 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select *
  from small_join limit 5;
  +----------------+------------------+-----------------------------------+---------------------+----------------------+-----------------------+----------------------------------------------------------------+--+
  | small_join.id  |
  small_join.time  |          small_join.uid           | small_join.keyword  | small_join.url_rank  | small_join.click_num  |                      small_join.click_url                      |
  +----------------+------------------+-----------------------------------+---------------------+----------------------+-----------------------+----------------------------------------------------------------+--+
  | 95048          |
  20111230000005   | 57375476989eea12893c0c3811607bcf  | 奇艺高清                | 1                    | 1                     | http://www.qiyi.com/                                           |
  | 95096          |
  20111230000005   |
  66c5bb7774e31d0a22278249b26bc83a  | 凡人修仙传               | 3                    | 1                     |
  http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1  |
  | 95144          |
  20111230000007   |
  b97920521c78de70ac38e3713f524b50  | 本本联盟                | 1                    | 1                     | http://www.bblianmeng.com/                                     |
  | 95192          | 20111230000008   | 6961d0c97fe93701fc9c0d861d096cd9  | 华南师范大学图书馆           | 1                    | 1                     |
  http://lib.scnu.edu.cn/                                        |
  | 95192          |
  20111230000008   |
  f2f5a21c764aebde1e8afcc2871e086f  | 在线代理                | 2                    | 1                     |
  http://proxyie.cn/                                            
  |
  +----------------+------------------+-----------------------------------+---------------------+----------------------+-----------------------+----------------------------------------------------------------+--+
  5 rows selected (0.118 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.auto.convert.join=false; //将mapjoin打开效率更高，true
  No rows affected (0.017 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  insert overwrite table join_table
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  b.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_url
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  small_join s
  0: jdbc:hive2://hadoop207.cevent.com:10000> join big_join b
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> on
  b.id=s.id;
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set mapreduce.job.reduces=<number>
  INFO 
  : number of splits:2
  INFO 
  : Submitting tokens for job: job_1591593753790_0001
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1591593753790_0001/
  INFO 
  : Starting Job = job_1591593753790_0001, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591593753790_0001/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1591593753790_0001
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 2; number of
  reducers: 1
  INFO 
  : 2020-06-08 13:41:29,289 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-06-08 13:42:12,682 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 13.84 sec
  INFO 
  : 2020-06-08 13:42:33,048 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17.94 sec
  INFO 
  : 2020-06-08 13:42:34,442 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 23.18 sec
  INFO 
  : MapReduce Total cumulative CPU time: 23 seconds 180 msec
  INFO 
  : Ended Job = job_1591593753790_0001
  INFO 
  : Loading data to table cevent01.join_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_table/.hive-staging_hive_2020-06-08_13-41-10_494_1851985601125007593-1/-ext-10000
  INFO 
  : Table cevent01.join_table stats: [numFiles=1, numRows=999990,
  totalSize=123817216, rawDataSize=122817226]
  No rows affected (95.171 seconds)

5.SQL解析




 
  
  //hive表优化
  //1.join
  //创建大表
  create table big_join(
  id bigint,
  time bigint,
  uid string,
  keyword string,
  url_rank int,
  click_num int,
  click_url string
  )
  row format delimited fields terminated by
  '\t';
   
  //创建小表
  create table small_join(
  id bigint,
  time bigint,
  uid string,
  keyword string,
  url_rank int,
  click_num int,
  click_url string
  )
  row format delimited fields terminated by
  '\t';
   
  //创建join后表的语句
  create table join_table(
  id bigint,
  time bigint,
  uid string,
  keyword string,
  url_rank int,
  click_num int,
  click_url string
  )
  row format delimited fields terminated by
  '\t';
   
  //加载数据
  load data local inpath
  '/opt/module/datas/bigtable' into table big_join;
  load data local inpath '/opt/module/datas/smalltable'
  into table small_join;
   
  //关闭hive.auto.convert.join=false;关闭map join
  //关闭本地模式，影响效率：set hive.exec.mode.local.auto=false;
  //执行查询插入,小表插入大表
  insert overwrite table join_table
  select b.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_url
  from small_join s
  join big_join b
  on b.id=s.id;
  //执行查询插入,大表插入小表
  insert overwrite table join_table
  select
  b.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_url
  from big_join b
  join small_join s
  on s.id=b.id;

6.准备原始数据ori

ori1
ori2

7.执行插入




 
  
  [cevent@hadoop207 hive-1.2.1]$ bin/beeline 
  Beeline version 1.2.1 by Apache Hive
  beeline>
  !connect jdbc:hive2://hadoop207.cevent.com:10000
  Connecting to jdbc:hive2://hadoop207.cevent.com:10000
  Enter username for
  jdbc:hive2://hadoop207.cevent.com:10000: cevent
  Enter password for jdbc:hive2://hadoop207.cevent.com:10000:
  ******
  Connected to: Apache Hive (version 1.2.1)
  Driver: Hive JDBC (version 1.2.1)
  Transaction isolation:
  TRANSACTION_REPEATABLE_READ
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show
  databases;
  +----------------+--+
  | database_name  |
  +----------------+--+
  | cevent01       |
  | default        |
  +----------------+--+
  2 rows selected (1.77 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> use
  cevent01;
  No rows affected (0.053 seconds)   插入join表数据
  0: jdbc:hive2://hadoop207.cevent.com:10000> insert overwrite table join_table
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  b.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_url
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  small_join s
  0: jdbc:hive2://hadoop207.cevent.com:10000> left join big_join b
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> on
  b.id=s.id;
  INFO  : Stage-1 is selected by
  condition resolver.
  INFO  : Number of reduce tasks
  not specified. Estimated from input data size: 1
  INFO  : In order to change the
  average load for a reducer (in bytes):
  INFO  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO  : In order to limit the
  maximum number of reducers:
  INFO  :   set hive.exec.reducers.max=<number>
  INFO  : In order to set a
  constant number of reducers:
  INFO  :   set mapreduce.job.reduces=<number>
  INFO  : number of splits:2
  INFO  : Submitting tokens for
  job: job_1591675698957_0001
  INFO  : The url to track the
  job: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0001/
  INFO  : Starting Job = job_1591675698957_0001,
  Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0001/
  INFO  : Kill Command =
  /opt/module/hadoop-2.7.2/bin/hadoop job 
  -kill job_1591675698957_0001
  INFO  : Hadoop job information
  for Stage-1: number of mappers: 2; number of reducers: 1
  INFO  : 2020-06-09 12:13:16,419
  Stage-1 map = 0%,  reduce = 0%
  INFO  : 2020-06-09 12:13:43,229
  Stage-1 map = 50%,  reduce = 0%,
  Cumulative CPU 5.68 sec
  INFO  : 2020-06-09 12:13:57,087
  Stage-1 map = 100%,  reduce = 0%, Cumulative
  CPU 16.96 sec
  INFO  : 2020-06-09 12:14:12,299
  Stage-1 map = 100%,  reduce = 80%,
  Cumulative CPU 22.19 sec
  INFO  : 2020-06-09 12:14:15,774
  Stage-1 map = 100%,  reduce = 98%,
  Cumulative CPU 25.13 sec
  INFO  : 2020-06-09 12:14:24,366
  Stage-1 map = 100%,  reduce = 100%,
  Cumulative CPU 32.81 sec
  INFO  : MapReduce Total
  cumulative CPU time: 32 seconds 810 msec
  INFO  : Ended Job =
  job_1591675698957_0001
  INFO  : Loading data to table
  cevent01.join_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_table/.hive-staging_hive_2020-06-09_12-12-53_365_6693250539177613475-1/-ext-10000
  INFO  : Table
  cevent01.join_table stats: [numFiles=1, numRows=999990, totalSize=123817216,
  rawDataSize=122817226]
  No rows affected (96.166 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  create table join_original(  创建原始数据表
  0: jdbc:hive2://hadoop207.cevent.com:10000> id bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> time
  bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> uid
  string,
  0: jdbc:hive2://hadoop207.cevent.com:10000> keyword string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> url_rank
  int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_num
  int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_url
  string
  0: jdbc:hive2://hadoop207.cevent.com:10000> )
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  row format delimited fields terminated by '\t'; 
  No rows affected (0.608 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table join_original_null_id(  创建空key表
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  id bigint,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  time bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> uid
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> keyword
  string,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  url_rank int,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  click_num int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_url
  string
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> )
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t'; 
  No rows affected (0.135 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table join_original_table(  创建join表
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> id
  bigint,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> time
  bigint,
  0: jdbc:hive2://hadoop207.cevent.com:10000> uid string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> keyword
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> url_rank
  int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> click_num
  int,
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  click_url string
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  )
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t'; 
  No rows affected (0.122 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath '/opt/module/datas/ori' into table
  join_original;
  INFO 
  : Loading data to table cevent01.join_original from
  file:/opt/module/datas/ori
  INFO 
  : Table cevent01.join_original stats: [numFiles=1,
  totalSize=121734744]
  No rows affected (6.715 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath '/opt/module/datas/nullid' into
  table join_original_null_id;
  INFO 
  : Loading data to table cevent01.join_original_null_id from
  file:/opt/module/datas/nullid
  INFO 
  : Table cevent01.join_original_null_id stats: [numFiles=1, totalSize=118645854]
  No rows affected (13.959 seconds)    插入join表数据，全插
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> insert
  overwrite table join_original_table 
  0: jdbc:hive2://hadoop207.cevent.com:10000> select n.* from 
  0: jdbc:hive2://hadoop207.cevent.com:10000> join_original_null_id n left join join_original o on
  n.id=o.id;
  INFO  : Stage-1 is selected by
  condition resolver.
  INFO  : Number of reduce tasks
  not specified. Estimated from input data size: 1
  INFO  : In order to change the
  average load for a reducer (in bytes):
  INFO  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO  : In order to limit the
  maximum number of reducers:
  INFO  :   set hive.exec.reducers.max=<number>
  INFO  : In order to set a
  constant number of reducers:
  INFO  :   set mapreduce.job.reduces=<number>
  INFO  : number of splits:2
  INFO  : Submitting tokens for
  job: job_1591675698957_0002
  INFO  : The url to track the
  job: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0002/
  INFO  : Starting Job =
  job_1591675698957_0002, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0002/
  INFO  : Kill Command =
  /opt/module/hadoop-2.7.2/bin/hadoop job 
  -kill job_1591675698957_0002
  INFO  : Hadoop job information
  for Stage-1: number of mappers: 2; number of reducers: 1
  INFO  : 2020-06-09 13:26:11,449
  Stage-1 map = 0%,  reduce = 0%
  INFO  : 2020-06-09 13:27:00,291
  Stage-1 map = 100%,  reduce = 0%,
  Cumulative CPU 16.19 sec
  INFO  : 2020-06-09 13:27:21,791
  Stage-1 map = 100%,  reduce = 69%,
  Cumulative CPU 20.71 sec
  INFO  : 2020-06-09 13:27:25,027
  Stage-1 map = 100%,  reduce = 77%,
  Cumulative CPU 22.42 sec
  INFO  : 2020-06-09 13:27:27,828
  Stage-1 map = 100%,  reduce = 90%,
  Cumulative CPU 22.42 sec
  INFO  : 2020-06-09 13:27:33,724
  Stage-1 map = 100%,  reduce = 100%, Cumulative
  CPU 27.9 sec
  INFO  : MapReduce Total
  cumulative CPU time: 27 seconds 900 msec
  INFO  : Ended Job =
  job_1591675698957_0002
  INFO  : Loading data to table cevent01.join_original_table
  from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-26-01_145_7407125718177019072-1/-ext-10000
  INFO  : Table
  cevent01.join_original_table stats: [numFiles=1, numRows=1000000,
  totalSize=118645854, rawDataSize=117645854]
  No rows affected (97.176 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> insert
  overwrite table join_original_table 不为空查询
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  n.* from
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> (select *
  from join_original_null_id where id is not null) 
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> n left
  join join_original o on n.id=o.id;
  INFO  : Stage-1 is selected by
  condition resolver.
  INFO  : Number of reduce tasks
  not specified. Estimated from input data size: 1
  INFO  : In order to change the
  average load for a reducer (in bytes):
  INFO  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO  : In order to limit the
  maximum number of reducers:
  INFO  :   set hive.exec.reducers.max=<number>
  INFO  : In order to set a constant
  number of reducers:
  INFO  :   set mapreduce.job.reduces=<number>
  INFO  : number of splits:2
  INFO  : Submitting tokens for
  job: job_1591675698957_0003
  INFO  : The url to track the
  job: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0003/
  INFO  : Starting Job = job_1591675698957_0003,
  Tracking URL = http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0003/
  INFO  : Kill Command =
  /opt/module/hadoop-2.7.2/bin/hadoop job 
  -kill job_1591675698957_0003
  INFO  : Hadoop job information
  for Stage-1: number of mappers: 2; number of reducers: 1
  INFO  : 2020-06-09 13:29:56,277
  Stage-1 map = 0%,  reduce = 0%
  INFO  : 2020-06-09 13:30:25,427
  Stage-1 map = 100%,  reduce = 0%,
  Cumulative CPU 12.49 sec
  INFO  : 2020-06-09 13:30:48,717
  Stage-1 map = 100%,  reduce = 100%, Cumulative
  CPU 19.14 sec
  INFO  : MapReduce Total
  cumulative CPU time: 19 seconds 140 msec
  INFO  : Ended Job =
  job_1591675698957_0003
  INFO  : Loading data to table
  cevent01.join_original_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-29-44_221_5080146984779335117-1/-ext-10000
  INFO  : Table
  cevent01.join_original_table stats: [numFiles=1, numRows=200001,
  totalSize=24409561, rawDataSize=24209560]
  No rows affected (67.163 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  mapreduce.job.reduces=3; 设置reduce的数量
  No rows affected (0.226 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> insert
  overwrite table join_original_table  只匹配有id的
  0: jdbc:hive2://hadoop207.cevent.com:10000> select n.* from
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> join_original_null_id
  n left join join_original o on n.id=o.id;
  INFO  : Stage-1 is selected by
  condition resolver.
  INFO  : Number of reduce tasks
  not specified. Defaulting to jobconf value of: 3
  INFO  : In order to change the
  average load for a reducer (in bytes):
  INFO  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO  : In order to limit the
  maximum number of reducers:
  INFO  :   set hive.exec.reducers.max=<number>
  INFO  : In order to set a
  constant number of reducers:
  INFO  :   set mapreduce.job.reduces=<number>
  INFO  : number of splits:2
  INFO  : Submitting tokens for
  job: job_1591675698957_0004
  INFO  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0004/
  INFO  : Starting Job =
  job_1591675698957_0004, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0004/
  INFO  : Kill Command =
  /opt/module/hadoop-2.7.2/bin/hadoop job 
  -kill job_1591675698957_0004
  INFO  : Hadoop job information
  for Stage-1: number of mappers: 2; number of reducers: 3
  INFO  : 2020-06-09 13:37:54,461
  Stage-1 map = 0%,  reduce = 0%
  INFO  : 2020-06-09 13:38:20,067
  Stage-1 map = 50%,  reduce = 0%,
  Cumulative CPU 15.17 sec
  INFO  : 2020-06-09 13:38:25,554 Stage-1
  map = 100%,  reduce = 0%, Cumulative
  CPU 18.33 sec
  INFO  : 2020-06-09 13:38:46,507
  Stage-1 map = 100%,  reduce = 50%,
  Cumulative CPU 22.28 sec
  INFO  : 2020-06-09 13:39:12,015
  Stage-1 map = 100%,  reduce = 76%,
  Cumulative CPU 31.73 sec
  INFO  : 2020-06-09 13:39:16,761
  Stage-1 map = 100%,  reduce = 100%,
  Cumulative CPU 43.24 sec
  INFO  : MapReduce Total
  cumulative CPU time: 43 seconds 240 msec
  INFO  : Ended Job = job_1591675698957_0004
  INFO  : Loading data to table
  cevent01.join_original_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-37-42_754_4173942603345328108-1/-ext-10000
  INFO  : Table
  cevent01.join_original_table stats: [numFiles=3, numRows=1000000,
  totalSize=118645854, rawDataSize=117645854]
  No rows affected (110.936 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> insert
  overwrite table join_original_table 设置空key为随机数
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select n.* from
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  join_original_null_id n full join join_original o on 
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> case when
  n.id is null
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  then concat('hive',rand())
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  else n.id end=o.id;
  INFO  : Number of reduce tasks
  not specified. Defaulting to jobconf value of: 3
  INFO  : In order to change the
  average load for a reducer (in bytes):
  INFO  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO  : In order to limit the
  maximum number of reducers:
  INFO  :   set hive.exec.reducers.max=<number>
  INFO  : In order to set a
  constant number of reducers:
  INFO  :   set mapreduce.job.reduces=<number>
  INFO  : number of splits:2
  INFO  : Submitting tokens for
  job: job_1591675698957_0005
  INFO  : The url to track the
  job: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0005/
  INFO  : Starting Job =
  job_1591675698957_0005, Tracking URL = http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0005/
  INFO  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop
  job  -kill job_1591675698957_0005
  INFO  : Hadoop job information
  for Stage-1: number of mappers: 2; number of reducers: 3
  INFO  : 2020-06-09 13:43:57,701
  Stage-1 map = 0%,  reduce = 0%
  INFO  : 2020-06-09 13:44:18,174
  Stage-1 map = 33%,  reduce = 0%,
  Cumulative CPU 12.1 sec
  INFO  : 2020-06-09 13:44:19,621
  Stage-1 map = 50%,  reduce = 0%,
  Cumulative CPU 14.39 sec
  INFO  : 2020-06-09 13:44:26,173
  Stage-1 map = 84%,  reduce = 0%,
  Cumulative CPU 24.9 sec
  INFO  : 2020-06-09 13:44:30,664
  Stage-1 map = 100%,  reduce = 0%,
  Cumulative CPU 26.35 sec
  INFO  : 2020-06-09 13:44:45,291
  Stage-1 map = 100%,  reduce = 28%,
  Cumulative CPU 28.45 sec
  INFO  : 2020-06-09 13:44:52,047
  Stage-1 map = 100%,  reduce = 56%,
  Cumulative CPU 31.54 sec
  INFO  : 2020-06-09 13:45:03,169
  Stage-1 map = 100%,  reduce = 76%,
  Cumulative CPU 37.31 sec
  INFO  : 2020-06-09 13:45:04,818
  Stage-1 map = 100%,  reduce = 89%,
  Cumulative CPU 42.86 sec
  INFO  : 2020-06-09 13:45:12,375
  Stage-1 map = 100%,  reduce = 99%,
  Cumulative CPU 53.09 sec
  INFO  : 2020-06-09 13:45:13,851
  Stage-1 map = 100%,  reduce = 100%,
  Cumulative CPU 53.26 sec
  INFO  : MapReduce Total
  cumulative CPU time: 53 seconds 260 msec
  INFO  : Ended Job =
  job_1591675698957_0005
  INFO  : Loading data to table
  cevent01.join_original_table from
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-43-42_102_6854456494005290452-1/-ext-10000
  INFO  : Table
  cevent01.join_original_table stats: [numFiles=3, numRows=1799999,
  totalSize=135445833, rawDataSize=133645834]
  No rows affected (97.97 seconds)

8.链接测试

链接测试cluster：http://hadoop207.cevent.com:8088/cluster

cluster1
cluster2

9.动态分区




 
  
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select b.id from big_join b  查询id<=10的数据
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> join
  (select id from join_original where id<=10) o
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> on
  b.id=o.id;
  INFO 
  : Stage-1 is selected by condition resolver.
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO  :  
  set hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set mapreduce.job.reduces=<number>
  INFO 
  : number of splits:2
  INFO 
  : Submitting tokens for job: job_1591675698957_0014
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0014/
  INFO 
  : Starting Job = job_1591675698957_0014, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0014/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1591675698957_0014
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 2; number of
  reducers: 1
  INFO 
  : 2020-06-09 17:00:28,898 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-06-09 17:00:49,450 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 6.22 sec
  INFO 
  : 2020-06-09 17:00:51,358 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 10.76 sec
  INFO 
  : 2020-06-09 17:01:04,748 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 14.9 sec
  INFO 
  : MapReduce Total cumulative CPU time: 14 seconds 900 msec
  INFO 
  : Ended Job = job_1591675698957_0014
  +-------+--+
  | b.id  |
  +-------+--+
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 1     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 2     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 3     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 4     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 5     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 6     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 7     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 8     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 9     |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  | 10    |
  +-------+--+
  100 rows selected (48.722 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.exec.dynamic.partition=true; 开启动态分区功能
  No rows affected (0.018 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  set hive.exec.dynamic.partition.mode=nonstrict;  设置为非严格模式
  No rows affected (0.587 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.exec.max.dynamic.partitions=1000; 
  所有maoreduce节点上最大创建的动态分区数量1000
  No rows affected (0.007 seconds) 在执行的每个mapreduce中，最大可以创建100个动态分区
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.exec.max.dynamic.partitions.pernode=100;
  No rows affected (0.009 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.exec.max.created.files=100000;  
  最大可以创建多少个HDFS文件，默认为10万
  No rows affected (0.006 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set hive.error.on.empty.partition=false;
  空分区生成时，是否抛出异常
  No rows affected (0.007 seconds)   创建分区大表
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  create table join_dep_partition(
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> id int,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> name
  string
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> )
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  partitioned by (location int) row format delimited
  fields terminated by '\t';
  No rows affected (2.525 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> insert
  into table join_dep_partition  插入查询结果，自动根据类型匹配
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> partition(location)
  
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select deptno,dname,locations from c_dept;
  INFO  : Number of reduce tasks
  is set to 0 since there's no reduce operator
  INFO  : number of splits:1
  INFO  : Submitting tokens for
  job: job_1591675698957_0015
  INFO  : The url to track the
  job: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0015/
  INFO  : Starting Job =
  job_1591675698957_0015, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0015/
  INFO  : Kill Command =
  /opt/module/hadoop-2.7.2/bin/hadoop job 
  -kill job_1591675698957_0015
  INFO  : Hadoop job information
  for Stage-1: number of mappers: 1; number of reducers: 0
  INFO  : 2020-06-09 18:18:43,415
  Stage-1 map = 0%,  reduce = 0%
  INFO  : 2020-06-09 18:18:53,201
  Stage-1 map = 100%,  reduce = 0%,
  Cumulative CPU 1.95 sec
  INFO  : MapReduce Total
  cumulative CPU time: 1 seconds 950 msec
  INFO  : Ended Job =
  job_1591675698957_0015
  INFO  : Stage-4 is selected by
  condition resolver.
  INFO  : Stage-3 is filtered out
  by condition resolver.
  INFO  : Stage-5 is filtered out
  by condition resolver.
  INFO  : Moving data to:
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_dep_partition/.hive-staging_hive_2020-06-09_18-18-27_297_470948791343936680-7/-ext-10000
  from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_dep_partition/.hive-staging_hive_2020-06-09_18-18-27_297_470948791343936680-7/-ext-10002
  INFO  : Loading data to table
  cevent01.join_dep_partition partition (location=null) from
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_dep_partition/.hive-staging_hive_2020-06-09_18-18-27_297_470948791343936680-7/-ext-10000
  INFO  :          Time taken for load dynamic
  partitions : 2375
  INFO  :         Loading partition {location=1800}
  INFO  :         Loading partition {location=1900}
  INFO  :         Loading partition {location=1700}
  INFO  :          Time taken for adding to write
  entity : 2
  INFO  : Partition
  cevent01.join_dep_partition{location=1700} stats: [numFiles=1, numRows=2,
  totalSize=28, rawDataSize=26]
  INFO  : Partition
  cevent01.join_dep_partition{location=1800} stats: [numFiles=1, numRows=1,
  totalSize=12, rawDataSize=11]
  INFO  : Partition
  cevent01.join_dep_partition{location=1900} stats: [numFiles=1, numRows=1,
  totalSize=9, rawDataSize=8]
  No rows affected (32.8 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show
  partitions join_dep_partition; 查看分区
  +----------------+--+
  |   partition    |
  +----------------+--+
  | location=1700  |
  | location=1800  |
  | location=1900  |
  +----------------+--+
  3 rows selected (0.502 seconds)

10.链接校验

http://hadoop207.cevent.com:50070/explorer.html#/user/hive/warehouse/cevent01.db
分区结果

cevent

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive调优fetch模式抓取，join表测试空key自动转换random，mapreduce动态分区设置

1.fetch配置 [cevent@hadoop207 ~]$ cd /opt/module/hive-1.2.1/ [cevent@hadoop207 hive-1.2.1]$ ll 总用量 524 drwxrwxr-x. 3 cevent cevent 4096 4月 30 15:59 bin drwxrwxr-x. 2 cevent cevent 4096 5月 9 18:40 conf -rw-rw-r--. 1 cevent cevent 2
复制链接

扫一扫

专栏目录