七十八、Hive数据仓库实际操作(操作测试(1),【干货】

MapReduce Total cumulative CPU time: 3 seconds 470 msec

Ended Job = job_1646528951444_0003

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://master:9000/user/hive/warehouse/db.db/cat_group2/.hive-staging_hive_2022-03-06_04-06-59_043_3456913091663343579-1/-ext-10000

Loading data to table db.cat_group2

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.47 sec HDFS Read: 13409 HDFS Write: 348 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 470 msec

OK

Time taken: 63.711 seconds

hive> insert overwrite table cat_group2 select * from cat_group1;

Query ID = root_20220306041024_bf920fd1-b42d-4ed7-ad7b-66955905fa19

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1646528951444_0004, Tracking URL = http://master:8088/proxy/application_1646528951444_0004/

Kill Command = /home/hadoop//bin/mapred job -kill job_1646528951444_0004

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2022-03-06 04:10:47,981 Stage-1 map = 0%, reduce = 0%

2022-03-06 04:11:12,568 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.33 sec

2022-03-06 04:11:22,231 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.1 sec

MapReduce Total cumulative CPU time: 4 seconds 100 msec

Ended Job = job_1646528951444_0004

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://master:9000/user/hive/warehouse/db.db/cat_group2/.hive-staging_hive_2022-03-06_04-10-24_167_6531779411761470258-1/-ext-10000

Loading data to table db.cat_group2

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.1 sec HDFS Read: 13494 HDFS Write: 348 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 100 msec

OK

Time taken: 60.895 seconds

注:insert overwrite会覆盖数据

3、查询表 cat_group2

hive> select * from cat_group2 limit 5;

OK

101 孙悟空

102 唐僧

103 猪八戒

104 沙僧

105 托马斯

Time taken: 0.33 seconds, Fetched: 5 row(s)

4、在创建表的时候从别的表中查询出相应数据并插入到所创建的表中

Hive中创建表cat_group3并直接从cat_group2中获得数据。

hive> create table cat_group3 as select * from cat_group2;

Query ID = root_20220306041630_3200b863-b9b3-4c2e-ac0d-c7caff9b6611

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there’s no reduce operator

Starting Job = job_1646528951444_0005, Tracking URL = http://master:8088/proxy/application_1646528951444_0005/

Kill Command = /home/hadoop//bin/mapred job -kill job_1646528951444_0005

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2022-03-06 04:16:54,438 Stage-1 map = 0%, reduce = 0%

2022-03-06 04:17:02,430 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.58 sec

MapReduce Total cumulative CPU time: 1 seconds 580 msec

Ended Job = job_1646528951444_0005

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://master:9000/user/hive/warehouse/db.db/.hive-staging_hive_2022-03-06_04-16-30_327_7813330832683742274-1/-ext-10002

Moving data to directory hdfs://master:9000/user/hive/warehouse/db.db/cat_group3

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.58 sec HDFS Read: 4969 HDFS Write: 133 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 580 msec

OK

Time taken: 34.65 seconds

5、查询表 cat_group3

hive> select * from cat_group3 limit 5;

OK

101 孙悟空

102 唐僧

103 猪八戒

104 沙僧

105 托马斯

Time taken: 0.229 seconds, Fetched: 5 row(s)

五、常见的三种数据导出方式

1、导出到本地文件系统

在本地创建目录 /output/hive  并将Hive中的cat_group表导出到本地文件系统/output/hive/中。

[root@master hive]# mkdir -p /output/hive/

hive> insert overwrite local directory ‘/output/hive/’

row format delimited fields terminated by ‘\t’ select * from cat_group;

Query ID = root_20220306062829_b059a3f5-e4ad-4dd7-a000-e294c4ccbee2

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there’s no reduce operator

Starting Job = job_1646528951444_0006, Tracking URL = http://master:8088/proxy/application_1646528951444_0006/

Kill Command = /home/hadoop//bin/mapred job -kill job_1646528951444_0006

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2022-03-06 06:28:51,743 Stage-1 map = 0%, reduce = 0%

2022-03-06 06:29:00,515 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.59 sec

MapReduce Total cumulative CPU time: 1 seconds 590 msec

Ended Job = job_1646528951444_0006

Moving data to local directory /output/hive

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.59 sec HDFS Read: 4738 HDFS Write: 64 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 590 msec

OK

Time taken: 32.116 seconds

[root@master out]# cd /output/hive/

[root@master hive]# ll

total 4

-rw-r–r–. 1 root root 64 Mar 6 06:29 000000_0

[root@master hive]# cat 000000_0

101 孙悟空

102 唐僧

103 猪八戒

104 沙僧

105 托马斯

注意:方法和导入数据到 Hive不一样,不能用insert into来将数据导出。

2、Hive中数据导出到HDFS中

将Hive中的表cat_group中的数据导入到HDFS的/output/hive目录里。

hive> insert overwrite directory ‘/output/hive’

row format delimited fields terminated by ‘\t’ select group_id,

group_name from cat_group;

Query ID = root_20220306063621_b359d338-77ee-4571-a425-5415f9c6fb03

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there’s no reduce operator

Starting Job = job_1646528951444_0007, Tracking URL = http://master:8088/proxy/application_1646528951444_0007/

Kill Command = /home/hadoop//bin/mapred job -kill job_1646528951444_0007

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2022-03-06 06:36:41,866 Stage-1 map = 0%, reduce = 0%

2022-03-06 06:36:55,679 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.75 sec

MapReduce Total cumulative CPU time: 1 seconds 750 msec

Ended Job = job_1646528951444_0007

Stage-3 is selected by condition resolver.

Stage-2 is filtered out by condition resolver.

Stage-4 is filtered out by condition resolver.

Moving data to directory hdfs://master:9000/output/hive/.hive-staging_hive_2022-03-06_06-36-21_452_7432529204143275493-1/-ext-10000

Moving data to directory /output/hive

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.75 sec HDFS Read: 4772 HDFS Write: 64 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 750 msec

OK

Time taken: 36.494 seconds

在HDFS上查看结果

[root@master hive]# hadoop fs -ls /output/hive

Found 1 items

-rw-r–r-- 2 root supergroup 64 2022-03-06 06:36 /output/hive/000000_0

3、导出到Hive的另一张表中

将Hive中表cat_group中的数据导入到cat_group4中(两表字段及字符类型相同)。

首先在Hive 中创建一个表cat_group4,有group_id和group_name 两个字段,字符类型为string,以\t’为分隔符。

hive> create table cat_group4(group_id string,group_name string)

row format delimited fields terminated by ‘\t’ stored as textfile;

OK

Time taken: 0.195 seconds

然后将cat_group中的数据导入到cat_group4中。

hive> insert into table cat_group4 select * from cat_group;

Query ID = root_20220306064421_722364dd-7475-4ae5-ba44-553f3df856e2

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1646528951444_0008, Tracking URL = http://master:8088/proxy/application_1646528951444_0008/

Kill Command = /home/hadoop//bin/mapred job -kill job_1646528951444_0008

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2022-03-06 06:44:47,514 Stage-1 map = 0%, reduce = 0%

2022-03-06 06:44:58,359 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.74 sec

2022-03-06 06:45:11,880 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec

MapReduce Total cumulative CPU time: 3 seconds 400 msec

Ended Job = job_1646528951444_0008

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://master:9000/user/hive/warehouse/db.db/cat_group4/.hive-staging_hive_2022-03-06_06-44-21_318_6696628966307745769-1/-ext-10000

Loading data to table db.cat_group4

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.4 sec HDFS Read: 13474 HDFS Write: 348 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 400 msec

OK

Time taken: 52.617 seconds

导入完成后,查看cat_group4表中数据。

hive> select * from cat_group4 limit 10;

OK

101 孙悟空

102 唐僧

103 猪八戒

104 沙僧

105 托马斯

Time taken: 0.249 seconds, Fetched: 5 row(s)

六、Hive分区表的操作

创建表分区,在 Hive中创建一个分区表goods,包含 goods_id和goods_status两个字段,字符类型为string,分区为cat_id,字符类型为string,以“\t“为分隔符。

hive> create table goods(goods_id string,goods_status string) partitioned by (cat_id string)

row format delimited fields terminated by ‘\t’;

OK

Time taken: 0.107 seconds

查看表 goods 结构

hive> desc goods;

OK

goods_id string

goods_status string

cat_id string

Partition Information

col_name data_type comment

cat_id string

Time taken: 0.108 seconds, Fetched: 7 row(s)

向分区表插入数据,将本地/output/hive下的表goods 中数据,插入到分区表goods中。

[root@master hive]# cat goods

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

1020405 6 52052

在Hive中创建一个非分区表goods_1表,用于存储本地/input/hive/下的表goods 中数据。

hive> create table goods_1(goods_id string,goods_status string,cat_id string)

row format delimited fields terminated by ‘\t’;

OK

Time taken: 0.179 seconds

将本地/input/hive/下的表goods 中数据导入到Hive中的goods_1表中。

hive> load data local inpath ‘/input/hive/goods’ into table goods_1;

Loading data to table db.goods_1

OK

Time taken: 0.511 seconds

再将表goods_1中的数据导入到分区表goods中

hive> insert into table db.goods partition(cat_id = ‘52052’) select goods_id, goods_status from db.goods_1 where cat_id = ‘52052’;

Query ID = root_20220307041832_30f47fc3-629d-4eda-821a-5f0c3a9edb0d

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1646636256603_0002, Tracking URL = http://master:8088/proxy/application_1646636256603_0002/

Kill Command = /home/hadoop//bin/mapred job -kill job_1646636256603_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2022-03-07 04:19:05,274 Stage-1 map = 0%, reduce = 0%

2022-03-07 04:19:18,487 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.77 sec

2022-03-07 04:19:27,292 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.59 sec

MapReduce Total cumulative CPU time: 4 seconds 590 msec

Ended Job = job_1646636256603_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://master:9000/user/hive/warehouse/db.db/goods/cat_id=52052/.hive-staging_hive_2022-03-07_04-18-32_060_6446641423854979060-1/-ext-10000

Loading data to table db.goods partition (cat_id=52052)

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.59 sec HDFS Read: 14777 HDFS Write: 320 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 590 msec

OK

Time taken: 59.931 seconds

查看表 goods中数据

hive> select goods_id, goods_status from goods;

OK

1624123 6

1020405 6

1020405 6

1020405 6

1020405 6

1020405 6

1020405 6

1020405 6

Time taken: 0.252 seconds, Fetched: 8 row(s)

修改表分区,将分区表goods中的分区列 cat_id = 52050 改为cat_id = 52051,并查看修改后的分区名。

hive> alter table goods partition(cat_id=52052) rename to partition(cat_id=52051);

OK

Time taken: 0.678 seconds

hive> show partitions goods;

OK

cat_id=52051

Time taken: 0.139 seconds, Fetched: 1 row(s)

删除表分区

在删除goods分区表之前,先将goods表备份出一个goods_2表

hive> create table goods_2(goods_id string,goods_status string) partitioned by (cat_id string) row format delimited fields terminated by ‘\t’;

OK

Time taken: 0.178 seconds

hive> insert into table goods_2 partition(cat_id=‘52052’) select goods_id,goods_status from goods_1 where cat_id = ‘52052’;

自我介绍一下,小编13年上海交大毕业,曾经在小公司待过,也去过华为、OPPO等大厂,18年进入阿里一直到现在。

深知大多数软件测试工程师,想要提升技能,往往是自己摸索成长或者是报班学习,但对于培训机构动则几千的学费,着实压力不小。自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前!

因此收集整理了一份《2024年软件测试全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友,同时减轻大家的负担。
img
img
img
img
img

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上软件测试开发知识点,真正体系化!

由于文件比较大,这里只是将部分目录大纲截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且后续会持续更新

如果你觉得这些内容对你有帮助,可以添加V获取:vip1024b (备注软件测试)
img

一个人可以走的很快,但一群人才能走的更远。不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎扫码加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

‘52052’;

自我介绍一下,小编13年上海交大毕业,曾经在小公司待过,也去过华为、OPPO等大厂,18年进入阿里一直到现在。

深知大多数软件测试工程师,想要提升技能,往往是自己摸索成长或者是报班学习,但对于培训机构动则几千的学费,着实压力不小。自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前!

因此收集整理了一份《2024年软件测试全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友,同时减轻大家的负担。
[外链图片转存中…(img-3n5pyqew-1712737528187)]
[外链图片转存中…(img-Siif483o-1712737528187)]
[外链图片转存中…(img-hdMbv2Dx-1712737528188)]
[外链图片转存中…(img-xVuexmWc-1712737528188)]
[外链图片转存中…(img-FsAWdNaL-1712737528188)]

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上软件测试开发知识点,真正体系化!

由于文件比较大,这里只是将部分目录大纲截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且后续会持续更新

如果你觉得这些内容对你有帮助,可以添加V获取:vip1024b (备注软件测试)
[外链图片转存中…(img-L0yWWahP-1712737528188)]

一个人可以走的很快,但一群人才能走的更远。不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎扫码加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值