大数据一直都在说小文件、
insert 也会产生小文件:
测试:
1、创建表
create table dept(
deptno string,
dname string,
location string
)row format delimited fields terminated by ‘\t’;
2、加载数据
load data local inpath ‘/home/hadoop/data/dept.txt’ overwrite into table dept;
3、insert语句插入
insert into table dept values (40,‘it’,‘japan’);
会跑一个mapreduce作业,作业跑完后select语句运行。
hive (ruozeg6)> select * from dept;
OK
40 it japan
10 accouting newwork
20 restart china
30 sales japan
Time taken: 0.996 seconds, Fetched: 4 row(s)
去到hdfs的目录下查看文件:desc formatted 表名;查看到location
1、[hadoop@hadoop004 data]$ hdfs dfs -ls /user/hive/warehouse/ruozeg6.db/dept
19/07/01 20:24:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rwxr-xr-x 1 hadoop supergroup 12 2019-07-01 20:11 /user/hive/warehouse/ruozeg6.db/dept/000000_0
-rwxr-xr-x 1 hadoop supergroup 53 2019-07-01 17:55 /user/hive/warehouse/ruozeg6.db/dept/dept.txt
2、查看这个文件下的内容:
[hadoop@hadoop004 data]$ hdfs dfs -text /user/hive/warehouse/ruozeg6.db/dept/000000_0
19/07/01 20:32:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
40 it japan
3、再执行一条插入语句:
insert into table dept values (50,'ERP','suzhou');
4、再去hadoop的hdfs目录下查看:
[hadoop@hadoop004 data]$ hdfs dfs -ls /user/hive/warehouse/ruozeg6.db/dept
Found 3 items
-rwxr-xr-x 1 hadoop supergroup 12 2019-07-01 20:11 /user/hive/warehouse/ruozeg6.db/dept/000000_0
-rwxr-xr-x 1 hadoop supergroup 14 2019-07-01 20:21 /user/hive/warehouse/ruozeg6.db/dept/000000_0_copy_1
-rwxr-xr-x 1 hadoop supergroup 53 2019-07-01 17:55 /user/hive/warehouse/ruozeg6.db/dept/dept.txt
在hive中,每使用一次insert都会产生一个小文件,在生产中是大忌。