java读parquet文件乱码,对于parquet格式, 修改了字段类型又如何

20180314112830850

注:图来自info官网

[slave01:21000] > create table ml_123 (a int,b varchar(10)) STORED AS PARQUET ;

Query: create table ml_123 (a int,b varchar(10)) STORED AS PARQUET

Fetched 0 row(s) in 0.07s

[slave01:21000] > insert into ml_123(a,b) values(3,cast('孟梁' as varchar(10)));

Query: insert into ml_123(a,b) values(3,cast('孟梁' as varchar(10)))

Query submitted at: 2018-03-14 11:30:44 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=35482310a072c5c2:f2c507ea00000000

Modified 1 row(s) in 0.21s

[slave01:21000] > select * from ml_123;

Query: select * from ml_123

Query submitted at: 2018-03-14 11:30:47 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=ea450028b742c082:25c024e100000000

+---+------+

| a | b |

+---+------+

| 3 | 孟梁 |

+---+------+

Fetched 1 row(s) in 0.13s

[slave01:21000] >

下面开始修改int为bigint,看看是什么情况

[slave01:21000] > alter table ml_123 change a a bigint;

Query: alter table ml_123 change a a bigint

Fetched 0 row(s) in 0.18s

[slave01:21000] > select * from ml_123;

Query: select * from ml_123

Query submitted at: 2018-03-14 11:32:30 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=85412aa2816c7a71:dd93c58200000000

WARNINGS:

File 'hdfs://myha/user/hive/warehouse/telecom_xdr.db/ml_123/35482310a072c5c2-f2c507ea00000000_309295272_data.0.parq' has an incompatible Parquet schema for column 'telecom_xdr.ml_123.a'. Column type: BIGINT, Parquet schema:

optional int32 a [i:0 d:1 r:0]

File 'hdfs://myha/user/hive/warehouse/telecom_xdr.db/ml_123/35482310a072c5c2-f2c507ea00000000_309295272_data.0.parq' has an incompatible Parquet schema for column 'telecom_xdr.ml_123.a'. Column type: BIGINT, Parquet schema:

optional int32 a [i:0 d:1 r:0]

[slave01:21000] >

出现上面情况,就是parquert在存储是,做了加密,二级制存储压缩。如果改为bigint,那么占用的空间会大些,所以之前的压缩不可用, 如果生产环境出现误操作,怎么恢复呢?

看下面:

[slave01:21000] > alter table ml_123 change a a int;

Query: alter table ml_123 change a a int

Fetched 0 row(s) in 0.10s

[slave01:21000] > select * from ml_123;

Query: select * from ml_123

Query submitted at: 2018-03-14 11:34:52 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=68401ff9b0414c60:2eb2c5e400000000

+---+------+

| a | b |

+---+------+

| 3 | 孟梁 |

+---+------+

Fetched 1 row(s) in 0.13s

[slave01:21000] >

对,把字段类型恢复到原样即可。是不是有点像外部表的感觉。 壳和躯体分开的。

我们在看看改了varchar长度会导致乱码吗?

[slave01:21000] > desc ml_123;

Query: describe ml_123

+------+-------------+---------+

| name | type | comment |

+------+-------------+---------+

| a | int | |

| b | varchar(10) | |

+------+-------------+---------+

Fetched 2 row(s) in 0.01s

[slave01:21000] > alter table ml_123 change b b varchar(30);

Query: alter table ml_123 change b b varchar(30)

Fetched 0 row(s) in 0.16s

[slave01:21000] > select * from ml_123;

Query: select * from ml_123

Query submitted at: 2018-03-14 11:37:00 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=c74eda9401c455d7:1252cec600000000

+---+------+

| a | b |

+---+------+

| 3 | 孟梁 |

+---+------+

Fetched 1 row(s) in 0.13s

[slave01:21000] > desc ml_123;

Query: describe ml_123

+------+-------------+---------+

| name | type | comment |

+------+-------------+---------+

| a | int | |

| b | varchar(30) | |

+------+-------------+---------+

Fetched 2 row(s) in 0.01s

[slave01:21000] >那么改短了,我们试试乱码吗?

[slave01:21000] > desc ml_123;

Query: describe ml_123

+------+-------------+---------+

| name | type | comment |

+------+-------------+---------+

| a | int | |

| b | varchar(30) | |

+------+-------------+---------+

Fetched 2 row(s) in 0.01s

[slave01:21000] > alter table ml_123 change b b varchar(2);

Query: alter table ml_123 change b b varchar(2)

Fetched 0 row(s) in 0.09s

[slave01:21000] > select * from ml_123;

Query: select * from ml_123

Query submitted at: 2018-03-14 11:38:17 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=af45bd6629b19d8d:c662468200000000

+---+---+

| a | b |

+---+---+

| 3 | � |

+---+---+

Fetched 1 row(s) in 0.13s

[slave01:21000] >

那么我们试试,生成环境出现这种异常,我们怎么恢复? 改回来能正常显示吗?

看:

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=af45bd6629b19d8d:c662468200000000

+---+---+

| a | b |

+---+---+

| 3 | � |

+---+---+

Fetched 1 row(s) in 0.13s

[slave01:21000] > alter table ml_123 change b b varchar(12);

Query: alter table ml_123 change b b varchar(12)

Fetched 0 row(s) in 0.09s

[slave01:21000] > select * from ml_123;

Query: select * from ml_123

Query submitted at: 2018-03-14 11:39:14 (Coordinator: http://slave01:25000)

Query progress can be monitored at: http://slave01:25000/query_plan?query_id=d5449e9748082434:b0fc657200000000

+---+------+

| a | b |

+---+------+

| 3 | 孟梁 |

+---+------+

Fetched 1 row(s) in 0.13s

[slave01:21000] >

可以的。 哈哈,一样的。都是壳和躯体分开。

如果生成环境遇到问题,不要惊慌。恢复即可。

--end ---

老农民

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值