统计大数据数仓hive下数据量:
统计hive中数据量大小:
hadoop fs -du -s -h /user/hive/warehouse/
附:
hdfs查询命令:
hadoop fs -du -s -h 此路径
hadoop fs -du -h 路径下子目录
如下如:
通过元数据统计hive中数据条数(去hive的metastore所用的mysql地址查询):
查询不含分区表的总条数
select FORMAT(sum(tb.PARAM_VALUE),0)
from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join TABLE_PARAMS tb
on t.TBL_ID = tb.TBL_ID
where d.NAME='your_database_name' and tb.PARAM_KEY='numRows';
查询含有分区表的总条数
-- 查看分别是什么:
select d.NAME,t.TBL_NAME,t.TBL_ID,p.PART_ID,p.PART_NAME,a.PARAM_VALUE
from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join PARTITIONS p
on t.TBL_ID = p.TBL_ID
left join PARTITION_PARAMS a
on p.PART_ID=a.PART_ID
where t.TBL_NAME='your_table_name' and d.NAME='your_database_name' and a.PARAM_KEY='numRows';
-- 统计数据:
select FORMAT(sum(a.PARAM_VALUE),0)
from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join PARTITIONS p
on t.TBL_ID = p.TBL_ID
left join PARTITION_PARAMS a
on p.PART_ID=a.PART_ID
where t.TBL_NAME='your_table_name' and d.NAME='your_database_name' and a.PARAM_KEY='numRows';
统计mysql数据库下数据量:
USE information_schema;
# /1024/1024/1024 将数据单位处理成G
SELECT TABLE_SCHEMA, SUM(DATA_LENGTH)/1024/1024/1024 FROM TABLES GROUP BY TABLE_SCHEMA;