Hive在0.8.0提供了虚拟列的功能,虚拟列是一种特殊的函数,目前有2个可用的虚拟列:
- INPUT__FILE__NAME:显示mapper task的输入文件名
- BLOCK__OFFSET__INSIDE__FILE:显示当前全局文件的位置,或者当前block在文件的偏移位置(如果文件是压缩的)。
示例1
> SELECT INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE as OFFSIDE from employee_partitioned;
+----------------------------------------------------+----------+
| input__file__name | offside |
+----------------------------------------------------+----------+
| hdfs://ns001/tmp/hive/employee_partitioned/year=2012/month=11/000000_0 | 0 |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 0 |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 63 |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 116 |
| hdfs://ns001/tmp/hive/employee_partitioned/year=2018/month=9/000000_0 | 177 |
+----------------------------------------------------+----------+
示例2
> select * from employee_partitioned where BLOCK__OFFSET__INSIDE__FILE > 120;
+----------------------------+----------------------------------+----------------------------------+------------------------------------+------------------------------------+----------------------------+-----------------------------+
| employee_partitioned.name | employee_partitioned.work_place | employee_partitioned.gender_age | employee_partitioned.skills_score | employee_partitioned.depart_title | employee_partitioned.year | employee_partitioned.month |
+----------------------------+----------------------------------+----------------------------------+------------------------------------+------------------------------------+----------------------------+-----------------------------+
| Lucy | ["Vancouver"] | {"gender":"Female","age":57} | {"Sales":89,"HR":94} | {"Sales":["Lead"]} | 2018 | 9 |
+----------------------------+----------------------------------+----------------------------------+------------------------------------+------------------------------------+----------------------------+-----------------------------+
示例3
> select count(INPUT__FILE__NAME) from employee_partitioned;
+------+
| _c0 |
+------+
| 5 |
+------+
参考
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns
书籍 Apache Hive Essentials Second Edition (by Dayong Du) Chapter 5