Hive(25):实例:hive时间格式的转换

一、时间格式的转换

实现将日志中的字段转换为需要的字符串。

二、实现方法

1.UDF

(1)目的:将日期"31/Aug/2015:00:04:37 +0800" 转换为"2015-08-31 00:04:37"

(2)创建类TestDateUDF

package com.ray.hive.mapreduce;

import java.util.Date;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Locale;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class TestDateUDF extends UDF {
	//记得一定做trycatch
	  public SimpleDateFormat inputDate = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss", Locale.ENGLISH);
	  public SimpleDateFormat outputDate = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
	  
	  public Text evaluate(Text time)
	  {
	    String date = null;
	    if (time == null) {
	      return null;
	    }
	    if (StringUtils.isBlank(time.toString())) {
	      return null;
	    }
	    String parse = time.toString().replaceAll("\"", "");
	    try
	    {
	      Date parseDate = this.inputDate.parse(parse);
	      date = this.outputDate.format(parseDate);
	    }
	    catch (ParseException e)
	    {
	      e.printStackTrace();
		  //syso:日志格式解析异常
	    }
	    return new Text(date);
	  }
}

(3)将代码打成jar包上传到linux服务器里

/opt/datas/hivedateudf.jar

(4)将jar包和hive进行关联

add jar /opt/datas/hivedateudf.jar;

(5)创建函数

create temporary function pdate as 'com.ray.hive.mapreduce.TestDateUDF';

(6)检验udf是否生效

select pdate(time_local) from log_src limit 10;
结果:
2015-08-31 00:04:37
2015-08-31 00:04:37
2015-08-31 00:04:53
2015-08-31 00:04:53
2015-08-31 00:04:53
2015-08-31 00:04:53
2015-08-31 00:04:53
2015-08-31 00:04:53
2015-08-31 00:04:53
2015-08-31 00:04:53

(7)永久生效!放到hdfs上面!

create function pdate2 as 'com.ray.hive.HiveDateTest' using jar 'hdfs://hadoop:8020/date_udf.jar';

select pdate2(time_local) from log_src limit 10;

2.使用python脚本预处理数据

官网:https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SimpleExampleUseCases

(1)创建表

CREATE TABLE u_data (
  userid INT,
  movieid INT,
  rating INT,
  unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

(2)获得数据,解压

wget -P /opt/datas/movie http://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip

(3)加载数据

LOAD DATA LOCAL INPATH '/opt/datas/movie/ml-100k/u.data' OVERWRITE INTO TABLE u_data;

(4)编写python脚本/opt/datas/movie/weekday_mapper.py

import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print '\t'.join([userid, movieid, rating, str(weekday)])

(5)创建新表

CREATE TABLE u_data_new (
  userid INT,
  movieid INT,
  rating INT,
  weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

(6)脚本与hive进行关联:跟add jar 类似

add FILE /opt/datas/movie/weekday_mapper.py;

(7)使用

INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'python /opt/datas/weekday_mapper.py'
  AS (userid, movieid, rating, weekday)
FROM u_data;

(8)查看数据

SELECT weekday, COUNT(*)
FROM u_data_new
GROUP BY weekday;

结果:
weekday	_c1
1	12254
2	13579
3	14430
4	15114
5	14743
6	18229
7	11651

3.hive本身自带时间转换函数

3.1 unix_timestamp
(1)获取当前时间时间戳

hive (temp)> select unix_timestamp();
OK
_c0
1542026782

(2)获取某一个指定日期时间戳

select unix_timestamp("2017-12-16 11:38:55");

         注意:转换格式为"yyyy-MM-dd HH:mm:ss";如果格式不正确,返回的是null值
         错误示范:select unix_timestamp("20171216 11:38:55");
(3)指定格式日期转时间戳

select unix_timestamp("20171216 11:38:55","yyyyMMdd HH:mm:ss");
结果:
OK
_c0
1513395535

         备注:将"yyyyMMdd HH:mm:ss"格式的日期转换为int时间戳。

3.2 from_unixtime

   将unixtime转换成标准时区格式:from_unixtime   将1513395535  转换为标准           

select from_unixtime(881250949,"yyyy-MM-dd HH:mm:ss"); 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值