大数据网站日志离线分析项目

最新推荐文章于 2024-03-26 19:12:12 发布

一鸣888

最新推荐文章于 2024-03-26 19:12:12 发布

阅读量215

点赞数

文章标签：大数据日志分析离线项目大数据开发

本文链接：https://blog.csdn.net/HelloWowofei/article/details/132599650

版权

大数据网站日志离线分析项目

hive和hbase的整合

HBaseIntegration - Apache Hive - Apache Software Foundation

注意事项：

版本信息

Avro Data Stored in HBase Columns

As of Hive 0.9.0 the HBase integration requires at least HBase 0.92, earlier versions of Hive were working with HBase 0.89/0.90

Hive 0.9.0与HBase 0.92兼容。

版本信息

Hive 1.x will remain compatible with HBase 0.98.x and lower versions. Hive 2.x will be compatible with HBase 1.x and higher. (See HIVE-10990 for details.) Consumers wanting to work with HBase 1.x using Hive 1.x will need to compile Hive 1.x stream code themselves.

Hive 1.x仍然和HBase 0.98.x兼容。

HIVE-705提出的原生支持的Hive和HBase的整合。可以使用Hive QL语句访问HBase的表，包括SELECT和INSERT。甚至让hive做Hive表和HBase表的join操作和union操作。

需要jar包（hive自带）

hive-hbase-handler-x.y.z.jar

连接单节点hbase的示例：

$HIVE_SRC/build/dist/bin/hive –auxpath $HIVE_SRC/build/dist/lib/hive-hbase-handler-0.9.0.jar,$HIVE_SRC/build/dist/lib/hbase-0.92.0.jar,$HIVE_SRC/build/dist/lib/zookeeper-3.3.4.jar,$HIVE_SRC/build/dist/lib/guava-r09.jar --hiveconf hbase.master=hbase.yoyodyne.com:60000

其中--hiveconf表示可以将此配置写到hive-site.xml中。

连接到hbase集群的示例：

$HIVE_SRC/build/dist/bin/hive --auxpath $HIVE_SRC/build/dist/lib/hive-hbase-handler-0.9.0.jar,$HIVE_SRC/build/dist/lib/hbase-0.92.0.jar,$HIVE_SRC/build/dist/lib/zookeeper-3.3.4.jar,$HIVE_SRC/build/dist/lib/guava-r09.jar --hiveconf hbase.zookeeper.quorum=zk1.yoyodyne.com,zk2.yoyodyne.com,zk3.yoyodyne.com

其中--hiveconf表示可以将此配置写到hive-site.xml中。

在hive的服务端：

然后正常启动：hive --service metastore

启动客户端CLI：hive

要在hive中操作hbase的表，需要对列进行映射。

CREATE TABLE hbase_table_1(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")

TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz");

必须指定hbase.columns.mapping属性。

hbase.table.name属性可选，用于指定hbase中对应的表名，允许在hive表中使用不同的表名。上例中，hive中表名为hbase_table_1，hbase中表名为xyz。如果不指定，hive中的表名与hbase中的表名一致。

hbase.mapred.output.outputtable属性可选，向表中插入数据的时候是必须的。该属性的值传递给了hbase.mapreduce.TableOutputFormat使用。

在hive表定义中的映射hbase.columns.mapping中的cf1:val在创建完表之后，hbase中只显示cf1，并不显示val，因为val是行级别的，cf1才是hbase中表级别的元数据。

具体操作：

hive：

CREATE TABLE hbase_table_1(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")

TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz");

hbase:

list

desc 'xyz'

hive操作：

insert into hbase_table_1 values(1,'zhangsan');

hbase操作：

scan 'xyz'

建立外部表要求hbase中必须有表对应。

hbase操作：

create 'tb_user', 'info'

hive操作：

create external table hive_tb_user1 (

key int,

name string,

age int,

sex string,

likes array<string>

)

row format

delimited

collection items terminated by '-'

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties("hbase.columns.mapping"=":key,info:name,info:age,info:sex,info:likes")

tblproperties("hbase.table.name"="tb_user", "hbase.mapred.output.outputtable"="tb_user");

from hive_tb_user

insert into table hive_tb_user

select 1,'zhangsan',25,'female',array('climbing','reading','shopping') limit 1;

hbase操作：

scan 'tb_user'

put 'tb_user', 1, 'info:likes', 'like1-like2-like3-like4'

hive和hbase

要求在hive的server端中添加配置信息：

hive-site.xml中添加

<name>hbase.zookeeper.quorum</name>

</property>

hive --service metastore

客户端直接启动hive就行了

hive

创建hive的内部表，要求hbase中不能有对应的表
创建hive的外部表，要求hbase中一定要有对应的表
映射关系通过
1. WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id,cf:username,cf:age")
stored by指定hive中存储数据的时候，由该类来处理，该类会将数据放到hbase的存储中，同时在hive读取数据的时候，由该类负责处理hbase的数据和hive的对应关系
1. STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

5、指定hive表和hbase中的哪张表对应，outputtable负责当hive insert数据的时候将数据写到hbase的哪张表。

TBLPROPERTIES ("hbase.table.name" = "my_table", "hbase.mapred.output.outputtable" = "my_table");

创建外部表，要求hbase中有对应的表

CREATE external TABLE hbase_my_table(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:val")

TBLPROPERTIES ("hbase.table.name" = "my_table", "hbase.mapred.output.outputtable" = "my_table");

关于hbase和hive的列对应关系

hbase表

my_table

rowkey key

cf:id myid

cf:username myname

cf:age myage

hive表

create external table my_table_hbase (

key int,

myid int,

myname string,

myage int

)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id,cf:username,cf:age")

TBLPROPERTIES ("hbase.table.name" = "my_table", "hbase.mapred.output.outputtable" = "my_table");

创建hive的内部表：要求hbase中不能有对应的表。

CREATE TABLE hbase_table_1(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")

TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz");

sqoop介绍+安装+数据导入

Sqoop:将关系数据库（oracle、mysql、postgresql等）数据与hadoop数据进行转换的工具

官网：http://sqoop.apache.org/

版本：（两个版本完全不兼容，sqoop1使用最多）

sqoop1：1.4.x

sqoop2：1.99.x

sqoop导出

sqoop安装和测试

解压

配置环境变量

SQOOP_HOME

PATH

添加数据库驱动包

配置sqoop-env.sh

注释掉bin/configure-sqoop中的第134-147行以关闭不必要的警告信息。

测试

sqoop version

sqoop list-databases --connect jdbc:mysql://node4:3306/ --username root --password 123456

sqoop help

sqoop help command

直接在命令行执行：

sqoop list-databases --connect jdbc:mysql://node1:3306 --username hive --password hive123

将sqoop的命令放到文件中：

sqoop1.txt

######################

list-databases

--connect

jdbc:mysql://node4:3306

--username

hive

--password

hive123

######################

命令行执行：

sqoop --options-file sqoop1.txt

[root@node4 sqoop-1.4.6]# sqoop help list-databases

usage: sqoop list-databases [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:

--connect <jdbc-uri> Specify JDBC connect

string

--connection-manager <class-name> Specify connection manager

class name

--connection-param-file <properties-file> Specify connection

parameters file

--driver <class-name> Manually specify JDBC

driver class to use

--hadoop-home <hdir> Override

$HADOOP_MAPRED_HOME_ARG

--hadoop-mapred-home <dir> Override

$HADOOP_MAPRED_HOME_ARG

--help Print usage instructions

-P Read password from console

--password <password> Set authentication

password

--password-alias <password-alias> Credential provider

password alias

--password-file <password-file> Set authentication

password file path

--relaxed-isolation Use read-uncommitted

isolation for imports

--skip-dist-cache Skip copying jars to

distributed cache

--username <username> Set authentication

username

--verbose Print more information

while working

Generic Hadoop command-line arguments:

(must preceed any tool-specific arguments)

Generic options supported are

-conf <configuration file> specify an application configuration file

-D <property=value> use value for given property

-fs <local|namenode:port> specify a namenode

-jt <local|resourcemanager:port> specify a ResourceManager

-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

从hive导出到MySQL，则需要在hive的主机（比如hive的客户端所在的位置）安装sqoop。

$CONDITIONS

[root@server3 ~]# sqoop help import

usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:

--connect <jdbc-uri> Specify JDBC connect

string

--connection-manager <class-name> Specify connection manager

class name

--connection-param-file <properties-file> Specify connection

parameters file

--driver <class-name> Manually specify JDBC

driver class to use

--hadoop-home <hdir> Override

$HADOOP_MAPRED_HOME_ARG

--hadoop-mapred-home <dir> Override

$HADOOP_MAPRED_HOME_ARG

--help Print usage instructions

-P Read password from console

--password <password> Set authentication

password

--password-alias <password-alias> Credential provider

password alias

--password-file <password-file> Set authentication

password file path

--relaxed-isolation Use read-uncommitted

isolation for imports

--skip-dist-cache Skip copying jars to

distributed cache

--username <username> Set authentication

username

--verbose Print more information

while working

Import control arguments:

--append Imports data

in append

mode

--as-avrodatafile Imports data

to Avro data

files

--as-parquetfile Imports data

to Parquet

files

--as-sequencefile Imports data

SequenceFile

--as-textfile Imports data

as plain

text

(default)

--autoreset-to-one-mapper Reset the

number of

mappers to

one mapper

if no split

key

available

--boundary-query <statement> Set boundary

query for

retrieving

max and min

value of the

primary key

--columns <col,col,col...> 指定将数据库表中的哪些列数据导入

--compression-codec <codec> Compression

codec to use

for import

--delete-target-dir Imports data in delete mode

--direct Use direct

import fast

path

--direct-split-size <n> Split the

input stream

every 'n'

bytes when

importing in

direct mode

-e,--query <statement> Import

results of

SQL

'statement'

--fetch-size <n> Set number

'n' of rows

to fetch

from the

database

when more

rows are

needed

--inline-lob-limit <n> Set the

maximum size

for an

inline LOB

-m,--num-mappers <n> Use 'n' map

tasks to

import in

parallel

--mapreduce-job-name <name> Set name for

generated

mapreduce

job

--merge-key <column> Key column

to use to

join results

--split-by <column-name> Column of

the table

used to

split work

units

--table <table-name> Table to

read

--target-dir <dir> HDFS plain

table

destination

--validate Validate the

copy using

the

configured

validator

--validation-failurehandler <validation-failurehandler> Fully

qualified

class name

for

ValidationFa

ilureHandler

--validation-threshold <validation-threshold> Fully

qualified

class name

for

ValidationTh

reshold

--validator <validator> Fully

qualified

class name

for the

Validator

--warehouse-dir <dir> HDFS parent

for table

destination

--where <where clause> WHERE clause

to use

during

import

-z,--compress Enable

compression

Incremental import arguments:

--check-column <column> Source column to check for incremental

change

--incremental <import-type> Define an incremental import of type

'append' or 'lastmodified'

--last-value <value> Last imported value in the incremental

check column

Output line formatting arguments:

--enclosed-by <char> Sets a required field enclosing

character

--escaped-by <char> Sets the escape character

--fields-terminated-by <char> Sets the field separator character

--lines-terminated-by <char> Sets the end-of-line character

--mysql-delimiters Uses MySQL's default delimiter set:

fields: , lines: \n escaped-by: \

optionally-enclosed-by: '

--optionally-enclosed-by <char> Sets a field enclosing character

Input parsing arguments:

--input-enclosed-by <char> Sets a required field encloser

--input-escaped-by <char> Sets the input escape

character

--input-fields-terminated-by <char> Sets the input field separator

--input-lines-terminated-by <char> Sets the input end-of-line

char

--input-optionally-enclosed-by <char> Sets a field enclosing

character

Code generation arguments:

--bindir <dir> Output directory for compiled

objects

--class-name <name> Sets the generated class name.

This overrides --package-name.

When combined with --jar-file,

sets the input class.

--input-null-non-string <null-str> Input null non-string

representation

--input-null-string <null-str> Input null string representation

--jar-file <file> Disable code generation; use

specified jar

--map-column-java <arg> Override mapping for specific

columns to java types

--null-non-string <null-str> Null non-string representation

--null-string <null-str> Null string representation

--outdir <dir> Output directory for generated

code

--package-name <name> Put auto-generated classes in

this package

Generic Hadoop command-line arguments:

(must preceed any tool-specific arguments)

Generic options supported are

-conf <configuration file> specify an application configuration file

-D <property=value> use value for given property

-fs <local|namenode:port> specify a namenode

-jt <local|resourcemanager:port> specify a ResourceManager

-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

At minimum, you must specify --connect and --table

Arguments to mysqldump and other subprograms may be supplied

after a '--' on the command line.

命令行导入：

从MySQL导数据到HDFS，导入

sqoop import --connect jdbc:mysql://node4/log_results --username hivehive --password hive --as-textfile --table dimension_browser --columns id,browser_name,browser_version --target-dir /sqoop/test1 --delete-target-dir -m 1

将语句写入文件并运行：

sqoop2.txt

import

--connect

jdbc:mysql://node4/log_results

--username

hivehive

--password

hive

--as-textfile

--table

dimension_browser

--columns

id,browser_name,browser_version

--target-dir

/sqoop/test1

--delete-target-dir

-m

命令行：

sqoop --options-file sqoop2.txt

可以指定SQL执行导入：

sqoop3.txt

import

--connect

jdbc:mysql://node4/log_results

--username

hivehive

--password

hive

--as-textfile

#--query is the same as -e

-e

select id, browser_name, browser_version from dimension_browser where $CONDITIONS

--target-dir

/sqoop/test2

--delete-target-dir

-m

命令行：

sqoop --options-file sqoop3.txt

指定导出文件的分隔符：

sqoop4.txt

import

--connect

jdbc:mysql://node1/log_results

--username

hive

--password

hive123

--as-textfile

-e

select id, browser_name, browser_version from dimension_browser where $CONDITIONS

--target-dir

/sqoop/test2-1

--delete-target-dir

-m

--fields-terminated-by

命令行：

sqoop --options-file sqoop4.txt

导入到HDFS以及在HIVE创建表默认字段的分隔符就是逗号，可以不指定逗号

sqoop5.txt

import

--connect

jdbc:mysql://node1/log_results

--username

hive

--password

hive123

--as-textfile

#--query is the same as -e

-e

select id, browser_name, browser_version from dimension_browser where $CONDITIONS

--hive-import

--create-hive-table

--hive-table

hive_browser_dim

--target-dir

/my/tmp

-m

--fields-terminated-by

命令行：

sqoop --options-file sqoop5.txt

导出：

hdfs://mycluster/sqoop/data/mydata.txt

1,zhangsan,hello world

2,lisi,are you ok

3,wangwu,fine thanks

4,zhaoliu,what are you doing

5,qunqi,just say hello

sqoop6.txt

export

--connect

jdbc:mysql://node4/log_results

--username

hivehive

--password

hive

--columns

id,myname,myversion

--export-dir

/user/hive/warehouse/hive_browser_dim/

-m

--table

mybrowserinfo

--input-fields-terminated-by

sqoop6-1.txt

export

--connect

jdbc:mysql://node4/log_results

--username

hivehive

--password

hive

--columns

id,myname,myversion

--export-dir

/user/hive/warehouse/hive_browser_dim/

-m

--table

mybrowserinfo1

命令行：

sqoop --options-file sqoop6-1.txt

令行执行：

sqoop --options-file sqoop6.txt

逗号不需要指定分隔符

默认的hive分隔符需要在sqoop文件中指定分隔符\001：

sqoop11.txt

export

--connect

jdbc:mysql://node1/log_results

--username

hive

--password

hive123

--columns

id,name,msg

--export-dir

/user/hive/warehouse/tb_log2

-m

--table

tb_loglog

--input-fields-terminated-by

\001

用户浏览深度SQL分析

四种行转列：

join

union

DECODE(oracle)

case when

需求：

将用户访问的次数进行分组，每组多少人。

站在统计用户的角度

MySQL中的stat_view_depth表

1. 在hive中创建hbase的event_log对应表

CREATE EXTERNAL TABLE event_logs(

key string, pl string, en string, s_time bigint, p_url string, u_ud string, u_sd string

) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties('hbase.columns.mapping'=':key,log:pl,log:en,log:s_time,log:p_url,log:u_ud,log:u_sd')

tblproperties('hbase.table.name'='eventlog');

2. 创建mysql在hive中的对应表

hive中的表，执行HQL之后分析的结果保存该表，然后通过sqoop工具导出到mysql

CREATE TABLE `stats_view_depth` (

`platform_dimension_id` bigint ,

`data_dimension_id` bigint ,

`kpi_dimension_id` bigint ,

`pv1` bigint ,

`pv2` bigint ,

`pv3` bigint ,

`pv4` bigint ,

`pv5_10` bigint ,

`pv10_30` bigint ,

`pv30_60` bigint ,

`pv60_plus` bigint ,

`created` string

) row format delimited fields terminated by '\t';

3. hive创建临时表

把hql分析之后的中间结果存放到当前的临时表。

CREATE TABLE `stats_view_depth_tmp`(`pl` string, `date` string, `col` string, `ct` bigint);

pl平台

date日期

col列，值对应于mysql表中的列：pv1,pv2,pv4….

ct对应于每列的值

col对应mysql中的pv前缀列。

4. 编写UDF

(platformdimension & datedimension)两个维度

package com.sxt.transformer.hive;

import com.sxt.common.DateEnum;
import com.sxt.transformer.model.dim.base.DateDimension;
import com.sxt.transformer.model.dim.base.PlatformDimension;
import com.sxt.transformer.service.IDimensionConverter;
import com.sxt.transformer.service.impl.DimensionConverterImpl;
import com.sxt.util.TimeUtil;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

import java.io.IOException;

/**
* 操作日期dimension 相关的udf
*
* @author root
*
*/
public class PlatformDimensionUDF extends UDF {
    private IDimensionConverter converter = new DimensionConverterImpl();

    /**
     * 根据给定的platform名称返回id
     *
     * @param platform
     * @return
     */
    public IntWritable evaluate(Text platform) {
        PlatformDimension dimension = new PlatformDimension(platform.toString());

        try {
            int id = this.converter.getDimensionIdByValue(dimension);
            return new IntWritable(id);
        } catch (IOException e) {
            throw new RuntimeException("获取id异常");
        }
    }
}

package com.sxt.transformer.hive;

import java.io.IOException;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

import com.sxt.common.DateEnum;
import com.sxt.transformer.model.dim.base.DateDimension;
import com.sxt.transformer.service.IDimensionConverter;
import com.sxt.transformer.service.impl.DimensionConverterImpl;
import com.sxt.util.TimeUtil;

/**
* 操作日期dimension 相关的udf
*
* @author root
*
*/
public class DateDimensionUDF extends UDF {
    private IDimensionConverter converter = new DimensionConverterImpl();

    /**
     * 根据给定的日期（格式为:yyyy-MM-dd）至返回id
     *
     * @param day
     * @return
     */
    public IntWritable evaluate(Text day) {
        DateDimension dimension = DateDimension.buildDate(TimeUtil.parseString2Long(day.toString()), DateEnum.DAY);
        try {
            int id = this.converter.getDimensionIdByValue(dimension);
            System.out.println(day.toString());
            System.out.println(id);
            return new IntWritable(id);
        } catch (IOException e) {
            throw new RuntimeException("获取id异常" + day.toString());
        }
    }
}

5. 上传

打包

bds3.jar上传到hdfs的/sxt/transformer文件夹中

6. 创建hive的function

#create function platformFunc as 'com.sxt.transformer.hive.PlatformDimensionUDF' using jar 'hdfs://mycluster/sxt/transformer/bds3.jar';

create function dateFunc as 'com.sxt.transformer.hive.DateDimensionUDF' using jar 'hdfs://sxt/sxt/transformer/bds3.jar';

7. hql编写(统计用户角度的浏览深度)<注意：时间为外部给定>

from (

select

pl, from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd') as day, u_ud,

(case when count(p_url) = 1 then "pv1"

when count(p_url) = 2 then "pv2"

when count(p_url) = 3 then "pv3"

when count(p_url) = 4 then "pv4"

when count(p_url) >= 5 and count(p_url) <10 then "pv5_10"

when count(p_url) >= 10 and count(p_url) <30 then "pv10_30"

when count(p_url) >=30 and count(p_url) <60 then "pv30_60"

else 'pv60_plus' end) as pv

from event_logs

where

en='e_pv'

and p_url is not null

and pl is not null

and s_time >= unix_timestamp('2017-08-23','yyyy-MM-dd')*1000

and s_time < unix_timestamp('2017-08-24','yyyy-MM-dd')*1000

group by

pl, from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd'), u_ud

) as tmp

insert overwrite table stats_view_depth_tmp

select pl,day,pv,count(distinct u_ud) as ct where u_ud is not null group by pl,day,pv;

如何知道该访客是pv10的？

聚合操作

需要从hbase表中查询数据，对u_ud聚合，计算出多少个pv事件

case when得出该访客属于pv10

89155407 pv3

62439313 pv5_10

41469129 pv10_30

37005838 pv30_60

08257218 pv3

总的得出所有人属于pv10

对所有的pv10聚合，计算u_ud的总数，得出pv10的有多少人

`pl` string, `date` string, `col` string, `ct` bigint

website 2019-11-18 pv10 300

website 2019-11-18 pv10 400

website 2019-11-18 pv10 500

website 2019-11-18 pv10 300

website 2019-11-18 pv5_10 20

website 2019-11-18 pv10_30 40

website 2019-11-18 pv30_60 10

website 2019-11-18 pv60_plus 120

总的得出所有pv？有都少人

pv1人数是多少？

聚合操作

行转列 à 结果

--把临时表的多行数据，转换一行

行转列

std prj score

S1 M 100

S1 E 98

S1 Z 80

S2 M 87

S2 E 88

S2 Z 89

std M E Z

S1 100 98 80

S2 87 88 89

select std, score from my_score where prj='M';

select std, score from my_score where prj='E';

select std, score from my_score where prj='Z';

select std, t1.score, t2.score, t3.score from t1 join t2 on t1.std=t2.std

join t3 on t1.std=t3.std;

SELECT t1.std, t1.score, t2.score, t3.score

from

(select std, score from my_score where prj='M') t1

join

(select std, score from my_score where prj='E') t2

on t1.std=t2.std

join (select std, score from my_score where prj='Z') t3

on t1.std=t3.std;

采用union all的方式：

select tmp.std, sum(tmp.M), sum(tmp.E), sum(tmp.Z) from (

select std, score as 'M', 0 as 'E', 0 as 'Z' from tb_score where prj='M' UNION ALL

select std, 0 as 'M', score as 'E', 0 as 'Z' from tb_score where prj='E' UNION ALL

select std, 0 as 'M', 0 as 'E', score as 'Z' from tb_score where prj='Z'

) tmp group by tmp.std;

with tmp as

(

select pl,`date` as date1,ct as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv1' union all

select pl,`date` as date1,0 as pv1,ct as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv2' union all

select pl,`date` as date1,0 as pv1,0 as pv2,ct as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv3' union all

select pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,ct as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv4' union all

select pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,ct as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv5_10' union all

select pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,ct as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv10_30' union all

select pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,ct as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv30_60' union all

select pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,ct as pv60_plus from stats_view_depth_tmp where col='pv60_plus' union all

select 'all' as pl,`date` as date1,ct as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv1' union all

select 'all' as pl,`date` as date1,0 as pv1,ct as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv2' union all

select 'all' as pl,`date` as date1,0 as pv1,0 as pv2,ct as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv3' union all

select 'all' as pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,ct as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv4' union all

select 'all' as pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,ct as pv5_10,0 as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv5_10' union all

select 'all' as pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,ct as pv10_30,0 as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv10_30' union all

select 'all' as pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,ct as pv30_60,0 as pv60_plus from stats_view_depth_tmp where col='pv30_60' union all

select 'all' as pl,`date` as date1,0 as pv1,0 as pv2,0 as pv3,0 as pv4,0 as pv5_10,0 as pv10_30,0 as pv30_60,ct as pv60_plus from stats_view_depth_tmp where col='pv60_plus'

)

from tmp

insert overwrite table stats_view_depth

select 2,3,6,sum(pv1),sum(pv2),sum(pv3),sum(pv4),sum(pv5_10),sum(pv10_30),sum(pv30_60),sum(pv60_plus),'2017-01-10' group by pl,date1;

编写UDF获取2,3,6的值，2,3,6是一个假的数据。