第71课：Spark SQL窗口函数解密与实战学习笔记

最新推荐文章于 2024-08-03 23:10:08 发布

梦飞天

最新推荐文章于 2024-08-03 23:10:08 发布

阅读量7.9k

点赞数 1

分类专栏： Spark 文章标签： SparkSQL 窗口函数

本文链接：https://blog.csdn.net/slq1023/article/details/51138709

版权

本文详细介绍了Spark SQL中的窗口函数，包括窗口聚合、排名、分析和聚合函数。通过示例展示了如何使用窗口函数进行分组排序，以及如何在DataFrame中实现类似功能。文章还提供了使用窗口函数处理数据的完整代码示例，演示了如何实现复杂Top N场景。

摘要由CSDN通过智能技术生成

第71课：Spark SQL窗口函数解密与实战学习笔记

本期内容：

1 SparkSQL窗口函数解析

2 SparkSQL窗口函数实战

窗口函数是Spark内置函数中最有价值的函数，因为很多关于分组的统计往往都使用了窗口函数。

Window Aggregates (Windows)

Window Aggregates (aka Windows) operate on a group of rows (a row set) called awindow to apply aggregation on. They calculate a value for every input row for its window.

Note	Window-based framework is available as an experimental feature since Spark1.4.0.

Spark SQL supports three kinds of window aggregate function: ranking functions, analyticfunctions, and aggregate functions.

A window specification defines the partitioning, ordering, and frame boundaries.

Window Aggregate Functions

A window aggregate function calculates a return value over a set of rows called windowthat are somehow related to the current row.

Note	Window functions are also called over functions due to how they are applied using Column’s over function.

Although similar to aggregate functions, a window function does not group rows into a single output row and retains their separate identities. A window function can access rows that are linked to the current row.

Tip	See Examples section in this document.

Spark SQL supports three kinds of window functions:

ranking functions

analytic functions

aggregate functions

Table 1. Window functions in Spark SQL (see Introducing Window Functions in Spark SQL)
	SQL	DataFrame API
Ranking functions	RANK	rank
	DENSE_RANK	dense_rank
	PERCENT_RANK	percent_rank
	NTILE	ntile
	ROW_NUMBER	row_number
Analytic functions	CUME_DIST	cume_dist
	LAG	lag
	LEAD	lead

For aggregate functions, you can use the existing aggregate functions as window functions, e.g. sum, avg, min, max and count.

You can mark a function window by OVER clause after a function in SQL, e.g. avg(revenue) OVER (…) or over method on a function in the Dataset API, e.g. rank().over(…).

When executed, a window function computes a value for each row in a window.

Note	Window functions belong to Window functions group in Spark’s Scala API.

窗口函数中最重要的是row_number。row_bumber是对分组进行排序，所谓分组排序就是说在分组的基础上再进行排序。

先看一下GroupedTopN的代码：

package SparkSQLByScala

import org.apache.spark.{SparkContext, SparkConf}

/**

* 复杂Top N案例实战

* @author DT大数据梦工厂

* 新浪微博：http://weibo.com/ilovepains/

object TopNGroup {

def main(args: Array[String]) {

/**

* 第1步：创建Spark的配置对象SparkConf，设置Spark程序的运行时的配置信息，

* 例如说通过setMaster来设置程序要链接的Spark集群的Master的URL,如果设置

* 为local，则代表Spark程序在本地运行，特别适合于机器配置条件非常差（例如

* 只有1G的内存）的初学者 *

val conf = new SparkConf() //创建SparkConf对象

conf.setAppName("Top N Basically!") //设置应用程序的名称，在程序运行的监控界面可以看到名称

// conf.setMaster("spark://Master:7077") //此时，程序在Spark集群

conf.setMaster("local")

/**

* 第2步：创建SparkContext对象

* SparkContext是Spark程序所有功能的唯一入口，无论是采用Scala、Java、Python、R等都必须有一个SparkContext

* SparkContext核心作用：初始化Spark应用程序运行所需要的核心组件，包括DAGScheduler、TaskScheduler、SchedulerBackend

* 同时还会负责Spark程序往Master注册程序等

* SparkContext是整个Spark应用程序中最为至关重要的一个对象

val sc = new SparkContext(conf) //创建SparkContext对象，通过传入SparkConf实例来定制Spark运行的具体参数和配置信息

sc.setLogLevel("OFF")

/**

* 第3步：根据具体的数据来源（HDFS、HBase、Local FS、DB、S3等）通过SparkContext来创建RDD

* RDD的创建基本有三种方式：根据外部的数据来源（例如HDFS）、根据Scala集合、由其它的RDD操作

* 数据会被RDD划分成为一系列的Partitions，分配到每个Partition的数据属于一个Task的处理范畴

val lines = sc.textFile("D://DT-IMF//testdata//topNGroup.txt") //读取本地文件并设置为一个Partition

val groupRDD = lines.map(line =>(line.split(" ")(0),line.split(" ")(1).toInt)).groupByKey()

val top5 = groupRDD.map(pair => (pair._1,pair._2.toList.sortWith(_>_).take(5))).sortByKey()

top5.collect().foreach(pair => {

println(pair._1 + ":")

pair._2.foreach(println)

println("*********************")

})

}

运行结果：

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

16/04/12 00:39:20 INFO SparkContext: Running Spark version 1.6.0

16/04/12 00:39:32 INFO SecurityManager: Changing view acls to: think

16/04/12 00:39:32 INFO SecurityManager: Changing modify acls to: think

16/04/12 00:39:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(think); users with modify permissions: Set(think)

16/04/12 00:39:34 INFO Utils: Successfully started service 'sparkDriver' on port 63433.

16/04/12 00:39:35 INFO Slf4jLogger: Slf4jLogger started

16/04/12 00:39:35 INFO Remoting: Starting remoting

16/04/12 00:39:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.56.1:63448]

16/04/12 00:39:36 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 63448.

16/04/12 00:39:36 INFO SparkEnv: Registering MapOutputTracker

16/04/12 00:39:36 INFO SparkEnv: Registering BlockManagerMaster

16/04/12 00:39:36 INFO DiskBlockManager: Created local directory at C:\Users\think\AppData\Local\Temp\blockmgr-c37d158d-2af8-4630-afb4-63d7093546e5

16/04/12 00:39:36 INFO MemoryStore: MemoryStore started with capacity 1773.8 MB

16/04/12 00:39:37 INFO SparkEnv: Registering OutputCommitCoordinator

16/04/12 00:39:37 INFO Utils: Successfully started service 'SparkUI' on port 4040.

16/04/12 00:39:37 INFO SparkUI: Started SparkUI at http://192.168.56.1:4040

16/04/12 00:39:38 INFO Executor: Starting executor ID driver on host localhost

16/04/12 00:39:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63460.

16/04/12 00:39:38 INFO NettyBlockTransferService: Server created on 63460

16/04/12 00:39:38 INFO BlockManagerMaster: Trying to register BlockManager

16/04/12 00:39:38 INFO BlockManagerMasterEndpoint: Registering block manager localhost:63460 with 1773.8 MB RAM, BlockManagerId(driver, localhost, 63460)

16/04/12 00:39:38 INFO BlockManagerMaster: Registered BlockManager

Hadoop:

*********************

Spark:

195

100

*********************

下面使用SparkSQL的方式重新编写并执行：

package SparkSQLByScala

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.sql.hive.HiveContext

object SparkSQLWindowFunctionOps {

def main(args: Array[String]): Unit = {

val conf = new SparkConf() //创建SparkConf对象

conf.setAppName("SparkSQLWindowFuntionOps") //设置应用第69课：Spark SQL通过Hive数据源实战程序的名称，在程序运行的监控界面可以看到名称

conf.setMaster("spark://slq1:7077") //此时，程序在Spark集群

val sc = new SparkContext(conf) //创建SparkContext对象，通过传入SparkConf实例来定制Spark运行的具体参数和配置信息

/**

* 第一：在目前企业级大数据Spark开发的时候绝大多数情况下是采用Hive作为数据仓库的；

* Spark提供了Hive的支持功能，Spark通过HiveContext可以直接操作Hive中的数据；

* 基于HiveContext我们可以使用sql/hql两种方式才编写SQL语句对Hive进行操作，包括

* 创建表、删除表、往表里导入数据以及用SQL语法构造各种SQL语句对表中的数据进行CRUD操作

* 第二：我们也可以直接通过saveAsTable的方式把DataFrame中的数据保存到Hive数据仓库中；

* 第三：可以直接通过HiveContext.table方法来直接加载Hive中的表而生成DataFrame

val hiveContext = new HiveContext(sc)

hiveContext.sql("use hive") //使用名称为hive的数据库，我们接下来所有的表的操作都位于这个库中

/**

* 如果要创建的表存在的话就删除，然后创建我们要导入数据的表

hiveContext.sql("DROP TABLE IF EXISTS scores")

hiveContext.sql("CREATE TABLE IF NOT EXISTS scores(name STRING,score INT) "

+"ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\\n'")

//把要处理的出数据导入到Hive的表中

hiveContext.sql("LOAD DATA LOCAL INPATH '/home/richard/slq/spark/160330/topNGroup.txt' INTO TABLE scores")

/**

* 使用子查询的方式完成目标数据的提取，在目标数据内幕使用窗口函数row_number来进行分组排序：

* PARTITION BY :指定窗口函数分组的Key；

* ORDER BY：分组后进行排序；

val result = hiveContext.sql("SELECT name,score "

+ "FROM ("

+ "SELECT "

+ "name,"

+ "score,"

+ "row_number() OVER (PARTITION BY name ORDER BY score DESC) rank"

+" FROM scores "

+ ") sub_scores "

+ "WHERE rank <=4")

result.show(); //在Driver的控制台上打印出结果内容

//把数据保存在Hive数据仓库中

hiveContext.sql("DROP TABLE IF EXISTS sortedResultScores")

result.~~saveAsTable~~("sortedResultScores")

}

执行结果：

[richard@slq1 160330]$ ./SparkAppsScala.sh

16/04/13 00:09:13 INFO spark.SparkContext: Running Spark version 1.6.0

16/04/13 00:09:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/04/13 00:09:20 INFO spark.SecurityManager: Changing view acls to: richard

16/04/13 00:09:20 INFO spark.SecurityManager: Changing modify acls to: richard

16/04/13 00:09:20 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(richard); users with modify permissions: Set(richard)

16/04/13 00:09:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 49808.

16/04/13 00:09:31 INFO slf4j.Slf4jLogger: Slf4jLogger started

16/04/13 00:09:32 INFO Remoting: Starting remoting

16/04/13 00:09:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.121:36186]

16/04/13 00:09:35 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 36186.

16/04/13 00:09:35 INFO spark.SparkEnv: Registering MapOutputTracker

16/04/13 00:09:36 INFO spark.SparkEnv: Registering BlockManagerMaster

16/04/13 00:09:36 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-65c08292-a1ea-4c14-98b3-4fb355b2dd25

16/04/13 00:09:37 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB

16/04/13 00:09:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator

16/04/13 00:09:40 INFO server.Server: jetty-8.y.z-SNAPSHOT

16/04/13 00:09:41 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040

16/04/13 00:09:41 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

16/04/13 00:09:41 INFO ui.SparkUI: Started SparkUI at http://192.168.1.121:4040

16/04/13 00:09:41 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-103237c0-b116-4a08-b441-8ea4b3de3e99/httpd-dc709bf3-3487-4c46-9a7e-d2cb3ded5212

16/04/13 00:09:41 INFO spark.HttpServer: Starting HTTP Server

16/04/13 00:09:41 INFO server.Server: jetty-8.y.z-SNAPSHOT

16/04/13 00:09:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:38484

16/04/13 00:09:41 INFO util.Utils: Successfully started service 'HTTP file server' on port 38484.

16/04/13 00:09:42 INFO spark.SparkContext: Added JAR file:/home/richard/slq/spark/160330/SparkSQLWindowFunctionOps.jar at http://192.168.1.121:38484/jars/SparkSQLWindowFunctionOps.jar with timestamp 1460477382113

16/04/13 00:09:43 INFO client.AppClient$ClientEndpoint: Connecting to master spark://slq1:7077...

16/04/13 00:09:45 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160413000945-0000

16/04/13 00:09:45 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52549.

16/04/13 00:09:45 INFO netty.NettyBlockTransferService: Server created on 52549

16/04/13 00:09:45 INFO storage.BlockManagerMaster: Trying to register BlockManager

16/04/13 00:09:45 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.121:52549 with 517.4 MB RAM, BlockManagerId(driver, 192.168.1.121, 52549)

16/04/13 00:09:45 INFO storage.BlockManagerMaster: Registered BlockManager

16/04/13 00:09:46 INFO client.AppClient$ClientEndpoint: Executor added: app-20160413000945-0000/0 on worker-20160412234000-192.168.1.121-35019 (192.168.1.121:35019) with 1 cores

16/04/13 00:09:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160413000945-0000/0 on hostPort 192.168.1.121:35019 with 1 cores, 1024.0 MB RAM

16/04/13 00:09:46 INFO client.AppClient$ClientEndpoint: Executor added: app-20160413000945-0000/1 on worker-20160412233939-192.168.1.123-37351 (192.168.1.123:37351) with 1 cores

16/04/13 00:09:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160413000945-0000/1 on hostPort 192.168.1.123:37351 with 1 cores, 1024.0 MB RAM

16/04/13 00:09:46 INFO client.AppClient$ClientEndpoint: Executor added: app-20160413000945-0000/2 on worker-20160412233939-192.168.1.122-42079 (192.168.1.122:42079) with 1 cores

16/04/13 00:09:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160413000945-0000/2 on hostPort 192.168.1.122:42079 with 1 cores, 1024.0 MB RAM

16/04/13 00:09:48 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160413000945-0000/2 is now RUNNING

16/04/13 00:09:48 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160413000945-0000/1 is now RUNNING

16/04/13 00:09:49 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160413000945-0000/0 is now RUNNING

16/04/13 00:09:53 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

16/04/13 00:10:34 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slq2:39600) with ID 2

16/04/13 00:10:35 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slq3:53352) with ID 1

16/04/13 00:10:35 INFO storage.BlockManagerMasterEndpoint: Registering block manager slq2:34090 with 517.4 MB RAM, BlockManagerId(2, slq2, 34090)

16/04/13 00:10:35 INFO storage.BlockManagerMasterEndpoint: Registering block manager slq3:43697 with 517.4 MB RAM, BlockManagerId(1, slq3, 43697)

16/04/13 00:10:41 INFO hive.HiveContext: Initializing execution hive, version 1.2.1

16/04/13 00:10:44 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0

16/04/13 00:10:44 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0

16/04/13 00:10:51 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore

16/04/13 00:10:52 INFO metastore.ObjectStore: ObjectStore, initialize called

16/04/13 00:10:55 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored

16/04/13 00:10:55 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored

16/04/13 00:10:59 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)

16/04/13 00:11:09 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)

16/04/13 00:11:23 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slq1:41641) with ID 0

16/04/13 00:11:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager slq1:44779 with 517.4 MB RAM, BlockManagerId(0, slq1, 44779)

16/04/13 00:11:43 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"

16/04/13 00:11:58 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

16/04/13 00:11:58 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

16/04/13 00:12:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

16/04/13 00:12:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

16/04/13 00:12:24 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY

16/04/13 00:12:24 INFO metastore.ObjectStore: Initialized ObjectStore

16/04/13 00:12:29 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

16/04/13 00:12:32 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

16/04/13 00:12:36 INFO metastore.HiveMetaStore: Added admin role in metastore

16/04/13 00:12:36 INFO metastore.HiveMetaStore: Added public role in metastore

16/04/13 00:12:38 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty

16/04/13 00:12:42 INFO metastore.HiveMetaStore: 0: get_all_databases

16/04/13 00:12:43 INFO HiveMetaStore.audit: ugi=richard ip=unknown-ip-addr cmd=get_all_databases

16/04/13 00:12:43 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*

16/04/13 00:12:43 INFO HiveMetaStore.audit: ugi=richard ip=unknown-ip-addr cmd=get_functions: db=default pat=*

16/04/13 00:12:43 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.

16/04/13 00:12:52 INFO session.SessionState: Created local directory: /tmp/1ecbd29a-b42c-49c9-bbbb-af5a4226101f_resources

16/04/13 00:12:52 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/1ecbd29a-b42c-49c9-bbbb-af5a4226101f

16/04/13 00:12:53 INFO session.SessionState: Created local directory: /tmp/richard/1ecbd29a-b42c-49c9-bbbb-af5a4226101f

16/04/13 00:12:53 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/1ecbd29a-b42c-49c9-bbbb-af5a4226101f/_tmp_space.db

16/04/13 00:12:54 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse

16/04/13 00:12:54 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.

16/04/13 00:12:55 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0

16/04/13 00:12:55 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0

16/04/13 00:13:02 INFO hive.metastore: Trying to connect to metastore with URI thrift://slq1:9083

16/04/13 00:13:03 INFO hive.metastore: Connected to metastore.

16/04/13 00:13:08 INFO session.SessionState: Created local directory: /tmp/110134cb-6db5-4fa2-805f-d2381f07f293_resources

16/04/13 00:13:08 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/110134cb-6db5-4fa2-805f-d2381f07f293

16/04/13 00:13:08 INFO session.SessionState: Created local directory: /tmp/richard/110134cb-6db5-4fa2-805f-d2381f07f293

16/04/13 00:13:08 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/110134cb-6db5-4fa2-805f-d2381f07f293/_tmp_space.db

16/04/13 00:13:13 INFO parse.ParseDriver: Parsing command: use hive

16/04/13 00:13:24 INFO parse.ParseDriver: Parse Completed

16/04/13 00:13:34 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:34 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:34 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:35 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:35 INFO parse.ParseDriver: Parsing command: use hive

16/04/13 00:13:44 INFO parse.ParseDriver: Parse Completed

16/04/13 00:13:44 INFO log.PerfLogger: </PERFLOG method=parse start=1460477615061 end=1460477624818 duration=9757 from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:44 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:46 INFO ql.Driver: Semantic Analysis Completed

16/04/13 00:13:46 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1460477624964 end=1460477626437 duration=1473 from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)

16/04/13 00:13:46 INFO log.PerfLogger: </PERFLOG method=compile start=1460477614250 end=1460477626656 duration=12406 from=org.apache.hadoop.hive.ql.Driver>

16/04/13 00:13:46 INFO metadata.Hive: Dumping metastore api call timing information for : compilation phase

16/04/13 00:13:46 INFO metadata.Hive: Total time spent in this metastore function was greater than 1000ms : getAllDatabases_()=1180

16/04/13 00:13:46 INFO metadata.Hive: Total time spent in this metastore function was greater than 1000ms : getFunctions_(String, String, )=2031

16/04/13 00:13:46 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager

16/04/13 00:13:46 INFO log.PerfLogger: <PERFLO