Spark Sql
文章平均质量分 72
zhixingheyi_tian
Intel Big Data. Spark
展开
-
Big Data 平障录
Hive 生成带压缩的格式,需要如此设置。原创 2024-04-30 21:02:19 · 293 阅读 · 3 评论 -
spark sql query 剖析
生成 unresolved LogicalPlan// abstract class AbstractSqlParser`/** Creates LogicalPlan for a given SQL string. */ override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =&gt...原创 2018-11-13 16:58:07 · 454 阅读 · 1 评论 -
Spark 之 Shuffle & AQE
shuffle原创 2022-06-03 15:06:17 · 2324 阅读 · 0 评论 -
Spark 之 DataFrame
Dataframe原创 2022-06-02 10:25:10 · 121 阅读 · 0 评论 -
Spark-SQL 之 join 类型
Shuffle Hash Join启用 Shuffle Hash Join 必须满足以下几个条件:仅支持等值 Join,不要求参与 Join 的 Keys 可排序;spark.sql.join.preferSortMergeJoin 参数必须设置为 false,参数是从 Spark 2.0.0 版本引入的,默认值为 true,也就是默认情况下选择 Sort Merge Join;小表的大小(plan.stats.sizeInBytes)必须小于 spark.sql.autoBroadcastJoi原创 2022-05-31 19:34:41 · 651 阅读 · 3 评论 -
Spark-SQL那些事
WITH ASWITH AS短语,也叫做子查询部分,定义一个SQL片断后,该SQL片断可以被整个SQL语句所用到。有的时候,with as是为了提高SQL语句的可读性,减少嵌套冗余。with A as ( select * from user) select * from A, customer where customer.userid = user.id**先执行select * from user把结果放到一个临时表A中,作为全局使用。with as的将频繁执行的s原创 2022-05-18 14:55:09 · 1083 阅读 · 0 评论 -
Spark SQL -- Tungsten
ConceptSpark uses two engines to optimize and run the queries - Catalyst and Tungsten, in that order. Catalyst basically generates an optimized physical query plan from the logical query plan by applying a series of transformations like predicate pushdown原创 2021-10-24 20:09:08 · 142 阅读 · 0 评论 -
Spark 常见 issues solving
加载 metastore_db issueDatabase Class Loader started - derby.database.classpath=''21/01/20 06:30:33 ERROR PoolWatchThread: Error in trying to obtain a connection. Retrying in 7000msjava.sql.SQLException: A read-only user or a user in a read-only database原创 2021-01-20 15:35:42 · 398 阅读 · 0 评论 -
Semi-join
semi-join Conceptsemi-join是指semi-join子查询。 当一张表在另一张表找到匹配的记录之后,半连接(semi-jion)返回第一张表中的记录。与条件连接相反,即使在右节点中找到几条匹配的记录,左节点 的表也只会返回一条记录。另外,右节点的表一条记录也不会返回。半连接通常使用IN 或 EXISTS 作为连接条件。 该子查询具有如下结构:SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tabl原创 2020-08-23 10:24:10 · 312 阅读 · 0 评论 -
CreateTempViewUsing
logicRelation//ddl.scala//viewDefinition 为logicRelationcase class CreateTempViewUsing( tableIdent: TableIdentifier, userSpecifiedSchema: Option[StructType], replace: Boolean, global...原创 2019-08-21 15:42:24 · 480 阅读 · 0 评论 -
SQL 那些事 (纯SQL)
HavingHAVING语句通常与GROUP BY语句联合使用,用来过滤由GROUP BY语句返回的记录集。HAVING语句的存在弥补了WHERE关键字不能与聚合函数联合使用的不足。语法:SELECT column1, column2, … column_n, aggregate_function (expression)FROM tablesWHERE predicatesGROU...原创 2019-06-23 16:36:42 · 179 阅读 · 1 评论 -
OAP FileFormt
OAP File// OAP Data File V1 Meta Part// ..// Field Length In Byte// Meta// Magic and Version 4// Row Count In Each Row Group 4// ...原创 2019-06-18 10:38:36 · 370 阅读 · 0 评论 -
spark sql examples on kubernetes
submit sql to thriftserver by beelinerun thriftserver in a podsh sbin/start-thriftserver.sh \ --master k8s://https://kubernetes.default.svc.cluster.local:443 \ --name spark-thriftserver \ ...原创 2019-05-07 20:51:37 · 645 阅读 · 0 评论 -
implement a spark-sql case of separating computation and storage using Kubernetes
PrerequisitesSet up a single-node Kubernetes(minikube)with --cpus 8 --memory 8192Build and push the spark2.4.1 imagePut hive-site.xml in the conf dirRunbin/spark-sql \ --master k8s://https:...原创 2019-04-12 16:57:47 · 134 阅读 · 0 评论 -
Spark 之 InternalRow
InternalRow — Abstract Binary Row FormatInternalRow is also called Catalyst row or Spark SQL row.abstract class InternalRow extends SpecializedGetters with Serializable {}UnsafeRowUnsafeRow is a...原创 2019-04-01 14:32:18 · 878 阅读 · 0 评论 -
Physical Query Operator
BinaryExecNodeBinary physical operator with two child left and right physical operatorsLeafExecNodeLeaf physical operator with no childrenBy default, the set of all attributes that are produce...原创 2019-01-12 15:31:56 · 247 阅读 · 0 评论