spark 写tidb_当Apache Spark遇见TIDB

展开查看详情

1.When Apache Spark meets TiDB

=> TiSpark

maxiaoyu@pingcap.com

2.Who am I

● Shawn Ma@PingCAP

● Tech Lead of OLAP Team

● Working on OLAP related products and features

● Previously tech lead of Big Data infra team@Netease

● Focus on SQL on Hadoop and Big Data related stuff

3.Agenda

● A little bit about TiDB / TiKV

● What is TiSpark

● Architecture

● Benefit

● What’s Next

4.What’s TiDB

● Open source distributed RDBMS

● Inspired by Google Spanner

● Horizontal Scalability

● ACID Transaction

● High Availability

● Auto-Failover

● SQL at scale

● Widely used in different industries, including Internet, Gaming,

Banking, Finance, Manufacture and so on (200+ users)

5.A little bit about TiDB and TiKV

Stateless SQL Layer

Metadata / Timestamp request

TiDB ... TiDB ... TiDB

gRPC

Placement

gRPC Driver (PD)

... ... ...

TiKV TiKV TiKV TiKV Control flow:

Balance / Failover

gRPC

Raft Raft Raft

Distributed Storage Layer

6.TiKV: The whole picture

Client Placement

Driver

RPC RPC RPC RPC

PD 1

Store 1 Store 2 Store 3 Store 4

PD 2

Region 1 Region 1 Region 2 Region 1

Region 2 PD 3

Region 3 Region 2 Region 5

Region 5

Raft Region 5 Region 4 Region 3

Group

Region 4

Region 4 Region 3

TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4

TiKV is powered by RocksDB

7.What is TiSpark

● TiSpark = Spark SQL on TiKV

○ Spark SQL directly on top of a distributed Database

Storage

● Hybrid Transactional/Analytical Processing (HTAP) rocks

○ Provide strong OLAP capacity together with TiDB

8.What is TiSpark

● Complex Calculation Pushdown

● Key Range pruning

● Index support

○ Clustered index / Non-Clustered index

○ Index Only Query

● Cost Based Optimization

○ Histogram

○ Pick up right Access Path

9.Architecture

Spark Driver

gRPC

Placement

TiSpark Driver (PD)

retrieve data location

Spark Exec Spark Exec Spark Exec

TiSpark TiSpark TiSpark

gRPC retrieve real data from TiKV

TiKV TiKV TiKV TiKV TiKV

Distributed Storage Layer

10.Architecture

● On Spark Driver

○ Translate metadata from TiDB into Spark meta info

○ Transform Spark SQL logical plan, pick up elements to be

leverage by storage (TiKV) and rewrite the plan

○ Locate Data based on Region info from Placement Driver

and split partitions;

● On Spark Executor

○ Encode Spark SQL plan into TiKV’s coprocessor request

○ Decode TiKV / Coprocessor result and transform result

into Spark SQL Rows

11.How everything made possible

● Extension points for Spark SQL Internal

● Extra Strategies allow us to inject our own physical executor and that’s

what we leveraged for TiSpark

● Trying best to keep Spark Internal untouched to avoid compatibility issue

12.How everything made possible

● A fat java client module, paying the price of bypassing TiDB

○ Parsing Schema, Type system, encoding / decoding, coprocessor

○ Almost full featured TiKV client (without write support for now)

○ Predicates / Index - Key Range related logic

○ Aggregates pushdown related

○ Limit, Order, Stats related

● A thin layer inside Spark SQL

○ TiStrategy for Spark SQL plan transformation

○ And other utilities for mapping things from Spark SQL to TiKV

client library

○ Physical Operators like IndexScan

○ Thin enough for not bothering much of compatibility with Spark

SQL

13.Too Abstract? Let’s get concrete

select class, avg(score) from student

WHERE school = ‘engineering’ and lottery(name) = ‘picked’

and studentId >= 8000 and studentId < 10100

group by class ;

● Above is a table on TiDB named student

● Clustered index on StudentId and a secondary index on

School column

● Lottery is an Spark SQL UDF which pick up a name and

output ‘picked’ if RNG decided so

14.Predicates Processing

WHERE school = ‘engineering’ and lottery(name) = ‘picked’

and studentId >= 8000 and studentId < 10100

Predicates are converted into key ranges based on indexes

StudentId >= 8000 StudentId < 10100 Key Range: [8000, 10100)

School = ‘engineering’

school = ‘engineering’

Lottery(name) = ‘picked’

Construct Tasks

1. Append remaining predicates if supported by

Spark Task 1 Spark Task 2 coprocessor

Region2 Region3 2. Push back whatever needs to be computed by Spark

[8000, 10000) [10000, 10100) SQL, e.g. UDFs, prefix index predicates

COP Request COP Request 3. Cut them into tasks according to Region/Range

4. Encode into coprocessor request

gRPC via Spark worker

Region 1 Region 2 Region 3

StudentId StudentId StudentId

[0-5000) [5000-10000) [10000-15000)

15. Index Scan

WHERE school = ‘engineering’ and lottery(name) = ‘picked’

and (studentId >= 8000 and studentId < 10100)

● Secondary Index is encode as key-value pair

○ Key is comparable bytes format of

Executor Executor all index keys in defined order

○ Value is the row ID pointing to table

row data

1,2,3,4,5,7,8,10,88 Sort and cut row keys into

ranges according to [1,5) ● Reading data via Secondary Index usually

Batch Scan for index according 5,7,9 10 88

Key range in region requires a double read.

to predicates range

○ First, read secondary index in range

just like reading primary keys in

previous slide.

Index Data for student_school Row Data for student ○ Shuffle Row IDs according to region

○ Sort all row IDs retrieved and

combine them into ranges if

possible

○ Encoding row IDs into row keys for

the table

TiKV Region Data TiKV Region Data TiKV Region Data ○ Send those mini requests in batch

concurrently

● Optimize away second read operation

○ If all required column covered by

index itself already

16.Index Selection

WHERE school = ‘engineering’ and lottery(name) = ‘picked’

and (studentId >= 8000 and studentId < 10100) or studentId in

(10323, 10327)

Clustered Index on

StudentID + 1k Rows

predicates related 1K * Clustered Index

StudentId matched Access Cost

<

Secondary Index on 800 * Secondary

School + 800 Rows Index Access Cost

predicates related

School matched Histogram

● If the columns referred are all covered by index, then instead of retrieving

actual rows, we apply index only query and cost function will be different

● If histogram not exists, TiSpark using pseudo selection logic.

17.Aggregates Processing

select class, avg(score) from student

…….

group by class ;

Spark SQL plan received in TiStrategy AVG are rewritten into SUM and COUNT

SUM(score) / COUNT(score) Group BY class

AVG(score) Group BY class

Construct Schema Transformation Rules

Reduce Task 1 Reduce Task 2 TiDB has totally different type system and infer rules

Spark Schema by its own type infer rules

[SUM, COUNT, class]

Map Task 1 Map Task 2 TiKV Schema to Spark Schema

[groupBy keys as bytes, SUM as Decimal, COUNT as BigInt ]

gRPC via Spark worker

● After coprocessor preprocessing,

Region 1 Region 2 Region 3 TiSpark still rely on normal Spark

StudentId StudentId StudentId aggregation strategy

[0-5000) [5000-10000) [10000-15000)

18.Benefit

● Analytical / Transactional support all on one platform

○ No need for ETL and query data in real-time

○ High throughput and consistent snapshot read from

database

○ Simplify your platform and reduce maintenance cost

● Embrace Apache Spark and its eco-system

○ Support of complex transformation and analytics

beyond SQL

○ Cooperate with other projects in eco-system (like

Apache Zeppelin)

○ Apache Spark bridges your data sources

19.Ease of Use

● Working on your existing Spark Cluster

○ Just a single jar like other Spark connector

● Workable as standalone application, spark-shell,

thrift-server, pyspark and R

● Work just like another data source

val ti = new org.apache.spark.sql.TiContext(spark)

// Map all TiDB tables from database tpch as Spark SQL tables

ti.tidbMapDatabase("sampleDB")

spark.sql("select count(*) from sampleTable").show

20.What’s Next

● Batch Write Support (writing directly as TiKV native

format)

● JSON Type support (since TiDB already supported)

● Partition Table support (both Range and Hash)

● Join optimization based on range and partition table

● (Maybe) Join Reorder with TiDB’s own Histogram

● Another separate columnar storage project using Spark

as its execution engine (not released yet)

21.Thanks!

Contact me:

maxiaoyu@pingcap.com

www.pingcap.com

https://github.com/pingcap/tispark

https://github.com/pingcap/tidb

https://github.com/pingcap/tikv

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值