并行执行的学习与测试

最新推荐文章于 2021-04-03 16:26:41 发布

cprkdx1264

最新推荐文章于 2021-04-03 16:26:41 发布

阅读量186

点赞数

参考文档<>

#################################################################
1 Introduction to Parallel Execution <1>
#################################################################
1.1 Oracle的并行执行不适合于(事务多，时间短)的OLTP系统，在这类系统中，从并行执行得到的好处，远小于并行执行带来的额外开销．

1.2 The most common example of using parallel execution is for DSS,Data warehouses，Complex queries, 批处理的动作，如数据的迁移．

1.3 Parallelism is the idea of breaking down a task so that, instead of on process doing all of the work in a query,
many processes do part of the work at the same time.

#################################################################
2 When (not) to implement parallel execution
#################################################################
2.1 业务时间不合适做并行动作。

2.2 Parallel execution improves performance for:
1) Queries
2) Creating of Large indexes
3) Bulk inserts, updates and deletes,
4) Aggregations and copying

and Parallel execution benefits systems that have all of the following characteristics:
1) SMP, Clusters, multiple CPUS.(在单CPU的机器中不能从并行技术中得到任何好处,其实也可以得到一些好处的，充分利用CPU资源.)
2) Sufficient I/O bandwidth (足够的I/O带宽)
3) Under-utilized or intermittently used CPUS(less than 30%,CPU相对空闲)
4) Sufficient Memory to support additional memory-intensive processes such as sorts, hashing, and I/O bufffers

2.3 使用并行执行的两个前提：
1）工作量大，如扫描一个50G的大表
2）系统资源(CPU,IO, Memory)充足

#################################################################
3 How Parallel Execution Works<1>
#################################################################
when parallel execution is not used, a single server process performs all necessary processing for the sequential
execution of a SQL statement.For example, to perform. a full table scan(select * from employees), one process performs
the entire operation.(一个Client Connection,对应一个服务端的进程，这个进程在没用使用并行机制的情况下，只由一个CPU来处理，
是这样吗？是的，那小机中的这么多CPU不是没发挥到作用???多个进程的并发)

Parallel execution performs these operations in parallel using multiple parallel processes. One process, known as the parallel
execution coordinator(协调者),dispatches the execution of a statement to several parallel execution servers and coordinates the
result from all of the server processes to send the results back to the user.(由一个协调者来分配任务和综合结果，返回给最终用户)

Take Parallel Table Scan employees for example, The table is divided dynamically(dynamic partitioning) into load units called granules
and each granule is read by a single parallel execution server. The granules are generated by the coordinator. Each granules is a
range of physical blocks of the table. The mapping of granules to execution servers is not static, but is determined at execution
time. When an execution server finishes reading the rows of the table employees corresponding to a granule,it gets another granule
from the coordinator if there are any granules remaining. This continues till all granules are exhausted, in other words, the entire
table employees has been read. The parallel execution servers send results back to the parallel execution coordinator, which assembles
the pieces into the desired full table scan.
(按表的物理块分成几个Granule,每一个服务进程取一个，一旦完成，则再在剩余的Granule中取一个，直到所有的Granule都完成．)

Given a query plan for a SQL query, the parallel execution coordinator breaks down each operator in a SQL query into parallel pieces.
runs them in the right order as specified in the query, and then integrates the partial results produced by the parallel execution
servers executing the operators. The number of parallel execution servers assigned to a single operation is the degree of parallelism(DOP)
for an operation. Multiple operations within the same SQL statement all have the same degree of parallelism.

#################################################################
4 Parallelized SQL Statement <1>
#################################################################

4.1 Tips
1) Each SQL statement undergoes an optimization and parallelization process when it is parsed.
(Oracle会动态地选择适合的并行操作)

2) After the optimizer determines the execution plan of a statement, the parallel execution coordinator determines the parallelization method
for each operation in the execution plan.
在优化器确定执行计划后，并行协调者必须确定并行的方法，并行度(parallelism)。

3) Parallelism Between Operations

Intra-operation parallelism: 操作内部的并行
Parallelization of an individual operation where the same operation is performed on smaller sets of rows by parallel execution servers.

Inter-operation parallelism: 操作间的并行
When two operations run concurrently on different sets of parallel execution servers with data flowing from one operation into the other,
we achieve what is termed inter-operation parallelism.

Consider the following statement:

Select * from employees order by employee_id.

这个语句的执行计划分为两步：
1) Full table scan the table employees
2) Order by employee_id
假定这个查询语句的并行度设为4.

则
Each of the two operations(scan and sort) performed concurrently is given its own set of parallel execution servers.
两个操作内部的并行度都设为4,所以，对于这个查询的执行并行度为4+4=8; This is because a parent and child operator can be performed at the same time.
(inter-operation parallelism) 排序操作不必等到所有的扫描操作完成后，才开始。但inter-operation不能超过2,No more than two sets of parallel execution servers
can run simultaneously.

示例图参考<>

#################################################################
5 Degree of Parallelism <1>
#################################################################

1) The number of parallel execution servers associated with a single operation is known as the degree of parallelism.

2) Note that the degree of parallelism applies directly only to intra-operation parallelism. If inter-operation parallelism is possible, the total number of
parallel execution servers for a statement can be twice the specified degree of parallelism.But, No more than two sets of parallel execution servers can run simultaneously.Only two sets of parallel execution servers need to be active to guarantee optimal inter-operation parallelism.
(并行执行度针对于Intra-operation,对于一个语句的操作，并行度可能是两倍于Intra-operation parallelism)

3) Server ways to manage resource utilizations
限制资源的几种方法：

1> PARALLEL_ADAPTIVE_MULTI_USER
2> User resource limits and profiles, which allow you to set limits on resource to each user
3> Database Resource Manager,which lets you allocate resource to different groups of users.

#################################################################
6 SQL Operations That Can Be Parallelized
#################################################################

6.1 Parallel Query
You can parallelize queries and subqueries in SELECT statement, as well as the query portions of DDL statements and DML statements.
However, you cannot parallelize the query portion of a DDL or DML statement if it references a remote object.
(Select,或DDL中的查询部分，或DDL中的查询部分都可并行，但如果DDL，DML中的查询部分引用了远程对象，则会自动变成串行. 如果select引用了远程对象呢？应该可以的)

示例：
select /*+ parallel(big_table, 4) parallel(big_table_bak, 4) */
max(a.col1), avg(a.col1)
from big_table a, big_table_bak b
where a.col1 = b.col1
group by a.col2;

6.2 Parallel DDL
You can normally use parallel DDL when you use regular DDL. However, cannot be used on tables with object or LOB columns
(DDL不能用于带对象或LOB列的表)

DDL Statements that can be parallized:

1) Create table as select ...

2) Create table IOT can be parallelized either with or without an AS Select Clause.
(索引组织表可以不带AS Select ...)

3) Create index ..

4) Alter index rebuild ..

if table is partitioned:
5) Alter table move or [split or coalesce]

6) Alter index rebuild or [split] partitioned index.

Different parallelism is used for different operations. Parallel Create (partitioned) table as select and parallel create (partitioned) index
run with a degree of parallelism equal to the number of partitions.
(不同的操作有不同的并行度，对于分区表，分区索引，并行度一般等于分区数)

Parallel operations require accurate statistics to perform. optimally.
并行DDL需要准确的统计数据

示例：
Create table big_table_bak parallel as select * from big_table;

6.3 Parallel DML
You can normally use parallel DML where you use regular DML.

并行DML，除了并行对象(Table,index)本身需要设置并行属性外，还必须显式打开会话的并行选项

示例：
ALTER SESSION ENABLE PARALLEL DML;

Alter table big_table_bak parallel 4;

insert into big_table_bak select * from big_table;

insert into big_table_bak select /*+ parallel(big_table 4) */ * from big_table;

6.4 SQL*Loader

SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE

An important point to remember is that indexes are not maintained during a parallel load.
不维护索引

具体的语法要参考<>

#################################################################
7 How to Make a Statement Run in Parallel
#################################################################

7.1 Parallel Query
To achieve parallelism for SQL query statements, one or more of the tables being scanned should have a parallel attribute.
(添加表的并行属性，并行Hint select /*+ parallel(employee 4) */ from employee;)

7.2 Parallel DDL
To achieve parallelism for SQL DDL statements, the parallel clause should be specified.

7.3 Parallel DML

PDML: first
"alter session enable parallel DML;"
then any DML issued against a table with a parallel attribute will occur in parallel, if no PDML restrictions are violated.
alter table xxx paralle;
insert into xxx select * from xxx;

#################################################################
8 并行相关的视图
#################################################################

SQL> show parameter parallel_max_servers

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
parallel_max_servers integer 7

SQL> select index_name, degree,instances from user_indexes;

INDEX_NAME DEGREE INSTANCES
------------------------------ ---------------------------------------- ----------------------------------------
IND_PK 10 1

执行
select col1,col2,col3 from big_table where col1 in (select distinct col1 from big_table);
oracle会启动并行进程来扫描索引．

1) v$px_process;

SQL> select * from v$px_process;

SERVER_NAME STATUS           PID SPID                SID    SERIAL#
----------- --------- ---------- ------------ ---------- ----------
P000        IN USE            21 13395                24          6
P001        IN USE            23 13399                25          1
P002        IN USE            24 13401                29          1
P003        IN USE            25 13403                26          1
P004        IN USE            26 13405                27          1
P005        IN USE            27 13407                28          1
P006        IN USE            28 13409                30          1
P007        AVAILABLE         29 13411

8 rows selected

起了7个并行进程．

2) v$px_session
SQL> select * from v$px_session;

SADDR                   SID    SERIAL#      QCSID QCSERIAL# QCINST_ID SERVER_GROUP SERVER_SET    SERVER#     DEGREE REQ_DEGREE
---------------- ---------- ---------- ---------- ---------- ---------- ------------ ---------- ---------- ---------- ----------
000000005B2D73B0         24          6         23         14          1            1          1          1         14         20
000000005B2D6920         23         14         23
000000005B2D7E40         25          1         23         14          1            1          1          2         14         20
000000005B2DA880         29          1         23         14          1            1          1          3         14         20
000000005B2D88D0         26          1         23         14          1            1          1          4         14         20
000000005B2D9360         27          1         23         14          1            1          1          5         14         20
000000005B2D9DF0         28          1         23         14          1            1          1          6         14         20
000000005B2DB310         30          1         23         14          1            1          1          7         14         20

8 rows selected

起了7个并行进程．
000000005B2D6920 23 14 23
这是Coordinator Process.

#################################################################
9 并行执行步骤的执行计划
#################################################################

这些都是什么意思呢？

PARALLEL_TO_SERIAL: Parallel execution; output of step is returned to serial "query coordinator" process.
什么意思呢？

blank Serial execution.

SERIAL_FROM_REMOTE (S -> R)
Serial from remote Serial execution at a remote site.

SERIAL_TO_PARALLEL (S -> P)
Serial to parallel Serial execution; output of step is partitioned or broadcast to parallel execution servers.

PARALLEL_TO_PARALLEL (P -> P)
Parallel to parallel Parallel execution; output of step is repartitioned to second set of parallel execution servers.

PARALLEL_TO_SERIAL (P -> S)
Parallel to serial Parallel execution; output of step is returned to serial "query coordinator" process.

PARALLEL_COMBINED_WITH_PARENT (PWP)
Parallel combined with parent Parallel execution; output of step goes to next step in same parallel process. No interprocess communication to parent.

PARALLEL_COMBINED_WITH_CHILD (PWC)
Parallel combined with child Parallel execution; input of step comes from prior step in same parallel process. No interprocess communication from child.

#################################################################
10 Hints for Parallel Execution 并行执行的Hint
#################################################################

1) PARALLEL
Select /*+ PARALLEL(table_alias, degree,rac_option) */ xxx from xxx;

示例：
SELECT /*+ FULL(hr_emp) PARALLEL(hr_emp, 5) */ last_name FROM hr.employees hr_emp;

SELECT /*+ FULL(hr_emp) PARALLEL(hr_emp, DEFAULT,DEFAULT) */ last_name FROM hr.employees hr_emp;

第二个Default Specifies that the table should be split among all of the available instance, with the of parallelism on each instance.
(RAC选项)

2) NOPARALLEL

示例：
SELECT /*+ NOPARALLEL(hr_emp) */ last_name FROM hr.employees hr_emp;

3) PQ_DISTRIBUTE
The PQ_DISTRIBUTE hint improves the performance of parallel join operations.

4) PARALLEL_INDEX
The PARALLEL_INDEX hint specifies the desired number of concurrent servers that can be used to parallelize index range scans for partitioned indexes.

SELECT /*+ PARALLEL_INDEX(table1, index1, 3, 2) +/ * from xxxx;

table1 -- table name or alias name of table
index1 -- index name which an index scan is to be performed.
3 -- Degree of parallelism
2 -- Two nodes of RAC

5) NOPARALLEL_INDEX

The NOPARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation.

#################################################################
问题
#################################################################

1 一旦启动的并行进程，什么时候会关闭？
Session退出后？
不会killed.仍会保留供其它并行使用。

2 收集ITPub.net上的并行相关的帖子,看有没其它的注意事项？

3 建议使用Parallel Hint而不是修改对象的Parallel属性来使用并行执行。

参考文档
1 <> Chapter 20

2 <>for more information about parallel

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/10248702/viewspace-669510/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/10248702/viewspace-669510/

cprkdx1264

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
并行执行的学习与测试

参考文档<>#################################################################1 Introduction to Parallel Executio...
复制链接

扫一扫