2022-06-01 postgres的insert语句执行的关键流程

悟世者

已于 2022-06-01 20:29:02 修改

阅读量818

点赞数

分类专栏： postgres 文章标签： postgresql 数据库 database

于 2022-06-01 19:42:28 首次发布

本文链接：https://blog.csdn.net/adofsauron/article/details/125089111

版权

postgres 专栏收录该内容

10 篇文章 1 订阅

订阅专栏

本文详细介绍了Postgres数据库中执行Insert语句的过程，从客户端连接到查询分析、执行计划和存储层。解释了各阶段的功能，如分析、重写、规划和执行，并通过示例展示了执行器如何递归处理计划树以获取结果。此外，还探讨了插入操作如何处理触发器、约束检查和返回结果。

摘要由CSDN通过智能技术生成

摘要:

postgres的语句执行可以分为客户端连接层, 查询分析层, 执行计划层，存储层。

本文记录insert语句在源码中如何执行的。

流程概述：

执行流程：

这里我们将介绍为了获取结果，一个查询将会经历那些阶段。

1. 一个由应用程序到PostgreSQL服务器的连接必须被建立。应用程序传递一个查询给服务

器并等待接收由服务器传回的结果。

2. 分析阶段对由应用程序传递的查询进行语法检查，并创建一个查询树。

3. 重写系统得到分析阶段创建的查询树，并查找可以应用到该查询树的任何规则（存储

在系统目录中）。对找到的规则，它会执行规则体中给定的转换。

重写系统的一个应用是实现视图。不论何时发生一个视图（即一个虚拟表）上的查询，

重写系统将用户查询重写为一个访问视图定义中给定的基本表的查询来替代。

4. 规划器/优化器接手（重写过的）查询树并创建一个将被作为执行器输入的查询计划。

它会先创建所有可能导向相同结果的路径。例如，如果在一个被扫描的关系上有一个索

引，则有两条可供扫描的路径。其中之一是一个简单的顺序扫描，而另一个则是使用索

引。接下来执行每条路径所需的代价被估算出来并且代价最低的路径将被选中。代价最

低的路径将被扩展成一个完整的计划可供执行器使用。

5. 执行器递归地逐步通过计划树并按照计划表述的方式获取行。执行器在扫描关系时会使

用存储系统、执行排序和连接、估算条件并最后归还得到的行。

执行器：

执行器接手规划器/优化器创建的计划，并递归地处理之以抽取所需的行集。这本质上是一

种需求拉动的管道机制。每次一个计划节点被调用时，它必须交付一个或多个行，或者报告

已经完成了行的交付。

为了提供一个具体例子，假设顶层节点是一个MergeJoin节点。在归并完成之前，两个行必

须先被获取（每一个来自于一个子计划）。因此执行器递归地调用它自己去处理子计划（从

附加在lefttree的子计划开始）。新的顶层节点（左子计划的顶层节点），我们说是一

个Sort节点，并且又要递归来获取一个输入行。Sort的子节点可以是一个SeqScan节点，表

示真正地读取一个表。该节点的执行将会使执行器从表中获取一行并将它返回给调用节

点。Sort节点将反复调用它的子节点来获得所有需要排序的行。当输入耗尽后（子节点将返

回一个NULL来标识），Sort节点执行排序，并且最后能够返回它的第一个输出行，及排序后

的第一个。它会把剩下的行保存下来，这样它可以根据后续的要求按照排好的顺序返回这些

行。

MergeJoin节点也会相似地从其右子计划要求第一个行。然后它会比较两个子节点提供的行

看它们是否能被连接，如果可以它会返回一个连接行给调用者。在下一次调用时，或者它无

法连接当前的输入对时，它会前进到一个表或另一个表的下一行（取决于比较的结果），并

再次检查匹配。最后，某个子计划耗尽，MergeJoin节点返回NULL表示它没有更多连接行可

以提供。

复杂的查询可能涉及多层计划节点，但是一般的方法是相同的：每个节点在被调用时计算并

返回它的下一个输出行。每个节点同时也负责应用由规划器分配给它的选择或投影表达式。

执行器机制被用于四种基本SQL查询类型：SELECT、INSERT、

UPDATE以及DELETE。对

于SELECT，顶层执行器代码只需要发送查询计划树返回的每个行给客户端。对于INSERT，每

一个被返回的行被插入到INSERT中指定的目标表中。这通过一个被称为ModifyTable的特殊

顶层计划节点完成（一个简单的INSERT ... VALUES命令会创建一个由一个Result节点组成的

简单计划树，该节点只计算一个结果行，在它之上的ModifyTable节点会执行插入。但

是INSERT ... SELECT可以用到执行器机制的全部功能）。对于UPDATE，规划器会安排每一个

计算行包含所有被更新的列值加上原始目标行的TID（元组ID或行ID），这些数据也会被输

入到一个ModifyTable节点，该节点将利用这些信息创建一个新的被更新行并标记旧行为被

删除。对于DELETE，唯一被计划返回的列是TID，ModifyTable节点简单地使用TID访问每一

个目标行并将其标记为被删除。

tuple:

/*
 * A postgres disk page is an abstraction layered on top of a postgres
 * disk block (which is simply a unit of i/o, see block.h).
 *
 * specifically, while a disk block can be unformatted, a postgres
 * disk page is always a slotted page of the form:
 *
 * +----------------+---------------------------------+
 * | PageHeaderData | linp1 linp2 linp3 ...           |
 * +-----------+----+---------------------------------+
 * | ... linpN |									  |
 * +-----------+--------------------------------------+
 * |		   ^ pd_lower							  |
 * |												  |
 * |			 v pd_upper							  |
 * +-------------+------------------------------------+
 * |			 | tupleN ...                         |
 * +-------------+------------------+-----------------+
 * |	   ... tuple3 tuple2 tuple1 | "special space" |
 * +--------------------------------+-----------------+
 *									^ pd_special
 *
 * a page is full when nothing can be added between pd_lower and
 * pd_upper.
 *
 * all blocks written out by an access method must be disk pages.
 *
 * EXCEPTIONS:
 *
 * obviously, a page is not formatted before it is initialized by
 * a call to PageInit.
 *
 * NOTES:
 *
 * linp1..N form an ItemId (line pointer) array.  ItemPointers point
 * to a physical block number and a logical offset (line pointer
 * number) within that block/page.  Note that OffsetNumbers
 * conventionally start at 1, not 0.
 *
 * tuple1..N are added "backwards" on the page.  Since an ItemPointer
 * offset is used to access an ItemId entry rather than an actual
 * byte-offset position, tuples can be physically shuffled on a page
 * whenever the need arises.  This indirection also keeps crash recovery
 * relatively simple, because the low-level details of page space
 * management can be controlled by standard buffer page code during
 * logging, and during recovery.
 *
 * AM-generic per-page information is kept in PageHeaderData.
 *
 * AM-specific per-page data (if any) is kept in the area marked "special
 * space"; each AM has an "opaque" structure defined somewhere that is
 * stored as the page trailer.  an access method should always
 * initialize its pages with PageInit and then set its own opaque
 * fields.
 */

核心流程：

(gdb) bt
#0  ExecInsert (mtstate=0x2087a38, resultRelInfo=0x2087c48, slot=0x2088f78, planSlot=0x20884e8, estate=0x20877d8, canSetTag=true) at nodeModifyTable.c:602
#1  0x000000000072498e in ExecModifyTable (pstate=0x2087a38) at nodeModifyTable.c:2561
#2  0x00000000006edb08 in ExecProcNodeFirst (node=0x2087a38) at execProcnode.c:463
#3  0x00000000006e3279 in ExecProcNode (node=0x2087a38) at ../../../src/include/executor/executor.h:257
#4  0x00000000006e56a9 in ExecutePlan (estate=0x20877d8, planstate=0x2087a38, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, 
    direction=ForwardScanDirection, dest=0x206da58, execute_once=true) at execMain.c:1551
#5  0x00000000006e37b7 in standard_ExecutorRun (queryDesc=0x20c5048, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#6  0x00000000006e3654 in ExecutorRun (queryDesc=0x20c5048, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:305
#7  0x00000000008fabfd in ProcessQuery (plan=0x206d978, 
    sourceText=0x1fb2a28 "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE) VALUES (17, 'Paul', 32, 'California', 20000.00,'2001-07-13');", params=0x0, queryEnv=0x0, 
    dest=0x206da58, qc=0x7ffcf6b62bb0) at pquery.c:160
#8  0x00000000008fc34a in PortalRunMulti (portal=0x2015898, isTopLevel=true, setHoldSnapshot=false, dest=0x206da58, altdest=0x206da58, qc=0x7ffcf6b62bb0) at pquery.c:1274
#9  0x00000000008fb9ac in PortalRun (portal=0x2015898, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x206da58, altdest=0x206da58, qc=0x7ffcf6b62bb0) at pquery.c:788
#10 0x00000000008f5a77 in exec_simple_query (
    query_string=0x1fb2a28 "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE) VALUES (17, 'Paul', 32, 'California', 20000.00,'2001-07-13');") at postgres.c:1214
#11 0x00000000008f9d66 in PostgresMain (argc=1, argv=0x7ffcf6b62e40, dbname=0x1fdf898 "test", username=0x1fadff8 "kevin") at postgres.c:4496
#12 0x000000000084abfe in BackendRun (port=0x1fd6ed0) at postmaster.c:4530
#13 0x000000000084a57a in BackendStartup (port=0x1fd6ed0) at postmaster.c:4252
#14 0x0000000000846d3b in ServerLoop () at postmaster.c:1745
#15 0x0000000000846612 in PostmasterMain (argc=1, argv=0x1fabfb0) at postmaster.c:1417
#16 0x0000000000756f66 in main (argc=1, argv=0x1fabfb0) at main.c:209

(gdb) p *planSlot
$37 = {
  type = T_TupleTableSlot, 
  tts_flags = 16, 
  tts_nvalid = 6, 
  tts_ops = 0xbcee80 <TTSOpsVirtual>, 
  tts_tupleDescriptor = 0x20880d8, 
  tts_values = 0x2088530, 
  tts_isnull = 0x2088560, 
  tts_mcxt = 0x20876c0, 
  tts_tid = {
    ip_blkid = {
      bi_hi = 65535, 
      bi_lo = 65535
    }, 
    ip_posid = 0
  }, 
  tts_tableOid = 0
}

(gdb) p *resultRelInfo
$39 = {
  type = T_ResultRelInfo, 
  ri_RangeTableIndex = 1, 
  ri_RelationDesc = 0x7f849a8415a8, 
  ri_NumIndices = 0, 
  ri_IndexRelationDescs = 0x0, 
  ri_IndexRelationInfo = 0x0, 
  ri_RowIdAttNo = 0, 
  ri_projectNew = 0x0, 
  ri_newTupleSlot = 0x0, 
  ri_oldTupleSlot = 0x0, 
  ri_projectNewInfoValid = false, 
  ri_TrigDesc = 0x0, 
  ri_TrigFunctions = 0x0, 
  ri_TrigWhenExprs = 0x0, 
  ri_TrigInstrument = 0x0, 
  ri_ReturningSlot = 0x0, 
  ri_TrigOldSlot = 0x0, 
  ri_TrigNewSlot = 0x0, 
  ri_FdwRoutine = 0x0, 
  ri_FdwState = 0x0, 
  ri_usesFdwDirectModify = false, 
  ri_NumSlots = 0, 
  ri_NumSlotsInitialized = 0, 
  ri_BatchSize = 1, 
  ri_Slots = 0x0, 
  ri_PlanSlots = 0x0, 
  ri_WithCheckOptions = 0x0, 
  ri_WithCheckOptionExprs = 0x0, 
  ri_ConstraintExprs = 0x0, 
  ri_GeneratedExprs = 0x0, 
  ri_NumGeneratedNeeded = 0, 
  ri_returningList = 0x0, 
  ri_projectReturning = 0x0, 
  ri_onConflictArbiterIndexes = 0x0, 
  ri_onConflict = 0x0, 
  ri_PartitionCheckExpr = 0x0, 
  ri_RootResultRelInfo = 0x0, 
  ri_RootToPartitionMap = 0x0, 
  ri_PartitionTupleSlot = 0x0, 
  ri_ChildToRootMap = 0x0, 
  ri_ChildToRootMapValid = false, 
  ri_CopyMultiInsertBuffer = 0x0
}

(gdb) p *slot
$43 = {
  type = 4139132976, 
  tts_flags = 32764, 
  tts_nvalid = 0, 
  tts_ops = 0x6edb08 <ExecProcNodeFirst+77>, 
  tts_tupleDescriptor = 0x110, 
  tts_values = 0x2087a38, 
  tts_isnull = 0x7ffcf6b62850, 
  tts_mcxt = 0x6e3279 <ExecProcNode+54>, 
  tts_tid = {
    ip_blkid = {
      bi_hi = 24, 
      bi_lo = 0
    }, 
    ip_posid = 0
  }, 
  tts_tableOid = 34110008
}

(gdb) p *resultRelInfo
$47 = {
  type = T_ResultRelInfo, 
  ri_RangeTableIndex = 1, 
  ri_RelationDesc = 0x7f849a8415a8, 
  ri_NumIndices = 0, 
  ri_IndexRelationDescs = 0x0, 
  ri_IndexRelationInfo = 0x0, 
  ri_RowIdAttNo = 0, 
  ri_projectNew = 0x0, 
  ri_newTupleSlot = 0x0, 
  ri_oldTupleSlot = 0x0, 
  ri_projectNewInfoValid = false, 
  ri_TrigDesc = 0x0, 
  ri_TrigFunctions = 0x0, 
  ri_TrigWhenExprs = 0x0, 
  ri_TrigInstrument = 0x0, 
  ri_ReturningSlot = 0x0, 
  ri_TrigOldSlot = 0x0, 
  ri_TrigNewSlot = 0x0, 
  ri_FdwRoutine = 0x0, 
  ri_FdwState = 0x0, 
  ri_usesFdwDirectModify = false, 
  ri_NumSlots = 0, 
  ri_NumSlotsInitialized = 0, 
  ri_BatchSize = 1, 
  ri_Slots = 0x0, 
  ri_PlanSlots = 0x0, 
  ri_WithCheckOptions = 0x0, 
  ri_WithCheckOptionExprs = 0x0, 
  ri_ConstraintExprs = 0x0, 
  ri_GeneratedExprs = 0x0, 
  ri_NumGeneratedNeeded = 0, 
  ri_returningList = 0x0, 
  ri_projectReturning = 0x0, 
  ri_onConflictArbiterIndexes = 0x0, 
  ri_onConflict = 0x0, 
  ri_PartitionCheckExpr = 0x0, 
  ri_RootResultRelInfo = 0x0, 
  ri_RootToPartitionMap = 0x0, 
  ri_PartitionTupleSlot = 0x0, 
  ri_ChildToRootMap = 0x0, 
  ri_ChildToRootMapValid = false, 
  ri_CopyMultiInsertBuffer = 0x0
}

(gdb) p *node
$48 = {
  ps = {
    type = T_ModifyTableState, 
    plan = 0x206d4e8, 
    state = 0x20877d8, 
    ExecProcNode = 0x7244b5 <ExecModifyTable>, 
    ExecProcNodeReal = 0x7244b5 <ExecModifyTable>, 
    instrument = 0x0, 
    worker_instrument = 0x0, 
    worker_jit_instrument = 0x0, 
    qual = 0x0, 
    lefttree = 0x2087ee8, 
    righttree = 0x0, 
    initPlan = 0x0, 
    subPlan = 0x0, 
    chgParam = 0x0, 
    ps_ResultTupleDesc = 0x2088e68, 
    ps_ResultTupleSlot = 0x0, 
    ps_ExprContext = 0x0, 
    ps_ProjInfo = 0x0, 
    async_capable = false, 
    scandesc = 0x0, 
    scanops = 0x0, 
    outerops = 0x0, 
    innerops = 0x0, 
    resultops = 0x0, 
    scanopsfixed = false, 
    outeropsfixed = false, 
    inneropsfixed = false, 
    resultopsfixed = false, 
    scanopsset = false, 
    outeropsset = false, 
    inneropsset = false, 
    resultopsset = false
  }, 
  operation = CMD_INSERT, 
  canSetTag = true, 
  mt_done = false, 
  mt_nrels = 1, 
  resultRelInfo = 0x2087c48, 
  rootResultRelInfo = 0x2087c48, 
  mt_epqstate = {
    parentestate = 0x20877d8, 
    epqParam = 0, 
    tuple_table = 0x0, 
    relsubs_slot = 0x2087ec8, 
--Type <RET> for more, q to quit, c to continue without paging--
    plan = 0x205c940, 
    arowMarks = 0x0, 
    origslot = 0x0, 
    recheckestate = 0x0, 
    relsubs_rowmark = 0x0, 
    relsubs_done = 0x0, 
    recheckplanstate = 0x0
  }, 
  fireBSTriggers = false, 
  mt_resultOidAttno = 0, 
  mt_lastResultOid = 0, 
  mt_lastResultIndex = 0, 
  mt_resultOidHash = 0x0, 
  mt_root_tuple_slot = 0x0, 
  mt_partition_tuple_routing = 0x0, 
  mt_transition_capture = 0x0, 
  mt_oc_transition_capture = 0x0
}

核心函数:

ExecInsert


/* ----------------------------------------------------------------
 *		ExecInsert
 *
 *		For INSERT, we have to insert the tuple into the target relation
 *		(or partition thereof) and insert appropriate tuples into the index
 *		relations.
 *
 *		slot contains the new tuple value to be stored.
 *		planSlot is the output of the ModifyTable's subplan; we use it
 *		to access "junk" columns that are not going to be stored.
 *
 *		Returns RETURNING result if any, otherwise NULL.
 *
 *		This may change the currently active tuple conversion map in
 *		mtstate->mt_transition_capture, so the callers must take care to
 *		save the previous value to avoid losing track of it.
 * ----------------------------------------------------------------
 */
static TupleTableSlot *
ExecInsert(ModifyTableState *mtstate,
		   ResultRelInfo *resultRelInfo,
		   TupleTableSlot *slot,
		   TupleTableSlot *planSlot,
		   EState *estate,
		   bool canSetTag)
{
	Relation	resultRelationDesc;
	List	   *recheckIndexes = NIL;
	TupleTableSlot *result = NULL;
	TransitionCaptureState *ar_insert_trig_tcs;
	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
	OnConflictAction onconflict = node->onConflictAction;
	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
	MemoryContext oldContext;

	/*
	 * If the input result relation is a partitioned table, find the leaf
	 * partition to insert the tuple into.
	 */
	if (proute)
	{
		ResultRelInfo *partRelInfo;

		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
									   resultRelInfo, slot,
									   &partRelInfo);
		resultRelInfo = partRelInfo;
	}

	ExecMaterializeSlot(slot);

	resultRelationDesc = resultRelInfo->ri_RelationDesc;

	/*
	 * Open the table's indexes, if we have not done so already, so that we
	 * can add new index entries for the inserted tuple.
	 */
	if (resultRelationDesc->rd_rel->relhasindex &&
		resultRelInfo->ri_IndexRelationDescs == NULL)
		ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);

	/*
	 * BEFORE ROW INSERT Triggers.
	 *
	 * Note: We fire BEFORE ROW TRIGGERS for every attempted insertion in an
	 * INSERT ... ON CONFLICT statement.  We cannot check for constraint
	 * violations before firing these triggers, because they can change the
	 * values to insert.  Also, they can run arbitrary user-defined code with
	 * side-effects that we can't cancel by just not inserting the tuple.
	 */
	if (resultRelInfo->ri_TrigDesc &&
		resultRelInfo->ri_TrigDesc->trig_insert_before_row)
	{
		if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
			return NULL;		/* "do nothing" */
	}

	/* INSTEAD OF ROW INSERT Triggers */
	if (resultRelInfo->ri_TrigDesc &&
		resultRelInfo->ri_TrigDesc->trig_insert_instead_row)
	{
		if (!ExecIRInsertTriggers(estate, resultRelInfo, slot))
			return NULL;		/* "do nothing" */
	}
	else if (resultRelInfo->ri_FdwRoutine)
	{
		/*
		 * GENERATED expressions might reference the tableoid column, so
		 * (re-)initialize tts_tableOid before evaluating them.
		 */
		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);

		/*
		 * Compute stored generated columns
		 */
		if (resultRelationDesc->rd_att->constr &&
			resultRelationDesc->rd_att->constr->has_generated_stored)
			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
									   CMD_INSERT);

		/*
		 * If the FDW supports batching, and batching is requested, accumulate
		 * rows and insert them in batches. Otherwise use the per-row inserts.
		 */
		if (resultRelInfo->ri_BatchSize > 1)
		{
			/*
			 * If a certain number of tuples have already been accumulated, or
			 * a tuple has come for a different relation than that for the
			 * accumulated tuples, perform the batch insert
			 */
			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
			{
				ExecBatchInsert(mtstate, resultRelInfo,
								resultRelInfo->ri_Slots,
								resultRelInfo->ri_PlanSlots,
								resultRelInfo->ri_NumSlots,
								estate, canSetTag);
				resultRelInfo->ri_NumSlots = 0;
			}

			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);

			if (resultRelInfo->ri_Slots == NULL)
			{
				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
												 resultRelInfo->ri_BatchSize);
				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
													 resultRelInfo->ri_BatchSize);
			}

			/*
			 * Initialize the batch slots. We don't know how many slots will
			 * be needed, so we initialize them as the batch grows, and we
			 * keep them across batches. To mitigate an inefficiency in how
			 * resource owner handles objects with many references (as with
			 * many slots all referencing the same tuple descriptor) we copy
			 * the appropriate tuple descriptor for each slot.
			 */
			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_NumSlotsInitialized)
			{
				TupleDesc	tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
				TupleDesc	plan_tdesc =
					CreateTupleDescCopy(planSlot->tts_tupleDescriptor);

				resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
					MakeSingleTupleTableSlot(tdesc, slot->tts_ops);

				resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
					MakeSingleTupleTableSlot(plan_tdesc, planSlot->tts_ops);

				/* remember how many batch slots we initialized */
				resultRelInfo->ri_NumSlotsInitialized++;
			}

			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
						 slot);

			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
						 planSlot);

			resultRelInfo->ri_NumSlots++;

			MemoryContextSwitchTo(oldContext);

			return NULL;
		}

		/*
		 * insert into foreign table: let the FDW do it
		 */
		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
															   resultRelInfo,
															   slot,
															   planSlot);

		if (slot == NULL)		/* "do nothing" */
			return NULL;

		/*
		 * AFTER ROW Triggers or RETURNING expressions might reference the
		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
		 * them.  (This covers the case where the FDW replaced the slot.)
		 */
		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
	}
	else
	{
		WCOKind		wco_kind;

		/*
		 * Constraints and GENERATED expressions might reference the tableoid
		 * column, so (re-)initialize tts_tableOid before evaluating them.
		 */
		slot->tts_tableOid = RelationGetRelid(resultRelationDesc);

		/*
		 * Compute stored generated columns
		 */
		if (resultRelationDesc->rd_att->constr &&
			resultRelationDesc->rd_att->constr->has_generated_stored)
			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
									   CMD_INSERT);

		/*
		 * Check any RLS WITH CHECK policies.
		 *
		 * Normally we should check INSERT policies. But if the insert is the
		 * result of a partition key update that moved the tuple to a new
		 * partition, we should instead check UPDATE policies, because we are
		 * executing policies defined on the target table, and not those
		 * defined on the child partitions.
		 */
		wco_kind = (mtstate->operation == CMD_UPDATE) ?
			WCO_RLS_UPDATE_CHECK : WCO_RLS_INSERT_CHECK;

		/*
		 * ExecWithCheckOptions() will skip any WCOs which are not of the kind
		 * we are looking for at this point.
		 */
		if (resultRelInfo->ri_WithCheckOptions != NIL)
			ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);

		/*
		 * Check the constraints of the tuple.
		 */
		if (resultRelationDesc->rd_att->constr)
			ExecConstraints(resultRelInfo, slot, estate);

		/*
		 * Also check the tuple against the partition constraint, if there is
		 * one; except that if we got here via tuple-routing, we don't need to
		 * if there's no BR trigger defined on the partition.
		 */
		if (resultRelationDesc->rd_rel->relispartition &&
			(resultRelInfo->ri_RootResultRelInfo == NULL ||
			 (resultRelInfo->ri_TrigDesc &&
			  resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
			ExecPartitionCheck(resultRelInfo, slot, estate, true);

		if (onconflict != ONCONFLICT_NONE && resultRelInfo->ri_NumIndices > 0)
		{
			/* Perform a speculative insertion. */
			uint32		specToken;
			ItemPointerData conflictTid;
			bool		specConflict;
			List	   *arbiterIndexes;

			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;

			/*
			 * Do a non-conclusive check for conflicts first.
			 *
			 * We're not holding any locks yet, so this doesn't guarantee that
			 * the later insert won't conflict.  But it avoids leaving behind
			 * a lot of canceled speculative insertions, if you run a lot of
			 * INSERT ON CONFLICT statements that do conflict.
			 *
			 * We loop back here if we find a conflict below, either during
			 * the pre-check, or when we re-check after inserting the tuple
			 * speculatively.
			 */
	vlock:
			specConflict = false;
			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
										   &conflictTid, arbiterIndexes))
			{
				/* committed conflict tuple found */
				if (onconflict == ONCONFLICT_UPDATE)
				{
					/*
					 * In case of ON CONFLICT DO UPDATE, execute the UPDATE
					 * part.  Be prepared to retry if the UPDATE fails because
					 * of another concurrent UPDATE/DELETE to the conflict
					 * tuple.
					 */
					TupleTableSlot *returning = NULL;

					if (ExecOnConflictUpdate(mtstate, resultRelInfo,
											 &conflictTid, planSlot, slot,
											 estate, canSetTag, &returning))
					{
						InstrCountTuples2(&mtstate->ps, 1);
						return returning;
					}
					else
						goto vlock;
				}
				else
				{
					/*
					 * In case of ON CONFLICT DO NOTHING, do nothing. However,
					 * verify that the tuple is visible to the executor's MVCC
					 * snapshot at higher isolation levels.
					 *
					 * Using ExecGetReturningSlot() to store the tuple for the
					 * recheck isn't that pretty, but we can't trivially use
					 * the input slot, because it might not be of a compatible
					 * type. As there's no conflicting usage of
					 * ExecGetReturningSlot() in the DO NOTHING case...
					 */
					Assert(onconflict == ONCONFLICT_NOTHING);
					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid,
										ExecGetReturningSlot(estate, resultRelInfo));
					InstrCountTuples2(&mtstate->ps, 1);
					return NULL;
				}
			}

			/*
			 * Before we start insertion proper, acquire our "speculative
			 * insertion lock".  Others can use that to wait for us to decide
			 * if we're going to go ahead with the insertion, instead of
			 * waiting for the whole transaction to complete.
			 */
			specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());

			/* insert the tuple, with the speculative token */
			table_tuple_insert_speculative(resultRelationDesc, slot,
										   estate->es_output_cid,
										   0,
										   NULL,
										   specToken);

			/* insert index entries for tuple */
			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
												   slot, estate, false, true,
												   &specConflict,
												   arbiterIndexes);

			/* adjust the tuple's state accordingly */
			table_tuple_complete_speculative(resultRelationDesc, slot,
											 specToken, !specConflict);

			/*
			 * Wake up anyone waiting for our decision.  They will re-check
			 * the tuple, see that it's no longer speculative, and wait on our
			 * XID as if this was a regularly inserted tuple all along.  Or if
			 * we killed the tuple, they will see it's dead, and proceed as if
			 * the tuple never existed.
			 */
			SpeculativeInsertionLockRelease(GetCurrentTransactionId());

			/*
			 * If there was a conflict, start from the beginning.  We'll do
			 * the pre-check again, which will now find the conflicting tuple
			 * (unless it aborts before we get there).
			 */
			if (specConflict)
			{
				list_free(recheckIndexes);
				goto vlock;
			}

			/* Since there was no insertion conflict, we're done */
		}
		else
		{
			/* insert the tuple normally */
			table_tuple_insert(resultRelationDesc, slot,
							   estate->es_output_cid,
							   0, NULL);

			/* insert index entries for tuple */
			if (resultRelInfo->ri_NumIndices > 0)
				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
													   slot, estate, false,
													   false, NULL, NIL);
		}
	}

	if (canSetTag)
		(estate->es_processed)++;

	/*
	 * If this insert is the result of a partition key update that moved the
	 * tuple to a new partition, put this row into the transition NEW TABLE,
	 * if there is one. We need to do this separately for DELETE and INSERT
	 * because they happen on different tables.
	 */
	ar_insert_trig_tcs = mtstate->mt_transition_capture;
	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
		&& mtstate->mt_transition_capture->tcs_update_new_table)
	{
		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
							 NULL,
							 slot,
							 NULL,
							 mtstate->mt_transition_capture);

		/*
		 * We've already captured the NEW TABLE row, so make sure any AR
		 * INSERT trigger fired below doesn't capture it again.
		 */
		ar_insert_trig_tcs = NULL;
	}

	/* AFTER ROW INSERT Triggers */
	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
						 ar_insert_trig_tcs);

	list_free(recheckIndexes);

	/*
	 * Check any WITH CHECK OPTION constraints from parent views.  We are
	 * required to do this after testing all constraints and uniqueness
	 * violations per the SQL spec, so we do it after actually inserting the
	 * record into the heap and all indexes.
	 *
	 * ExecWithCheckOptions will elog(ERROR) if a violation is found, so the
	 * tuple will never be seen, if it violates the WITH CHECK OPTION.
	 *
	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind we
	 * are looking for at this point.
	 */
	if (resultRelInfo->ri_WithCheckOptions != NIL)
		ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);

	/* Process RETURNING if present */
	if (resultRelInfo->ri_projectReturning)
		result = ExecProcessReturning(resultRelInfo, slot, planSlot);

	return result;
}

ExecModifyTable


/* ----------------------------------------------------------------
 *	   ExecModifyTable
 *
 *		Perform table modifications as required, and return RETURNING results
 *		if needed.
 * ----------------------------------------------------------------
 */
static TupleTableSlot *
ExecModifyTable(PlanState *pstate)
{
	ModifyTableState *node = castNode(ModifyTableState, pstate);
	EState	   *estate = node->ps.state;
	CmdType		operation = node->operation;
	ResultRelInfo *resultRelInfo;
	PlanState  *subplanstate;
	TupleTableSlot *slot;
	TupleTableSlot *planSlot;
	TupleTableSlot *oldSlot;
	ItemPointer tupleid;
	ItemPointerData tuple_ctid;
	HeapTupleData oldtupdata;
	HeapTuple	oldtuple;
	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
	List	   *relinfos = NIL;
	ListCell   *lc;

	CHECK_FOR_INTERRUPTS();

	/*
	 * This should NOT get called during EvalPlanQual; we should have passed a
	 * subplan tree to EvalPlanQual, instead.  Use a runtime test not just
	 * Assert because this condition is easy to miss in testing.  (Note:
	 * although ModifyTable should not get executed within an EvalPlanQual
	 * operation, we do have to allow it to be initialized and shut down in
	 * case it is within a CTE subplan.  Hence this test must be here, not in
	 * ExecInitModifyTable.)
	 */
	if (estate->es_epq_active != NULL)
		elog(ERROR, "ModifyTable should not be called during EvalPlanQual");

	/*
	 * If we've already completed processing, don't try to do more.  We need
	 * this test because ExecPostprocessPlan might call us an extra time, and
	 * our subplan's nodes aren't necessarily robust against being called
	 * extra times.
	 */
	if (node->mt_done)
		return NULL;

	/*
	 * On first call, fire BEFORE STATEMENT triggers before proceeding.
	 */
	if (node->fireBSTriggers)
	{
		fireBSTriggers(node);
		node->fireBSTriggers = false;
	}

	/* Preload local variables */
	resultRelInfo = node->resultRelInfo + node->mt_lastResultIndex;
	subplanstate = outerPlanState(node);

	/*
	 * Fetch rows from subplan, and execute the required table modification
	 * for each row.
	 */
	for (;;)
	{
		/*
		 * Reset the per-output-tuple exprcontext.  This is needed because
		 * triggers expect to use that context as workspace.  It's a bit ugly
		 * to do this below the top level of the plan, however.  We might need
		 * to rethink this later.
		 */
		ResetPerTupleExprContext(estate);

		/*
		 * Reset per-tuple memory context used for processing on conflict and
		 * returning clauses, to free any expression evaluation storage
		 * allocated in the previous cycle.
		 */
		if (pstate->ps_ExprContext)
			ResetExprContext(pstate->ps_ExprContext);

		planSlot = ExecProcNode(subplanstate);

		/* No more tuples to process? */
		if (TupIsNull(planSlot))
			break;

		/*
		 * When there are multiple result relations, each tuple contains a
		 * junk column that gives the OID of the rel from which it came.
		 * Extract it and select the correct result relation.
		 */
		if (AttributeNumberIsValid(node->mt_resultOidAttno))
		{
			Datum		datum;
			bool		isNull;
			Oid			resultoid;

			datum = ExecGetJunkAttribute(planSlot, node->mt_resultOidAttno,
										 &isNull);
			if (isNull)
				elog(ERROR, "tableoid is NULL");
			resultoid = DatumGetObjectId(datum);

			/* If it's not the same as last time, we need to locate the rel */
			if (resultoid != node->mt_lastResultOid)
				resultRelInfo = ExecLookupResultRelByOid(node, resultoid,
														 false, true);
		}

		/*
		 * If resultRelInfo->ri_usesFdwDirectModify is true, all we need to do
		 * here is compute the RETURNING expressions.
		 */
		if (resultRelInfo->ri_usesFdwDirectModify)
		{
			Assert(resultRelInfo->ri_projectReturning);

			/*
			 * A scan slot containing the data that was actually inserted,
			 * updated or deleted has already been made available to
			 * ExecProcessReturning by IterateDirectModify, so no need to
			 * provide it here.
			 */
			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);

			return slot;
		}

		EvalPlanQualSetSlot(&node->mt_epqstate, planSlot);
		slot = planSlot;

		tupleid = NULL;
		oldtuple = NULL;

		/*
		 * For UPDATE/DELETE, fetch the row identity info for the tuple to be
		 * updated/deleted.  For a heap relation, that's a TID; otherwise we
		 * may have a wholerow junk attr that carries the old tuple in toto.
		 * Keep this in step with the part of ExecInitModifyTable that sets up
		 * ri_RowIdAttNo.
		 */
		if (operation == CMD_UPDATE || operation == CMD_DELETE)
		{
			char		relkind;
			Datum		datum;
			bool		isNull;

			relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
			if (relkind == RELKIND_RELATION ||
				relkind == RELKIND_MATVIEW ||
				relkind == RELKIND_PARTITIONED_TABLE)
			{
				/* ri_RowIdAttNo refers to a ctid attribute */
				Assert(AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo));
				datum = ExecGetJunkAttribute(slot,
											 resultRelInfo->ri_RowIdAttNo,
											 &isNull);
				/* shouldn't ever get a null result... */
				if (isNull)
					elog(ERROR, "ctid is NULL");

				tupleid = (ItemPointer) DatumGetPointer(datum);
				tuple_ctid = *tupleid;	/* be sure we don't free ctid!! */
				tupleid = &tuple_ctid;
			}

			/*
			 * Use the wholerow attribute, when available, to reconstruct the
			 * old relation tuple.  The old tuple serves one or both of two
			 * purposes: 1) it serves as the OLD tuple for row triggers, 2) it
			 * provides values for any unchanged columns for the NEW tuple of
			 * an UPDATE, because the subplan does not produce all the columns
			 * of the target table.
			 *
			 * Note that the wholerow attribute does not carry system columns,
			 * so foreign table triggers miss seeing those, except that we
			 * know enough here to set t_tableOid.  Quite separately from
			 * this, the FDW may fetch its own junk attrs to identify the row.
			 *
			 * Other relevant relkinds, currently limited to views, always
			 * have a wholerow attribute.
			 */
			else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
			{
				datum = ExecGetJunkAttribute(slot,
											 resultRelInfo->ri_RowIdAttNo,
											 &isNull);
				/* shouldn't ever get a null result... */
				if (isNull)
					elog(ERROR, "wholerow is NULL");

				oldtupdata.t_data = DatumGetHeapTupleHeader(datum);
				oldtupdata.t_len =
					HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
				ItemPointerSetInvalid(&(oldtupdata.t_self));
				/* Historically, view triggers see invalid t_tableOid. */
				oldtupdata.t_tableOid =
					(relkind == RELKIND_VIEW) ? InvalidOid :
					RelationGetRelid(resultRelInfo->ri_RelationDesc);

				oldtuple = &oldtupdata;
			}
			else
			{
				/* Only foreign tables are allowed to omit a row-ID attr */
				Assert(relkind == RELKIND_FOREIGN_TABLE);
			}
		}

		switch (operation)
		{
			case CMD_INSERT:
				/* Initialize projection info if first time for this table */
				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
					ExecInitInsertProjection(node, resultRelInfo);
				slot = ExecGetInsertNewTuple(resultRelInfo, planSlot);
				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
								  estate, node->canSetTag);
				break;
			case CMD_UPDATE:
				/* Initialize projection info if first time for this table */
				if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
					ExecInitUpdateProjection(node, resultRelInfo);

				/*
				 * Make the new tuple by combining plan's output tuple with
				 * the old tuple being updated.
				 */
				oldSlot = resultRelInfo->ri_oldTupleSlot;
				if (oldtuple != NULL)
				{
					/* Use the wholerow junk attr as the old tuple. */
					ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
				}
				else
				{
					/* Fetch the most recent version of old tuple. */
					Relation	relation = resultRelInfo->ri_RelationDesc;

					Assert(tupleid != NULL);
					if (!table_tuple_fetch_row_version(relation, tupleid,
													   SnapshotAny,
													   oldSlot))
						elog(ERROR, "failed to fetch tuple being updated");
				}
				slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
											 oldSlot);

				/* Now apply the update. */
				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
								  planSlot, &node->mt_epqstate, estate,
								  node->canSetTag);
				break;
			case CMD_DELETE:
				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
								  planSlot, &node->mt_epqstate, estate,
								  true, /* processReturning */
								  node->canSetTag,
								  false,	/* changingPart */
								  NULL, NULL);
				break;
			default:
				elog(ERROR, "unknown operation");
				break;
		}

		/*
		 * If we got a RETURNING result, return it to caller.  We'll continue
		 * the work on next call.
		 */
		if (slot)
			return slot;
	}

	/*
	 * Insert remaining tuples for batch insert.
	 */
	if (proute)
		relinfos = estate->es_tuple_routing_result_relations;
	else
		relinfos = estate->es_opened_result_relations;

	foreach(lc, relinfos)
	{
		resultRelInfo = lfirst(lc);
		if (resultRelInfo->ri_NumSlots > 0)
			ExecBatchInsert(node, resultRelInfo,
							resultRelInfo->ri_Slots,
							resultRelInfo->ri_PlanSlots,
							resultRelInfo->ri_NumSlots,
							estate, node->canSetTag);
	}

	/*
	 * We're done, but fire AFTER STATEMENT triggers before exiting.
	 */
	fireASTriggers(node);

	node->mt_done = true;

	return NULL;
}

exec_simple_query


/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	CommandDest dest = whereToSendOutput;
	MemoryContext oldcontext;
	List	   *parsetree_list;
	ListCell   *parsetree_item;
	bool		save_log_statement_stats = log_statement_stats;
	bool		was_logged = false;
	bool		use_implicit_block;
	char		msec_str[32];

	/*
	 * Report query to various monitoring facilities.
	 */
	debug_query_string = query_string;

	pgstat_report_activity(STATE_RUNNING, query_string);

	TRACE_POSTGRESQL_QUERY_START(query_string);

	/*
	 * We use save_log_statement_stats so ShowUsage doesn't report incorrect
	 * results because ResetUsage wasn't called.
	 */
	if (save_log_statement_stats)
		ResetUsage();

	/*
	 * Start up a transaction command.  All queries generated by the
	 * query_string will be in this same command block, *unless* we find a
	 * BEGIN/COMMIT/ABORT statement; we have to force a new xact command after
	 * one of those, else bad things will happen in xact.c. (Note that this
	 * will normally change current memory context.)
	 */
	start_xact_command();

	/*
	 * Zap any pre-existing unnamed statement.  (While not strictly necessary,
	 * it seems best to define simple-Query mode as if it used the unnamed
	 * statement and portal; this ensures we recover any storage used by prior
	 * unnamed operations.)
	 */
	drop_unnamed_stmt();

	/*
	 * Switch to appropriate context for constructing parsetrees.
	 */
	oldcontext = MemoryContextSwitchTo(MessageContext);

	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	/* Log immediately if dictated by log_statement */
	if (check_log_statement(parsetree_list))
	{
		ereport(LOG,
				(errmsg("statement: %s", query_string),
				 errhidestmt(true),
				 errdetail_execute(parsetree_list)));
		was_logged = true;
	}

	/*
	 * Switch back to transaction context to enter the loop.
	 */
	MemoryContextSwitchTo(oldcontext);

	/*
	 * For historical reasons, if multiple SQL statements are given in a
	 * single "simple Query" message, we execute them as a single transaction,
	 * unless explicit transaction control commands are included to make
	 * portions of the list be separate transactions.  To represent this
	 * behavior properly in the transaction machinery, we use an "implicit"
	 * transaction block.
	 */
	use_implicit_block = (list_length(parsetree_list) > 1);

	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		RawStmt    *parsetree = lfirst_node(RawStmt, parsetree_item);
		bool		snapshot_set = false;
		CommandTag	commandTag;
		QueryCompletion qc;
		MemoryContext per_parsetree_context = NULL;
		List	   *querytree_list,
				   *plantree_list;
		Portal		portal;
		DestReceiver *receiver;
		int16		format;

		pgstat_report_query_id(0, true);

		/*
		 * Get the command name for use in status display (it also becomes the
		 * default completion tag, down inside PortalRun).  Set ps_status and
		 * do any special start-of-SQL-command processing needed by the
		 * destination.
		 */
		commandTag = CreateCommandTag(parsetree->stmt);

		set_ps_display(GetCommandTagName(commandTag));

		BeginCommand(commandTag, dest);

		/*
		 * If we are in an aborted transaction, reject all commands except
		 * COMMIT/ABORT.  It is important that this test occur before we try
		 * to do parse analysis, rewrite, or planning, since all those phases
		 * try to do database accesses, which may fail in abort state. (It
		 * might be safe to allow some additional utility commands in this
		 * state, but not many...)
		 */
		if (IsAbortedTransactionBlockState() &&
			!IsTransactionExitStmt(parsetree->stmt))
			ereport(ERROR,
					(errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
					 errmsg("current transaction is aborted, "
							"commands ignored until end of transaction block"),
					 errdetail_abort()));

		/* Make sure we are in a transaction command */
		start_xact_command();

		/*
		 * If using an implicit transaction block, and we're not already in a
		 * transaction block, start an implicit block to force this statement
		 * to be grouped together with any following ones.  (We must do this
		 * each time through the loop; otherwise, a COMMIT/ROLLBACK in the
		 * list would cause later statements to not be grouped.)
		 */
		if (use_implicit_block)
			BeginImplicitTransactionBlock();

		/* If we got a cancel signal in parsing or prior command, quit */
		CHECK_FOR_INTERRUPTS();

		/*
		 * Set up a snapshot if parse analysis/planning will need one.
		 */
		if (analyze_requires_snapshot(parsetree))
		{
			PushActiveSnapshot(GetTransactionSnapshot());
			snapshot_set = true;
		}

		/*
		 * OK to analyze, rewrite, and plan this query.
		 *
		 * Switch to appropriate context for constructing query and plan trees
		 * (these can't be in the transaction context, as that will get reset
		 * when the command is COMMIT/ROLLBACK).  If we have multiple
		 * parsetrees, we use a separate context for each one, so that we can
		 * free that memory before moving on to the next one.  But for the
		 * last (or only) parsetree, just use MessageContext, which will be
		 * reset shortly after completion anyway.  In event of an error, the
		 * per_parsetree_context will be deleted when MessageContext is reset.
		 */
		if (lnext(parsetree_list, parsetree_item) != NULL)
		{
			per_parsetree_context =
				AllocSetContextCreate(MessageContext,
									  "per-parsetree message context",
									  ALLOCSET_DEFAULT_SIZES);
			oldcontext = MemoryContextSwitchTo(per_parsetree_context);
		}
		else
			oldcontext = MemoryContextSwitchTo(MessageContext);

		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);

		plantree_list = pg_plan_queries(querytree_list, query_string,
										CURSOR_OPT_PARALLEL_OK, NULL);

		/*
		 * Done with the snapshot used for parsing/planning.
		 *
		 * While it looks promising to reuse the same snapshot for query
		 * execution (at least for simple protocol), unfortunately it causes
		 * execution to use a snapshot that has been acquired before locking
		 * any of the tables mentioned in the query.  This creates user-
		 * visible anomalies, so refrain.  Refer to
		 * https://postgr.es/m/flat/5075D8DF.6050500@fuzzy.cz for details.
		 */
		if (snapshot_set)
			PopActiveSnapshot();

		/* If we got a cancel signal in analysis or planning, quit */
		CHECK_FOR_INTERRUPTS();

		/*
		 * Create unnamed portal to run the query or queries in. If there
		 * already is one, silently drop it.
		 */
		portal = CreatePortal("", true, true);
		/* Don't display the portal in pg_cursors */
		portal->visible = false;

		/*
		 * We don't have to copy anything into the portal, because everything
		 * we are passing here is in MessageContext or the
		 * per_parsetree_context, and so will outlive the portal anyway.
		 */
		PortalDefineQuery(portal,
						  NULL,
						  query_string,
						  commandTag,
						  plantree_list,
						  NULL);

		/*
		 * Start the portal.  No parameters here.
		 */
		PortalStart(portal, NULL, 0, InvalidSnapshot);

		/*
		 * Select the appropriate output format: text unless we are doing a
		 * FETCH from a binary cursor.  (Pretty grotty to have to do this here
		 * --- but it avoids grottiness in other places.  Ah, the joys of
		 * backward compatibility...)
		 */
		format = 0;				/* TEXT is default */
		if (IsA(parsetree->stmt, FetchStmt))
		{
			FetchStmt  *stmt = (FetchStmt *) parsetree->stmt;

			if (!stmt->ismove)
			{
				Portal		fportal = GetPortalByName(stmt->portalname);

				if (PortalIsValid(fportal) &&
					(fportal->cursorOptions & CURSOR_OPT_BINARY))
					format = 1; /* BINARY */
			}
		}
		PortalSetResultFormat(portal, 1, &format);

		/*
		 * Now we can create the destination receiver object.
		 */
		receiver = CreateDestReceiver(dest);
		if (dest == DestRemote)
			SetRemoteDestReceiverParams(receiver, portal);

		/*
		 * Switch back to transaction context for execution.
		 */
		MemoryContextSwitchTo(oldcontext);

		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 &qc);

		receiver->rDestroy(receiver);

		PortalDrop(portal, false);

		if (lnext(parsetree_list, parsetree_item) == NULL)
		{
			/*
			 * If this is the last parsetree of the query string, close down
			 * transaction statement before reporting command-complete.  This
			 * is so that any end-of-transaction errors are reported before
			 * the command-complete message is issued, to avoid confusing
			 * clients who will expect either a command-complete message or an
			 * error, not one and then the other.  Also, if we're using an
			 * implicit transaction block, we must close that out first.
			 */
			if (use_implicit_block)
				EndImplicitTransactionBlock();
			finish_xact_command();
		}
		else if (IsA(parsetree->stmt, TransactionStmt))
		{
			/*
			 * If this was a transaction control statement, commit it. We will
			 * start a new xact command for the next command.
			 */
			finish_xact_command();
		}
		else
		{
			/*
			 * We need a CommandCounterIncrement after every query, except
			 * those that start or end a transaction block.
			 */
			CommandCounterIncrement();

			/*
			 * Disable statement timeout between queries of a multi-query
			 * string, so that the timeout applies separately to each query.
			 * (Our next loop iteration will start a fresh timeout.)
			 */
			disable_statement_timeout();
		}

		/*
		 * Tell client that we're done with this query.  Note we emit exactly
		 * one EndCommand report for each raw parsetree, thus one for each SQL
		 * command the client sent, regardless of rewriting. (But a command
		 * aborted by error will not send an EndCommand report at all.)
		 */
		EndCommand(&qc, dest, false);

		/* Now we may drop the per-parsetree context, if one was created. */
		if (per_parsetree_context)
			MemoryContextDelete(per_parsetree_context);
	}							/* end loop over parsetrees */

	/*
	 * Close down transaction statement, if one is open.  (This will only do
	 * something if the parsetree list was empty; otherwise the last loop
	 * iteration already did it.)
	 */
	finish_xact_command();

	/*
	 * If there were no parsetrees, return EmptyQueryResponse message.
	 */
	if (!parsetree_list)
		NullCommand(dest);

	/*
	 * Emit duration logging if appropriate.
	 */
	switch (check_log_duration(msec_str, was_logged))
	{
		case 1:
			ereport(LOG,
					(errmsg("duration: %s ms", msec_str),
					 errhidestmt(true)));
			break;
		case 2:
			ereport(LOG,
					(errmsg("duration: %s ms  statement: %s",
							msec_str, query_string),
					 errhidestmt(true),
					 errdetail_execute(parsetree_list)));
			break;
	}

	if (save_log_statement_stats)
		ShowUsage("QUERY STATISTICS");

	TRACE_POSTGRESQL_QUERY_DONE(query_string);

	debug_query_string = NULL;
}

ExecProcNode

/* ----------------------------------------------------------------
 *		ExecProcNode
 *
 *		Execute the given node to return a(nother) tuple.
 * ----------------------------------------------------------------
 */
#ifndef FRONTEND
static inline TupleTableSlot *
ExecProcNode(PlanState *node)
{
	if (node->chgParam != NULL) /* something changed? */
		ExecReScan(node);		/* let ReScan handle this */

	return node->ExecProcNode(node);
}
#endif

raw_parser



/*
 * raw_parser
 *		Given a query in string form, do lexical and grammatical analysis.
 *
 * Returns a list of raw (un-analyzed) parse trees.  The contents of the
 * list have the form required by the specified RawParseMode.
 */
List *
raw_parser(const char *str, RawParseMode mode)
{
	core_yyscan_t yyscanner;
	base_yy_extra_type yyextra;
	int			yyresult;

	/* initialize the flex scanner */
	yyscanner = scanner_init(str, &yyextra.core_yy_extra,
							 &ScanKeywords, ScanKeywordTokens);

	/* base_yylex() only needs us to initialize the lookahead token, if any */
	if (mode == RAW_PARSE_DEFAULT)
		yyextra.have_lookahead = false;
	else
	{
		/* this array is indexed by RawParseMode enum */
		static const int mode_token[] = {
			0,					/* RAW_PARSE_DEFAULT */
			MODE_TYPE_NAME,		/* RAW_PARSE_TYPE_NAME */
			MODE_PLPGSQL_EXPR,	/* RAW_PARSE_PLPGSQL_EXPR */
			MODE_PLPGSQL_ASSIGN1,	/* RAW_PARSE_PLPGSQL_ASSIGN1 */
			MODE_PLPGSQL_ASSIGN2,	/* RAW_PARSE_PLPGSQL_ASSIGN2 */
			MODE_PLPGSQL_ASSIGN3	/* RAW_PARSE_PLPGSQL_ASSIGN3 */
		};

		yyextra.have_lookahead = true;
		yyextra.lookahead_token = mode_token[mode];
		yyextra.lookahead_yylloc = 0;
		yyextra.lookahead_end = NULL;
	}

	/* initialize the bison parser */
	parser_init(&yyextra);

	/* Parse! */
	yyresult = base_yyparse(yyscanner);

	/* Clean up (release memory) */
	scanner_finish(yyscanner);

	if (yyresult)				/* error */
		return NIL;

	return yyextra.parsetree;
}

pg_parse_query


/*
 * Do raw parsing (only).
 *
 * A list of parsetrees (RawStmt nodes) is returned, since there might be
 * multiple commands in the given string.
 *
 * NOTE: for interactive queries, it is important to keep this routine
 * separate from the analysis & rewrite stages.  Analysis and rewriting
 * cannot be done in an aborted transaction, since they require access to
 * database tables.  So, we rely on the raw parser to determine whether
 * we've seen a COMMIT or ABORT command; when we are in abort state, other
 * commands are not processed any further than the raw parse stage.
 */
List *
pg_parse_query(const char *query_string)
{
	List	   *raw_parsetree_list;

	TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);

	if (log_parser_stats)
		ResetUsage();

	raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);

	if (log_parser_stats)
		ShowUsage("PARSER STATISTICS");

#ifdef COPY_PARSE_PLAN_TREES
	/* Optional debugging check: pass raw parsetrees through copyObject() */
	{
		List	   *new_list = copyObject(raw_parsetree_list);

		/* This checks both copyObject() and the equal() routines... */
		if (!equal(new_list, raw_parsetree_list))
			elog(WARNING, "copyObject() failed to produce an equal raw parse tree");
		else
			raw_parsetree_list = new_list;
	}
#endif

	/*
	 * Currently, outfuncs/readfuncs support is missing for many raw parse
	 * tree nodes, so we don't try to implement WRITE_READ_PARSE_PLAN_TREES
	 * here.
	 */

	TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);

	return raw_parsetree_list;
}