Parallel bulk copy and IDENTITY columns

Parallel bulk copy and IDENTITY columns

When you areusing parallel bulk copy, IDENTITY columns can cause a bottleneck. As bcp reads in thedata, the utility both generates the values of the IDENTITY column and updates the IDENTITY column’smaximum value for each row. This extra work may adversely affect theperformance improvement that you expected to receive from using parallel bulkcopy.

To avoid thisbottleneck, you can explicitly specify the IDENTITY starting pointfor each session.

 

Retaining sort order

If you copysorted data into the table without explicitly specifying the IDENTITY starting point, bcp might not generate the IDENTITY column values in sorted order. Parallel bulk copy reads theinformation into all the partitions simultaneously and updates the values ofthe IDENTITY column as it reads in the data.

A bcp statement with no explicit starting point wouldproduce IDENTITY column numbers similar to those shown in Figure 4-2:

Figure 4-2: Producing IDENTITYcolumns in sorted order

The table has amaximum IDENTITY column number of 119, but the order is no longermeaningful.

If you wantAdaptive Server to enforce unique IDENTITY column values,you must run bcp with either the -g or -E parameter.

 

Specifying the starting point from the command line

Use the -g id_start_value flag to specify an IDENTITY starting pointfor a session in the command line.

The -gparameter instructs Adaptive Server to generate a sequence of IDENTITY column values for the bcp session without checkingand updating the maximum value of the table’s IDENTITY column foreach row. Instead of checking, Adaptive Server updates the maximum value at theend of each batch.

WARNING! Be cautious about creating duplicateidentity values inadvertently when you specify identity value ranges thatoverlap.

To specify astarting IDENTITY value, enter:

bcp[-gid_start_value]

For example, tocopy in four files, each of which has 100 rows, enter:

bcpmydb..bigtable in file1 -g100

bcpmydb..bigtable in file2 -g200

bcpmydb..bigtable in file3 -g300

bcpmydb..bigtable in file4 -g400

 

Using the -gparameter does not guarantee that the IDENTITY column valuesare unique. To ensure uniqueness, you must:

·        Know how many rows arein the input files and what the highest existing value is. Use this informationto set the starting values with the -g parameter and generate rangesthat do not overlap.

In the example above, if any file contains more than 100 rows, theidentity values overlap into the next 100 rows of data, creating duplicateidentity values.

·        Make sure that no oneelse is inserting data that can produce conflicting IDENTITY values.

 

Specifying the starting point using the data file

Use the -Eparameter to set the IDENTITY starting point explicitly from the data file.

The -E parameter instructs bcp to prompt youto enter an explicit IDENTITY column value for each row. If the number of inserted rowsexceeds the maximum possible IDENTITY column value, Adaptive Server returns an error.

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值