Custom Auto-Generated Sequences with SQL Server

 http://www.sqlteam.com/article/custom-auto-generated-sequences-with-sql-server

By Jeff Smith

This article by Jeff Smith covers different ways to create sequence numbers. It starts with the basic identity and GUIDs and quickly moves to much more complicated sequence numbers including those with auto-incrementing mixed letters and numbers such as "A0001", etc. Jeff shows one of the easiest ways I've ever seen to efficiently handle very odd sequence numbers.

Introduction

Sometimes you'd like the SQL Server itself to automatically generate a sequence for entities in your table as they are created. For example, assigning each new Customer added to your table a unique "CustomerNumber". There are effectively two ways to do using the built-in features that T-SQL provides:

  • Identity Columns - An identity is a common "auto generated" primary key to use in a SQL Server database these days. An identity is simply an integer value that "auto increments" as each new row is added to your table. You can specify when it should start and how it should increment when you add the column to your table:
    alter   table  YourTable  add  ID  int   identity (start, increment) 
  • GUID Columns - A "GUID" is a Globally Unique Identifier that can be assigned a unique yet random long string of characters like "B3FA6F0A-523F-4931-B3F8-0CF41E2A48EE". You can either use the NEWID() function when inserting into your table or set a default like this to implement a GUID column in your tables:
    alter   table  YourTable  add  ID  uniqueidentifier   default   newid () 
    However, we often see questions in the forums regarding how to create other types of auto-generated sequences in tables. For example, you might want your customers to automatically be assigned "Customer Numbers" as formatted like this:
    C0001
    C0002
    ... 
    C9998
    C9999 
    In other cases, people would like to use incrementing letters instead of numbers, some combination of both, or for the digits to be "reset" on some specific condition, and so on.

The most important and crucial part of implementing this is not writing the code! It is clearly defining your specification and ensuring that it is logical and works for you. Before you can write code that will automatically generate sequences for you, you must consider:

  • How many numbers will you ever need? Does your specification handle this?
  • What happens when values are deleted? Are they re-used?
  • Are these sequences dependant on data that might ever change? What happens to these values when the data does change? Does it make sense, then, to incorporate this data into your sequence algorithm?
  • If you have a complicated rule (i.e., "AA-00" through "ZZ-99"), is every step and possibility clearly defined? In this example, what comes after AA-99? Is it "BA-00", "AB-00", "BB-00", or something else? What comes after "ZZ-99"?

So, the very first step is to clearly, accurately, and completely define how your sequence values will be generated. You must explicitly map out how to handle all possible situations and you must do some research to ensure that your specification will work for the data you are handling. A primary key of "A0" through "Z9" will only work for 26*10 = 260 values -- is this really what you want?

There are a few different approaches you can take in order to facilitate this behavior in SQL Server, so let's take a look.

Option 1: Determine the next value by querying a table

This first approach is usually the most common, and in my opinion also the worst. What people try to do here is to query the existing table to retrieve the last sequence value created, and then use that value to create the next one.

For example, you might create a User-Defined Function that you could use in a trigger like this:

create   function  NextCustomerNumber() 
returns   char ( 5
as  
begin  
    
declare   @lastval   char ( 5
    
set   @lastval   =  ( select   max (customerNumber)  from  Customers) 
    
if   @lastval   is   null   set   @lastval   =   ' C0001 '  
    
declare   @i   int  
    
set   @i   =   right ( @lastval , 4 +   1  
    
return   ' C '   +   right ( ' 000 '   +   convert ( varchar ( 10 ), @i ), 4
end

This can cause some issues, however:

  • What if two processes attempt to add a row to the table at the exact same time? Can you ensure that the same value is not generated for both processes?
  • There can be overhead querying the existing data each time you'd like to insert new data
  • Unless this is implemented as a trigger, this means that all inserts to your data must always go through the same stored procedure that calculates these sequences. This means that bulk imports, or moving data from production to testing and so on, might not be possible or might be very inefficient.
  • If it is implemented as a trigger, will it work for a set-based multi-row INSERT statement? If so, how efficient will it be? This function wouldn't work if called for each row in a single set-based INSERT -- each NextCustomerNumber() returned would be the same value.

Overall, if this approach is absolutely required, then it's what you've got to do, but be sure that you consider the next two options first, which are much easier to implement and will generally work in most cases.

Option 2: Make it a presentation issue

The most common rules seem to be the simple ones, such as previous example ("C0000" to "C9999"), or something similar. Looking at this closely, we see that it is really just simply a number from 1-x, formatted with leading zeroes to be 4 digits, and then prefixed with a "C". Which means that all the database needs to do is generate a number from 1-x, which means ... why not just use an identity and let your front-end format the sequence value? Do you really need to store it in the database as a VARCHAR? What does this gain? If you simply use an identity and return an integer to the front-end, it is trivial to format it with a "C" in the front and with the necessary leading zeroes. By doing this, you have all of the advantages of a built-in SQL Server generated identity value with none of the headaches -- no worries about concurrency, performance, set-based triggers, and so on.

So, really carefully think to yourself: "Do I really need to format a simple integer in T-SQL and store that formatted value in my tables? Or can I simply use that integer internally throughout the database and format it any way I want at my presentation layer?"

The advantage of this approach is simplicity, but the disadvantage is that you must rely on your front-end applications and reports to understand how to format your codes.

Option 3: Let an Identity be your guide

If you really need to store the actual, auto-generated sequence in your tables, or if a simple incrementing integer formatted a certain way isn't enough, there is another simple option: Use a regular identity column internally as your table's primary key, but then use some math to calculate your external "auto-generated sequence" in another column using the identity value generated. This column can be stored in your table via a trigger, added as a computed column, or calculated using a View. You can implement it any way you wish.

The advantage of this approach is that we are using the database's built-in feature that guarantees that no two identities in a table will be alike, and that every row will get one; therefore, if we map each integer to a unique value in our sequence, we are guaranteed that all of our sequence values will also be unique. And this will work for set-based operations as well.

Let's start with the simple "C0000" - "C9999" example. First, let's create our Customers table like this:

create   table  Customers
(
    dbID 
int   identity   not   null   primary   key ,
    CustomerName 
varchar ( 100 )
)

Note that the dbID column is standard, database-generated identity which will be our physical primary key of the table. However, we will add a CustomerNumber column which will be what we expose to the outside world in the "C0000" format, as described.

Let's create a function accepts an integer, and uses that integer to return our CustomerNumber:

create   function  CustomerNumber ( @id   int
returns   char ( 5
as  
begin  
return   ' C '   +   right ( ' 0000 '   +   convert ( varchar ( 10 ),  @id ),  4
end

Using that function, we can simply add a computed column to our table like this:

alter   table  Customers  add  CustomerNumber  as  dbo.CustomerNumber(dbID)

Or, we could also create a column in our table to store the Customer Number, and use a trigger to populate it:

alter  Customers  add  CustomerNumber  varchar ( 10
create   trigger  Customers_insert  on  Customers 
after 
insert   as  
update  
    Customers 
set  
    Customers.customerNumber 
=  dbo.CustomerNumber(Customers.dbID) 
from  
    Customers 
inner   join  
    inserted 
on  Customers.dbID =  inserted.dbID

Using either method, once they are in place, we can simply insert into our table, and for each Row added a unique "Customer Number" is assigned:

insert   into  Customers (CustomerName)  values  ( ' jeff '
select   *   from  Customers 
/*
returns: 
(1 row(s) affected) 
dbID        CustomerName CustomerNumber 
----------- ------------ -------------- 
1           jeff         C0001 
(1 row(s) affected)
*/

The key to making this work is the formula that maps a 1:1 relation between integers and your sequence values. That example was very trivial to implement; let's try some more complicated ones. For example, suppose that the CustomerNumber will be in this format:

/**
AAAA 
AAAB 
... 
AAAZ 
AABA 
... 
ZZZZ
*
*/

How can we take an integer and map it to this? In this case, it is the same way you convert digits from decimal to hexadecimal or any other "base" -- we are converting from a base 10 (decimal) to base 26 (A-Z). Since we are working with a 4 "digit" value in base 26, this gives us 26 to the power of 4 different possibilities -- 456,976 different Customer Numbers can generated. (Remember the most important step before writing any code: is this acceptable?)

A simple algorithm to convert our integers to this base 26 number could be like this:

Starting with an integer identity value x:

  1. x mod 26 corresponds to the RIGHTMOST letter (0=A, 25=Z)
  2. x divided by 26 (26 to the power of 1) mod 26 corresponds to the NEXT letter from right to left
  3. x divided by 26*26 (26 to the power of 2) mod 26 corresponds to the NEXT letter, from right to left.
  4. x divided by 26*26*26 (26 to the power of 3) mod 26 corresponds to the LEFTMOST (first) letter.

To convert a number from 0-25 to a letter from A-Z, we add 65 to it and use the CHAR() function, since the ASCII value of "A" is 65 and the rest of the letters all follow in sequence (i.e., "B" is 66, "C" is 67, etc).

Thus, to implement this method, our CustomerNumber function becomes:

create   function  CustomerNumber ( @id   int
returns   char ( 5
as  
begin  
  
return   char ( @id   /   power ( 26 , 3 %   26   +   65 +  
     
char ( @id   /   power ( 26 , 2 %   26   +   65 +  
     
char ( @id   /   26   %   26   +   65 +  
     
char ( @id   %   26   +   65
end

Equally as important as writing the algorithm out on paper is then testing your implementation. We can do some simple testing like this to help us ensure that our function is working:

select  x, dbo.CustomerNumber2(x) 
from  
  ( 
    
select   1   as  x  union   all  
    
select   25   union   all  
    
select   26   union   all  
    
select   27   union   all  
    
select   51   union   all  
    
select   52  
  ) x 
/*
which returns: 


----------- ----- 
1           AAAB 
25          AAAZ 
26          AABA 
27          AABB 
51          AABZ 
52          AACA 

(6 row(s) affected)
*/

And that appears to do what we need. You should of course test the higher values as well.

The key is that we have mapped our integer values using our function to Customer Numbers from AAAA-ZZZZ, all guaranteed to be unique, and we don't need to worry about set-based inserts or lookups on our existing table.

Let's try one more. How about

/*
AA000 
AA001 
... 
AA999 
AB000 
AB001 
... 
AZ999 
BA000 
BA001 
... 
ZZ999 
*/

In this one, the left 2 digits are base 26; the right 3 are base 10. This gives us 26*26*10*10*10 = 676,000 possible values.

We use the same basic algorithm, starting from an integer x and working right to left:

  1. x mod 10 equals that is the rightmost digit
  2. x / 10 mod 10 equals the next digit
  3. x / 10*10 mod 10 equals the next digit
  4. x / 10*10*10 mod 26 equals the next letter
  5. x / 26*10*10*10 mod 26 equals the first letter.

Let's put this logic into a User Defined Function, which makes testing easier:

create   function  CustomerNumber( @i   int
returns   char ( 5
as  
begin  
  
return  ( char ( @i   /   26000   %   26   +   65 +  
    
char ( @i   /   1000   %   26   +   65 +  
    
char ( @i   /   100   %   10   +   48 +  
    
char ( @i   /   10   %   10   +   48 +  
    
char ( @i   %   10   +   48 )) 
  
end

And let's test this function to ensure that it works:

select  x, dbo.customerNumber(x) 
from  
 (
    
select   9   as  x  union   all  
    
select   99   union   all  
    
select   100   union   all  
    
select   999   union   all  
    
select   1000   union   all  
    
select   25999   union   all  
    
select   26000   union   all  
    
select   51999   union   all  
    
select   52000
 ) x 
/*

----------- ----- 
9           AA009 
99          AA099 
100         AA100 
999         AA999 
1000        AB000 
25999       AZ999 
26000       BA000 
51999       BZ999 
52000       CA000 

(9 row(s) affected) 
*/

And, again, that is just a guideline, but you should do as much testing as you can before implementing any of these features to ensure that they accurately do what you need. The key is to identify and test the "boundary" numbers where the values need to reset or change in a more complicated manner than simply incrementing a single digit.

If you need to add dashes, or a constant prefix or suffix or anything else, you can easily do it as well, all in your UDF. The key is to make your logic entirely dependant on an integer value, and to map each integer value to a unique value in your designated sequence. By doing this, you are letting SQL Server do the hard part -- ensure that your keys are consistent and unique -- but now you have the flexibility of creating your "CustomerNumbers" or other values in the exact format that you choose.

Summary

So, if you do find that you need to custom sequences of some sort at the database layer, here's my recommendation:

  1. Be sure that your code generation algorithm is well defined and handles your needs
  2. Be sure that it isn't just a presentation issue
  3. Write a UDF to map integers to your sequence values and let SQL Server generate those integers via an Identity column
  4. Test your UDF on its own thoroughly to ensure that it produces unique values in the format that you want
  5. Decide if a computed column, trigger, or View is the best way to incorporate these codes into your database layer
  6. Use identities internally as physical primary keys to your tables and for relations. (Note that this is optional; you may wish to use your sequence column, but be sure that it is indexed and constrained properly in your database.)

It can sometimes be tricky to come up with an algorithm and/or write the code to convert integers into your format, but if you can do it, I have found that in general it is the best approach to take.

CCF大数据与计算智能大赛-面向电信行业存量用户的智能套餐个性化匹配模型联通赛-复赛第二名-【多分类,embedding】.zip项目工程资源经过严格测试可直接运行成功且功能正常的情况才上传,可轻松复刻,拿到资料包后可轻松复现出一样的项目,本人系统开发经验充足(全领域),有任何使用问题欢迎随时与我联系,我会及时为您解惑,提供帮助。 【资源内容】:包含完整源码+工程文件+说明(如有)等。答辩评审平均分达到96分,放心下载使用!可轻松复现,设计报告也可借鉴此项目,该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的。 【提供帮助】:有任何使用问题欢迎随时与我联系,我会及时解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 下载后请首先打开README文件(如有),项目工程可直接复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值