SQL Server截取字符串和处理中文技巧

原创 2014年04月26日 10:03:32

一 环境介绍

SQL  Server

PRINT @@VERSION
MicrosoftSQLServer2012-11.0.2100.60(X64)
Feb10201219:39:15
Copyright(c)MicrosoftCorporation
EnterpriseEdition:Core-basedLicensing(64-bit)onWindowsNT6.1(Build7601:ServicePack1)

操作系统
------------------
System Information
------------------
Operating System: Windows 7 Ultimate 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.130828-1532)
System Model: Aspire E1-471G
Processor: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz (4 CPUs), ~2.6GHz
Memory: 4096MB RAM



二 实现功能

从一大堆有包含中文字符和编号的字符串中过滤出编号。


三 实现模拟


首先,我们准备测试数据,注意,这里的数据全部都是模拟数据,无实际含义。语句如下:
CREATE TABLE #temp
(
   name VARCHAR(80)
);

INSERT INTO #temp
VALUES     ('五道口店3059');

INSERT INTO #temp
VALUES     ('五羊邨店3060');

INSERT INTO #temp
VALUES     ('杨家屯店3061');

INSERT INTO #temp
VALUES     ('十里堤店3062');

INSERT INTO #temp
VALUES     ('中关村店3063');

INSERT INTO #temp
VALUES     ('丽秀店3064');

INSERT INTO #temp
VALUES     ('石门店3065');

INSERT INTO #temp
VALUES     ('黄村店3066');

INSERT INTO #temp
VALUES     ('东圃店3067');

INSERT INTO #temp
VALUES     ('天河店3068');

INSERT INTO #temp
VALUES     ('人民路广场3069');

INSERT INTO #temp
VALUES     ('社区中心3070');

INSERT INTO #temp
VALUES     ('珠海市3071');

INSERT INTO #temp
VALUES     ('丽都3072');

INSERT INTO #temp
VALUES     ('晓月3073');

INSERT INTO #temp
VALUES     ('旧区3074');

INSERT INTO #temp
VALUES     ('新城3075');

INSERT INTO #temp
VALUES     ('水井沟3076'); 


然后,我们观察数据,发现这些数据都有规律,编号是数字,占4个字符。数字前面包含店、场、心、市、都、月、区、城、沟共9个字符。
我们试着采用SQL Server内置的函数Substring、Charindex、Rtrim、Ltrim过滤掉出现次数最多(店)的字符串。
语句如下:

SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS name
INTO   #t1
FROM   #temp 

以下是这几个函数的使用说明:


Substring

Returns the part of a character expression that starts at the specified position and has the specified length. The position parameter and the length parameter must evaluate to integers.


Syntax

SUBSTRING(character_expression, position, length)


Arguments

character_expression

Is a character expression from which to extract characters.

position

Is an integer that specifies where the substring begins.

length

Is an integer that specifies the length of the substring as number of characters.


Result Types

DT_WSTR


Charindex
Searches an expression for another expression and returns its starting position if found.

Syntax

CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] ) 


Arguments
expressionToFind
Is a character expression that contains the sequence to be found. expressionToFind is limited to 8000 characters.
expressionToSearch
Is a character expression to be searched.
start_location
Is an integer or bigint expression at which the search starts. If start_location is not specified, is a negative number, or is 0, the search starts at the beginning of expressionToSearch.

Return Types
bigint if expressionToSearch is of the varchar(max), nvarchar(max), or varbinary(max) data types; otherwise, int.

Rtrim
Returns a character expression after removing trailing spaces.

RTRIM does not remove white space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET RTrim method in a script run from the Script component.

Syntax
RTRIM(character expression)
              
Arguments
character_expression
Is a character expression from which to remove spaces.

Result Types
DT_WSTR

Ltrim
Returns a character expression after removing leading spaces.

LTRIM does not remove white-space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET LTrim method in a script run from the Script component.

Syntax
LTRIM(character expression)
              
Arguments
character_expression
Is a character expression from which to remove spaces.

Result Types
DT_WSTR


好了,我们查看处理完后的结果,可以看到包含店的字符串已经全部过滤出编号。
SELECT * FROM #t1


3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
人民路广场3069
社区中心3070
珠海市3071
丽都3072
晓月3073
旧区3074
新城3075
水井沟3076


接着我们依次处理包含场、心、市、都、月、区、城、沟的字符串,语句和处理结果如下:
SELECT *
FROM   #t1
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


人民路广场3069
社区中心3070
珠海市3071
丽都3072
晓月3073
旧区3074
新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('场', name) + 1, Len(name)))) AS name
INTO   #t2
FROM   #t1


SELECT *
FROM   #t2
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


社区中心3070
珠海市3071
丽都3072
晓月3073
旧区3074
新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('心', name) + 1, Len(name)))) AS name
INTO   #t3
FROM   #t2


SELECT *
FROM   #t3
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


珠海市3071
丽都3072
晓月3073
旧区3074
新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('市', name) + 1, Len(name)))) AS name
INTO   #t4
FROM   #t3


SELECT *
FROM   #t4
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


丽都3072
晓月3073
旧区3074
新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('都', name) + 1, Len(name)))) AS name
INTO   #t5
FROM   #t4


SELECT *
FROM   #t5
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


晓月3073
旧区3074
新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('月', name) + 1, Len(name)))) AS name
INTO   #t6
FROM   #t5


SELECT *
FROM   #t6
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


旧区3074
新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('区', name) + 1, Len(name)))) AS name
INTO   #t7
FROM   #t6


SELECT *
FROM   #t7
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


新城3075
水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('城', name) + 1, Len(name)))) AS name
INTO   #t8
FROM   #t7


SELECT *
FROM   #t8
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


水井沟3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('沟', name) + 1, Len(name)))) AS name
INTO   #t9
FROM   #t8


SELECT *
FROM   #t9
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


--无记录


这是最终的处理结果,过滤出编号后,我就可以利用这些编号和数据库表进行关联,获得想要的数据。
SELECT *
INTO   #result
FROM   #t9


SELECT *
FROM   #result 


name
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076


SELECT s.xxx,
       s.xxx
FROM   xx s
       JOIN #result r
         ON s.xxx = r.name
WHERE  s.xxx = 0; 


四 总结

本文过滤编号实际上核心代码就两个,第一个是利用SQL Server的内置函数过滤出指定编号,语句如下:
SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS name
INTO   #t1
FROM   #temp 


第二个是判断是否包含中文,语句如下:
SELECT *
FROM   #t1
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


在工作中,发现和总结这些小技巧会让你的工作事半功倍。

Good Luck!


版权声明:本文为博主原创文章,未经博主允许不得转载。 举报

相关文章推荐

sql server 提取汉字/数字/字母的方法

sql server 提取汉字/数字/字母的方法--提取数字IF OBJECT_ID('DBO.GET_NUMBER2') IS NOT NULLDROP FUNCTION DBO.GET_NUMBE...

SQL Server中截取字符串常用函数

SQL Server 中截取字符串常用的函数: 1.LEFT ( character_expression , integer_expression ) 函数说明:LEFT ( '源字符串' , '...

精选:深入理解 Docker 内部原理及网络配置

网络绝对是任何系统的核心,对于容器而言也是如此。Docker 作为目前最火的轻量级容器技术,有很多令人称道的功能,如 Docker 的镜像管理。然而,Docker的网络一直以来都比较薄弱,所以我们有必要深入了解Docker的网络知识,以满足更高的网络需求。

SQL截取字符串(substring与patindex的使用)

SQL截取字符串(substring与patindex的使用) 首先学习两个函数 1.substring  返回字符、binary、text 或 image 表达式的一部分。  ...

SQL 截取字符串,中文算两个字符的方法

--SQL 截取字符串方法,中文算两个字符的方法。--方法跟水哥学习的,支持一下水哥的博客--http://hi.csdn.net/link.php?url=http://blog.csdn.net%...
  • Beirut
  • Beirut
  • 2009-11-17 16:05
  • 3483

用SQL 截取汉字英文混合的字符串

做网站的时候经常会出现因为输入的内容太多产生换行导致页面排版出现问题,这时候做的最多的就是对输出的内容进行切割,只输出前面部分内容,后面部分用“……”代替。传统的方法多数都是先从数据库里取出对应的数据...

sqlserver提取中文_提取英文_提取数字(函数)

--【提取中文字符】 IF OBJECT_ID('dbo.fun_getCN') IS NOT NULL DROP FUNCTION dbo.fun_getCN GO create function...

SQLServer中文处理

首先插句题外话:创建一个自然数表Nums。这是《SQL Server 2005技术内幕:T-SQL查询》一书的建议。在SQL Server 2005中,可以借用ROW_NUMBER排名函数轻松生成我们...

SQLServer中使用SUBSTRING截取字符串

SUBSTRING  返回字符、binary、text      或      image      表达式的一部分。有关可与该函数一起使...

SQL截取中文字符

--截取所有中文字符,原理是中文字符字节长度为2,核心判断语句引自@fcuandy CREATE FUNCTION chstr(@s NVARCHAR(255)) RETURNs NVARCHAR(...
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)