string


单引号两种转义
select 'Aren''t you glad you''re learning PL/SQL?'
from dual

select q'!Aren't you glad you're learning PL/SQL?!'
from dual--! @ 等


select q'!Aren''t you glad you're learning PL/SQL?!'
from dual --原样输出


& 的麻烦

solution1: SET DEFINE OFF to disable SQL*Plus' variable substitution feature



Using Nonprintable Characters
--use chr(10) to start a new line

The location in the standard ASCII collating sequence for the linefeed character is 10, so you can code:

    SELECT name || chr(10)
           || address1 || chr(10)
           || address2 || chr(10)
           || address3 || chr(10)
           || address4 || chr(10)
           || city || ', ' || state || ' ' || zipcode
           AS company_address
    FROM company;

Traditional Searching
DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Aaron,Jeff';
       comma_location NUMBER := 0;
    BEGIN
       LOOP
          comma_location := INSTR(names,',',comma_location+1);
          EXIT WHEN comma_location = 0;
          DBMS_OUTPUT.PUT_LINE(comma_location);
       END LOOP;
    END;

--use replace to avoid loop

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Aaron,Jeff';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(
          REPLACE(names, ',', chr(10))
       );
    END;

--trim
    DECLARE
       x VARCHAR2(30) := '.....Hi there!.....';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(   TRIM(LEADING '.' FROM x)   );
       DBMS_OUTPUT.PUT_LINE(   TRIM(TRAILING '.' FROM x)   );
       DBMS_OUTPUT.PUT_LINE(   TRIM(BOTH '.' FROM x)   );

       --The default is to trim from both sides
       DBMS_OUTPUT.PUT_LINE(   TRIM('.' FROM x)   );

       --The default trim character is the space:
       DBMS_OUTPUT.PUT_LINE(   TRIM(x)   );
    END;


The output is:

    Hi there!.....
    .....Hi there!
    Hi there!
    Hi there!
    .....Hi there!.....

TRIM不能指定多个符号,只有RTRIM和LTRIM可以
RTRIM(LTRIM(x,',.;'),',.;')



Mixing CHAR and VARCHAR2 Values
When you SELECT or FETCH data from a CHAR database column into a VARCHAR2 variable, the trailing spaces are retained. If you

SELECT or FETCH from a VARCHAR2 database column into a CHAR variable, PL/SQL automatically pads the value with spaces out to

the maximum length. In other words, the type of the variable, not the column, determines the variable's resulting value.


8.2. Working with Strings
Working with strings is largely a matter of manipulating those strings using Oracle's rich library of built-in string

functions. To that end, we recommend that you become broadly familiar with the functions Oracle has to offer. In the

subsections that follow, we'll begin by showing you how to write string constants, and then we'll introduce you to the

string manipulation functions that we've come to find most important in our own work.

8.2.1. Specifying String Constants
One way to get strings into your PL/SQL programs is to issue a SELECT statement that brings back character string values.

Another way is to place string constants directly into your code. You write such constants by enclosing them within single

quotes:

    'Brighten the corner where you are.'



If you want to embed a single quote within a string constant, you can do so by typing the single quote twice:

    'Aren''t you glad you''re learning PL/SQL?'



Another way to embed single quotes within string constants is to specify your own delimiter. Do this using the q prefix

(uppercase Q may also be specified). For example:

    q'!Aren't you glad you're learning PL/SQL?!'



or:

    q'{Aren't you glad you're learning PL/SQL?}'



When you use the q prefix, you still must enclose the entire string within single quotes. The character immediately

following the first quotation markan exclamation-point (!) in the first of our two examplesthen becomes the true quote

character for the string. Thus, the first of our q-prefixed strings consists of all characters between the two exclamation

points.

 Special rule: If your quote character is one of [, {, , or )

respectively.
 
 




Normally, string constants are represented using the database character set. If such a string constant is assigned to an

NCHAR or NVARCHAR2 variable, the constant will be implicitly converted to the national character set (see Chapter 24).

Oracle performs such conversions when necessary, and you rarely need to worry about them. Occasionally, however, you may

find yourself faced with the need to explicitly specify a string constant to be represented in the national character set.

You can do so using the n prefix:

    n'This string will be represented using the national character set.'



If you need a string in the national character set, and you also want to specify some characters by their Unicode code

point, you can use the u prefix:

    u'Thi\0073 \0073tring will be repre\0073ented u\0073ing ...'



0073 is the code point for the letter "s," and all references to "s" are via the escape sequence \0073. The resulting string

constant is the same as for the preceding n-prefixed example.

Using the assignment operator, you can store the value of a string constant within a variable:

    DECLARE
       jonathans_motto VARCHAR2(50);
    BEGIN
       jonathans_motto := 'Brighten the corner, where you are.';
    END;


You can also pass string constants to built-in functions. For example, to find out the number of characters in Jonathan's

motto, you can use the LENGTH function:

    BEGIN
       DBMS_OUTPUT.PUT_LINE(
          LENGTH('Brighten the corner, where you are.')
       );
    END;


Run this code, and you'll find that the number of characters is 35.

While not strictly a PL/SQL issue, you'll often find that ampersand (&) characters cause problems if you're executing PL/SQL

code via SQL*Plus. SQL*Plus uses ampersands to prefix SQL*Plus variables. When an ampersand is encountered, SQL*Plus "sees"

the next word as a variable and prompts you to supply a value:

    SQL> BEGIN
      2     DBMS_OUTPUT.PUT_LINE ('Generating & saving test data.');
      3  END;
      4  /
    Enter value for saving:


There are several solutions to this problem. One that works well is to issue the SQL*Plus command SET DEFINE OFF to disable

SQL*Plus' variable substitution feature. Other solutions can be found in Jonathan Gennick's book Oracle SQL*Plus: The

Definitive Guide (O'Reilly).

8.2.2. Using Nonprintable Characters
The built-in CHR function is especially valuable when you need to make reference to a nonprintable character in your code.

Suppose you have to build a report that displays the address of a company. A company can have up to four address strings (in

addition to city, state, and zip code). Your boss wants each address string on a new line. You can do that by concatenating

all the address lines together into one, long text value, and using CHR to insert linefeeds where desired. The location in

the standard ASCII collating sequence for the linefeed character is 10, so you can code:

    SELECT name || chr(10)
           || address1 || chr(10)
           || address2 || chr(10)
           || address3 || chr(10)
           || address4 || chr(10)
           || city || ', ' || state || ' ' || zipcode
           AS company_address
    FROM company;


And the results will end up looking like:

    COMPANY_ADDRESS
    --------------------
    Harold Henderson
    22 BUNKER COURT
    SUITE 100


    WYANDANCH, MN 66557


 Linefeed is the newline character for Linux and Unix systems. In other environments, you may need to use some other

character.
 
 




What? You say your boss doesn't want to see any blank lines? No problem. You can eliminate those with a bit of cleverness

involving the NVL2 function :

    SELECT name
           || NVL2(address1, chr(10) || address1, '')
           || NVL2(address2, chr(10) || address2, '')
           || NVL2(address3, chr(10) || address3, '')
           || NVL2(address4, chr(10) || address4, '')
           || chr(10) || city || ', ' || state || ' ' || zipcode
           AS company_address
    FROM company;


Now the query returns a single formatted column per company. The NVL2 function returns the third argument when the first is

NULL, and otherwise returns the second argument. In this example, when address1 is NULL, the empty string ('') is returned,

and likewise for the other address columns. In this way, blank address lines are ignored, and the address will be scrunched

down to:

    COMPANY_ADDRESS
    --------------------
    Harold Henderson
    22 BUNKER COURT
    SUITE 100
    WYANDANCH, MN 66557


A related function is the ASCII function , which, in essence, does the reverse of CHR: it returns the location of a given

character in the database character set. For example, execute the following code to display the character code for the

letter 'J':

    BEGIN
       DBMS_OUTPUT.PUT_LINE(ASCII('J'));
    END;


And you'll find that, in UTF-8 at least, the underlying representation of 'J' is the value 74.

 Watch for an interesting use of CHR later in the section "Traditional Searching, Extracting, and Replacing."
 
 




8.2.3. Concatenating Strings
CONCAT takes wo arguments as follows:

    CONCAT (string1, string2)



CONCAT always appends string2 to the end of string1. If either string is NULL, CONCAT returns the non-NULL argument all by

its lonesome. If both strings are NULL, CONCAT returns NULL. Here are some examples of uses of CONCAT (where > means that

the function returns the value shown):

    CONCAT ('abc', 'defg') --&gt 'abcdefg'
    CONCAT (NULL, 'def') --&gt 'def'
    CONCAT ('ab', NULL) --&gt 'ab'
    CONCAT (NULL, NULL) --&gt NULL


I have a confession to make about CONCAT: I have never used it in all my years of PL/SQL coding. In fact, I never even

noticed it was available until I did the research for this book. How can this be? I certainly have performed many acts of

concatenation in my time. However, PL/SQL (and the Oracle RDBMS) offers a second concatenation operatorthe double vertical

bars (||). For example:

    DECLARE
       x VARCHAR2(100);
    BEGIN
       x := 'abc' || 'def' || 'ghi';
       DBMS_OUTPUT.PUT_LINE(x);
    END;


The output is:

    abcdefghi



To perform. the identical concatenation using CONCAT, we'd need to nest one call to CONCAT inside another:

    x := CONCAT(CONCAT('abc','def'),'ghi');



You can see that the || operator not only is much easier to use than CONCAT, but also results in much more readable code.

8.2.4. Dealing with Case
Letter case is often an issue when working with strings . For example, you might want to compare two strings regardless of

case. There are different approaches you can take to dealing with this problem depending partly on the release of Oracle you

are running and partly on the scope that you want your actions to have.

8.2.4.1 Forcing a string to all upper- or lowercase
One way to deal with case issues is to use the built-in UPPER and LOWER functions. These are available in all current

versions of Oracle, and they let you upper- or lowercase a string for just a single comparison. For example:

    DECLARE
       name1 VARCHAR2(30) := 'Andrew Sears';
       name2 VARCHAR2(30) := 'ANDREW SEARS';
    BEGIN
       IF LOWER(name1) = LOWER(name2) THEN
           dbms_output.put_line('The names are the same.');
       END IF;
    END;


In this example, both strings are passed through LOWER so the comparison ends up being between 'andrew sears' and 'andrew

sears'.

8.2.4.2 Making comparisons case-insensitive
Oracle Database 10g Release 2 provides a mechanism to render all string operations case-insensitive within a given session.

This approach involves setting two parameters: NLS_COMP and NLS_SORT. To begin, here's a simple, SQL-based example that

illustrates the kind of problem you can now solve easily using NLS_COMP. The problem is to take a list of names and

determine which should come first:

    SELECT LEAST ('Jonathan','andrew') from dual;



On our system, which uses UTF-8 for the character set, the call to LEAST that you see here returns 'Jonathan'. That's

because the character code for an uppercase 'J' precedes that for a lowercase 'a'. By default, NLS_COMP is set to BINARY,

meaning that string comparisons performed by functions such as LEAST are based on the underlying character code values.

Leaving aside for a moment the fact that 'andrew' should probably capitalize the first letter in his name, you might like to

see LEAST ignore case and return 'andrew' instead of 'Jonathan'. To that end, you can change NLS_COMP to specify that a

linguistic sort (sensitive to the language settings) be performed:

    ALTER SESSION SET NLS_COMP=LINGUISTIC;



Next, you must change NLS_SORT to specify the sorting rules that you want. The default NLS_SORT value is often BINARY, but

it may be otherwise depending on how your system is configured. For this example, use the sort XWEST_EUROPEAN_CI. The _CI

suffix specifies a case-insensitive sort:

    ALTER SESSION SET NLS_SORT=XWEST_EUROPEAN_CI;



Now, try that call to LEAST one more time:

    SELECT LEAST ('Jonathan','andrew') from dual;



This time, the result is 'andrew'. This seems like a simple exercise, but this result is not so easy to achieve without the

linguistic sorting we've just described.

And it's not just functions that are affected by linguistic sorting. Simple string comparisons are affected as well. For

example:

    BEGIN
       IF 'Andrew' = 'ANDREW' THEN
          DBMS_OUTPUT.PUT_LINE('It is true!');
      END IF;
    END;


With NLS_COMP and NLS_SORT set as we've described, the expression 'Andrew' = 'ANDREW' in this example evaluates to TRUE.

 NLS_COMP and NLS_SORT settings affect all string manipulation that you do. The settings "stick" until you change them, or

until you terminate your session.
 
 




Oracle also supports accent-insensitive sorting, which you can get by appending _AI (rather than _CI) to a sort name. To

find a complete list of linguistic sort names , refer to the Oracle Database Globalization Support Guide. That guide also

explains the operation of NLS_COMP and NLS_SORT in detail. Also refer to Chapter 24 of this book, which presents more

information on the various NLS parameters at your disposal.

8.2.4.3 Capitalizing each word in a string
A third case-related function, after UPPER and LOWER, is INITCAP. This function forces the initial letter of each word in a

string to uppercase, and all remaining letters to lowercase. For example:

    DECLARE
       name VARCHAR2(30) := 'MATT williams';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(INITCAP(name));
    END;


The output will be:

    Matt Williams



It's wonderfully tempting to use INITCAP to properly format names, and all will be fine until you run into a case like:

    DECLARE
       name VARCHAR2(30) := 'JOE mcwilliams';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(INITCAP(name));
    END;


which generates this output:

    Joe Mcwilliams



Joe McWilliams may not be so happy to see his last name written as "Mcwilliams," with a lowercase "w." INITCAP is handy at

times but do remember that it doesn't yield correct results for words or names having more than just an initial, capital

letter.

8.2.5. Traditional Searching, Extracting, and Replacing
Frequently, you'll find yourself wanting to search a string for a bit of text. One approach is to use the INSTR function ,

which returns the character position of a substring within a larger string. The following code finds the locations of all

the commas in a list of names:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Aaron,Jeff';
       comma_location NUMBER := 0;
    BEGIN
       LOOP
          comma_location := INSTR(names,',',comma_location+1);
          EXIT WHEN comma_location = 0;
          DBMS_OUTPUT.PUT_LINE(comma_location);
       END LOOP;
    END;


The output is:

    5
    10
    14
    21
    28
    34


The first argument to INSTR is the string to search. The second is the substring to look for, in this case a comma. The

third argument specifies the character position at which to begin looking. After each comma is found, the loop begins

looking again one character further down the string. When no match is found, INSTR returns zero, and the loop ends.

Having found the location of some text in a string, a natural next step is to extract it. We don't care about those commas.

Let's extract the names instead. For that, we'll use the SUBSTR function :

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Aaron,Jeff';
       names_adjusted VARCHAR2(61);
       comma_location NUMBER := 0;
       prev_location NUMBER := 0;
    BEGIN
       --Stick a comma after the final name
       names_adjusted := names || ',';
       LOOP
          comma_location := INSTR(names_adjusted,',',comma_location+1);
          EXIT WHEN comma_location = 0;
          DBMS_OUTPUT.PUT_LINE(
             SUBSTR(names_adjusted,
                    prev_location+1,
                    comma_location-prev_location-1));
          prev_location := comma_location;
       END LOOP;
    END;


The list of names that we get is:

    Anna
    Matt
    Joe
    Nathan
    Andrew
    Aaron
    Jeff


The keys to the preceding bit of code are two-fold. First, a comma is appended to the end of the string to make the loop's

logic easier to write. Every name in names_adjusted is followed by a comma. That simplifies life. Then, each time the loop

iterates to DBMS_OUTPUT.PUT_LINE, the two variables named prev_location and comma_location point to the character positions

on either side of the name to print. It's then just a matter of some simple math and the SUBSTR function. Three arguments

are passed:



names_adjusted

The string from which to extract a name.



prev_location+1

The character position of the first letter in the name. Remember that prev_location will point to just before the name to

display, usually to a comma preceding the name. That's why I add 1 to the value.



comma_location-prev_location-1

The number of characters to extract. I subtract the extra 1 to avoid displaying the trailing comma.

All this searching and extracting is fairly tedious. I'd sure like to display that list of names without writing so much

code in order to do it. Let's try the REPLACE function . Let's replace those commas with newlines:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Aaron,Jeff';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(
          REPLACE(names, ',', chr(10))
       );
    END;


And the output is (!):

    Anna
    Matt
    Joe
    Nathan
    Andrew
    Aaron
    Jeff


Well, that was certainly fun. What I did here was to avoid all that looping by simply replacing all commas with newlines. Of

course, you won't always be able to avoid loop processing by using REPLACE, but it's good to know about the possibilities!

8.2.6. Padding
Occasionally it's helpful to force strings to be a certain size. You can use LPAD and RPAD to add spaces (or some other

character) to either end of a string in order to make the string a specific length. The following example uses the two

functions to display a list of names two-up in a column, with the leftmost name being flush-left and the rightmost name

appearing flush right:

    DECLARE
       a VARCHAR2(30) := 'Jeff';
       b VARCHAR2(30) := 'Eric';
       c VARCHAR2(30) := 'Andrew';
       d VARCHAR2(30) := 'Aaron';
       e VARCHAR2(30) := 'Matt';
       f VARCHAR2(30) := 'Joe';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(   RPAD(a,10) || LPAD(b,10)   );



Negative String Positioning
Some of Oracle's built-in string functions, notably SUBSTR and INSTR, allow you to determine the position from which to

begin extracting or searching by counting backwards from the right end of a string. For example, to extract the final 10

characters of a string:

    SUBSTR('Brighten the corner where you are',-10)



This function call returns "re you are". The key is the use of a -10 to specify the starting position. By making the

starting position negative, you instruct SUBSTR to count backwards from the end of the string.

INSTR adds an interesting twist to all of this. Specify a negative starting index, and INSTR will:

Count back from the end of the string to determine from whence to begin searching,

Then search backwards from that point towards the beginning of the string.

Step 1 is the same as for SUBSTR, but Step 2 proceeds in quite the opposite direction. For example, to find the occurrence

of "re" that is second from the end:

    INSTR('Brighten the corner where you are','re',-1,2)



The result is 24. The fourth parameter, a 2, requests the second occurrence of "re" The third parameter is -1, so the search

begins at the last character of the string (first character prior to the closing quote). The search progresses backwards

towards the beginning, past the "re" at the end of "are" (the first occurrence) until reaching the occurrence of "re" at the

end of "where".

There is one, subtle case in which INSTR with a negative position will search forward. Here's an example:

    INSTR('Brighten the corner where you are','re',-2,1)



The -2 starting position means that the search begins with the "r" in "are". The result is 32. Beginning from the "r" in

"are", INSTR looks forward to see whether it is pointing at an occurrence of "re". And it is, so INSTR returns the its

current position in the string, which happens to be the 32nd character. Thus, the "re" in "are" is found even though it

extends past the point at which INSTR began searching.
 




       DBMS_OUTPUT.PUT_LINE(   RPAD(c,10) || LPAD(d,10)   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(e,10) || LPAD(f,10)   );
    END;


The output is:

    Jeff            Eric
    Andrew         Aaron
    Matt             Joe


The default padding character is the space. If you like, you can specify a fill character as third argument. Change the

lines of code to read:

       DBMS_OUTPUT.PUT_LINE(   RPAD(a,10,'.') || LPAD(b,10,'.')   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(c,10,'.') || LPAD(d,10,'.')   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(e,10,'.') || LPAD(f,10,'.')   );


And the output changes to:

    Jeff............Eric
    Andrew.........Aaron
    Matt.............Joe


Your fill "character" can even be a string of characters:

       DBMS_OUTPUT.PUT_LINE(   RPAD(a,10,'-~-') || LPAD(b,10,'-~-')   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(c,10,'-~-') || LPAD(d,10,'-~-')   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(e,10,'-~-') || LPAD(f,10,'-~-')   );


And now the output looks like:

    Jeff-~--~--~--~-Eric
    Andrew-~---~--~Aaron
    Matt-~--~--~--~--Joe


Fill characters, or strings are laid down from left to right, always, even when RPAD is used. You can see that that's the

case if you study carefully the 10-character "column" containing Joe's name.

One possible problem to think about when using LPAD and RPAD is the possibility that some of your input strings may already

be longer than (or equal to) the width that you desire. For example, change the column width to four characters:

       DBMS_OUTPUT.PUT_LINE(   RPAD(a,4) || LPAD(b,4)   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(c,4) || LPAD(d,4)   );
       DBMS_OUTPUT.PUT_LINE(   RPAD(e,4) || LPAD(f,4)   );


Now the output looks like:

    JeffEric
    AndrAaro
    Matt Joe


Notice particularly the second row: both "Andrew" and "Aaron" were truncated to just four characters.

8.2.7. Trimming
What LPAD and RPAD giveth, LTRIM and RTRIM taketh away. For example:

    DECLARE
       a VARCHAR2(40) := 'This sentence has too many periods......';
       b VARCHAR2(40) := 'The number 1';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(   RTRIM(a,'.')   );
       DBMS_OUTPUT.PUT_LINE(
          LTRIM(b, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz')
       );
    END;


And the output is:

    This sentence has too many periods
    1


As you can see, RTRIM removed all the periods. The second argument to that function was a period and specifies the

character(s) to trim. Our use of LTRIM is a bit absurd, but it demonstrates that you can specify an entire set of characters

to trim. I asked that all letters and spaces be trimmed from the beginning of the string b, and I got what I asked for.

The default is to trim spaces from the beginning or end of the string. Specifying RTRIM(a) is the same as asking for RTRIM

(a,' '). The same goes for LTRIM(a) and LTRIM(a,' ');.

There's one other trimming function you should know about, and it's called just plain TRIM. Oracle added TRIM when Oracle8i

Database was released in order to make the RDBMS more compliant with the ISO SQL standard. TRIM works a bit differently from

LTRIM and RTRIM, as you can see:

    DECLARE
       x VARCHAR2(30) := '.....Hi there!.....';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(   TRIM(LEADING '.' FROM x)   );
       DBMS_OUTPUT.PUT_LINE(   TRIM(TRAILING '.' FROM x)   );
       DBMS_OUTPUT.PUT_LINE(   TRIM(BOTH '.' FROM x)   );

       --The default is to trim from both sides
       DBMS_OUTPUT.PUT_LINE(   TRIM('.' FROM x)   );

       --The default trim character is the space:
       DBMS_OUTPUT.PUT_LINE(   TRIM(x)   );
    END;


The output is:

    Hi there!.....
    .....Hi there!
    Hi there!
    Hi there!
    .....Hi there!.....


One function, yet you can trim from either side, or from both sides. However, you can specify only a single character to

remove. You cannot, for example, write:

    TRIM(BOTH ',.;' FROM x)



Instead, to solve this particular problem, you can use a combination of RTRIM and LTRIM:

    RTRIM(LTRIM(x,',.;'),',.;')



If you want to trim a set of characters, your options are RTRIM and LTRIM.

8.2.8. Regular Expression Searching, Extracting, and Replacing
Oracle Database 10g brought the most exciting change to string-manipulation in Oracle that we can remember: support for

regular expressions . And we're not talking the mundane, regular expression support involving the LIKE predicate that you

find in other database management systems. Oracle has given us a well-thought-out and powerful feature setjust what PL/SQL

needed.

Regular expressions form. a sort of pattern language for describing and manipulating text. Those of you familiar with Perl

doubtless know a bit about the topic already, as Perl has done more to spread the use of regular expressions than perhaps

any other language. Regular expression support in Oracle Database 10g Release 1 follows closely the Portable Operating

System Interface (POSIX) regular expression standard . Release 2 adds support for many nonstandard, but quite useful

operators from the world of Perl.

8.2.8.1 Detecting a pattern
Regular expressions give you a pattern language you can use to describe text that you want to find and manipulate. To

illustrate, let's revisit the example we used throughout the earlier section on "Traditional Searching, Extracting, and

Replacing":

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Aaron,Jeff';


Let's assign ourselves the task of determining programmatically whether names represents a list of comma-delimited elements.

We can do that using the REGEXP_LIKE function , which detects the presence of a pattern in a string:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';
       names_adjusted VARCHAR2(61);
       comma_delimited BOOLEAN;
    BEGIN
       --Look for the pattern
       comma_delimited := REGEXP_LIKE(names,'^([a-z A-Z]*,)+([a-z A-Z]*){1}$');

       --Display the result
       DBMS_OUTPUT.PUT_LINE(
          CASE comma_delimited
             WHEN true THEN 'We have a delimited list!'
             ELSE 'The pattern does not match.'
          END);
    END;


The result is:

    We have a delimited list!



To understand what's going on here, you must begin with the expression defining the pattern you seek. That expression is the

second argument to REGEXP_LIKE. A recap of our thought process as we put it together is next.



[a-z A-Z]

Each entry in our list of names must consist of only letters and spaces. Square-brackets define a set of characters on which

to match. We use a-z to gives us all lowercase letters, and we use A-Z to give all uppercase letters. The space sits in

between those two parts of the expression.



[a-z A-Z]*

The asterisk is a quantifier, specifying that we want to see zero or more characters in each list item.



[a-z A-Z]*,

Each list item must terminate with a comma. An exception is the final item, but we can safely ignore that nuance for now.



([a-z A-Z]*,)

We use parentheses to define a subexpression that matches some number of characters terminated by a comma. We define this

subexpression because we want to specify that the entire thing repeats.



([a-z A-Z]*,)+

The plus sign is another quantifier, and applies to the preceding element, which happens to be the subexpression. In

contrast to the *, the + requires "one or more." A comma-delimited list consists of one or more of our subexpressions.



([a-z A-Z]*,)+([a-z A-Z]*)

We add another subexpression: ([a-z A-Z]*). This is almost a duplicate of the first, but it doesn't include the comma. The

final list item is not terminated by a comma.



([a-z A-Z]*,)+([a-z A-Z]*){1}

We add the quantifier {1} to allow for exactly one list element with no trailing comma.



^([a-z A-Z]*,)+([a-z A-Z]*){1}$

Finally, we use ^ and $ to anchor our expression to the beginning and end, respectively of the target string. We do this to

require that the entire string, rather than some subset of the string, match our pattern.

Using REGEXP_LIKE, we examine the names string to see whether it matches the pattern. And it does:

    We have a delimited list!



REGEXP_LIKE is a Boolean function. It's optimized to detect the mere presence of a pattern within a string. Other functions

let you do even more. Keep reading!

8.2.8.2 Locating a pattern
You can use REGEXP_INSTR to locate occurrences of a pattern within a string. For example, to find the first occurrence of a

name beginning with the letter A and ending with a consonant:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';
       names_adjusted VARCHAR2(61);
       comma_delimited BOOLEAN;
       j_location NUMBER;
    BEGIN
       --Look for the pattern
       comma_delimited := REGEXP_LIKE(names,'^([a-z A-Z]*,)+([a-z A-Z]*){1}$');

       --Only do more if we do, in fact, have a comma-delimited list.
       IF comma_delimited THEN
          j_location := REGEXP_INSTR(names, 'A[^,]*[^aeiou],|A[^,]*[^aeiou]$');
          DBMS_OUTPUT.PUT_LINE(j_location);
       END IF;
    END;


Execute this code and you'll find that the first A name ending with a consonant, which happens to be Andrew, begins at

position 22. Here's how we worked out the pattern:



A

We begin with the letter A. No need to worry about commas, because we already know at this point that we are working with a

delimited list.



A[^,]*

We follow that A with some number of characters that are not commas. The caret in [^,] creates a matching set that will

match anything but a comma. The * allows for zero or more such noncomma characters following the A.



A[^,]*[^aeiou]

We add [^aeiou] because we want our name to end with anything but a vowel. Because we specify no quantifier, exactly one

such nonvowel is required.



A[^,]*[^aeiou],

We require a comma to end the pattern. Otherwise, we'd have a match on the "An" of "Anna." While adding the comma solves

that problem, it introduces another, because our pattern now will never match Aaron at the end of the string. Uh, oh...

 Writing regular expressions is not easy! As a beginner, you'll discover subtleties to regular expression evaluation that

will trip you up. We spent quite a bit of time working out just this one example, and we went down several dead-end paths

before getting it right. Don't despair though. Writing regular expressions does become easier with practice.
 
 






A[^,]*[^aeiou],|A[^,]*[^aeiou]$

Here we've introduced a vertical-bar (|) into the mix. The | indicates alternation: we are now looking for one of two

patterns. The first pattern ends with a comma, whereas the second does not. The second pattern accommodates the possibility

that the name we're looking for is the final name in the list. It (the pattern) is thus anchored to the end of the string by

the dollar sign ($).

While REGEXP_INSTR has its uses, we are often more interested in returning the text matching a pattern than we are in simply

locating it.

8.2.8.3 Extracting text matching a pattern
Knowing that we have names beginning with A and ending with consonants isn't enough. Let's use REGEXP_SUBSTR to extract

those names:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';
       names_adjusted VARCHAR2(61);
       comma_delimited BOOLEAN;
       extracted_name VARCHAR2(60);
       name_counter NUMBER;
    BEGIN
       --Look for the pattern
       comma_delimited := REGEXP_LIKE(names,'^([a-z A-Z]*,)+([a-z A-Z]*){1}$');

       --Only do more if we do, in fact, have a comma-delimited list.
       IF comma_delimited THEN
          --Begin by extracting the very first, matching name.
          name_counter := 1;

          --Extract and display all matching names
          LOOP
             extracted_name :=
                REGEXP_SUBSTR(names,
                              'A[^,]*[^aeiou],|A[^,]*[^aeiou]$',
                              1, name_counter);
             EXIT WHEN extracted_name IS NULL;
             DBMS_OUTPUT.PUT_LINE(extracted_name);
             name_counter := name_counter + 1;
          END LOOP;
       END IF;
    END;


The four arguments we've passed to REGEXP_SUBSTR are:



names

We are extracting from the names string.



'A[^,]*[^aeiou],|A[^,]*[^aeiou]$'

This is the pattern we are searching for. The preceding section on "Locating a pattern" describes it in detail.



1

We begin our search always from the first character position of names.



name_counter

We increment this name counter, asking for the first, second, third, and subsequent occurrences of our pattern.

REGEXP_SUBSTR returns NULL when no occurrences remain, and that's how we know when to quit.

Here's the output from the preceding code block:

    Andrew,
    Aaron


Uh, oh! What's that comma doing after Andrew's name? It's there to illustrate one of the pitfalls of working with patterns.

Sometimes you are forced into writing a pattern that includes characters that you do not want in the text that you extract.

That's our problem here. We cannot avoid the trailing comma in the first of our two alternations. Yet we do not want that

comma in our output. One solution is to invoke REGEXP_SUBSTR twice:

    extracted_name :=
       REGEXP_SUBSTR(
         REGEXP_SUBSTR(names,
                        'A[^,]*[^aeiou],|A[^,]*[^aeiou]$',
                        1, name_counter),
          'A[^,]*[^aeiou,]');


Look carefully at what we've done here. The innermost call to REGEXP_SUBSTR will extract, potentially, strings such as

"Andrew," that have trailing commas. However, the outermost call to REGEXP_SUBSTR looks for a different pattern: the set

[^aeiou,] matches all characters up to, but not including, any trailing comma.

 Yet another way to eliminate that pesky, trailing comma, is to use the RTRIM function :

    extracted_name :=
       RTRIM(
          REGEXP_SUBSTR(names,
             'A[^,]*[^aeiou],|A[^,]*[^aeiou]$',
              1, name_counter),
       ',');


We'd expect RTRIM to be the more efficient approach, in this particular case. Our double-nested REGEXP_SUBSTR approach comes

in handy, however, when we're dealing with a trailing pattern that is more than just a single character.
 
 




8.2.8.4 Replacing text (it's magic!)
Regular expression search and replace is one of my favorite new features. Your replacement text can refer to portions of

your source text, enabling you to manipulate text in very powerful ways. Imagine that you're faced with the problem of

displaying our list of names two to a line. One way to do that is to replace every second comma with a newline character.

Hard to do? It is hard to do with standard REPLACE, but you won't even break a sweat doing the job using the new

REGEXP_REPLACE:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';
       names_adjusted VARCHAR2(61);
       comma_delimited BOOLEAN;
       extracted_name VARCHAR2(60);
       name_counter NUMBER;
    BEGIN
       --Look for the pattern
       comma_delimited := REGEXP_LIKE(names,'^([a-z A-Z]*,)+([a-z A-Z]*){1}$');

       --Only do more if we do, in fact, have a comma-delimited list.
       IF comma_delimited THEN
          names := REGEXP_REPLACE(
                      names,
                      '([a-z A-Z]*),([a-z A-Z]*),',
                      '\1,\2' || chr(10)   );
       END IF;

       DBMS_OUTPUT.PUT_LINE(names);
    END;


The output from this bit of code is:

    Anna,Matt
    Joe,Nathan
    Andrew,Jeff
    Aaron


We begin our explanation of this bit of magic by pointing out that we passed three arguments to REGEXP_REPLACE:



names

Our source string



'([a-z A-Z]*),([a-z A-Z]*),'

An expression specifying the text that we want to replace. More on this in just a bit.



'\1,\2' || chr(10)

Our replacement text. The \1 and \2 are backreferences and are what makes our solution work. We'll talk more about these in

just a bit too.

The expression we're searching for consists of two subexpressions enclosed within parentheses, plus two commas. Here's an

explanation of how that expression works:



([a-z A-Z]*)

We want to begin by matching a name.



,

We want that name to be terminated by a comma.



([a-z A-Z]*)

Then we want to match another name.



,

And we again want to match the terminating comma.

Remember that our goal is to replace every second comma with a newline. That's why we wrote our expression to match two

names and two commas. There's a reason, too, why we kept the commas out of the subexpressions.

Following is the first match that will be found for our expression upon invoking REGEXP_REPLACE:

    Anna,Matt,



The two subexpressions will correspond to "Anna" and "Matt" respectively. The key to our solution is that you can reference

the text matching a given subexpression via something called a backreference. The two backreferences in our replacement text

are \1 and \2, and they refer to the text matched by the first and second subexpressions. Here's how that plays out:

    '\1,\2' || chr(10)       --our replacement text
    'Anna,\2' || chr(10)     --fill in the value matched
                               by the first subexpression
    'Anna,Matt' || chr(10)   --fill in the value matched
                               by the second subexpression


We hope you can begin to see the power at your disposal here. We don't even use the commas from the original text. We use

only the text matching the two subexpressions, the names "Anna" and "Matt", and we insert those into a new string formatted

with one comma and one newline.

We can do even more! We can easily change our replacement text to use a tab (an ASCII 9) rather than a comma:

          names := REGEXP_REPLACE(
                      names,
                      '([a-z A-Z]*),([a-z A-Z]*),',
                      '\1' || chr(9) || '\2' || chr(10)   );


And now we get our results in two, nice, neat columns:

    Anna    Matt
    Joe     Nathan
    Andrew  Jeff
    Aaron


We think regular expression search and replace is a wonderful thing. It's fun. It's powerful. You can do a lot with it. No

other RDBMS on the market, that we're aware of at least, has anything like it.

8.2.8.5 Groking greediness
Greediness is an important concept to understand when writing regular expressions. Consider the problem of extracting just

the first name, and its trailing comma, from our comma-delimited list of names. Recall that our list looks like this:

       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';



One solution that you might think of is to look for a series of characters ending in a comma:

    .*,



Let's try this solution to see how it works:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(   REGEXP_SUBSTR(names, '.*,')   );
    END;


Our output is:

    Anna,Matt,Joe,Nathan,Andrew,Jeff,



Well! This is certainly not what we were after. What happened? We were a victim of greediness. Not the sort of greediness

your mother chastised you about, but rather a greediness of the regular-expression sort: each element of a regular

expression will match as many characters as it possibly can. When you and I see:

    .*,



our natural tendency often is to think in terms of stopping at the first comma and returning "Anna,". However, the database

looks for the longest run of characters it can find that terminate with a comma; the database stops not at the first comma,

but at the last.

In Oracle Database 10g Release 1, you have limited options for dealing with greediness problems. You may be able to

reformulate an expression to avoid the problem. For example, you can use '[^,]*,' to return the first name and its trailing

comma from our delimited string. Sometimes though, you are forced to change your whole approach to solving a problem, often

to the point of using a completely different combination of functions than you first intended.

Oracle Database 10g Release 2 brings with it some welcome relief from greed, in the form. of nongreedy quantifiers inspired

by those found in Perl. By adding a question-mark (?) to the quantifier for the period (.), changing that quantifier from an

* to *?, we can request the shortest run of characters that precedes a comma:

    DECLARE
       names VARCHAR2(60) := 'Anna,Matt,Joe,Nathan,Andrew,Jeff,Aaron';
    BEGIN
       DBMS_OUTPUT.PUT_LINE(   REGEXP_SUBSTR(names, '(.*?,)')   );
    END;


The output now is:

    Anna,



The nongreedy quantifiers match as soon as they can, not as much as they can.

8.2.8.6 Learning more about regular expressions
Regular expressions can seem deceptively simple, but end up being a surprisingly deep topic. They are simple enough that

you'll be able to use them after just reading this chapter (we hope!), and yet there's so much more to learn. We'd like to

recommend the following sources from Oracle and O'Reilly:



Oracle Database Application Developer's Guide-Fundamentals

Chapter 4 of this Oracle manual is the definitive source of information on regular expression support in Oracle.



Oracle Regular Expression Pocket Reference

A fine introduction to regular expressions written by Jonathan Gennick and Peter Linsley. Peter is one of the developers for

Oracle's regular expression implementation.



Mastering Oracle SQL

Contains an excellent chapter introducing regular expressions in the context of Oracle SQL. Aside from regular expressions,

this book by Sanjay Mishra and Alan Beaulieu is an excellent read if you want to hone your SQL skills.



Mastering Regular Expressions

Jeffrey Friedl's book stands tall as the definitive fount of wisdom on using regular expressions. To really delve deeply

into the topic, this is the book to read.

Finally, you'll find in Appendix A a table describing each of the regular expression metacharacters supported in Oracle's

implementation of regular expressions.

8.2.9. Working with Empty Strings
One issue that often causes great consternation, especially to people who come to Oracle after working with other databases,

is that Oracle treats empty strings as NULLs. This is contrary to the ISO SQL standard, which recognizes the difference

between an empty string and a string variable that is NULL.

The following code demonstrates Oracle's behavior.:

    /* File on web: empty_is_null.tst */
    DECLARE
       empty_varchar2 VARCHAR2(10) := '';
       empty_char CHAR(10) := '';
    BEGIN
       IF empty_varchar2 IS NULL THEN
          DBMS_OUTPUT.PUT_LINE('empty_varchar2 is NULL');
       END IF;

       IF '' IS NULL THEN
          DBMS_OUTPUT.PUT_LINE(''''' is NULL');
       END IF;

       IF empty_char IS NULL THEN
          DBMS_OUTPUT.PUT_LINE('empty_char is NULL');
       END IF;
    END;


The output is:

    empty_varchar2 is NULL
    '' is NULL


You'll notice in this example that the CHAR variable is not considered NULL. That's because CHAR variables, as fixed-length

character strings, are never truly empty. The CHAR variable in this example is padded with blanks until it is exactly 10

characters in length. The VARCHAR2 variable, however, is NULL, as is the zero-length string literal.

You have to really watch for this behavior. in IF statements that compare two VARCHAR2 values. Consider a program that

queries the user for a name, and then compares that name to a value read in from the database:

    DECLARE
       user_entered_name VARCHAR2(30);
       name_from_database VARCHAR2(30);
       ...
    BEGIN
    ...
    IF user_entered_name <> name_from_database THEN
    ...


If the user entered an empty string instead of a name, the IF condition shown in this example would never be TRUE. That's

because a NULL is never not-equal, nor equal, to any other value. One alternative approach to this IF statement is the

following:

    IF (user_entered_name <> name_from_database)
       OR (user_entered_name IS NULL) THEN


This is just one way of dealing with the "empty string is NULL" issue; it's impossible to provide a solution that works in

all cases. You must think through what you are trying to accomplish, recognize that any empty strings will be treated as

NULLs, and code appropriately.

8.2.10. Mixing CHAR and VARCHAR2 Values
If you use both fixed-length (CHAR) and variable-length (VARCHAR2) strings in your PL/SQL code, you should be aware of how

Oracle handles the interactions between these two datatypes, as described in the following sections.

8.2.10.1 Database-to-variable conversion
When you SELECT or FETCH data from a CHAR database column into a VARCHAR2 variable, the trailing spaces are retained. If you

SELECT or FETCH from a VARCHAR2 database column into a CHAR variable, PL/SQL automatically pads the value with spaces out to

the maximum length. In other words, the type of the variable, not the column, determines the variable's resulting value.

8.2.10.2 Variable-to-database conversion
When you INSERT or UPDATE a CHAR variable into a VARCHAR2 database column, the SQL kernel does not trim the trailing blanks

before performing the change. When the following PL/SQL is executed, the company_name in the new database record is set to

"ACME SHOWERS········" (where · indicates a space). It is, in other words, padded out to 20 characters, even though

the default value was a string of only 12 characters.

    DECLARE
       comp_id#   NUMBER;
       comp_name  CHAR(20) := 'ACME SHOWERS';
    BEGIN
       SELECT company_id_seq.NEXTVAL
          INTO comp_id#
          FROM dual;
       INSERT INTO company (company_id, company_name)
          VALUES (comp_id#, comp_name);
    END;


On the other hand, when you INSERT or UPDATE a VARCHAR2 variable into a CHAR database column, the SQL kernel automatically

pads the variable-length string with spaces out to the maximum (fixed) length specified when the table was created, and

places that expanded value into the database.

8.2.10.3 String comparisons
Suppose your code contains a string comparison such as the following:

    IF company_name = parent_company_name ...



PL/SQL must compare company_name to parent_company_name. It performs the comparison in one of two ways, depending on the

types of the two variables:

If a comparison is made between two CHAR variables, then PL/SQL uses blank-padding comparison.

If at least one of the strings involved in the comparison is variable-length, then PL/SQL performs non-blank-padding

comparison.

The following code snippet illustrates the difference between these two comparison methods:

    DECLARE
       company_name CHAR(30)
          := 'Feuerstein and Friends';
       char_parent_company_name CHAR(35)
          := 'Feuerstein and Friends';
       varchar2_parent_company_name VARCHAR2(35)
          := 'Feuerstein and Friends';
    BEGIN
       --Compare two CHARs, so blank-padding is used
       IF company_name = char_parent_company_name THEN
          DBMS_OUTPUT.PUT_LINE ('first comparison is TRUE');
       ELSE
          DBMS_OUTPUT.PUT_LINE ('first comparison is FALSE');
       END IF;

       --Compare a CHAR and a VARCHAR2, so nonblank-padding is used
       IF company_name = varchar2_parent_company_name THEN
          DBMS_OUTPUT.PUT_LINE ('second comparison is TRUE');
       ELSE
          DBMS_OUTPUT.PUT_LINE ('second comparison is FALSE');
       END IF;
    END;


The output is:

    first comparison is TRUE
    second comparison is FALSE
 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23895263/viewspace-677179/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/23895263/viewspace-677179/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值