Common Lisp常用字符串操作

最新推荐文章于 2023-09-18 15:30:03 发布

darksun9972

最新推荐文章于 2023-09-18 15:30:03 发布

阅读量5.7k

点赞数

分类专栏：编程之旅

编程之旅专栏收录该内容

59 篇文章 0 订阅

订阅专栏

转载至http://rannger.blog.165.com/blog/static/2015672232012917462967/
访问子字符串subseq，参数有三个：源字符串，子字符串起始坐标，子字符串结束下标

* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (subseq *my-string* 8)
"Marx"
* (subseq *my-string* 0 7)
"Groucho"
* (subseq *my-string* 1 5)
"rouc"

修改subseq的返回值，不单能影响返回值的值，而且也能改变源字符串的值
* (defparameter *my-string* (string "Harpo Marx"))
*MY-STRING*
* (subseq *my-string* 0 5)
"Harpo"
* (setf (subseq *my-string* 0 5) "Chico")
"Chico"
* *my-string*
"Chico Marx"
但要注意的是common lisp中字符串并不是“可变长的”，在HyperSpec中的说明是：“如果子序列和新序列不是等长的，那么较短的那一个序列的长度将决定被替换的元素的个数。”
* (defparameter *my-string* (string "Karl Marx"))
*MY-STRING*
* (subseq *my-string* 0 4)
"Karl"
* (setf (subseq *my-string* 0 4) "Harpo")
"Harpo"
* *my-string*
"Harp Marx"
* (subseq *my-string* 4)
" Marx"
* (setf (subseq *my-string* 4) "o Marx")
"o Marx"
* *my-string*
"Harpo Mar"
访问单个字符。
你可以使用CHAR函数来访问字符串中单个字符，CHAR的返回值也可以被用作SETF中的目标值。
* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (char *my-string* 11)
#\x
* (char *my-string* 7)
#\Space
* (char *my-string* 6)
#\o
* (setf (char *my-string* 6) #\y)
#\y
* *my-string*
"Grouchy Marx"
要注意的是如果重视运行效率的话，你可以选择使用SCHAR，SCHAR在特定环境下可以有更高的运行效率。
因为字符串同时也是数组和序列，你也可以用更通用的AREF函数和ELT函数（但一般地CHAR函数可以有更高的运行效率）
* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (aref *my-string* 3)
#\u
* (elt *my-string* 8)
#\M
每一个字符串中的字符都有一个整数代码。认可代码的范围和Lisp能正确打印这些字符的能力取决以你选择的Lisp实现的字符集支持，比如ISO-8859-1或者Unicode。这里有些在UTF-8环境下的SBCL的例子，例子中的字符都是长度为1到4个的8bit字节。
* (stream-external-format *standard-output*)
:UTF-8
* (code-char 200)
#\LATIN_CAPITAL_LETTER_E_WITH_GRAVE
* (char-code #\LATIN_CAPITAL_LETTER_E_WITH_GRAVE)
200
* (code-char 1488)
#\HEBREW_LETTER_ALEF
* (char-code #\HEBREW_LETTER_ALEF)
1488
操作字符串内容
这里用一部分序列函数可以用来修改字符串。以下是一下例子：
* (remove #\o "Harpo Marx")
"Harp Marx"
* (remove #\a "Harpo Marx")
"Hrpo Mrx"
* (remove #\a "Harpo Marx" :start 2)
"Harpo Mrx"
* (remove-if #'upper-case-p "Harpo Marx")
"arpo arx"
* (substitute #\u #\o "Groucho Marx")
"Gruuchu Marx"
* (substitute-if #\_ #'upper-case-p "Groucho Marx")
"_roucho _arx"
* (defparameter *my-string* (string "Zeppo Marx"))
*MY-STRING*
* (replace *my-string* "Harpo" :end1 5)
"Harpo Marx"
* *my-string*
"Harpo Marx"
另外一个被频繁使用的函数是replace-all,但它并不是ANSI标准的一部分。这个函数可以让你简单的执行字符串的搜索／替换操作，而它的返回值则是被“部分”替换后的字符串。
* (replace-all "Groucho Marx Groucho" "Groucho" "ReplacementForGroucho")
"ReplacementForGroucho Marx ReplacementForGroucho"
下面是replace-all的一种实现：
(defun replace-all (string part replacement &key (test #'char=))
"Returns a new string in which all the occurences of the part
is replaced with replacement."
(with-output-to-string (out)
(loop with part-length = (length part)
for old-pos = 0 then (+ pos part-length)
for pos = (search part string
:start2 old-pos
:test test)
do (write-string string out
:start old-pos
:end (or pos (length string)))
when pos do (write-string replacement out)
while pos)))
然而，要注意的是上面的代码并没有为长字符串优化；如果你打算处理长字符串或者是文件内容，请考虑使用cl-ppcre正则表达式包。
连接字符串
一句话说完：concatenate是你的朋友。要注意的是这是一个通用的序列的函数，而且你可以用第一个参数来决定你所期望的返回值的类型。
* (concatenate 'string "Karl" " " "Marx")
"Karl Marx"
* (concatenate 'list "Karl" " " "Marx")
(#\K #\a #\r #\l #\Space #\M #\a #\r #\x)
如果你要用很多部分来构造一个字符串，那在这种情况下concatenate是比价糟糕的选择。大概至少有3种比较好的办法来构造一个字符串，但用哪种方法取决与你用来构造字符串的数据是什么。如果你打算用一个一个字符来构造你的字符串，那么你可以先构造一个可变长的向量（vector），其中元素的类型是字符，初始填充元素的值是0。然后用vector-push-extend把字符放到数组里面去。这种方法你也可以先告知系统你期望的字符串的长度。
* (defparameter *my-string* (make-array 0
:element-type 'character
:fill-pointer 0
:adjustable t))
*MY-STRING*
* *my-string*
""
* (dolist (char '(#\Z #\a #\p #\p #\a))
(vector-push-extend char *my-string*))
NIL
* *my-string*
"Zappa"
如果字符串将用任意的对象（符号，数字，字符，字符串）来构造，你可以用一个nil 值填充format函数的流参数来调用format函数。这会使format将指示输出返回成一个字符串。
* (format nil "This is a string with a list ~A in it"
'(1 2 3))
"This is a string with a list (1 2 3) in it"
我们也可以用format的格式化语句来循环构造字符串
* (format nil "The Marx brothers are:~{ ~A~}."
'("Groucho" "Harpo" "Chico" "Zeppo" "Karl"))
"The Marx brothers are: Groucho Harpo Chico Zeppo Karl."
format能做相当多的字符串处理但它的格式化语句的语法比较神秘，如果你想了解更多，你可以阅读 the CLHS section about formatted output
* (format nil "The Marx brothers are:~{ ~A~^,~}."
'("Groucho" "Harpo" "Chico" "Zeppo" "Karl"))
"The Marx brothers are: Groucho, Harpo, Chico, Zeppo, Karl."
另外一种创建字符串的方法是使用比较面向对象的方法with-output-to-string。这个便利的宏能把输出到它指定的字符串流中的数据返回成一个字符串。这意味着你能够在其中用任何方式构造字符串，包括威力强大的format。
* (with-output-to-string (stream)
(dolist (char '(#\Z #\a #\p #\p #\a #\, #\Space))
(princ char stream))
(format stream "~S - ~S" 1940 1993))
"Zappa, 1940 - 1993"
在字符串中循环处理字符
你可以使用MAP函数循环处理字符串中的字符。
* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (map 'string #'(lambda (c) (print c)) *my-string*)
#\G
#\r
#\o
#\u
#\c
#\h
#\o
#\Space
#\M
#\a
#\r
#\x
"Groucho Marx"
又或者你可以使用loop宏
* (loop for char across "Zeppo"
collect char)
(#\Z #\e #\p #\p #\o)
按字符或词来倒转字符串
按字符来倒转字符你可以使用内建的reverse函数（又或者是它的破坏性版本nreverse函数）
*(defparameter *my-string* (string "DSL"))
*MY-STRING*
* (reverse *my-string*)
"LSD"
在CL中没有简便的方式来按词倒转字符串（如果你在perl用的split和join）。你可以考虑使用split-sequence包中中的函数或者是研究你自己的解决方法。
下面是一种尝试：
* (defun split-by-one-space (string)
"Returns a list of substrings of string
divided by ONE space each.
Note: Two consecutive spaces will be seen as
if there were an empty string between them."
(loop for i = 0 then (1+ j)
as j = (position #\Space string :start i)
collect (subseq string i j)
while j))
SPLIT-BY-ONE-SPACE
* (split-by-one-space "Singing in the rain")
("Singing" "in" "the" "rain")
* (split-by-one-space "Singing in the rain")
("Singing" "in" "the" "" "rain")
* (split-by-one-space "Cool")
("Cool")
* (split-by-one-space " Cool ")
("" "Cool" "")
* (defun join-string-list (string-list)
"Concatenates a list of strings
and puts spaces between the elements."
(format nil "~{~A~^ ~}" string-list))
JOIN-STRING-LIST
* (join-string-list '("We" "want" "better" "examples"))
"We want better examples"
* (join-string-list '("Really"))
"Really"
* (join-string-list '())
""
* (join-string-list
(nreverse
(split-by-one-space
"Reverse this sentence by word")))
"word by sentence this Reverse"
大小写控制
common lisp有一堆控制字母大小写的函数，以下仅给出以下例子。
* (string-upcase "cool")
"COOL"
* (string-upcase "Cool")
"COOL"
* (string-downcase "COOL")
"cool"
* (string-downcase "Cool")
"cool"
* (string-capitalize "cool")
"Cool"
* (string-capitalize "cool example")
"Cool Example"
这些函数都有:start和:end关键字参数来让你决定更改源字符的哪一部分。另外，它们也有以‘N’开头的破坏性版本。
* (string-capitalize "cool example" :start 5)
"cool Example"
* (string-capitalize "cool example" :end 5)
"Cool example"
* (defparameter *my-string* (string "BIG"))
*MY-STRING*
* (defparameter *my-downcase-string* (nstring-downcase *my-string*))
*MY-DOWNCASE-STRING*
* *my-downcase-string*
"big"
* *my-string*
"big"
要注意的是这些函数的潜在功能，根据HyperSpec中的注解：“对于string-upcate,string-downcate和string-capitalize,源字符串是不会被更改的。然而，如果没有字符被更改，那么返回的结果将会是源字符串或者是它的一个复制，这取决于你选择的Common Lisp实现。”这意味着下面的例子中会会根据你选择的Common LIsp实现返回"BIG"或者是"BUG"。如果你想确定这一点，你可以使用copy-seq。
* (defparameter *my-string* (string "BIG"))
*MY-STRING*
* (defparameter *my-upcase-string* (string-upcase *my-string*))
*MY-UPCASE-STRING*
* (setf (char *my-string* 1) #\U)
#\U
* *my-string*
"BUG"
* *my-upcase-string*
"BIG"
消除字符串左右多余的空白字符
事实上不单能让你消除空白，也可以让消除你所指定的字符。string-trim,string-left-trim,string-right-trim返回一个去除了左右指定字符的子字符串。其中要消除的字符列表由第一个参数来指定，而源字符串由第二个参数来指定。
* (string-trim " " " trim me ")
"trim me"
* (string-trim " et" " trim me ")
"rim m"
* (string-left-trim " et" " trim me ")
"rim me "
* (string-right-trim " et" " trim me ")
" trim m"
* (string-right-trim '(#\Space #\e #\t) " trim me ")
" trim m"
* (string-right-trim '(#\Space #\e #\t #\m) " trim me ")
符号和字符串之间的转换
intern函数会把一个字符串转换成一个符号。实际上，它会检查所指定的符号在这个包中是否存在并且得到它，如果不存在，那么就把它生成并注册到包里面。但这些都超出了我们所要讨论的范围。
* (in-package "COMMON-LISP-USER")
#<The COMMON-LISP-USER package, 35/44 internal, 0/9 external>
* (intern "MY-SYMBOL")
MY-SYMBOL
NIL
* (intern "MY-SYMBOL")
MY-SYMBOL
:INTERNAL
* (export 'MY-SYMBOL)
T
* (intern "MY-SYMBOL")
MY-SYMBOL
:EXTERNAL
* (intern "My-Symbol")
|My-Symbol|
NIL
* (intern "MY-SYMBOL" "KEYWORD")
:MY-SYMBOL
NIL
* (intern "MY-SYMBOL" "KEYWORD")
:MY-SYMBOL
:EXTERNAL
如果想做相反的操作，转换一个符号成一个字符串，可以用symbol-name或者string
* (symbol-name 'MY-SYMBOL)
"MY-SYMBOL"
* (symbol-name 'my-symbol)
"MY-SYMBOL"
* (symbol-name '|my-symbol|)
"my-symbol"
* (string 'howdy)
"HOWDY"
在字符和字符串之间转换
你可以使用coerce去装换一个长度为1的字符串成一个字符。你也可以使用coerce去装换一个字符序列成为一个字符串，但你不能只用一个字符来期望转换成一个字符串，如果你要这样做，你可以考虑使用string来代替。
* (coerce "a" 'character)
#\a
* (coerce (subseq "cool" 2 3) 'character)
#\o
* (coerce "cool" 'list)
(#\c #\o #\o #\l)
* (coerce '(#\h #\e #\y) 'string)
"hey"
* (coerce (nth 2 '(#\h #\e #\y)) 'character)
#\y
* (defparameter *my-array* (make-array 5 :initial-element #\x))
*MY-ARRAY*
* *my-array*
#(#\x #\x #\x #\x #\x)
* (coerce *my-array* 'string)
"xxxxx"
* (string 'howdy)
"HOWDY"
* (string #\y)
"y"
* (coerce #\y 'string)
#\y can't be converted to type STRING.
[Condition of type SIMPLE-TYPE-ERROR]
在字符串中寻找匹配的元素
你可以使用find,position，以及他们带 -if的兄弟函数在字符串中寻找字符。
* (find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
#\t
* (find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
#\T
* (find #\z "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
NIL
* (find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
#\1
* (find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
#\0
* (position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
17
* (position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
0
* (position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
37
* (position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
43
又或者用count和他带 -if的兄弟函数来计算字符串中的字符。
* (count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
2
* (count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
3
* (count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
6
* (count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :start 38)
5
寻找子字符串
你可以用search函数来查找一个子字符串
* (search "we" "If we can't be free we can at least be cheap")
3
* (search "we" "If we can't be free we can at least be cheap" :from-end t)
20
* (search "we" "If we can't be free we can at least be cheap" :start2 4)
20
* (search "we" "If we can't be free we can at least be cheap" :end2 5 :from-end t)
3
* (search "FREE" "If we can't be free we can at least be cheap")
NIL
* (search "FREE" "If we can't be free we can at least be cheap" :test #'char-equal)
15
转换字符串为数字
common lisp提供了parse-integer函数来转换一个整数字符串为一个整数数值。第二个返回值表明了最后分析结束时的下标。
* (parse-integer "42")
42
2
* (parse-integer "42" :start 1)
2
2
* (parse-integer "42" :end 1)
4
1
* (parse-integer "42" :radix 8)
34
2
* (parse-integer " 42 ")
42
3
* (parse-integer " 42 is forty-two" :junk-allowed t)
42
3
* (parse-integer " 42 is forty-two")
Error in function PARSE-INTEGER:
There's junk in this string: " 42 is forty-two".
parse-integer并不能明白某些基数符号，比如说#X ,或者你可以使用内建函数来得到其他数值类型。你可以用read-from-string，但它会分析整个字符串来得到分析结果，而不只是分析部分得到整数结果。
* (read-from-string "#X23")
35
4
* (read-from-string "4.5")
4.5
3
* (read-from-string "6/8")
3/4
3
* (read-from-string "#C(6/8 1)")
#C(3/4 1)
9
* (read-from-string "1.2e2")
120.00001
5
* (read-from-string "symbol")
SYMBOL
6
* (defparameter *foo* 42)
*FOO*
* (read-from-string "#.(setq *foo* \"gotcha\")")
"gotcha"
23
* *foo*
"gotcha"
转换一个数字为字符串
通用的write-to-string或者它的简单版本prin1-to-stirng或princ-to-string可以装换一个数字为一个字符串。通过write-to-string的:base关键字参数可以转换输出数字的基数，如果不指定的话，那么基数就默认时10。需要说明的是，在lisp中，有理数有时候要用两个整数的商来表示。
* (write-to-string 250)
"250"
* (write-to-string 250.02)
"200.02"
* (write-to-string 250 :base 5)
"2000"
* (write-to-string (/ 1 3))
"1/3"
*
比较字符串
通用的equal和equalp函数可以用作测试两个字符串是否相等。字符串之间的比较是通过一个元素比一个元素来判定的。但也有只是用于字符串比较的函数集（STRING=, STRING/=, STRING<, STRING>, STRING<=, STRING>=, STRING-EQUAL, STRING-NOT-EQUAL, STRING-LESSP, STRING-GREATERP, STRING-NOT-GREATERP, STRING-NOT-LESSP）。
* (string= "Marx" "Marx")
T
* (string= "Marx" "marx")
NIL
* (string-equal "Marx" "marx")
T
* (string< "Groucho" "Zeppo")
0
* (string< "groucho" "Zeppo")
NIL
* (string-lessp "groucho" "Zeppo")
0
* (mismatch "Harpo Marx" "Zeppo Marx" :from-end t :test #'char=)
3