如下图,在文件夹“公共的”下面有a.c A.c D.c等文件,当我敲出命令ls [A-Z].c时,显示的是A.c 和 D.c ,这是符合的,但是当我敲出 ls [a-z].c 时,显示的却是 a.c A.c D.c,我懵逼了。我再次使用 ls [[:lower:]].c 显示的是 a.c , ls [[:upper:]].c 时显示A.c D.c ,这时终于正常了。但是不知道前面那个 [a-z].c 怎么匹配的不是小写的,竟然还出现大写了。
参考:http://unix.stackexchange.com/questions/227070/why-does-a-z-match-lowercase-letters-in-bash
Note that when using range expressions like [a-z], letters of the other case may be included, depending on the setting of LC_COLLATE.
LC_COLLATE
is a variable which determines the collation order used when sorting the results of pathname expansion, and determines the behavior of range expressions, equivalence classes, and collating sequences within pathname expansion and pattern matching.
Consider the following:
$ touch a A b B c C x X y Y z Z
$ ls
a A b B c C x X y Y z Z
$ echo [a-z] # Note the missing uppercase "Z"
a A b B c C x X y Y z
$ echo [A-Z] # Note the missing lowercase "a"
A b B c C x X y Y z Z
Notice when the command echo [a-z]
is called, the expected output would be all files with lower case characters. Also, with echo [A-Z]
, files with uppercase characters would be expected.
Standard collations with locales such as en_US
have the following order:
aAbBcC…xXyYzZ
- Between
a
andz
(in[a-z]
) are ALL uppercase letters, except forZ
. - Between
A
andZ
(in[A-Z]
) are ALL lowercase letters, except fora
.
See:
aAbBcC[...]xXyYzZ
| |
from a to z
aAbBcC[...]xXyYzZ
| |
from A to Z
If you change the LC_COLLATE
variable to C
it looks as expected:
$ export LC_COLLATE=C
$ echo [a-z]
a b c x y z
$ echo [A-Z]
A B C X Y Z
So, it’s not a bug, it’s a collation issue.
Instead of range expressions you can use POSIX defined character classes, such as upper
or lower
. They work also with different LC_COLLATE
configurations and even with accented characters:
$ echo [[:lower:]]
a b c x y z à è é
$ echo [[:upper:]]
A B C X Y Z