它们代表可打印文本最常见的代码点,加上换行符、空格和回车符等等。ASCII被覆盖到0x7F,而拉丁语-1或Windows代码页1251等标准将剩余的128字节用于重音字符等
你希望文本只使用那些代码点。二进制数据将使用0x00-0xFF范围内的所有码位;例如,文本文件可能不会使用\x00(NUL)或\x1F(ASCII标准中的单位分隔符)。在
不过,这充其量只是一种启发。一些文本文件可能仍然尝试在显式命名的7个字符之外使用C0 control codes,我确信存在的二进制数据碰巧不包括textchars字符串中未包含的25字节值。在
范围的作者可能基于file命令中的^{} table。它将字节标记为非文本、ASCII、Latin-1或非ISO扩展ASCII,并包含有关为什么选择这些代码点的文档:/*
* This table reflects a particular philosophy about what constitutes
* "text," and there is room for disagreement about it.
*
* [....]
*
* The table below considers a file to be ASCII if all of its characters
* are either ASCII printing characters (again, according to the X3.4
* standard, not isascii()) or any of the following controls: bell,
* backspace, tab, line feed, form feed, carriage return, esc, nextline.
*
* I include bell because some programs (particularly shell scripts)
* use it literally, even though it is rare in normal text. I exclude
* vertical tab because it never seems to be used in real text. I also
* include, with hesitation, the X3.64/ECMA-43 control nextline (0x85),
* because that's what the dd EBCDIC->ASCII table maps the EBCDIC newline
* character to. It might be more appropriate to include it in the 8859
* set instead of the ASCII set, but it's got to be included in *something*
* we recognize or EBCDIC files aren't going to be considered textual.
*
* [.....]
*/
有趣的是,表排除了0x7F,而您发现的代码没有。在