最近对维基中文数据进行一些统计,发现中文繁体简体混杂,需要把词条规范成简体中文。记录下cconv这个工具。
cconv建立在iconv之上,增加了词语转换能力,效果分析见后面。
Ubuntu用户可以用命令sudo apt-get install cconv安装。
命令说明如下:
Chinese-Convert Tool. Version 0.6.2 (inside libcconv version 0.6.2).
Copyright (c) 2008-2009, China. xiaoyjy@gmail.com
Usage: cconv [OPTION...] [FILE]
Convert encoding of given files from one encoding to another.
Input/Output format specification:
-f [NAME] encoding of original text! at this you can use GBK, BIG5 or UTF8
-t [NAME] encoding for output! at this you can use GBK, BIG5, UTF8-CN, UTF8-HK or UTF-TW
Information:
-l list coded character sets can used
Output control:
-o [FILE] output file
-?, -h, -v Show this help page.
Report bugs to <xiaoyjy@gmail.com>
如繁体转简体中文,输入
cconv -f utf8 -t utf8-cn zh.txt -o zh_cn.txt
cconv建立在iconv之上,增加了词语转换能力,效果分析见后面。
Ubuntu用户可以用命令sudo apt-get install cconv安装。
命令说明如下:
Chinese-Convert Tool. Version 0.6.2 (inside libcconv version 0.6.2).
Copyright (c) 2008-2009, China. xiaoyjy@gmail.com
Usage: cconv [OPTION...] [FILE]
Convert encoding of given files from one encoding to another.
Input/Output format specification:
-f [NAME] encoding of original text! at this you can use GBK, BIG5 or UTF8
-t [NAME] encoding for output! at this you can use GBK, BIG5, UTF8-CN, UTF8-HK or UTF-TW
Information:
-l list coded character sets can used
Output control:
-o [FILE] output file
-?, -h, -v Show this help page.
Report bugs to <xiaoyjy@gmail.com>
如繁体转简体中文,输入
cconv -f utf8 -t utf8-cn zh.txt -o zh_cn.txt