java数据汉字的数量_统计txt文件汉字个数

最新推荐文章于 2022-10-03 16:42:57 发布

weixin_39928686

最新推荐文章于 2022-10-03 16:42:57 发布

阅读量257

点赞数

文章标签： java数据汉字的数量

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39928686/article/details/114506091

版权

UTF-8 文件读取中文字符正则表达式 BufferedReader

关键词由CSDN通过智能技术生成

这里需需要注意一下：

如果txt文件的编码不是utf-8会是乱码，所以需要设置一下txt的编码。

package com.java.hanzi.utf;

import java.io.BufferedReader;

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;

import java.io.InputStreamReader;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class TxtCount {

/**

* @param args

*/

public static void main(String[] args) {

// TODO Auto-generated method stub

File file = new File("D:\\2012-0414.txt");

try {

//FileInputStream fin = new FileInputStream(file);

//FileReader默认使用的是GBK,查看123.txt文件的编码格式

//FileInputStreamReader(new InputStreamReader(new FileInputStream("path")),"UTF-8")

//

int count = 0;

//FileReader fr = new FileReader(file);

//System.out.println("fr.getEncoding()="+fr.getEncoding());

//BufferedReader bf = new BufferedReader(fr);

BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream(file),"UTF-8"));

//System.out.println("fr.getEncoding()="+fr.getEncoding());

String str = null;

while((str=br.readLine())!=null){

count = count + calculator(str);

}

System.out.println("-----"+count);

br.close();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

}

public static int calculator(String str){

int count = 0;

String regEx = "[\\u4e00-\\u9fa5]";

Pattern p = Pattern.compile(regEx);

Matcher m = p.matcher(str);

while (m.find()) {

for (int i = 0; i <= m.groupCount(); i++) {

count = count + 1;

}

}

System.out.println(str);

System.out.println("共有 " + count + "个 ");

return count;

}

}

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。