正则表达式的小应用(按空白分割文本但是保留"\n")

正则表达式从一段文字中提取想要的东西:

比如说从一篇英语阅读中提取除了空格以外的东西包括单词换行符标点下滑线:

Small Schools Rising
  This year's list of the top 100 high schools shows that today, those with fewer students are flourishing.
  Fifty years ago, they were the latest thing in educational reform: big, modern, suburban high schools with students counted in the thousands. As baby boomers(二战后婴儿潮时期出生的人) came of high-school age, big schools promised economic efficiency. A greater choice of courses, and, of course, better football teams. Only years later did we understand the trade-offs this involved: the creation of excessive bureaucracies(官僚机构),the difficulty of forging personal connections between teachers and students.SAT scores began dropping in 1963;today,on average,30% of students do not complete high school in four years, a figure that rises to 50% in poor urban neighborhoods. While the emphasis on teaching to higher, test-driven standards as set in No Child Left Behind resulted in significantly better performance in elementary(and some middle)schools, high schools for a variety of reasons seemed to have made little progress.
注意:此部分试题请在答卡1上作答.
  1. Fifty years ago. big. Modern. Suburban high schools were established in the hope of __________.
  A) ensuring no child is left behind
  B) increasing economic efficiency
  C) improving students' performance on SAT
  D)providing good education for baby boomers

代码:

String regex = "\\n+|[\\w\\']+|[\\W&&\\S]+";//定义正则式
FileReader fileReader=new FileReader("D:\\学习\\四级历年真解听力(2015.12)\\Small Schools Rising.txt");//提取文件就是上面的文本
String content ="";
int len=0;
char[] buf=new char[1024];
while((len=fileReader.read(buf))!=-1){//提取文件内容创建文本
	content+=new String(buf,0,len);
}
ArrayList<String>  strArray = new ArrayList<String> ();
Matcher matcher = Pattern.compile(regex).matcher(content);//编译正则并匹配文本
while(matcher.find()){
	strArray.add(matcher.group(0));//把匹配结果一个一个给字符串ArrayList
}
String regex = "\\n+|[\\w\\']+|[\\W&&\\S]+";//定义正则式

"\\n+|[\\w\\']+|[\\W&&\\S]+";

这个正则的意思是匹配所有的换行("\\n+"实现)

和单词([\\w\\']+实现, 其中的\\'是为了匹配的时候包括类似与"It's"这种中间有"’"的单词)

还有除了除了单词和空格的所有字符

该代码的结果是把每个单词还有下划线“_”和换行符"\n"从文本中分离出来分成一个一个

这样做就1可以保证把文本拆分,重组后还能保证它的排版(主要是换行)。




 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值