小试牛刀
nino_summer
这个作者很懒,什么都没留下…
展开
-
含虚词的词
use Encode; use utf8; open(a,"xuci1.txt"); while($l1=){ chomp($l1); @arr=("$l1"); $w1=$arr[0]; # $W1=encode("gbk",$w1); $hash1{$w1}=1; } foreach $u1(keys %hash1){ #$u2=encode("gbk",$u1); open(原创 2017-06-30 10:13:37 · 247 阅读 · 0 评论 -
含虚词的词并以虚词为文件名命名
use Encode; use utf8; open(a,"xuci1.txt"); while($l1=){ chomp($l1); @arr=("$l1"); $w1=$arr[0]; # $W1=encode("gbk",$w1); $hash1{$w1}=1; } foreach $u1(keys %hash1){ #$u2=encode("gbk",$u1); open(原创 2017-06-30 10:24:07 · 315 阅读 · 0 评论 -
整理词表
open(In,"corpus.txt"); open(out,">cibiao.txt"); while(){ chomp; @Words=$_=~/(\S+)\/\S+/g; @POSs=$_=~/\S+\/(\S+)/g; for($i=0;$i<@Words;$i++){ # if(defined $hash{$Word[$i]}){ # $refPOS=$hash{$Wo原创 2017-07-02 10:11:12 · 188 阅读 · 0 评论 -
除去文档中英文 数字 ?
open(In,"noun.txt"); open(out,">noun_2.txt"); while(){ chomp; @arr=split(" ",$_); $m=$arr[0]; $n=$arr[1]; if($m=~/[u4e00-u9fa5A-Za-z]{2,}/){ } else{ $m=~/^[^a-z]/i; $m=~/^[\u4e00-\u9fa5]{原创 2017-07-02 14:22:03 · 201 阅读 · 0 评论 -
合并文件
ReadDict("202_112_194_59_601_n{count_10}.txt"); ReadDict("202_112_194_59_602_n{count_10}.txt"); ReadDict("202_112_194_59_603_n{count_10}.txt"); ReadDict("202_112_194_59_604_n{count_10}.txt"); ReadDict原创 2017-07-02 14:29:13 · 273 阅读 · 0 评论