![](https://img-blog.csdnimg.cn/20201014180756724.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
小试牛刀
nino_summer
这个作者很懒,什么都没留下…
展开
-
含虚词的词
use Encode;use utf8;open(a,"xuci1.txt");while($l1=){ chomp($l1); @arr=("$l1"); $w1=$arr[0]; # $W1=encode("gbk",$w1); $hash1{$w1}=1;}foreach $u1(keys %hash1){ #$u2=encode("gbk",$u1); open(原创 2017-06-30 10:13:37 · 237 阅读 · 0 评论 -
含虚词的词并以虚词为文件名命名
use Encode;use utf8;open(a,"xuci1.txt");while($l1=){ chomp($l1); @arr=("$l1"); $w1=$arr[0]; # $W1=encode("gbk",$w1); $hash1{$w1}=1;}foreach $u1(keys %hash1){ #$u2=encode("gbk",$u1); open(原创 2017-06-30 10:24:07 · 285 阅读 · 0 评论 -
整理词表
open(In,"corpus.txt");open(out,">cibiao.txt");while(){ chomp; @Words=$_=~/(\S+)\/\S+/g; @POSs=$_=~/\S+\/(\S+)/g; for($i=0;$i<@Words;$i++){ # if(defined $hash{$Word[$i]}){ # $refPOS=$hash{$Wo原创 2017-07-02 10:11:12 · 176 阅读 · 0 评论 -
除去文档中英文 数字 ?
open(In,"noun.txt"); open(out,">noun_2.txt");while(){ chomp; @arr=split(" ",$_); $m=$arr[0]; $n=$arr[1]; if($m=~/[u4e00-u9fa5A-Za-z]{2,}/){ } else{ $m=~/^[^a-z]/i; $m=~/^[\u4e00-\u9fa5]{原创 2017-07-02 14:22:03 · 193 阅读 · 0 评论 -
合并文件
ReadDict("202_112_194_59_601_n{count_10}.txt");ReadDict("202_112_194_59_602_n{count_10}.txt");ReadDict("202_112_194_59_603_n{count_10}.txt");ReadDict("202_112_194_59_604_n{count_10}.txt");ReadDict原创 2017-07-02 14:29:13 · 239 阅读 · 0 评论