酒店评论预处理程序

def corpus_content(corpus_path, seg_path):
    catelist = os.listdir(corpus_path)  # 获取corpus_path下所有的子目录
    print("catelist", catelist)
    with open(seg_path, 'w', encoding="UTF-8") as fw:
        # 获取每个目录下的所有子文件
        for mydir in catelist:
            print(mydir)
            if mydir == "neg":
                label = -1
            else:
                label = 1
            class_path = corpus_path + mydir + "/"  # 拼出分类子目录的路径如:6000/neg/

            file_list = os.listdir(class_path)  # 获取未分词语料库中某一类别中的所有文本

            for file_path in file_list:  # 遍历类别目录下的所有文件
                filedir = class_path + file_path  # # 拼出文件名全路径如:train_corpus/art/21.txt
                content = readfile(filedir).replace(' ','').replace('\n', '')
                label_content = str(label)+" "+content+"\n"
                fw.write(label_content)

评论提供了有关酒店的大量信息。这个数据可用于许多nlp项目:推荐系统,情绪分析,同类酒店的图网,基于评论酒店细分。该数据集包含25个城市的酒店列表和评论。 file/opensearch/documents/92885/hotelReviewsInAustin__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBali__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBangkok__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBarcelona__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInBombay__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInChicago__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInDubai__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInHong Kong__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInIstanbul__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInLondon__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInMiami__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInMilan__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInNew York__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInOsaka__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInParis__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInPhuket__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInPrague__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInRome__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInSan Francisco__en2019100120191005.csv file/opensearch/documents/92885/hotelReviewsInSantorini__en2019100120191005.csv file/opense
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值