pyspark使用jieba.analyse报错IOError: [Errno 20] Not a directory:

问题:

jieba打包zip后上传spark运行jieba.analyse包中tfidf报错:

IOError: [Errno 20] Not a directory: 'XXXX/jieba.zip/jieba/analyse/idf.txt'

解决方案:

修改analyse包下的tf_idf.py如下(代码参考自:https://github.com/fxsjy/jieba/pull/539/files):

# encoding=utf-8
from __future__ import absolute_import
import os
import jieba
import jieba.posseg
from operator import itemgetter
from .._compat import get_module_res

_get_abs_path = jieba._get_abs_path

DEFAULT_IDF = "analyse/idf.txt"


class KeywordExtractor(object):

    STOP_WORDS = set((
        "the", "of", "is", "and", "to", "in", "that", "we", "for", "an", "are",
        "by", "be", "as", "on", "with", "can", "if", "from"
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值