- 博客(6)
- 资源 (3)
- 收藏
- 关注
原创 simhash的python实现
import hashlibdef hash_str(s): md5 = hashlib.md5() md5.update(s) res = int(md5.hexdigest()[:16], base=16) return bin(res)[2:].zfill(64)def simhash(words, weights): words = ma
2017-03-23 23:04:33 1582
原创 自动摘要提取python,textrank
# encoding=utf-8import jiebaimport networkx as nxfrom sklearn.feature_extraction.text import TfidfVectorizer, TfidfTransformerdef cut_sentence(sentence): """ 分句 :param sentence:
2017-03-10 22:29:27 7711 9
原创 python中文分句
# -*-coding=UTF-8-*-def cut_sentences(sentence): if not isinstance(sentence, unicode): sentence = unicode(sentence) puns = frozenset(u'。!?') tmp = [] for ch in sentence:
2017-03-10 18:46:32 5850 2
原创 TextRank算法
# -*-coding=UTF-8-*-import networkxfrom nltk.tokenize.punkt import PunktSentenceTokenizerfrom sklearn.feature_extraction.text import CountVectorizer, TfidfTransformerdocument = """To Sherlock Hol
2017-03-10 15:56:43 1211
原创 bulk批量建立索引python
# encoding=utf-8import elasticsearch.helpersfrom elasticsearch import Elasticsearchpath = '/home/fhqplzj/data/orion/news.json'es = Elasticsearch('localhost:9200')my_index, my_type = 'test_index'
2017-03-04 17:17:50 1298
原创 simhash实现
import com.clearspring.analytics.hash.MurmurHash/** * Created by fhqplzj on 17-3-1 at 下午6:07. */object Sim { def simHash(features: Array[String], weights: Array[Int]): Long = { val hist =
2017-03-01 18:19:42 1121
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人