nlp
sinat_24395003
先学使用轮子,再学造轮子,再自己造轮子
展开
-
tfidf的tf粗暴过滤相似文本的过程二(计算性能优化)
from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.metrics.pairwise import linear_kernelimport numpy as npall_list= ['大雨預報1:16pm:大雨正影響台北東部,市民應提高警覺', '大雨預報1:02pm:大雨正影響台北東部,市民應提高警覺', '大雨預報12:35pm:大雨正影響台北東部,市民應提高警覺', '大雨預報3:46pm:未.原创 2020-09-18 13:11:08 · 272 阅读 · 0 评论 -
tfidf的tf粗暴过滤相似文本的过程
from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.metrics.pairwise import linear_kernelimport numpy as npall_list= ['大雨預報1:16pm:大雨正影響台北東部,市民應提高警覺', '大雨預報1:02pm:大雨正影響台北東部,市民應提高警覺', '大雨預報12:35pm:大雨正影響台北東部,市民應提高警覺', '大雨預報3:46pm:未.原创 2020-09-17 14:55:07 · 183 阅读 · 0 评论 -
不均衡样本的sampler构建 Imbalanced Dataset Sampler
from fastNLP.io import SST2Pipefrom fastNLP import DataSetIterfrom torchsampler import ImbalancedDatasetSamplerpipe = SST2Pipe()databundle = pipe.process_from_file()vocab = databundle.vocabs['words']print(databundle)print(databundle.datasets['train.原创 2020-08-12 10:06:51 · 1392 阅读 · 4 评论 -
理解 gluonnlp.data.batchify的pad机制
import mathimport mxnet as mximport numpy as npimport warningsdef _pad_arrs_to_max_length(arrs, pad_axis, pad_val, use_shared_mem, dtype, round_to=None): """Inner Implementation of the Pad batchify 填充[arr,arr]列表的数组维度是该维度下的最大值 Parameters .原创 2020-06-09 21:38:06 · 373 阅读 · 0 评论 -
CBOW的pytorch实现过程
代码来源 少量中文注解 纯学习https://github.com/joosthub/PyTorchNLPBook/blob/master/chapters/chapter_5/5_2_CBOW/5_2_Continuous_Bag_of_Words_CBOW.ipynbimport jsonimport osfrom argparse import Namespacefrom tq...原创 2019-12-19 18:58:00 · 1236 阅读 · 0 评论 -
3层LSTM+word2vec+rnnapi+beamsearch生成射雕英雄传第40章
参照https://github.com/PacktPublishing/Natural-Language-Processing-with-TensorFlow/blob/master/ch8/lstm_word2vec_rnn_api.ipynbembedding.pyimport reimport osimport numpy as npimport collectionsim...原创 2019-07-04 17:02:14 · 701 阅读 · 0 评论 -
手撕最简化LSTM运用BeamSearch生成射雕英雄传第40章
#参照网址 https://github.com/PacktPublishing/Natural-Language-Processing-with-TensorFlow/blob/master/ch8/lstms_for_text_generation.ipynbimport reimport collectionsimport numpy as npimport tens...原创 2019-06-29 14:31:04 · 611 阅读 · 0 评论