cousera NLP专项课程
砰!
这个作者很懒,什么都没留下…
展开
-
文本摘要-transformers
1.数据准备import sysimport osimport numpy as npimport textwrapwrapper = textwrap.TextWrapper(width=70)import traxfrom trax import layers as tlfrom trax.fastmath import numpy as jnp# to print the entire np arraynp.set_printoptions(threshold=sys.原创 2020-10-27 22:37:40 · 1442 阅读 · 0 评论 -
BLEU and ROUGE
1.BLEUreference = "The NASA Opportunity rover is battling a massive dust storm on planet Mars."candidate_1 = "The Opportunity rover is combating a big sandstorm on planet Mars."candidate_2 = "A NASA rover is fighting a massive storm on planet Mars."原创 2020-10-27 17:23:59 · 379 阅读 · 0 评论 -
NMT with Attention
1.数据准备termcolor.colered 对输出进行染色,凸显。colored(f"tokenize('hello'): ", 'green')from termcolor import coloredimport randomimport numpy as npimport traxfrom trax import layers as tlfrom trax.fastmath import numpy as fastnpfrom trax.supervised impor..原创 2020-10-27 16:54:00 · 1094 阅读 · 1 评论 -
POS标记——HMM模型
1.数据准备from utils_pos import get_word_tag, preprocess import pandas as pdfrom collections import defaultdictimport mathimport numpy as npwith open("WSJ_02-21.pos", 'r') as f: training_corpus = f.readlines()with open("hmm_vocab.txt", 'r') as原创 2020-10-15 23:12:19 · 843 阅读 · 0 评论 -
自动补齐
1.数据准备import refrom collections import Counterimport numpy as npimport pandas as pddef process_data(file_name): """ Input: A file_name which is found in your current directory. You just have to read it in. Output: wor原创 2020-10-15 22:27:47 · 285 阅读 · 0 评论 -
机器翻译(english to french)&LSH
1.预处理和数据获取import pdbimport pickleimport stringimport timeimport gensimimport matplotlib.pyplot as pltimport nltkimport numpy as npimport scipyimport sklearnfrom gensim.models import KeyedVectorsfrom nltk.corpus import stopwords, twitter_sam原创 2020-09-28 23:25:07 · 699 阅读 · 0 评论 -
哈希表
1.单值的hash tabledef basic_hash_table(value_l, n_buckets): def hash_function(value, n_buckets): return int(value) % n_buckets hash_table = {i:[] for i in range(n_buckets)} # Initialize all the buckets in the hash table as empty l原创 2020-09-28 19:11:28 · 121 阅读 · 0 评论 -
PCA降维
def compute_pca(X, n_components=2): """ Input: X: of dimension (m,n) where each row corresponds to a word vector n_components: Number of components you want to keep. Output: X_reduced: data transformed in 2 dims/columns.原创 2020-09-28 18:34:34 · 133 阅读 · 0 评论 -
朴素贝叶斯——情感分类
1.数据集的获取与处理同logisti回归中所用方法from utils import process_tweet, lookupimport pdbfrom nltk.corpus import stopwords, twitter_samplesimport numpy as npimport pandas as pdimport nltkimport stringfrom nltk.tokenize import TweetTokenizerfrom os import getc原创 2020-09-28 17:05:51 · 359 阅读 · 0 评论 -
logistic regression——情感分类
1.数据集的准备2.文本预处理正则表达式提取文本nltk.tokennize分词去除标点和stopwordsstemming提取词的主干3.提取特征生成词频字典利用词频字典对每条tweet生成特征向量x[bias,pos,neg]4.logistic回归模型def gradientDescent(x, y, theta, alpha, num_iters): ''' Input: ...原创 2020-09-28 16:10:16 · 533 阅读 · 0 评论