我想了很多事情-CSDN博客

原创 Transformer源码分析

import numpy as npimport torchimport torch.nn as nnimport torch.nn.functional as Fimport math, copy, timefrom torch.autograd import Variableimport matplotlib.pyplot as pltimport seabornseabor...

2020-01-05 10:05:26 323

自回归语言模型（Autoregressive LM）语言模型其实是根据上文内容预测下一个可能跟随的单词，就是常说的自左向右的语言模型任务，或者反过来也行，就是根据下文预测前面的单词，这种类型的LM被称为自回归语言模型。GPT 就是典型的自回归语言模型。ELMO尽管看上去利用了上文，也利用了下文，但是本质上仍然是自回归LM，这个跟模型具体怎么实现有关系。ELMO是做了两个方向（从左到右以及从右...

2019-10-13 12:46:25 325

原创 Chinese NER Using Lattice LSTM

本文研究了lattice-structured LSTM模型用来做中文的NER，在character-based的序列标注的模型上改进得到的，在character-based模型中的每个character cell vector通过词向量输入门的控制引入以当前字符结束的在词表中出现的所有词的word cell vector得到新的character cell vector，用新的charact...

2019-08-18 20:12:04 280

原创利用词典进行命名实体

1.本文提供了只需要提供字典的情况下，实施NER任务，并对进行了标注数据的对比试验。2.使用未标记数据和命名实体字典来执行NER的方法。作者将任务表示为正未标记（PU, Positive-Unlabeled）学习问题，并由此提出一种PU学习算法来执行该任务。该方法的一个关键特征是它不需要字典标记句子中的每个实体，甚至不要求字典标记构成实体的所有单词，这大大降低了对字典质量的要求。文章最后对四个...

2019-08-11 09:23:17 1380 2

原创 QA问题

1.文章采用Document Retriever + Document Reader, 其中Document Retriever的问题是从Wikipedia抽取出相关的文档或段落，然后利用Document Reader进行阅读理解。2.Document Retriever采用的是TFIDF，Document Reader采用的是Bi-LSTM3.Document Retriever会检索到...

2019-07-28 11:30:48 527

原创 Reading Wikipedia to Answer Open-Domain Questions

本文是发表在 ACL2017 上的一篇论文，（1）Document Retriever：基于二元语法哈希（bigram hashing）和TF-IDF匹配的搜索组件对于给出的问题，有效地返回相关的文档（2）Document Reader：多层RNN机器阅读理解模型，在（1）所返回的文档中查找问题答案的所在。（3）Document Retriever结合 TF-IDF 加权的词袋向量...

2019-07-19 22:47:30 233

原创 QA问题总结

最近看了一些QA任务关于《Lstm-based Deep Learning Models for Non- factoid Answer Selection》的体会1.本文针对答案选择任务应用了通用的深度学习框架，该框架不依赖于手动定义的特征或语言工具。基本框架是建立BiLSTM模型的问题和答案的嵌入，并通过余弦相似度来衡量它们的相似程度。2.文章在通用的框架上做了几点改进：通过将卷积神经...

2019-07-13 09:57:37 2313

原创 bert总结

1.BERT是一个预训练的模型，用于下游任务的使用，这里在解释下什么是与训练模型：假设已有A训练集，先用A对网络进行预训练，在A任务上学会网络参数，然后保存以备后用，当来一个新的任务B，采取相同的网络结构，网络参数初始化的时候可以加载A学习好的参数，其他的高层参数随机初始化，之后用B任务的训练数据来训练网络，当加载的参数保持不变时，称为"frozen"，当加载的参数随着B任务的训练进行不断的改...

2019-06-29 11:12:34 520

原创 keras_bert运算

import numpy as npfrom keras_bert import load_trained_model_from_checkpoint,Tokenizerimport codecsimport pandas as pdfrom keras.layers import *from keras.models import Modelimport keras.backend...

2019-06-27 16:58:16 1383

原创 elmo问题

elmo中的拼接方式是否可以改变？

2019-06-25 21:48:08 420

原创 elmo总结

在回顾elmo能发现是从word2vec的一大进步，在bert,XL-Net横空出世的现在，elmo也是起到承上启下的作用，现在就总结下elmo吧。1.ELMo是一种新型深度语境化词表征，可对词进行复杂特征(如句法和语义)和词在语言语境中的变化进行建模，利用了隐状态Ht2.通过双向语言模型进行建模，虽然现在看起来不管是前向还是反向拼接在一起有点粗暴3.前后向语言模型为LSTM所构建...

2019-06-25 21:45:03 438

原创 elmo调试练习

import tensorflow_hub as hubimport tensorflow as tfimport reimport numpy as npimport pickleimport pandas as pdfrom nltk import WordNeatLemmatizer,word_tokenizefrom nltk.corpus import stopwords...

2019-06-25 20:46:13 581 1

原创 ELMO

最近重温了下elmo模型，主要有几点： 1- 相比于word2vec这些多了上下文的理解。 2 - 基本单元是一个两层的基于字符卷积的网络. 3- 内部状态的组合构成新的词汇向量表示. 4-elmo采用了双向bi-lstm模型，利用了语言模型，从elmo公式中我们就可以看出, 向左和向右的LSTM是不同的, 也就是说有两个 LSTM单元.是输入的意思. 输入的...

2019-06-22 20:53:05 242

原创 Ner

import codecsimport randomimport numpy as npfrom gensim import corporafrom keras.layers import Dense,GRU,Bidirectional,SpatialDropout1D,Embeddingfrom keras import preprocessingfrom keras.models...

2019-06-09 22:48:38 1819

原创 Bert文本分类 run_classifier内容

2019-05-30 21:49:26 1348

原创 PDF转换txt

# -*- coding: utf-8 -*-import sys#reload(sys)#sys.setdefaultencoding('utf-8')from pdfminer.pdfparser import PDFParserfrom pdfminer.pdfdocument import PDFDocumentfrom pdfminer.pdfpage import ...

2019-05-23 12:03:53 352

原创 excel转换txt

import xlrdimport osimport sysdef getListFiles(path): ret = [] for root, dirs, files in os.walk(path): for filespath in files: if filespath.endswith(".xls"): ...

2019-05-23 12:03:00 774

原创 wod清洗，docx

import docxfrom win32com import client as wcimport reimport osimport os.pathdef getListFiles(path): ret = [] for root, dirs, files in os.walk(path): for filespath in files: ...

2019-05-23 12:01:47 198

原创 word处理

#读取docx中的文本代码示例import docxfrom win32com import client as wcimport reimport osimport os.pathdef getListFiles(path): ret = [] for root, dirs, files in os.walk(path): for filespath...

2019-05-23 11:54:26 488

原创 gensim fasttext

from nltk import word_tokenize,WordNetLemmatizerimport pandas as pdfrom nltk.corpus import stopwordsimport refrom gensim import corporaimport gensimfrom gensim.models import word2vec,fasttextf...

2019-05-22 21:36:53 1237

原创 python动态规划之背包问题

import numpy as npdef bag(weight,values,weight_cont): num = len(weight) weight.insert(0,0) values.insert(0,0) bag = np.zeros((num+1,weight_cont+1),dtype=np.int) for i in range(1,...

2019-05-21 16:47:55 568

原创 python选择排序

def select_sort(list): len_list = len(list) for i in range(len_list): min = i for j in range(i+1,len_list): if list[j] < list[min]: min = j ...

2019-05-21 16:46:51 249

原创 python快速排序

def quick_sort(list): list_3 = [] list_1 = [] list_2 = [] if len(list) <=1: return list else: min = list[0] for i in list: if i > min: ...

2019-05-21 16:45:55 119

原创 python归并排序

def merge_sort(list): left_p = 0 right_p = 0 result = [] len_list = len(list) if len_list <= 1: return list mid = (len_list//2) left_list = merge_sort(list[:mid...

2019-05-21 16:44:42 215

原创 python堆排序

import heapqimport randomdef heapsort(li): h = [] for v in li: heapq.heappush(h,v) return [heapq.heappop(h) for i in range(len(h))]if __name__ == '__main__': li = [1,2,3,4,...

2019-05-21 16:43:31 203

原创 python冒泡排序

def bubble_sort(list): len_list = len(list) for i in range(len_list): for j in range(i+1,len_list): if list[i] > list[j]: list[i],list[j] = list[j],list...

2019-05-21 16:42:41 149

转载 python二叉树以及遍历方法

class Node: def __init__(self,elem=-1,lchild=None,rchild=None): self.elem = elem self.lchild = lchild self.rchild = rchildclass Tree1: def __init__(self,root=None): ...

2019-05-21 16:40:45 134

原创贝叶斯，SVM分类

from sklearn.model_selection import train_test_splitfrom sklearn.naive_bayes import MultinomialNBfrom sklearn import svmimport jiebafrom sklearn.feature_extraction.text import CountVectorizer,Tfi...

2019-05-20 14:45:35 1050

原创 gensim中的word2vec与faxttext

from nltk import word_tokenize,WordNetLemmatizerimport pandas as pdfrom nltk.corpus import stopwordsimport refrom gensim import corporaimport gensimfrom gensim.models import word2vec,fasttextf...

2019-05-18 21:23:42 228

原创基于互信息与左右信息熵的新词发现

import refrom collections import Counterimport numpy as npdef ngram_words(file,ngram_cont): words = [] for i in range(1,ngram_cont): words.extend([file[j:j+i] for j in range(len(fi...

2019-05-16 17:00:24 2000

原创 gensim跟sklearn对tf-idf的使用

from nltk import word_tokenize,WordNetLemmatizerimport pandas as pdfrom nltk.corpus import stopwordsimport refrom gensim import corporafrom gensim import modelsfrom sklearn.feature_extraction.t...

2019-05-15 22:48:37 936 1

原创 jieba的基本操作

import jiebaimport reimport jieba.analyseimport jieba.posseg as psegfrom collections import Counterdef token(file): f = open(r'E:\BaiduNetdiskDownload\cnews\stop_word.txt','r',encoding='utf8...

2019-05-14 21:20:01 1032

原创中文数据预处理

from keras.layers import Dense,BatchNormalization,Bidirectional,CuDNNLSTM,Conv1D,MaxPooling1D,SpatialDropout1D,Dropout,Embeddingimport numpy as npfrom keras import preprocessingimport jiebaimpor...

2019-05-12 21:27:03 827

原创 Kaggle情绪分类attention版本

# This Python 3 environment comes with many helpful analytics libraries installed# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python# For example, here's seve...

2019-04-24 15:38:04 722

原创链表找环

def hasCycle(self,head): fast = slow = head while slow and fast and fast.next: slow = slow.next fast = fast.next.next if slow is fast: ...

2019-04-23 21:40:22 186

原创 python链表反转

最近被大厂问懵逼，还是要多复习多练数据结构的 def reverseList(self,head): cur,prev = head,None while cur: cur.next,prev,cur = prev,cur,cur.next return prev...

2019-04-23 21:39:43 159

原创 python链表相邻元素互换反转

def swapPairs(self,head): pre,pre.next = self,head while pre.next and pre.next.next a = pre.next b = a.next pre.next,b.next,a.next = b,a,b.next pre = a ret...

2019-04-21 21:36:43 711 1

原创链表的基本操作

数据结构，好久没复习了，好多都忘了，手也生了，最近多家复习啊#链表class Node: def __init__(self,data): self.data = data self.next = None#单链表class SingleLinkList: def __init__(self,node = None): s...

2019-04-14 22:02:55 115

原创 kaggle影评情绪分类比赛调整版

kaggle情绪分类比赛打了好久，从最开始的暴力版，一直更新到现在终于有个稍微能看的识别率了，故记录下过程，之后再调整一波试试：import pandas as pdfrom keras.layers import Dense,LSTM,Bidirectional,Embedding,Conv1D,MaxPooling1D,GlobalMaxPooling1D,Dropout,Spatia...

2019-04-05 22:12:21 504

原创 leetcode第二天

#给出一个 32 位的有符号整数，你需要将这个整数中每位上的数字进行反转。class Solution(object): def reverse(self, x): """ :type x: int :rtype: int """ list = [] if x == 0: ...

2019-04-05 22:03:05 105

tensorflow

空空如也