赤醒醒-CSDN博客

原创简易英文问答系统（glove.6B词向量）

1.读入json文件转换成dict形式`# doc = open('train-v2.0.json').readlines()此时格式为list,需要转换格式成dictimport jsonf = open('train1-v2.0.json')doc = json.load(f)# 利用json.load()将list形式转换成dict的形式2.将全部问题和答案放入question...

2020-05-03 17:03:29 2214 1

原创简易英文问答系统（tf-idf向量）

1.读入json文件转换成dict形式# doc = open('train-v2.0.json').readlines()此时格式为list,需要转换格式成dictimport jsonf = open('train1-v2.0.json')doc = json.load(f)# 利用json.load()将list形式转换成dict的形式2.将全部问题和答案放入question_...

2020-04-30 23:31:06 533

原创拼写纠错python代码

from nltk import *from nltk.corpus import brown#每次访问数据需要添加数据至路径当中corpus = brown.sents()#.sent()整个语料库中的句子,sents(fileids=[f1,f2..],categories=[c1,c2...])import numpy as np# 读入字典#set() 函数创建一个无序不...

2020-04-19 16:13:37 1021 1

import typesimport mathfrom torch._six import inffrom functools import wrapsimport warningsimport weakreffrom collections import Counterfrom bisect import bisect_rightfrom .optimizer import OptimizerEPOCH_DEPRECATION_WARNING = (“scheduler.step（）中

2020-08-07 01:12:43 1700

原创基于联合贝叶斯的人脸验证

1.特征表示人脸由两个高斯变量的和表征：x=μ+εx=\mu+\varepsilonx=μ+ε这里xxx代表人脸，μ\muμ代表固有身份，ε\varepsilonε代表脸部变化（光照，姿态，表情等）。隐变量μ\muμ代表固有身份，ε\varepsilonε服从高斯分布 N(0,Σε)\ N(0, \Sigma \varepsilon) N(0,Σε),N(0,Sε)N\left(0, S_{\varepsilon}\right)N(0,Sε)以照片xix_{i}xi和xjx_{

2020-07-02 00:27:00 773

原创读取excel文件并绘图

pip install pandasLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: pandas in c:\users\zcc\appdata\local\programs\python\python37\lib\site-packag

2020-05-21 22:55:59 929 1

原创利用pandas来进行csv文件读取

read_csv:从文件，url，文件型对象中加载带分隔符的数据，默认分隔符为逗号read_table:从文件，url，文件型对象中加载带分隔符的数据，默认分隔符为制表符’/t’read_fwf:读取固定宽度的数据read_clipboard:读取剪切板中的数据，可以看作是read_table的剪切板，在将在网页上的数据转换为表格中的数据时用到pip install pandasLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpl

2020-05-21 20:28:34 853

原创 DataFrame的层次化索引访问及汇总运算

pip install pandasLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/Note: you may need to restart the kernel to use updated packages.WARNING: You are using pip version 20.0.2; however, version 20.1.1 is available.You should consider upgrad

2020-05-21 15:43:17 735

原创 pandas的层次化索引

层次化索引时pandas的一项比较重要的功能，可以在一个轴上拥有多个索引级别。另一种说法时可以以低纬度的形式处理高维度数据pip install pandasLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/Note: you may need to restart the kernel to use updated packages.WARNING: You are using pip version 20.0.2; ho

2020-05-21 14:22:02 301

原创 DataFrame和Series之间的运算和排序

DataFrame和Series之间的运算pip install pandasLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: pandas in c:\users\zcc\appdata\local\programs\python\py

2020-05-21 12:35:02 415

原创 pandas的重新索引，算数运算和数据对齐

pip install pandasLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: pandas in c:\users\zcc\appdata\local\programs\python\python37\lib\site-package

2020-05-21 11:09:36 270

原创 pandas数据类型简介

pip install pandasNote: you may need to restart the kernel to use updated packages.Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/Requirement already satisfied: pandas in c:\users\zcc\appdata\local\programs\python\python37\lib\site-packag

2020-05-20 17:44:35 983

原创 viterbi算法实现词性标注

word2id,id2word ={},{}tag2id,id2tag = {},{}with open('word_tagging_traindata.txt') as training_data: for line in training_data.readlines(): line = line.rstrip().split('/') word,tag = line[0],line[1] for letters in word.rstrip(

2020-05-20 15:06:28 393

原创简易英文问答系统(glove2word2vec)

利用gensim模块中的glove2word2vec来将glove词向量转化成另一个文件里的glove2word2vec向量，方便以后备用from gensim.test.utils import datapath,get_tmpfilefrom gensim.scripts.glove2word2vec import glove2word2vecglove_file = datapath(...

2020-05-03 23:05:04 1028

原创关于如何使用glove词向量

glove词向量为utf-8格式编码文件，python3中以gbk编码格式读入会出错：`glove = open('glove.6B.100d.txt', 'r')word = list()word_vector = list()line = glove.readline() #一行一行的读取，返回strwhile line: line = list(line.split())...

2020-05-03 21:24:36 2827

原创 python中read(),readline(),和readlines()区别

一、read([size])方法read([size])方法从文件当前位置起读取size个字节，若无参数size，则表示读取至文件结束为止，它返回为字符串对象二、readline()方法从字面意思可以看出，该方法每次读出一行内容，所以，读取时占用内存小，比较适合大文件，该方法返回一个字符串对象。三、readlines()方法读取整个文件所有行，保存在一个列表(list)变量中，每行作为一个...

2020-05-02 14:16:30 282

原创 PriorityQueue()在python中的应用

from queue import PriorityQueue#优先级队列q = PriorityQueue()# 格式：q.put((数字,值))#特点：数字越小，优先级越高c = [6,3,7,5,0,2]for i in c: q.put([i,10])b = q.get()print(b)#[0, 10]b = q.get()print(b)# [2, 10]...

2020-04-30 21:55:46 6681

原创错误：ValueError: shapes (4,4) and (1,4) not aligned: 4 (dim 1) != 1 (dim 0)

在使用 numpy重点矩阵和 array数组相乘的时候，经常会发生这个错误，import numpy as npd = np.array([[5,6,7,8]])c = np.array([[1,2,3,4],[3,4,5,6],[4,5,6,7],[2,3,4,6]])a = c.dot(d)print(a)错误：ValueError: shapes (4,4) and (1,4...

2020-04-30 19:42:53 12301 1

原创 nltk.download() 时遇到getaddrinfo failed错误

问题：nltk.download() 时遇到getaddrinfo failed解决方案：当NLTK下载程序启动时,其服务器索引更改为-http://nltk.org/nltk_data/可以通过进入NLTK Downloader窗口和File-> Change Server Index来更改此设置....

2020-04-29 13:12:51 11324 8

转载 NLP模型标准化处理（Normalization）

原文为：(https://www.cnblogs.com/jielongAI/p/10189907.html)一个NLP模型的表现（performance）很大程度取决于我们所拥有的词汇量（额。。。其实嘛，很难有一个定量的分析，起码目前在学习过程中给我的感觉是如此，词汇量应该掌握在一个具体什么样的程度呢？希望有大神看了我的博客文章也留言给我，是否有一个有效的衡量方法去查看究竟我们所需要的词汇量是...

2020-04-21 15:13:36 2556

原创 numpy数组transpose

numpy数组transpose python编程#np数组转置和轴对换import numpy as npa = np.arange(24).reshape((2,3,4))print(a)#[[[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]][[12 13 14 15][16 17 18 19][20 21 22 23]]]print(a[...

2020-04-15 17:55:11 304

原创 2020-01-29

2020-01-29 22:07:45 441

原创 2020-01-27

在验证集上调优模型已经是机器学习社区通用的做法，虽然理论上验证集调优后不论测试集有什么样的效果都不能再调整模型，但实际上模型的超参配置或多或少都会受到测试集性能的影响。因此研究社区可能设计出只在特定测试集上性能良好，但无法泛化至新数据的模型。训练集过拟合：过拟合的一个概念是训练准确率和测试准确率之间的差异。请注意，本研究的实验中的深度神经网络通常达到 100% 的训练准确率。所以这个过拟合的概念...

2020-01-27 21:35:13 117

wawjb的博客