AI小龘-CSDN博客

原创超全6500字Pyhton精简教程，期末必备

微信公众号获取，众多大学生必备干货

2021-12-20 22:17:28 268

原创 (Python sklearn+KMeans)聚类实现鸢尾花分类

#导入所需模块import matplotlib.pyplot as pltimport numpy as npfrom sklearn.cluster import KMeansfrom sklearn.datasets import load_iris#导入鸢尾花数据集iris = load_iris()X = iris.data[:]# print(X)print(X.shape)#肘方法看k值d=[]for i in range(1,11): #k取值1~11，做km.

2021-10-22 23:05:50 4268

原创 (Python gensim+Word2Vec)实现文本相似度计算

# -*-encoding=utf-8-*-import jiebafrom gensim.models.word2vec import Word2Vec# jieba分词返回列表def jieba_cut(sent): sent1 = jieba.lcut(sent) return sent1# gensim-Word2Vec模型训练def word2vec1(sent1,sent2): sent1 = jieba_cut(sent1) sent2 = jie.

2021-10-22 22:54:05 2448

原创 (Python jieba+bow)实现文本相似度比较

# -*- encoding=utf-8 -*-import jieba.possegimport jieba.analyseimport mathimport re# jieba实现中文分词def jieba_function(input1): input1 = re.sub(r'\W*', '',input1) # jieba.load_userdict("dic.txt") jieba.analyse.set_stop_words("3.txt") # 词.

2021-10-12 23:46:28 589

原创 (Python re+collections)实现贝叶斯单词检查器

# -*-encoding:utf-8-*-import re,collections# 把语料库中的单词全部抽取出来，转成小写，并去除单词中间的特殊符号def words(text): return re.findall('[a-z]+',text.lower())def train(features): model = collections.defaultdict(lambda:1) for f in features: model[f] += 1 .

2021-10-11 23:37:39 166

原创 (Python sklearn+KNeighborsClassifier)实现鸢尾花分类

# -*-encoding:utf-8-*-import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.metrics import confusion_matrixfrom sklearn.metrics import classification_report# 读取数据iris_.

2021-10-10 18:28:38 350

原创 (Python sklearn+LogisticRegression)实现乳腺癌预测

# -*-encoding='utf-8'-*-#导入pandas与numpy工具包import numpy as npimport pandas as pd#创建特征列表column_names = ['Sample code number','Clump Thickness','Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare .

2021-10-10 18:25:13 1343

原创 (Python tf-idf textrank)实现文章关键词提取

tf-idf(该文章该词词频/该文章总词数*（Log(文章总篇数/出现该词的文章数+1））偏词频提取# -*- coding:utf-8 -*-import jieba.analysestr_1 = "中央财政187.6亿保护草原生态，7月8日记者从财政部" \ "农业司获悉：2018年，中央财政安排新一轮草原生态保护" \ "补助奖励187.6亿元，支持实施禁牧面积12.06亿亩，草畜" \ "平衡面积26.05亿亩，并对工作突出、成效显著地区给予奖励"

2021-10-04 00:07:45 257

原创 (Anaconda创建虚拟环境添加tensorflow,keras)

操作系统：win10 64位python版本：3.8.81. 下载Anaconda2. 安装 (环境变量勾上)(系统默认显示不勾) 镜像源(我系统默认路径：C:\Users\86156\.condarc）channels: - https://mirrors.ustc.edu.cn/anaconda/pkgs/free/ - https://mirrors.ustc.edu.cn/anaconda/pkgs/main/ - https://mirrors....

2021-09-24 11:59:55 192

原创 (Python-jieba.posseg.cut)中文词性标注算法-我爱北京天安门

1.txt:我爱北京天安门词性标注结果写入2.txt# -*- encoding:utf-8 -*-import jieba.posseg# 读取文档with open("1.txt",'r',encoding='utf-8')as f: words_2=jieba.posseg.cut(f.read()) # 进行词性标注# 标注完写入文档with open("2.txt",'w',encoding='utf-8')as f: for i in words_2: ..

2021-09-16 23:11:09 1100

原创 (Python实现中文分词最大匹配算法)研究生命的起源

正向进行中文分词匹配：# -*- coding: utf-8 -*-# 待分词语句str_1='研究生命的起源'# 最大长度M=3# 词典列表list_1=['研究','研究生','生命','命','的','起源']# 字符串载体list_2=['研','究','生','命','的','起','源']# 找到定位点list_3=[]for i in range(len(str_1)//M+1): list_3.append(0+i*M)# 找需匹配的切片进行匹配for

2021-09-13 21:15:19 462

原创 (Python requests+正则 re) 提取猫眼Top100存入CSV文件

# -*- coding: utf-8 -*-import requestsimport reimport timeimport csvimport random#获取链接def get_one_page(url): try: agent_1='Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' agent_2='Mozilla/5.0 (Win.

2021-09-11 16:36:16 164

原创 (Python+nlp)正则表达式：

——查询学习正则表达式

2021-09-11 15:42:16 48

m0_51277974的博客