2020年09月_qq_40707462

12月 11月 10月 09月 08月 07月 06月 05月 04月

原创 nltk情感分析、文本相似度、文本分类

import nltkfrom nltk.stem import WordNetLemmatizerfrom nltk.corpus import stopwordsfrom nltk.classify import NaiveBayesClassifiertext1 = 'I like the movie so much!'text2 = 'That is a good movie.'text3 = 'This is a great one.'text4 = 'That is a real

2020-09-26 13:37:50 2126

原创英文分词nltk进行文本清洗

import nltknltk.download('punkt')#一个默认的方法，也可以用别的方法下载后可能会提示unzipped什么的，不用管，再运行一遍会发现已经satisfied了import nltksentence="python is a widely use high-level programing language"tokens=nltk.word_tokenize(sentence)print(tokens)结果：[‘python’, ‘is’, ‘a’, ‘wide

2020-09-23 20:25:13 1080

原创聚类模型kmeans算法python实现

#main.pyimport randomfrom kmeans_tools import Cluster, get_distance, gen_random_sampleimport matplotlib.pyplot as pltfrom matplotlib import colors as mcolorsdef kmeans(samples, k, cutoff): """ kmeans函数 """ # 随机选k个样本点作为初始聚类中心

2020-09-19 15:06:16 256

原创 python读取多种格式文件（txt，csv，json，sqlite）

1、txt①读取全部内容txt_filename = './files/python_baidu.txt'# 打开文件file_obj = open(txt_filename, 'r', encoding='utf-8')# 读取整个文件内容all_content = file_obj.read()# 关闭文件file_obj.close()print(all_content)②按行读取txt_filename = './files/python_baidu.txt'# 打开文件

2020-09-18 20:21:32 484

原创 leetcode【简单】1、两数之和

题目：给定一个整数数组 nums 和一个目标值 target，请你在该数组中找出和为目标值的那两个整数，并返回他们的数组下标。你可以假设每种输入只会对应一个答案。但是，数组中同一个元素不能使用两遍。解法1：暴力解法，两层遍历，第二层只需遍历第一层取的数以后的数，用时6440msclass Solution: def twoSum(self,nums,target): n = len(nums) for x in range(n): for y in range(x+1,n):

2020-09-18 15:57:02 61