![](https://img-blog.csdnimg.cn/20201014180756926.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
Python
文章平均质量分 58
wangxihe2012
这个作者很懒,什么都没留下…
展开
-
PCA 降维
import numpy as npimport pandas as pdafrom sklearn.datasets import load_irisimport matplotlib.pyplot as plt#加载数据iris=load_iris()# print(iris)data=iris["data"]labels=iris["target"]# print(da...原创 2018-05-02 09:55:19 · 231 阅读 · 0 评论 -
python正则表达式
import re## ^ 匹配开始# $ 匹配行尾# . 匹配出换行符以外的任何单个字符# [......] 匹配括号内任何当个字符# [^......] 匹配单个字符或多个字符不在括号内# * 匹配0个或多个匹配前面的表达式# + 匹配1个或多个前面出现的表达式# ? 匹配0次或1次前面出现的表达式# {n} 精确匹配前面出现的表达式的数量# {n,m} 匹配至...原创 2018-05-30 10:31:56 · 118 阅读 · 0 评论 -
Python xlrd,xlsxwriter操作 excel
import xlrd,xlwt#打开excel文件并获取所有sheetworkbook = xlrd.open_workbook(r'D:\1.xlsx')sheetlist=workbook.sheet_names()# print (sheetlist)coldata=[]for ele in sheetlist: sheet=workbook.sheet_by_name...原创 2018-05-30 13:25:17 · 1409 阅读 · 1 评论 -
自动登录豆瓣(不出现验证码情况)
import urllib.requestimport xlsxwriterimport re#模拟post请求import urllib.parse, urllib.request, http.cookiejar, recookie = http.cookiejar.CookieJar()cookieProc = urllib.request.HTTPCookieProcesso...原创 2018-05-30 22:43:51 · 1020 阅读 · 0 评论 -
微信爬虫,爬取网页信息(使用代理和模拟浏览器)
#http://weixin.sogou.com/import reimport urllib.requestimport timeimport urllib.errorimport urllib.requestimport scipy#自定义函数,功能为使用代理服务器爬一个网址def use_proxy(proxy_addr,url): #建立异常处理机制 tr...原创 2018-05-31 18:45:00 · 5675 阅读 · 0 评论 -
普通爬虫(糗事百科)
import urllib.requestimport urllib.errorimport reheaders = ("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")opener = url...原创 2018-05-31 19:23:49 · 167 阅读 · 0 评论 -
糗事百科多线程介绍
import urllib.requestimport urllib.errorimport reimport threadingheaders = ("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537....原创 2018-05-31 19:32:58 · 258 阅读 · 0 评论 -
jieba分词,分词,解析词性。
import jieba#全模式sentence="教育系统耕耘,在重庆大学从学生成长为校长,2004年7月被明确为副部长级2010年调任武汉大学校长,任教育部副部长、党组成员"words1=jieba.cut(sentence,cut_all=True)print(words1)for word in words1: print(word)print("===========...原创 2018-06-01 16:12:47 · 2020 阅读 · 0 评论 -
计算分本相似度jieba ,wordcloud ,gensim
'''1、读取文档2、对要计算的多篇文档进行分词3、对文档进行整理成指定格式,方便后续进行计算4、计算出词语的频率5【可选】、对频率低的词语进行过滤6、通过语料库建立词典7、加载要对比的文档8、将要对比的文档通过doc2bow转化为稀疏向量9、对稀疏向量进行进一步处理,得到新语料库10、将新语料库通过tfidfmodel进行处理,得到tfidf11、通过token2id得到特...原创 2018-06-02 12:23:14 · 762 阅读 · 0 评论 -
检验(两样本T检验、相关分析、方差分析)(数据挖掘笔记一)
# -*- coding: utf-8 -*-"""Created on Sat Jul 28 13:40:57 2018@author: wangxihe"""#%%import pandas as pdimport statsmodels.api as smimport osimport numpy as npimport matplotlib.pyplot as plt...原创 2018-07-29 15:13:10 · 11733 阅读 · 0 评论 -
Python实现批量转word和excel工具
# -*- coding: utf-8 -*-"""Created on Thu Aug 9 14:21:16 2018@author: wangxihe"""try: import os os.chdir(r'D:\ExpPdf') from win32com.client import Dispatch,constants,gencacheexcept Exc...原创 2018-08-10 13:49:59 · 3073 阅读 · 0 评论 -
机器学习二(二分类问题)
# -*- coding: utf-8 -*-"""Created on Mon Aug 6 20:37:19 2018@author: wangxihe"""import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom scipy import stats from statsmodels...原创 2018-08-07 14:41:13 · 3778 阅读 · 1 评论 -
评分卡模型-(一特征构建)
# -*- coding: utf-8 -*-"""Created on Sun Sep 16 09:24:18 2018@author: wangxihe"""import osimport pandas as pdimport datetimeimport matplotlib.pyplot as pltimport collectionsimport numpy as...原创 2018-09-20 09:02:01 · 724 阅读 · 0 评论 -
python英文分词及字典排序
speak='''Chief Justice Roberts, President Carter, President Clinton, President Bush, President Obama, fellow Americans and people of the world, thank you.We, the citizens of America, are now joined in...原创 2018-05-28 14:40:31 · 1992 阅读 · 0 评论 -
机器学习(1)KNN算法手写体识别
from numpy import *import operatorfrom os import listdir#从列方向扩展#tile(a,(size,1))def knn(k,testdata,traindata,labels): traindatasize=traindata.shape[0] dif=tile(testdata,(traindatasize,1))...原创 2018-06-02 20:14:11 · 392 阅读 · 0 评论 -
python 回归拟合图形展示
import numpy as npimport pandas as pdaimport matplotlib.pyplot as pltimport matplotlib as mplimport seaborn as snssns.set()#color_codes=Truenp.random.seed(sum(map(ord,"regression")))tips=sns.l...原创 2018-04-29 17:30:40 · 2179 阅读 · 0 评论 -
信用卡异常检查(过采样,下采样、逻辑回归,混淆矩阵)
import pandas as pdaimport numpy as npimport matplotlib.pyplot as pltimport itertoolsimport missingnodata=pda.read_csv("creditcard.csv")# print(data.head())count_class=pda.value_counts(data.C...原创 2018-05-02 15:52:46 · 1280 阅读 · 0 评论 -
用户流失预测(KNN SVC RF)
import pandas as pdaimport numpy as npimport missingnoimport matplotlib.pyplot as pltuserData=pda.read_csv("churn.csv")print(userData.shape)# print(userData.describe())# print(userData.colum...原创 2018-05-02 16:04:20 · 870 阅读 · 0 评论 -
聚类分析
import pandas as pdaimport numpy as npimport missingnoimport matplotlib.pyplot as pltimport seaborn as sns#读入数据data=pda.read_csv("114_congress.csv")#显示前几行print(data.head())#查看缺失值missingno....原创 2018-05-02 16:14:39 · 339 阅读 · 0 评论 -
Python Numpy学习总结
numpy 是一个 Python 包。 它代表 “Numeric Python”。 它是一个由多维数组对象和用于处理数组的例程集合组成的库import numpy as np#定义一个一维数组a1=np.array([1,42,3,4,5,6])print("结果:",a1)#定义一个二维数组a2=np.array([ [1,2,3], ["a","b","c"], ["1a...原创 2018-04-26 15:35:07 · 369 阅读 · 0 评论 -
Python中的赋值、浅拷贝、深拷贝之间的区别
1.赋值: 只是复制了新对象的引用,不会开辟新的内存空间。2.浅拷贝: 创建新对象,其内容是原对象的引用,view3.深拷贝:只有一种形式,copy模块中的deepcopy函数。 和浅拷贝对应,深拷贝拷贝了对象的所有元素,包括多层嵌套的元素。 深拷贝出来的对象是一个全新的对象,不再与原来的对象有任何关联。import numpy as npa=np.array([12,10,11,...原创 2018-04-28 09:19:40 · 87 阅读 · 0 评论 -
Python数组排序
import numpy as np#创建一个二维数组data=np.sin(np.arange(20)).reshape(5,4)print("data:")print(data)# [[ 0. 0.84147098 0.90929743 0.14112001]# [-0.7568025 -0.95892427 -0.2794155 0.656986...原创 2018-04-28 10:20:17 · 1067 阅读 · 0 评论 -
Python取行和列数据及切片操作
import pandas as pdadata=pda.read_csv("food_info.csv")print(type(data))#<class 'pandas.core.frame.DataFrame'>print(data.describe())#描述信息print(data.dtypes)#各字段信息print(data.head())# 取前5条数据...原创 2018-04-28 11:03:06 · 13370 阅读 · 1 评论 -
python数据预处理 缺失值,指标统计
import pandas as pdaimport numpy as np#数据预处理data=pda.read_csv("titanic_train.csv")print(data.columns)#缺失值# print(data[pda.isnull(data["Age"])])# #非缺失值# print(data[pda.notnull(data["Age"])])...原创 2018-04-28 13:37:09 · 7496 阅读 · 0 评论 -
Python可视化分析球员裁判数据(一)
from __future__ import absolute_import,division,print_functionimport matplotlib as mplimport matplotlib.pyplot as pltfrom matplotlib.pyplot import GridSpecimport seaborn as snsimport numpy as n...原创 2018-04-28 20:17:15 · 932 阅读 · 0 评论 -
Python可视化分析球员裁判数据(二、单变量分析,缺失值可视化)
from __future__ import absolute_import,division,print_functionimport matplotlib as mplimport matplotlib.pyplot as pltfrom matplotlib.pyplot import GridSpecimport seaborn as snsimport numpy as n...原创 2018-04-28 20:20:13 · 819 阅读 · 0 评论 -
Python Seaborn画图库代码整理
import seaborn as snsimport matplotlib.pyplot as pltimport numpy as np#构造数据def sinplot(flip=1): x=np.linspace(0,14,100) print(x) for i in range(1,7): plt.plot(x,np.sin(x+i*0.5...原创 2018-04-29 09:33:23 · 750 阅读 · 0 评论 -
pandas_profiling 数据报表展示
from __future__ import absolute_import,division,print_functionimport matplotlib as mplimport matplotlib.pyplot as pltfrom matplotlib.pyplot import GridSpecimport seaborn as snsimport numpy as n...原创 2018-04-29 10:37:11 · 1100 阅读 · 0 评论 -
Python多变量图形展示
from __future__ import absolute_import,division,print_functionimport matplotlib as mplimport matplotlib.pyplot as pltfrom matplotlib.pyplot import GridSpecimport seaborn as snsimport numpy as np...原创 2018-04-29 17:20:23 · 1389 阅读 · 0 评论 -
评分卡模型(二数据清洗)
# -*- coding: utf-8 -*-"""Created on Sun Sep 16 19:04:53 2018@author: wangxihe"""import osimport pandas as pdimport numbersimport numpy as npimport matplotlib.pyplot as plt#%%os.chdir(r'E...原创 2018-09-20 09:09:21 · 739 阅读 · 0 评论