数据分析
数据分析相关知识
超级D洋葱
人生苦短,我用Python!
展开
-
Python数据分析:基本文件操作
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :3.1.1 基本文件操作.py# 写入文件def fun_1(): f = open('hello.txt','w') f.write('Hello World') f.write('Good Morning') f.close()# 写入原创 2020-09-24 10:22:17 · 282 阅读 · 0 评论 -
Python Pandas操作数据库SQL
你需要安装:pip install sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simplepip install pymysql -i https://pypi.tuna.tsinghua.edu.cn/simpleimport pandas as pd# SQLAlchemy是Python编程语⾔下的⼀款开源软件。提供了SQL⼯具包及对象关系映射(ORM)⼯具from sqlalchemy import create_enginedf原创 2020-12-27 15:23:41 · 411 阅读 · 0 评论 -
python pandas位置选择
import pandas as pdimport numpy as npdf = pd.DataFrame(data = np.random.randint(0,150,size = [10,3]),# 计算机科⽬的考试成绩 index = list('ABCDEFGHIJ'),# ⾏标签 columns=['Python','Tensorflow','Keras'])df.iloc[4] # ⽤整数位置选择。df.iloc[2:8,0:2] # ⽤整数切⽚,类似NumPydf.iloc原创 2020-12-27 15:29:30 · 442 阅读 · 0 评论 -
python pandas 赋值操作
import pandas as pdimport numpy as npdf = pd.DataFrame(data = np.random.randint(0,150,size = [10,3]),# 计算机科⽬的考试成绩 index = list('ABCDEFGHIJ'),# ⾏标签,⽤户 columns=['Python','Tensorflow','Keras']) # 考试科⽬s = pd.Series(data = np.random.randint(0,150,size =原创 2020-12-27 15:31:49 · 4811 阅读 · 0 评论 -
Python数据分析:python与numpy效率对比
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :3.2.2 python与numpy效率对比.py# @Time :2020/9/23 10:07import sysfrom datetime import datetimeimport numpy as npimport matplotlib.pyplot原创 2020-09-24 10:27:25 · 852 阅读 · 0 评论 -
营销理论模型:4P、STP理论、SWOT图解
一、4P营销模型上个世纪60年代时,著名的营销学家麦卡锡提出了经典的“4P营销理论”,对市场营销的理论和实践产生了深刻的影响。1.产品(product):指企业提供给目标市场的有形与无形产品,包括产品的实体、品牌、包装、样式、服务、技术和观念等。2.价格(price):指企业出售产品所追求的经济回报,包括基本价格、折扣价格、付款期限及各种定价方法和定价技巧等因素的组合和运用。3.促销(promotion):是指企业利用各种信息载体与目标市场进行沟通的传播活动,包括广告、人员推销、营业推广与公共关系原创 2020-10-24 15:43:55 · 85582 阅读 · 0 评论 -
python pandas获取数据
import pandas as pdimport numpy as npdf = pd.DataFrame(data = np.random.randint(0,150,size = [150,3]),# 计算机科⽬的考试成绩 columns=['Python','Tensorflow','Keras'])df['Python'] # 获取单列,Seriesdf.Python # 获取单列,Seriesdf[['Python','Keras']] # 获取多列,DataFramedf[3原创 2020-12-27 15:28:02 · 224 阅读 · 0 评论 -
python pandas 数据排序
import numpy as npimport pandas as pddf = pd.DataFrame(data = np.random.randint(0,30,size = (30,3)), index = list('qwertyuioijhgfcasdcvbnerfghjcf'), columns = ['Python','Keras','Pytorch'])# 1、索引列名排序df.sort_index(axis = 0,ascending=True) # 按索引排序,降序原创 2020-12-27 16:59:24 · 1825 阅读 · 0 评论 -
python pandas 分组聚合 groupby
import numpy as npimport pandas as pd# 准备数据df = pd.DataFrame(data = {'sex':np.random.randint(0,2,size = 300), # 0男,1⼥ 'class':np.random.randint(1,9,size = 300),#1~8⼋个班 'Python':np.random.randint(0,151,size = 300),#Python成绩 'Keras':np.random.randint(原创 2020-12-27 18:09:45 · 2372 阅读 · 3 评论 -
Python Pandas读写csv文件
import numpy as npimport pandas as pddf = DataFrame(data = np.random.randint(0,50,size = [50,5]), # 薪资情况 columns=['IT','化⼯','⽣物','教师','⼠兵'])# 保存到当前路径下,⽂件命名是:salary.csv。csv逗号分割值⽂件格式df.to_csv('./salary.csv', sep = ';', # ⽂本分隔符,默认是逗号 header = True,#原创 2020-12-27 15:18:50 · 1178 阅读 · 0 评论 -
Python Pandas读写HDF5
先安装:pip install tables -i https://pypi.tuna.tsinghua.edu.cn/simpleHDF5是⼀个独特的技术套件,可以管理⾮常⼤和复杂的数据收集。HDF5,可以存储不同类型数据的⽂件格式,后缀通常是.h5,它的结构是层次性的。⼀个HDF5⽂件可以被看作是⼀个组包含了各类不同的数据集。对于HDF5⽂件中的数据存储,有两个核⼼概念:group 和 datasetdataset 代表数据集,⼀个⽂件当中可以存放不同种类的数据集,这些数据集如何管理,就原创 2020-12-27 15:25:55 · 1557 阅读 · 0 评论 -
Python数据分析:json文件的读取
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :3.1.3 JSON文件的读取.pyimport jsondef fun_1(): data = {'Tom':{'Weight':65,'Score':90,'Height':170}} json_str = json.dumps(data) pri原创 2020-09-24 10:24:48 · 804 阅读 · 0 评论 -
python pandas 数据清洗
import numpy as npimport pandas as pddf = pd.DataFrame(data = {'color': ['red','blue','red','green','blue',None,'red'], 'price':[10,20,10,15,20,0,np.NaN]})# 1、重复数据过滤df.duplicated() # 判断是否存在重复数据df.drop_duplicates() # 删除重复数据# 2、空数据过滤df.isnull() #原创 2020-12-27 16:02:29 · 254 阅读 · 0 评论 -
Python数据分析:pandas之Series
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :series.pyimport pandas as pdfrom pandas import Series,DataFrame# 创建def fun_1(): series1 = Series([1,2,3]) print('\nseries1:')原创 2020-09-24 10:29:12 · 282 阅读 · 0 评论 -
python pandas标签选择
import pandas as pdimport numpy as npdf = pd.DataFrame(data = np.random.randint(0,150,size = [10,3]),# 计算机科⽬的考试成绩 index = list('ABCDEFGHIJ'),# ⾏标签 columns=['Python','Tensorflow','Keras'])df.loc[['A','C','D','F']] # 选取指定⾏标签数据。df.loc['A':'E',['Python原创 2020-12-27 15:28:41 · 534 阅读 · 0 评论 -
python pandas 分箱操作
分箱操作就是将连续数据转换为分类对应物的过程。⽐如将连续的身⾼数据划分为:矮中⾼。分箱操作分为等距分箱和等频分箱。分箱操作也叫⾯元划分或者离散化。import numpy as npimport pandas as pddf = pd.DataFrame(data = np.random.randint(0,150,size = (100,3)), columns=['Python','Tensorflow','Keras'])# 1、等宽分箱pd.cut(df.Python,bins原创 2020-12-27 17:00:20 · 9255 阅读 · 3 评论 -
Python数据分析:csv文件的存取
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :3.1.2 csv文件的存取.pyimport pandas as pd# pandas保存数据到csv文件def fun_1(): # 生成一些数据 data = {'A':[1,2,3],'B':[4,5,6]} df = pd.DataFrame原创 2020-09-24 10:23:52 · 659 阅读 · 0 评论 -
管理理论模型:PEST、5W2H、时间管理、生命周期、逻辑树、金字塔、SMART原则
1.PEST2.5W2H3.时间管理4.生命周期(1)第一阶段:介绍(引入)期指产品从设计投产直到投入市场进入测试阶段。新产品投入市场,便进入了介绍期。此时产品品种少,顾客对产品还不了解,除少数追求新奇的顾客外,几乎无人实际购买该产品。生产者为了扩大销路,不得不投入大量的促销费用,对产品进行宣传推广。该阶段由于生产技术方面的限制,产品生产批量小,制造成本高,广告费用大,产品销售价格偏高,销售量极为有限,企业通常不能获利,反而可能亏损。(2)第二阶段:成长期当产品进入引入期,销售取得成原创 2020-10-24 15:59:16 · 4321 阅读 · 0 评论 -
python pandas concat数据串联
import pandas as pdimport numpy as npdf1 = pd.DataFrame(data = np.random.randint(0,150,size = [10,3]),# 计算机科⽬的考试成绩 index = list('ABCDEFGHIJ'),# ⾏标签,⽤户 columns=['Python','Tensorflow','Keras']) # 考试科⽬df2 = pd.DataFrame(data = np.random.randint(0,150,s原创 2020-12-27 15:39:11 · 277 阅读 · 0 评论 -
Python数据分析:Numpy基本操作
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :3.2.2 NumPy基本操作.pyimport numpy as np# 数组的创建def fun_1(): narr1 = np.arange(0,10,1) print('narr1:\n',narr1) narr2 = np.arange(0,原创 2020-09-24 10:25:33 · 268 阅读 · 0 评论 -
python pandas Join SQL⻛格合并
数据集的合并(merge)或连接(join)运算是通过⼀个或者多个键将数据链接起来的。这些运算是关系型数据库的核⼼操作。pandas的merge函数是数据集进⾏join运算的主要切⼊点。import pandas as pdimport numpy as np# 表⼀中记录的是name和体重信息df1 = pd.DataFrame(data = {'name': ['softpo','Daniel','Brandon','Ella'],'weight':[70,55,75,65]})# 表⼆中原创 2020-12-27 15:54:18 · 249 阅读 · 0 评论 -
Python数据分析:数据的获取 requests bs4 BeautifulSoup
假如要获取如下网站的数据:示例代码:#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :2.2.2 请求网页数据 网页解析.py# @Time :2020/9/18 10:52# @Author :tangxing07import requestsfrom bs4 import Beautifu原创 2020-09-18 11:30:00 · 604 阅读 · 0 评论 -
Python数据分析:pandas之DataFrame
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :DataFrame.pyimport pandas as pdfrom pandas import Series,DataFrame# 创建def fun_1(): d2 = {'prev':[-3,-2,-1],'now':[0,0,0],'next':[1,2,3原创 2020-09-24 10:28:23 · 325 阅读 · 0 评论 -
python pandas 轴和元素转换
import numpy as npimport pandas as pddf = pd.DataFrame(data = np.random.randint(0,10,size = (10,3)), index = list('ABCDEFHIJK'), columns=['Python','Tensorflow','Keras'])df.iloc[4,2] = None # 空数据#1、重命名轴索引df.rename(index = {'A':'AA','B':'BB'},colum原创 2020-12-27 16:18:43 · 473 阅读 · 0 评论 -
Python数据分析:Numpy函数应用
#!/usr/bin/env python# -*- coding:utf-8 -*-# @ProjectName :数据分析学习# @ProductName :PyCharm # @FileName :函数应用.pyimport pandas as pdfrom pandas import Series,DataFramedef fun_1(): d1 = {'prev':[-3,-2,-1],'now':[0,0,0],'next':[1,2,3]} df原创 2020-09-24 10:30:02 · 244 阅读 · 0 评论 -
从电子到产品:用户体验
1.蜂窝模型:2.5E原则:3.尼尔森十大可用性原则:原创 2020-11-24 08:42:46 · 216 阅读 · 0 评论 -
Python Pandas读写Excel文件
读写excel需要用到:pip install xlrd -i https://pypi.tuna.tsinghua.edu.cn/simplepip install xlwt -i https://pypi.tuna.tsinghua.edu.cn/simple示例代码:import numpy as npimport pandas as pddf1 = pd.DataFrame(data = np.random.randint(0,50,size = [50,5]), # 薪资情况 c原创 2020-12-27 15:21:00 · 1307 阅读 · 2 评论 -
python pandas 条件选择
import pandas as pdimport numpy as npdf = pd.DataFrame(data = np.random.randint(0,150,size = [10,3]),# 计算机科⽬的考试成绩 index = list('ABCDEFGHIJ'),# ⾏标签,⽤户 columns=['Python','Tensorflow','Keras']) # 考试科⽬cond1 = df.Python > 100 # 判断Python分数是否⼤于100,返回值是b原创 2020-12-27 15:30:29 · 1371 阅读 · 0 评论 -
数据可视化之matplotlib实战:plt.step() 绘制阶梯图
import matplotlib.pyplot as pltimport numpy as npx = np.linspace(1,10,10)y = np.sin(x)plt.step(x,y,color="#8dd3c7", where="pre",lw=2)plt.xlim(0,11)plt.xticks(np.arange(1,11,1))plt.ylim(-1.2,1.2)plt.show()原创 2020-10-04 11:33:55 · 6362 阅读 · 0 评论 -
Matplotlib绘制面积图
import matplotlib.pyplot as pltplt.figure(figsize=(9,6))days = [1,2,3,4,5]sleeping =[7,8,6,11,7]eating = [2,3,4,3,2]working =[7,8,7,2,2]playing = [8,5,7,8,13]plt.stackplot(days,sleeping,eating,working,playing)plt.xlabel('x')plt.ylabel('y')plt.t原创 2021-01-07 10:22:45 · 2484 阅读 · 0 评论 -
Matplotlib图表视图不均匀显示
方式1:import numpy as npimport matplotlib.pyplot as plt# 需要导⼊gridspec模块x = np.linspace(0,2*np.pi,200)fig = plt.figure(figsize=(12,9))# 使⽤切⽚⽅式设置⼦视图ax1 = plt.subplot(3,1,1) # 视图对象添加⼦视图ax1.plot(x,np.sin(10*x))# 设置ax1的标题,xlim、ylim、xlabel、ylabel等所有属性现原创 2021-01-06 17:13:10 · 999 阅读 · 0 评论 -
数据可视化之matplotlib实战:plt.pie()函数 绘制饼状图
import matplotlib as mplimport matplotlib.pyplot as pltimport numpy as np# 防止乱码mpl.rcParams["font.sans-serif"] = ["SimHei"]mpl.rcParams["axes.unicode_minus"] = Falsekinds = "简易箱","保温箱","行李箱","密封箱"colors = ["#e41a1c","#377eb8","#4daf4a","#984ea3"]原创 2020-10-03 12:01:45 · 2930 阅读 · 0 评论 -
数据可视化之matplotlib实战:plt.hist()函数 绘制直方图
import matplotlib as mplimport matplotlib.pyplot as pltimport numpy as np# 防止乱码mpl.rcParams["font.sans-serif"] = ["SimHei"]mpl.rcParams["axes.unicode_minus"] = False# 生成数据boxWeight = np.random.randint(0,10,1000)x = boxWeight# plot histogrambi原创 2020-10-03 11:45:03 · 4104 阅读 · 0 评论 -
Matplotlib设置图例
import numpy as npimport matplotlib.pyplot as plt# 1、图形绘制x = np.linspace(0,2*np.pi) # x轴# y轴y = np.sin(x) # 正弦# 绘制线形图# 调整尺⼨plt.figure(figsize=(9,6))plt.plot(x,y)# 2、图例plt.plot(x,np.cos(x)) # 余弦波plt.legend(['Sin','Cos'],fontsize = 18,loc = 'ce原创 2021-01-06 17:08:34 · 615 阅读 · 0 评论 -
数据可视化之matplotlib实战:plt.bar() barh()函数 绘制堆积图
import matplotlib as mplimport matplotlib.pyplot as pltimport numpy as np# 防止乱码mpl.rcParams["font.sans-serif"] = ["SimHei"]mpl.rcParams["axes.unicode_minus"] = False# 生成数据x = [1,2,3,4,5]y1 = [6,10,4,5,1]y2 = [2,6,3,8,5]# 生成堆积柱状图plt.bar(x,y1,a原创 2020-10-04 11:22:58 · 3541 阅读 · 1 评论 -
数据可视化之matplotlib实战:plt.axvspan() axhspan()函数 绘制水平or垂直参考区域
import matplotlib.pyplot as pltimport numpy as npx = np.linspace(0.05,10,1000)y = np.sin(x)plt.plot(x,y,ls="-.",lw=2,c="c",label="plot figure")plt.legend()plt.axvspan(xmin=4.0,xmax=6.0,facecolor="y",alpha=0.3)plt.axhspan(ymin=0.0,ymax=0.5,faceco原创 2020-10-03 10:49:38 · 1776 阅读 · 0 评论 -
数据可视化之matplotlib实战:plt.bar()函数 绘制柱状图
import matplotlib as mplimport matplotlib.pyplot as plt# 防止乱码mpl.rcParams["font.sans-serif"] = ["SimHei"]mpl.rcParams["axes.unicode_minus"] = False# 生产数据x = [1,2,3,4,5,6,7,8]y = [3,1,4,5,7,9,7,2]# 生产柱状图plt.bar(x,y,align="center",color="c",tick_原创 2020-10-03 11:30:23 · 4053 阅读 · 0 评论 -
Matplotlib添加注释
import numpy as npimport matplotlib.pyplot as pltfig, ax = plt.subplots()x = np.arange(0.0, 5.0, 0.01)y = np.cos(2*np.pi*x)line, = ax.plot(x,y,lw=2)ax.annotate('local max', # ⽂本内容xy=(2, 1), # 箭头指向位置xytext=(3, 1.5), # ⽂本位置arrowprops=dict(facecolo原创 2021-01-06 17:15:27 · 1634 阅读 · 1 评论 -
Matplotlib绘制堆叠柱状图
import numpy as npimport matplotlib.pyplot as pltlabels = ['G1', 'G2', 'G3', 'G4', 'G5','G6'] # 级别men_means = np.random.randint(20,35,size = 6)women_means = np.random.randint(20,35,size = 6)men_std = np.random.randint(1,7,size = 6)women_std = np.ran原创 2021-01-06 17:17:06 · 5108 阅读 · 0 评论 -
Matplotlib图表嵌套
import numpy as npimport matplotlib.pyplot as pltx = np.linspace(-np.pi,np.pi,25)y = np.sin(x)fig = plt.figure(figsize=(9,6)) # 创建视图plt.plot(x,y)# 嵌套⽅式⼀,axes轴域(横纵坐标范围),⼦视图ax = plt.axes([0.2,0.55,0.3,0.3]) # 参数含义[left, bottom, width, height]ax.plo原创 2021-01-06 17:11:45 · 357 阅读 · 0 评论