panda 函数笔记(merge\DataFrame用法\DataFrame.plot)

最新推荐文章于 2024-06-12 21:10:34 发布

菜鸟知识搬运工

最新推荐文章于 2024-06-12 21:10:34 发布

阅读量2.5k

点赞数

分类专栏： Python学习文章标签： Pandas

本文链接：https://blog.csdn.net/qq_30815237/article/details/87893578

版权

Python学习专栏收录该内容

28 篇文章 37 订阅

订阅专栏

1、merge( ) 2、DataFrame用法 2.1、创建一个DataFrame: 2.2、定位DataFrame里的元素

2.3、csv文件读写read_ csv/to_csv 2.4、关于len() 2.5关于set()

2.6遍历 DataFrame 中的数据 3、DataFrame.plot 画图 4、Series属性及方法

5、Panda read_csv()把第一行的数据变成了列名 6、将Dataframe数据转化为ndarry数据

1、merge( )

merge( )合并需要指定连接键。

参见：https://blog.csdn.net/starter_____/article/details/79198137

2、DataFrame用法

详情参考：https://blog.csdn.net/cymy001/article/details/78275886#infodescribeheadtail_261

2.1、创建一个DataFrame:

1）用字典dict，字典值value是列表list
2）用Series构建DataFrame
3）用一个字典构成的列表list of dicts来构建DataFrame

2.2、定位DataFrame里的元素

1）利用表达式boolean定位
2）利用loc，iloc，ix函数定位，可以定位数字,就可以赋值

loc函数：通过行索引 "Index" 中的具体值来取行数据（如取"Index"为"A"的行,如data.loc['A']）

iloc函数：通过行号来取行数据（如取第二行的数据，如data.iloc[1]）

house_data = house.iloc[:, :-1]           #[:-1]就是去除了的最后一列后剩下的部分。iloc根据位置索引
house_target = house.iloc[:, -1]         #最后一列

ix：通过行标签或者行号索引行数据（基于loc和iloc 的混合）

print(df.ix[1:4,1:3])  #用行号和列号做数据选择
             apts2  bonus
Chongqing  30000.0   2000
Guangzhou   7000.0   2000
Hangzhou       NaN   2000

注意区分“行号”和“行号索引：行号从”0“开始！！！

2.3、csv文件读写read_ csv/to_csv

import pandas
food_info = pandas.read_csv(file_name)# 返回一个DataFrame对象
n_rows = food_info.head(n) #获取前n行数据，返回的依旧是个DataFrame
column_names = food_info.columns   #获取所有的列名
dimensions = food_info.shape #获取数据的shape

默认情况下，使用pandas.read_csv()读取csv文件的时候，会默认将数据的第一行当做列标签，还会为每一行添加一个行标签。我们可以使用这些标签来访问DataFrame中的数据

1、使用DataFrame.dtypes获取每列的数据类型
2、使用DataFrame[indices]获取列数据。

如果只是获取一行数据的话，返回Series

如何选择一行数据
data = food_info.loc[0] #使用loc[n]获取第n行数据，如果只是获取一行数据的话，返回Series
#如何选择多行呢，和numpy的语法是一样的
datas = food_info.loc[1:2] #返回DataFrame,返回行标签为1,2的，这个和numpy不一样，而且loc[]也不支持-n

data = food_info.loc[i][j] # i-th row, j-th column

2.4、关于len()

len(data_frame) # 是data_frame的行数
len(data_frame.loc[0]) #是data_frame的列数

2.5关于set()

set(data_frame) # 返回column name
set(data_frame["column1"]) # 返回第一列中不重复的值
set(data_frame.loc[0]) #返回第一行中不重复的值

2.6遍历 DataFrame 中的数据

# 使用 for 是不可行的！！！！！！！！！！！！！！！！！！！！！
for data in data_frame:
    pass

# 使用 len 和 loc
for i in range(len(data_frame)):
    cur_data = data_frame.loc[i]

# 使用 .iteriterms()
for i, series in df.iteritems():
    print(i, ":", type(series))

from：https://blog.csdn.net/u012436149/article/details/67109953

官网文档：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

3、`DataFrame`.plot 画图

DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwds)[source]

Make plots of DataFrame using matplotlib / pylab.

Parameters:

Parameters:	data : DataFrame x : label or position, default None y : label, position or list of label, positions, default None Allows plotting of one column versus another kind : str ‘line’ : line plot (default) ‘bar’ : vertical bar plot ‘barh’ : horizontal bar plot ‘hist’ : histogram ‘box’ : boxplot ‘kde’ : Kernel Density Estimation plot ‘density’ : same as ‘kde’ ‘area’ : area plot ‘pie’ : pie plot ‘scatter’ : scatter plot ‘hexbin’ : hexbin plot ax : matplotlib axes object, default None subplots : boolean, default False Make separate subplots for each column sharex : boolean, default True if ax is None else False In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax and sharex=True will alter all x axis labels for all axis in a figure! sharey : boolean, default False In case subplots=True, share y axis and set some y axis labels to invisible layout : tuple (optional) (rows, columns) for the layout of subplots figsize : a tuple (width, height) in inches use_index : boolean, default True Use index as ticks for x axis title : string or list Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot. grid : boolean, default None (matlab style default) Axis grid lines legend : False/True/’reverse’ Place legend on axis subplots style : list or dict matplotlib line style per column logx : boolean, default False Use log scaling on x axis logy : boolean, default False Use log scaling on y axis loglog : boolean, default False Use log scaling on both x and y axes xticks : sequence Values to use for the xticks yticks : sequence Values to use for the yticks xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None Rotation for ticks (xticks for vertical, yticks for horizontal plots) fontsize : int, default None Font size for xticks and yticks colormap : str or matplotlib colormap object, default None Colormap to select colors from. If string, load colormap with that name from matplotlib. colorbar : boolean, optional If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’ plots) position : float Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center) table : boolean, Series or DataFrame, default False If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table. yerr : DataFrame, Series, array-like, dict and str See Plotting with Error Bars for detail. xerr : same types as yerr. stacked : boolean, default False in line and bar plots, and True in area plot. If True, create stacked plot. sort_columns : boolean, default False Sort column names to determine plot ordering secondary_y : boolean or sequence, default False Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis mark_right : boolean, default True When using a secondary_y axis, automatically mark the column labels with “(right)” in the legend `kwds`** : keywords Options to pass to matplotlib plotting method
Returns:	axes : `matplotlib.axes.Axes` or numpy.ndarray of them

data : DataFrame

x : label or position, default None

y : label, position or list of label, positions, default None

Allows plotting of one column versus another

kind : str

‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot
‘scatter’ : scatter plot
‘hexbin’ : hexbin plot

ax : matplotlib axes object, default None

subplots : boolean, default False

Make separate subplots for each column

sharex : boolean, default True if ax is None else False

In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax and sharex=True will alter all x axis labels for all axis in a figure!

sharey : boolean, default False

In case subplots=True, share y axis and set some y axis labels to invisible

layout : tuple (optional)

(rows, columns) for the layout of subplots

figsize : a tuple (width, height) in inches

use_index : boolean, default True

Use index as ticks for x axis

title : string or list

Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.

grid : boolean, default None (matlab style default)

Axis grid lines

legend : False/True/’reverse’

Place legend on axis subplots

style : list or dict

matplotlib line style per column

logx : boolean, default False

Use log scaling on x axis

logy : boolean, default False

Use log scaling on y axis

loglog : boolean, default False

Use log scaling on both x and y axes

xticks : sequence

Values to use for the xticks

yticks : sequence

Values to use for the yticks

xlim : 2-tuple/list

ylim : 2-tuple/list

rot : int, default None

Rotation for ticks (xticks for vertical, yticks for horizontal plots)

fontsize : int, default None

Font size for xticks and yticks

colormap : str or matplotlib colormap object, default None

Colormap to select colors from. If string, load colormap with that name from matplotlib.

colorbar : boolean, optional

If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’ plots)

position : float

Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center)

table : boolean, Series or DataFrame, default False

If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.

yerr : DataFrame, Series, array-like, dict and str

See Plotting with Error Bars for detail.

xerr : same types as yerr.

stacked : boolean, default False in line and

bar plots, and True in area plot. If True, create stacked plot.

sort_columns : boolean, default False

Sort column names to determine plot ordering

secondary_y : boolean or sequence, default False

Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis

mark_right : boolean, default True

When using a secondary_y axis, automatically mark the column labels with “(right)” in the legend

`**kwds` : keywords

Options to pass to matplotlib plotting method

Returns:

axes : matplotlib.axes.Axes or numpy.ndarray of them

If kind = ‘scatter’ and the argument c is the name of a dataframe column, the values of that column are used to color each point；

Set the alpha value used for blending - not supported on all backends.范围0到1（0.0透明到1.0不透明）

例如：

df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
...                    [6.4, 3.2, 1], [5.9, 3.0, 2]],
...                   columns=['length', 'width', 'species'])
>>> ax1 = df.plot.scatter(x='length',
...                       y='width',
...                       c='DarkBlue')


也可写成：
ax1 = df.plot(kind="scatter"x='length',
...                       y='width',s=1
...                       c='DarkBlue')

结果：

4、Series属性及方法

Series是Pandas中最基本的对象，Series类似一种一维数组:

se1=Series([4,7,-2,8])
se1


结果：

0    4
1    7
2   -2
3    8
dtype: int64

se2=Series([4,7,-2,8],index=['b','c','a','d'])
se2

结果：

b    4
c    7
a   -2
d    8
dtype: int64

可通过Series的俩个属性values和index获取内容和索引:

se1.values
output：array([ 4,  7, -2,  8], dtype=int64)

se1.index

output： RangeIndex(start=0, stop=4, step=1)

数据类型Series和DataFrame区别：

1、Series.order()进行排序，而DataFrame则用sort或者sort_index

2、Series最重要的一个功能是：它在算术运算中会自动对齐不同索引的数据。

3、DataFrame相当于有表格，有行表头和列表头

4、可以将DataFrame的列获取为一个Series.返回的Series拥有原DataFrame相同的索引。行也可以通过位置或名称的方式进行获取

from：https://blog.csdn.net/u012474716/article/details/78550391

5、Panda read_csv()把第一行的数据变成了列名

有些时候，我们会遇到很多这样的数据，比如，这个csv的第一行并不是我们想象中的那样是一个列名。那样，我们处理数据的时候，就会出现问题，第一个不一致了嘛。

关于names这个参数上说到，当文件没有涵盖有header的话，那么你需要在header参数中明确指出！！

df = pd.read_csv('1.csv', header=None, Names=['test'])

这个没有列名的列就会被设置为test列~

from：https://blog.csdn.net/a19990412/article/details/80030142

6、将Dataframe数据转化为ndarry数据

df=df.values

菜鸟知识搬运工

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
panda 函数笔记(merge\DataFrame用法\DataFrame.plot)

1、merge( ) 2、DataFrame用法 2.1、创建一个DataFrame: 2.2、定位DataFrame里的元素2.3、csv文件读写read_ csv/to_csv 2.4、关于len() ...
复制链接

扫一扫

专栏目录