熊猫压缩怎么使用_您应该可以使用熊猫库做的几件事

最新推荐文章于 2022-01-07 19:30:19 发布

weixin_26730921

最新推荐文章于 2022-01-07 19:30:19 发布

阅读量351

点赞数

文章标签： python

原文链接：https://medium.com/swlh/few-things-you-should-be-able-to-do-with-the-pandas-library-29354e350d1e

版权

熊猫压缩怎么使用

What is Pandas?Pandas is an open-source data analysis library for providing easy-to-use data structures and data analysis tools. It is dependent on other libraries like Numpy(Read about “Few Things You Should Be Able to Do With the Numpy Library”) and has optional dependencies.

什么是熊猫？ Pandas是一个开源数据分析库，用于提供易于使用的数据结构和数据分析工具。它依赖于其他库，例如Numpy(请阅读“ 您应该能够使用Numpy库完成的一些事情 ” )，并且具有可选的依赖项。

What pandas has to offer?- It has fast and efficient DataFrame object.- It allows efficient data wrangling.- It has high performance in data cleaning and data scraping.- Missing data can be handled efficiently with the use of Pandas.- It allows grouping by data for aggregation and transformations.- It performs very well in merging and joining of data.- It has time series functionality.

熊猫提供什么？ -- 它具有快速高效的DataFrame对象。-它允许有效的数据整理。-它在数据清理和数据抓取方面具有高性能。-使用Pandas可以有效地处理丢失的数据。-它允许按数据分组以进行聚合和转换.-在合并和连接数据时表现非常好。-具有时间序列功能。

How to install Pandas?The easiest way to get Pandas set up is to install it through a package like the Anaconda distribution which is a cross platform distribution for data analysis and scientific computing or using pip as seen below.

如何安装熊猫？ 设置Pandas的最简单方法是通过Anaconda发行版之类的软件包进行安装，该软件包是用于数据分析和科学计算的跨平台发行版，或者使用如下所示的pip。

pip install pandas

Data structures of PandasSome of the data structures available with Pandas are:

熊猫的数据结构熊猫可用的一些数据结构是：

Series: A series is a 1-dimensional labelled array that can hold data of any type. Its axis labels are collectively called an index.

系列：系列是一维标记的数组，可以保存任何类型的数据。它的轴标签统称为索引。

Data frames: A data frame is a 2-dimensional labelled data structure with columns

数据帧 ：数据帧是带有列的二维标记数据结构

Panels: is a 3-dimensional container of data.

面板：是3维数据容器。

Using PandasTo use pandas, it has to be imported in our coding environment. This is done conventionally with the following command:

使用熊猫要使用熊猫，必须将其导入我们的编码环境中。通常，这是通过以下命令完成的：

import pandas as pd

What are Dataframes?Dataframe is a table. It has rows and columns. Each column in a dataframe is Series object, rows consist of elements inside Series. It can be constructed using dictionary with keys and values.

什么是数据框？ 数据框是一个表。它具有行和列。数据框中的每一列都是Series对象，行由Series内部的元素组成。可以使用带有键和值的字典来构造它。

It is illustrated below.

如下图所示。

df = pd.DataFrame({'hobbies': ['Playing piano', 'singing', 'movies'], 'likes': ['reading', 'writing', 'learning'], 'dislikes': ['laziness', 'lateness', 'dishonesty']})

To form series, use:

要形成系列，请使用：

data = pd.Series(['promise', 'ogooluwa', 'daniel'])

To read CSV files, use:

要读取CSV文件，请使用：

df = pd.read_csv('filename.csv')

Dataframe operations

数据框操作

Some dataframe operations is illustrated in the code snippets below

下面的代码段说明了一些数据框操作

#Importing pandasimport pandas as pd #Creating a dataframe with a dictionarydf = pd.DataFrame({'hobbies': ['Playing piano', 'coding', 'movies'], 'likes': ['reading', 'writing', 'learning'], 'dislikes': ['laziness', 'lateness', 'cheating']})
#To check data types of data frames, use the below command:df.dtypes#To check the origin of a dataframe from the pandas library, use:type(df)#To see columns of a dataframe, use:df.columns#To create dataframe specifying the columns and index.. You can use #as illustrated below:data = [['Playing piano', 'singing', 'movies'], ['reading', 'writing', 'learning'], ['laziness', 'lateness', 'dishonesty']]df1 = pd.DataFrame(data, columns=['Promise', 'Michael', 'Gloria'], index=['hobbies', 'likes', 'dislikes'])#Indexing
#To explicitly locate an element i.e to specifically locate an element, use:df1.loc[['likes']]#To implicitly locate an element, use:df1.iloc[[1]]

Filtering Data framesThe illustration to show filtering dataframes is seen in the below code snippets:

过滤数据帧在以下代码段中可以看到过滤数据帧的图示：

#Importing pandasimport pandas as pd #Data frame creationdata = [['Playing piano', 'singing', 'movies'], ['reading', 'writing', 'learning'], ['laziness', 'lateness', 'dishonesty']]df1 = pd.DataFrame(data, columns=['Promise', 'Michael', 'Gloria'], index=['hobbies', 'likes', 'dislikes'])#To transpose, use:df1.T#To filter with conditions, use this command:#df1[condition]

Arithmetic operationData frames take arithmetic operations row by columns of each frame being operated.Other operations is shown in the snippet below:

算术运算数据帧对每个要运算的帧逐行进行算术运算，其他操作如下面的代码段所示：

#Importing pandasimport pandas as pd 
import numpy as np#Data frame creationdata = [['Playing piano', 'singing', 'movies'], ['reading', 'writing', 'learning'], ['laziness', 'lateness', 'dishonesty']]df1 = pd.DataFrame(data, columns=['Promise', 'Michael', 'Gloria'], index=['hobbies', 'likes', 'dislikes'])#To give an overview of what is in the dataframe, use:df1.describe()#To sum what is in a dataframe, use:df1.sum()#To represent with a null using numpy, use:np.NaN

Data cleaningNA is referred to as missing values in Pandas. It is used for simplicity and performance reasons. Data cleaning is a process of preparing data for analysis.This can be illustrated in the following code snippets:

数据清理 NA在熊猫中称为缺失值。出于简化和性能的原因使用它。数据清理是准备数据进行分析的过程，可以在以下代码段中进行说明：

# Importing pandasimport pandas as pd  
import numpy as np# Data series creationdata = pd.Series(['singing', 'eating', 'lying'])#Data frame creationdf1 = pd.DataFrame(data, columns=['Promise', 'Michael', 'Gloria'], index=['hobbies', 'likes', 'dislikes'])# Removing Nan, use:data.dropNa()# To get true or false for not null values, use:data.notnull()
data[data.notnull()]# The above the same asdata.dropNa()#drops all rows with Nan (You could add axis)data.dropNa(how="all")#To create random dataframe with 4 rows and 2 columnspd.DataFrame(np.random.rand(4, 2))#To drop Na where na appears 2 or more than 2 times in df1df1.dropna(thresh=2) #Forward fill i.e fills all Na with data just before the data. To forward fill, use:df1.fillna(method='ffill')#To fill with a limit, use:df1.fillna(method="ffill", limit=2) #Won't fill more than 2 elements
#IT IS OFTEN BETTER TO FILL WITH MEAN THAN TO FILL WITH JUST DATA

Data WranglingTo manipulate data, we could:- Group by join- Combine- Pivot- Melt- Reshape

数据整理要操作数据，我们可以：-按连接分组-组合-枢轴-熔化-重塑

This is illustrated below:

如下图所示：

# Importing pandasimport pandas as pd  
# To melt - To make data come down from the top. To do, use:melted = pd.melt(dataframe, ['key'])# Pivot - To reshape meltedmelted.pivot('key', 'variable', 'value')# To group by data, use:grouped = dataframe['data'].groupby(dataframe['key'])#To view what you've grouped and evaluated in a table, use:grouped.unstack()

Joins and Unions

加盟与工会

# Importing pandasimport pandas as pd  df1 = pd.DataFrame({'hobbies': ['Playing piano', 'singing', 'movies'], 'likes': ['reading', 'writing', 'learning'], 'dislikes': ['laziness', 'lateness', 'dishonesty']})df2 = pd.DataFrame(data, columns=['Promise', 'Michael', 'Gloria'], index=['hobbies', 'likes', 'dislikes'])
data = [['Playing piano', 'singing', 'movies'], ['reading', 'writing', 'learning'], ['laziness', 'lateness', 'dishonesty']]# To join - To intersect two dataframes. To do, use:pd.merge(df1, df2)#Note : Make sure df2 has unique values before joining
#To join on a particular key, use:pd.merge(df1, df2, on="key")#To join in another waypd.merge(df1, df2, left_on='name', right_on="name")#To outer join i.e to join and put 'Nan" wherever a key is not includedpd.merge(df1,df2, how = "outer")#To concat, where s1, s2, s3 are series, use:pd.concat([s1, s2, s3])

Date and TimeTo use date and time, we use:

日期和时间要使用日期和时间，我们使用：

from datetime import datetime, date, time

Operations of date and time is seen below:

日期和时间的操作如下所示：

# importing required librariesfrom datetime import datetime, date, time
import pandas as pd#Datetime object dt = datetime(2019, 11, 25, 11, 36, 00, 00)
dt.day#To stringify datetime object to specified formatdt.strftime('%d/%m/%Y %H:%M')#To perform difference in timedifference = datetime(2019, 1, 7) + datetime(2018, 6, 24, 8, 15)
difference.days
stamp = datetime(2019, 1, 3)
print (str(stamp))
print(stamp.strftime('%Y-%m-%d'))#To strip time from any format to datetime format, use:value = '19-January-03'
datetime.strptime(value, '%y-%B-%d')#This gives the best guess of a stripped time
from dateutil.parser import parseparse('2011-January-03')#To create random series time seriests = pd.Series(np.random.randn(5), index=pd.date_range('1/1/2000', periods=5, freq='Q'))#Shifts first two time down:ts.shift(2)#Shifts first two time up:ts.shift(-2)

翻译自: https://medium.com/swlh/few-things-you-should-be-able-to-do-with-the-pandas-library-29354e350d1e

熊猫压缩怎么使用

weixin_26730921

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
熊猫压缩怎么使用_您应该可以使用熊猫库做的几件事

熊猫压缩怎么使用What is Pandas?Pandas is an open-source data analysis library for providing easy-to-use data structures and data analysis tools. It is dependent on other libraries like Numpy(Read about “Few T...
复制链接

扫一扫