使用iloc,loc和ix在Pandas DataFrames中选择行和列

使用iloc,loc和ix在Pandas DataFrames中选择行和列

Pandas数据选择

多种方法可以Pandas DataFrames中选择和索引行和列。我发现在线教程侧重于行和列选择的高级选择,这对我的要求有点复杂。

选择选项

在Pandas中实现选择和索引活动有三个主要选项,这可能会令人困惑。本文涉及的三个选择案例和方法是:

  1. 按行号选择数据(.iloc)
  2. 按标签或条件语句选择数据(.loc)
  3. 选择混合方法(.ix)(现在在Pandas 0.20.1中弃用)

数据设置

此博客文章受其他教程的启发,描述了使用这些操作的选择活动。本教程适用于一般数据科学情况,通常我发现自己:

  1. 数据框中的每一行代表一个数据样本。
  2. 每列都是一个变量,通常以其命名。我很少选择没有名字的专栏。
  3. 我需要快速并经常从数据框中选择相关的行以进行建模和可视化活动。

对于初学者来说,Python 的Pandas库提供了高性能,易于使用的数据结构和数据分析工具,用于处理“系列”和“数据框架”中的表格数据。它使您的数据处理更加轻松,我之前写过关于  使用Pandas对数据进行分组和汇总的文章

使用pandas使用行和列的两个主要参数来实现iloc和loc索引iloc and loc indexing is achieved with pandas using two main arguments for rows and columns
本博客文章中讨论的iloc和loc方法摘要。iloc和loc是从Pandas数据帧中检索数据的操作。

Pandas DataFrames的选择和索引方法

对于这些探索,我们需要一些样本数据 - 我从www.briandunning.com下载了uk-500样本数据集。此数据包含虚构英国字符的人工名称,地址,公司和电话号码。要继续,您可以在此处下载.csv文件  。加载如下数据(图表这里来自Jupyter笔记本  在蟒蛇Python的安装):

pandas 导入 pd
随机导入
从下载的CSV文件中读取数据。
data = pd.read_csv(' https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv '
设置一个数字id,用作示例的索引。
数据[ ' ID ' ] = [random.randint(01000 X 范围(data.shape [ 0 ])]
data.head(5
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/3324456b1a22eae86fce40bdd744f102/raw/2dd4ceaff99b0118fb72ac54fbc5630f66132b76/Pandas%20Index%20-%20Loading%20Data.py" style="float:right"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">查看原始</font></font></a>
    <a href="https://gist.github.com/shanealynn/3324456b1a22eae86fce40bdd744f102#file-pandas-index-loading-data-py"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Pandas索引 - 加载</font></font></a><font style="vertical-align: inherit;"></font><img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">由</font><a href="https://github.com"><font style="vertical-align: inherit;">GitHub</font></a><font style="vertical-align: inherit;"> 
    托管的</font><a href="https://gist.github.com/shanealynn/3324456b1a22eae86fce40bdd744f102#file-pandas-index-loading-data-py"><font style="vertical-align: inherit;">Data.py</font></a></font><a href="https://github.com"><font style="vertical-align: inherit;"></font></a>
  </div>
</div>
View the code on Gist.
pandas iloc loc和ix索引示例的示例数据。example data for pandas iloc loc and ix indexing examples.
从CSV文件加载的示例数据。
使用iloc和DataFrame进行单选
行:
data.iloc [ 0 ] 第一行数据帧(Aleshia Tomkiewicz) - 注意一个Series数据类型输出。
data.iloc [ 1 ] 第二行数据框(Evan Zigomalas)
data.iloc [ - 1 ] 最后一行数据帧(Mi Richan)
列:
data.iloc [:,0 ] 数据帧的第一列(first_name)
data.iloc [:,1 ] 数据帧的第二列(last_name)
data.iloc [:,- 1 ] 数据帧的最后一列(id)
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/1efd0555a0a668b5f0e3f5fa5593c673/raw/c68229b34a17f2179b28a08a51d0b1fce9b80738/Pandas%20Index%20-%20Single%20iloc%20selections.py" style="float:right"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">查看原始</font></font></a>
    <a href="https://gist.github.com/shanealynn/1efd0555a0a668b5f0e3f5fa5593c673#file-pandas-index-single-iloc-selections-py"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Pandas索引 -</font></font></a><font style="vertical-align: inherit;"></font><img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">由</font><a href="https://github.com"><font style="vertical-align: inherit;">GitHub</font></a><font style="vertical-align: inherit;"> 
    托管的</font><a href="https://gist.github.com/shanealynn/1efd0555a0a668b5f0e3f5fa5593c673#file-pandas-index-single-iloc-selections-py"><font style="vertical-align: inherit;">单个iloc selections.py</font></a></font><a href="https://github.com"><font style="vertical-align: inherit;"></font></a>
  </div>
</div>
View the code on Gist.

可以使用.iloc索引器一起选择多个列和行。

使用iloc和DataFrame进行多行和列选择
data.iloc [ 05 ] 前五行数据帧
data.iloc [:,02 ] 包含所有行的前两列数据帧
data.iloc [[ 03624 ],[ 056 ]] 第一,第四,第七,第25行+第一第六第七列。
data.iloc [ 0558 ] 前5行和第五,第六,数据帧的第七列(县- > PHONE1)。
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/3cd4c410ca1514dd9575443ab1b08a06/raw/ef0b34bed66d27ad60cb3960cd688b88069849d9/Pandas%20Index%20-%20Multi%20iloc%20selections.py" style="float:right"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">查看原始</font></font></a>
    <a href="https://gist.github.com/shanealynn/3cd4c410ca1514dd9575443ab1b08a06#file-pandas-index-multi-iloc-selections-py"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Pandas索引 -</font></font></a><font style="vertical-align: inherit;"></font><img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">由</font><a href="https://github.com"><font style="vertical-align: inherit;">GitHub</font></a><font style="vertical-align: inherit;"> 
    托管的</font><a href="https://gist.github.com/shanealynn/3cd4c410ca1514dd9575443ab1b08a06#file-pandas-index-multi-iloc-selections-py"><font style="vertical-align: inherit;">多iloc selections.py</font></a></font><a href="https://github.com"><font style="vertical-align: inherit;"></font></a>
  </div>
</div>
View the code on Gist.

以这种方式使用iloc时要记住两个问题:

  1. 请注意,.iloc在选择一行时返回Pandas系列,在选择多行时返回Pandas DataFrame,或者如果选择了任何完整列。要解决此问题,请在需要DataFrame输出时传递单值列表。

    使用.loc或.iloc时,可以通过将列表或单个值传递给选择器来控制输出格式。When using .loc, or .iloc, you can control the output format by passing lists or single values to the selectors.
    使用.loc或.iloc时,可以通过将列表或单个值传递给选择器来控制输出格式。

  2. 当以这种方式选择多列或多行时,请记住在您的选择中,例如[1:5],所选的行/列将从第一个数字运行到  一个减去第二个数字。例如[1:5]将是1,2,3,4。,[x,y]从x到y-1。

在实践中,我很少使用iloc索引器,除非我想要数据帧的第一行(.iloc [0])或最后一行(.iloc [-1])。

2.使用“loc”选择pandas数据

Pandas loc索引器可以与DataFrames一起用于两种不同的用例:

loc索引器的使用方法与iloc相同:data.loc [<行选择>,<列选择>]。

2A。使用.loc进行基于标签/基于索引的索引

使用loc方法的选择基于数据帧的索引(如果有的话)。使用<code> df.set_index()</ code>在DataFrame上设置索引的位置,.loc方法直接根据任何行的索引值进行选择。例如,将测试数据框的索引设置为人员“last_name”:

data.set_index( 姓氏就地=
data.head()
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/12069e026b30f15d043193bdc5032846/raw/b84289c9399284b130df950aa23713733e027cf2/Pandas%20Index%20-%20Setting%20index%20for%20iloc.py" style="float:right"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">查看原始</font></font></a>
    <a href="https://gist.github.com/shanealynn/12069e026b30f15d043193bdc5032846#file-pandas-index-setting-index-for-iloc-py"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Pandas索引 - 设置</font></font></a><font style="vertical-align: inherit;"></font><img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">由</font><a href="https://github.com"><font style="vertical-align: inherit;">GitHub</font></a><font style="vertical-align: inherit;"> 
    托管</font><a href="https://gist.github.com/shanealynn/12069e026b30f15d043193bdc5032846#file-pandas-index-setting-index-for-iloc-py"><font style="vertical-align: inherit;">的iloc.py的索引</font></a></font><a href="https://github.com"><font style="vertical-align: inherit;"></font></a>
  </div>
</div>
View the code on Gist.

Pandas Dataframe,索引设置为.set_index(),用于.loc []解释。Pandas Dataframe with index set using .set_index() for .loc[] explanation.
姓氏设置为样本数据框上的索引集
现在使用索引集,我们可以使用.loc [<label>]直接为不同的“last_name”值选择行 - 单独或以倍数。例如:

pandas使用.loc进行数据帧中基于标签的查找.loc is used by pandas for label based lookups in dataframes
使用带有pandas的.loc索引选择选择单行或多行。请注意,第一个示例返回一个系列,第二个示例返回一个DataFrame。您可以通过将单个元素列表传递给.loc操作来实现单列DataFrame。

使用列的名称选择带有.loc的列。在我的大多数数据工作中,通常我已经命名了列,并使用这些命名选择。

在pandas .loc中按名称选择列selecting columns by name in pandas .loc
使用.loc索引器时,列使用字符串列表或“:”切片通过名称引用。

您可以选择索引标签的范围 - 选择</ code> data.loc ['Bruch':'Julio'] </ code>将返回“Bruch”和“Julio”的索引条目之间数据框中的所有行。以下示例现在应该有意义:

# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'
data.loc[['Andrade', 'Veness'], 'city':'email']
# Select same rows, with just 'first_name', 'address' and 'city' columns
data.loc['Andrade':'Veness', ['first_name', 'address', 'city']]
# Change the index to be based on the 'id' column
data.set_index('id', inplace=True)
# select the row with 'id' = 487
data.loc[487]
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/b27fbdce4688f06108a32c767b65fc3f/raw/c9020c045d7e2ccf5a6201b537423912b52cf350/Pandas%20Index%20-%20Select%20rows%20with%20loc.py" style="float:right">view raw</a>
    <a href="https://gist.github.com/shanealynn/b27fbdce4688f06108a32c767b65fc3f#file-pandas-index-select-rows-with-loc-py">Pandas Index - Select rows with loc.py</a>
    hosted with <img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"> by <a href="https://github.com">GitHub</a>
  </div>
</div>
View the code on Gist.

Note that in the last example, data.loc[487] (the row with index value 487) is not equal to data.iloc[487] (the 487th row in the data). The index of the DataFrame can be out of numeric order, and/or a string or multi-value.

2b. Boolean / Logical indexing using .loc

Conditional selections with boolean arrays using data.loc[<selection>] is the most common method that I use with Pandas DataFrames. With boolean indexing or logical selection, you pass an array or Series of True/False values to the .loc indexer to select the rows where your Series has True values.

In most use cases, you will make selections based on the values of different columns in your data set.

For example, the statement data[‘first_name’] == ‘Antonio’] produces a Pandas Series with a True/False value for every row in the ‘data’ DataFrame, where there are “True” values for the rows where the first_name is “Antonio”. These type of boolean arrays can be passed directly to the .loc indexer as so:

.loc索引器可以接受布尔数组来选择行The .loc indexer can accept boolean arrays to select rows
Using a boolean True/False series to select rows in a pandas data frame – all rows with first name of “Antonio” are selected.

As before, a second argument can be passed to .loc to select particular columns out of the data frame. Again, columns are referred to by name for the loc indexer and can be a single string, a list of columns, or a slice “:” operation.

使用.loc的多列选择示例Multiple column selection example using .loc
Selecting multiple columns with loc can be achieved by passing column names to the second argument of .loc[]
请注意,在选择列时,如果仅选择了一列,则.loc运算符将返回一个Series。对于单列DataFrame,使用单元素列表来保留DataFrame格式,例如:

.loc根据选择返回Series或DataFrames.loc returning Series or DataFrames depending on selection
如果将单个列的选择作为字符串,则从.loc返回一个系列。传递一个列表以获取DataFrame。

为清晰起见,请确保您了解.loc选项的以下附加示例:

选择名为Antonio,#的行以及'city'和'email'之间的所有列
data.loc [data [ ' first_name ' ] == ' Antonio '' city '' email ' ]
# Select rows where the email column ends with 'hotmail.com', include all columns
data.loc[data['email'].str.endswith("hotmail.com")]
# Select rows with last_name equal to some values, all columns
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]
# Select rows with first name Antonio AND hotmail email addresses
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')]
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']]
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)]
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
data.loc[idx, ['email', 'first_name', 'company']]
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/b34acd07fdaba4f220f4e03c6f902a9f/raw/565aa819a584b4dafb0f9d6e9cbf40a228152e05/Pandas%20index%20-%20loc%20selection%20examples.py" style="float:right">view raw</a>
    <a href="https://gist.github.com/shanealynn/b34acd07fdaba4f220f4e03c6f902a9f#file-pandas-index-loc-selection-examples-py">Pandas index - loc selection examples.py</a>
    hosted with <img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"> by <a href="https://github.com">GitHub</a>
  </div>
</div>
View the code on Gist.

Logical selections and boolean Series can also be passed to the generic [] indexer of a pandas DataFrame and will give the same results: data.loc[data[‘id’] == 9] == data[data[‘id’] == 9] .

3. Selecting pandas data using ix

Note: The ix indexer has been deprecated in recent versions of Pandas, starting with version 0.20.1.

The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer. This only works where the index of the DataFrame is not integer based. ix will accept any of the inputs of .loc and .iloc.

Slightly more complex, I prefer to explicitly use .iloc and .loc to avoid unexpected results.

As an example:

# ix indexing works just the same as .loc when passed strings
data.ix[['Andrade']] == data.loc[['Andrade']]
# ix indexing works the same as .iloc when passed integers.
data.ix[[33]] == data.iloc[[33]]
# ix only works in both modes when the index of the DataFrame is NOT an integer itself.
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/2c61235a30cbe1dff15d81b90c216cfe/raw/5e31330041dcecfffa4f4552a86aa46b5d427bf0/Pandas%20index%20-%20ix%20selections.py" style="float:right">view raw</a>
    <a href="https://gist.github.com/shanealynn/2c61235a30cbe1dff15d81b90c216cfe#file-pandas-index-ix-selections-py">Pandas index - ix selections.py</a>
    hosted with <img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"> by <a href="https://github.com">GitHub</a>
  </div>
</div>
View the code on Gist.

Setting values in DataFrames using .loc

With a slight change of syntax, you can actually update your DataFrame in the same statement as you select and filter using .loc indexer. This particular pattern allows you to update values in columns depending on different conditions. The setting operation does not make a copy of the data frame, but edits the original data.

As an example:

# Change the first name of all rows with an ID greater than 2000 to "John"
data.loc[data['id'] > 2000, "first_name"] = "John"
# Change the first name of all rows with an ID greater than 2000 to "John"
data.loc[data['id'] > 2000, "first_name"] = "John"
  </div>
  <div class="gist-meta">
    <a href="https://gist.github.com/shanealynn/1c30da11d88cba2ecdf3936209a981b9/raw/dc82d5f8b119343bfe17849f4ac97aebd82b5eb0/Pandas%20index%20-%20changing%20data%20with%20loc.py" style="float:right">view raw</a>
    <a href="https://gist.github.com/shanealynn/1c30da11d88cba2ecdf3936209a981b9#file-pandas-index-changing-data-with-loc-py">Pandas index - changing data with loc.py</a>
    hosted with <img draggable="false" class="emoji" alt="❤" src="https://s.w.org/images/core/emoji/12.0.0-1/svg/2764.svg" scale="0"> by <a href="https://github.com">GitHub</a>
  </div>
</div>
View the code on Gist.

That’s the basics of indexing and selecting with Pandas. If you’re looking for more, take a look at the .iat, and .at operations for some more performance-enhanced value accessors in the Pandas Documentation and take a look at selecting by callable functions for more iloc and loc fun.

The Pandas DataFrame - this blog post covers the basics of loading, editing, and viewing data in Python, and getting to grips with the all-important data structure in Python - the Pandas Dataframe. Learn by example to load CSV files, rename columns, extract statistics, and select rows and columns." rel=“nofollow” data-origin=“643” data-position=“0”>

The Pandas DataFrame - this blog post covers the basics of loading, editing, and viewing data in Python, and getting to grips with the all-important data structure in Python - the Pandas Dataframe. Learn by example to load CSV files, rename columns, extract statistics, and select rows and columns." rel=“nofollow” data-origin=“643” data-position=“0”>The Pandas DataFrame - loading, editing, and viewing data in Python

Pandas - Python Data Analysis Library I’ve recently started using Python’s excellent Pandas library as a data analysis tool, and, while finding the transition from R’s excellent data.table library frustrating at times, I’m finding my way around and finding most things work quite well. One aspect that I’ve recently been…" rel=“nofollow” data-origin=“643” data-position=“1”>

Pandas - Python Data Analysis Library I’ve recently started using Python’s excellent Pandas library as a data analysis tool, and, while finding the transition from R’s excellent data.table library frustrating at times, I’m finding my way around and finding most things work quite well. One aspect that I’ve recently been…" rel=“nofollow” data-origin=“643” data-position=“1”>Summarising, Aggregating, and Grouping data in Python Pandas

Merging and Joining data sets are key activities of any data scientist or analyst. In this tutorial, we explore the process of combining datasets based on common columns quickly and easily with the Python Pandas library and it’s fast merge() functionality. Finally conquer merging and become a master with this…" rel=“nofollow” data-origin=“643” data-position=“2”>

Merging and Joining data sets are key activities of any data scientist or analyst. In this tutorial, we explore the process of combining datasets based on common columns quickly and easily with the Python Pandas library and it’s fast merge() functionality. Finally conquer merging and become a master with this…" rel=“nofollow” data-origin=“643” data-position=“2”>Merge and Join DataFrames with Pandas in Python


78 thoughts on “Using iloc, loc, & ix to select rows and columns in Pandas DataFrames”

  1. Pingback:在Python中实现l’algoritmo KNN scikit-learn | 洛伦佐·戈沃尼

  2. 对初学者来说真的很有帮助。非常详细。在pandas和python上寻找更多关于你的博客。

发表评论 Cancel reply

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值