使用iloc,loc和ix在Pandas DataFrames中选择行和列

多种方法可以Pandas DataFrames中选择和索引行和列。我发现在线教程侧重于行和列选择的高级选择,这对我的要求有点复杂。



  1. 按行号选择数据(.iloc)
  2. 按标签或条件语句选择数据(.loc)
  3. 选择混合方法(.ix)(现在在Pandas 0.20.1中弃用)



  1. 数据框中的每一行代表一个数据样本。
  2. 每列都是一个变量,通常以其命名。我很少选择没有名字的专栏。
  3. 我需要快速并经常从数据框中选择相关的行以进行建模和可视化活动。

对于初学者来说,Python 的Pandas库提供了高性能,易于使用的数据结构和数据分析工具,用于处理“系列”和“数据框架”中的表格数据。它使您的数据处理更加轻松,我之前写过关于  使用Pandas对数据进行分组和汇总的文章

使用pandas使用行和列的两个主要参数来实现iloc和loc索引iloc and loc indexing is achieved with pandas using two main arguments for rows and columns

Pandas DataFrames的选择和索引方法

对于这些探索,我们需要一些样本数据 - 我从www.briandunning.com下载了uk-500样本数据集。此数据包含虚构英国字符的人工名称,地址,公司和电话号码。要继续,您可以在此处下载.csv文件  。加载如下数据(图表这里来自Jupyter笔记本  在蟒蛇Python的安装):

pandas 导入 pd
data = pd.read_csv(' https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv '
数据[ ' ID ' ] = [random.randint(01000 X 范围(data.shape [ 0 ])]
pandas iloc loc和ix索引示例的示例数据。example data for pandas iloc loc and ix indexing examples.
data.iloc [ 0 ] 第一行数据帧(Aleshia Tomkiewicz) - 注意一个Series数据类型输出。
data.iloc [ 1 ] 第二行数据框(Evan Zigomalas)
data.iloc [ - 1 ] 最后一行数据帧(Mi Richan)
data.iloc [:,0 ] 数据帧的第一列(first_name)
data.iloc [:,1 ] 数据帧的第二列(last_name)
data.iloc [:,- 1 ] 数据帧的最后一列(id)
data.iloc [ 05 ] 前五行数据帧
data.iloc [:,02 ] 包含所有行的前两列数据帧
data.iloc [[ 03624 ],[ 056 ]] 第一,第四,第七,第25行+第一第六第七列。
data.iloc [ 0558 ] 前5行和第五,第六,数据帧的第七列(县- > PHONE1)。
  1. 请注意,.iloc在选择一行时返回Pandas系列,在选择多行时返回Pandas DataFrame,或者如果选择了任何完整列。要解决此问题,请在需要DataFrame输出时传递单值列表。

    使用.loc或.iloc时,可以通过将列表或单个值传递给选择器来控制输出格式。When using .loc, or .iloc, you can control the output format by passing lists or single values to the selectors.

  2. 当以这种方式选择多列或多行时,请记住在您的选择中,例如[1:5],所选的行/列将从第一个数字运行到  一个减去第二个数字。例如[1:5]将是1,2,3,4。,[x,y]从x到y-1。

在实践中,我很少使用iloc索引器,除非我想要数据帧的第一行(.iloc [0])或最后一行(.iloc [-1])。


Pandas loc索引器可以与DataFrames一起用于两种不同的用例:

loc索引器的使用方法与iloc相同:data.loc [<行选择>,<列选择>]。


使用loc方法的选择基于数据帧的索引(如果有的话)。使用<code> df.set_index()</ code>在DataFrame上设置索引的位置,.loc方法直接根据任何行的索引值进行选择。例如,将测试数据框的索引设置为人员“last_name”:

data.set_index( 姓氏就地=
Pandas Dataframe,索引设置为.set_index(),用于.loc []解释。Pandas Dataframe with index set using .set_index() for .loc[] explanation.
现在使用索引集,我们可以使用.loc [<label>]直接为不同的“last_name”值选择行 - 单独或以倍数。例如:

pandas使用.loc进行数据帧中基于标签的查找.loc is used by pandas for label based lookups in dataframes


在pandas .loc中按名称选择列selecting columns by name in pandas .loc

您可以选择索引标签的范围 - 选择</ code> data.loc ['Bruch':'Julio'] </ code>将返回“Bruch”和“Julio”的索引条目之间数据框中的所有行。以下示例现在应该有意义:

# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'
data.loc[['Andrade', 'Veness'], 'city':'email']
# Select same rows, with just 'first_name', 'address' and 'city' columns
data.loc['Andrade':'Veness', ['first_name', 'address', 'city']]
# Change the index to be based on the 'id' column
data.set_index('id', inplace=True)
# select the row with 'id' = 487
Note that in the last example, data.loc[487] (the row with index value 487) is not equal to data.iloc[487] (the 487th row in the data). The index of the DataFrame can be out of numeric order, and/or a string or multi-value.

2b. Boolean / Logical indexing using .loc

Conditional selections with boolean arrays using data.loc[<selection>] is the most common method that I use with Pandas DataFrames. With boolean indexing or logical selection, you pass an array or Series of True/False values to the .loc indexer to select the rows where your Series has True values.

In most use cases, you will make selections based on the values of different columns in your data set.

For example, the statement data[‘first_name’] == ‘Antonio’] produces a Pandas Series with a True/False value for every row in the ‘data’ DataFrame, where there are “True” values for the rows where the first_name is “Antonio”. These type of boolean arrays can be passed directly to the .loc indexer as so:

.loc索引器可以接受布尔数组来选择行The .loc indexer can accept boolean arrays to select rows
Using a boolean True/False series to select rows in a pandas data frame – all rows with first name of “Antonio” are selected.

As before, a second argument can be passed to .loc to select particular columns out of the data frame. Again, columns are referred to by name for the loc indexer and can be a single string, a list of columns, or a slice “:” operation.

使用.loc的多列选择示例Multiple column selection example using .loc
Selecting multiple columns with loc can be achieved by passing column names to the second argument of .loc[]

.loc根据选择返回Series或DataFrames.loc returning Series or DataFrames depending on selection


data.loc [data [ ' first_name ' ] == ' Antonio '' city '' email ' ]
# Select rows where the email column ends with 'hotmail.com', include all columns
# Select rows with last_name equal to some values, all columns
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]
# Select rows with first name Antonio AND hotmail email addresses
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')]
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']]
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)]
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
data.loc[idx, ['email', 'first_name', 'company']]
Logical selections and boolean Series can also be passed to the generic [] indexer of a pandas DataFrame and will give the same results: data.loc[data[‘id’] == 9] == data[data[‘id’] == 9] .

3. Selecting pandas data using ix

Note: The ix indexer has been deprecated in recent versions of Pandas, starting with version 0.20.1.

The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer. This only works where the index of the DataFrame is not integer based. ix will accept any of the inputs of .loc and .iloc.

Slightly more complex, I prefer to explicitly use .iloc and .loc to avoid unexpected results.

As an example:

# ix indexing works just the same as .loc when passed strings
data.ix[['Andrade']] == data.loc[['Andrade']]
# ix indexing works the same as .iloc when passed integers.
data.ix[[33]] == data.iloc[[33]]
# ix only works in both modes when the index of the DataFrame is NOT an integer itself.
Setting values in DataFrames using .loc

With a slight change of syntax, you can actually update your DataFrame in the same statement as you select and filter using .loc indexer. This particular pattern allows you to update values in columns depending on different conditions. The setting operation does not make a copy of the data frame, but edits the original data.

As an example:

# Change the first name of all rows with an ID greater than 2000 to "John"
data.loc[data['id'] > 2000, "first_name"] = "John"
# Change the first name of all rows with an ID greater than 2000 to "John"
data.loc[data['id'] > 2000, "first_name"] = "John"
That’s the basics of indexing and selecting with Pandas. If you’re looking for more, take a look at the .iat, and .at operations for some more performance-enhanced value accessors in the Pandas Documentation and take a look at selecting by callable functions for more iloc and loc fun.

  对初学者来说真的很有帮助。非常详细。在pandas和python上寻找更多关于你的博客。

