1、df[‘col_name’]:按照“列名”索引提取列数据
按照列名选择列,只选择一列输出Series,选择多列输出Dataframe
df[]
一般用于选择列,[]
中写列名(所以一般数据colunms都会单独制定,不会用默认数字列名,以免和index冲突);- 单选列为Series,print结果为Series格式;
- 多选列为Dataframe,print结果为Dataframe格式;
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(12).reshape(3, 4) * 100,
index=['one', 'two', 'three'],
columns=['a', 'b', 'c', 'd'])
print("df = ", df)
print('-' * 100)
# 按照列名选择列,只选择一列输出Series,选择多列输出Dataframe
data1 = df['a']
data2 = df[['a', 'c']]
print("data1 = \n{0}\ntype(data1) = {1}".format(data1, type(data1)))
print('-' * 100)
print("data2 = \n{0}\ntype(data2) = {1}".format(data2, type(data2)))
打印结果:
df = a b c d
one 12.427304 39.089892 22.467365 22.711018
two 50.808058 67.916443 39.312617 95.227642
three 3.399731 57.874266 45.771234 99.649908
----------------------------------------------------------------------------------------------------
data1 =
one 12.427304
two 50.808058
three 3.399731
Name: a, dtype: float64
type(data1) = <class 'pandas.core.series.Series'>
----------------------------------------------------------------------------------------------------
data2 =
a c
one 12.427304 22.467365
two 50.808058 39.312617
three 3.399731 45.771234
type(data2) = <class 'pandas.core.frame.DataFrame'>