Pandas learn 1

Learning Summary:

Ps: Some of the code comes from Kaggle's Learn Pandas, If I make mistakes, please point out.

1. What is a DataFrame object: A DataFrame object is a table that stores data

2. What is a Series object: A Series object is a list of data stores

3. Relationship between DataFrame and Series: Series is a part of the DataFrame

4. Create a DataFrame:

(All pd in the code block are Pandas.)

In [1]:

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Out [1]: 

YesNo
050131
1212

In [2]:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Out [2]:

Bob

Sue

0I liked itPretty good
1It was awfulBland

In [3]:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Out [3]:

BobSue
Product AI liked itPretty good
Product BIt was awfulBland

 Index replaces the default number by filling the leftmost column with the contents of index

5. Create a Series:

In [1]:

pd.Series([1, 2, 3, 4, 5])

Out [1]:

01
12
23
34
45

                        dtype : int 64

In [2]:

pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

Out [2]:

2015 Sales30
2016 Sales35
2017 Sales40

                Name: Product A , dtype : int 64

6. Read the CSV file:

You can go to Pandas's read_csv method to read data from a CSV file and return a DataFrame object

reviews = pd.read_csv('path')

The read_csv method can also specify a column in the data as the index

reviews = pd.read_csv(’path‘, index_col=0)

Index_col = 0, the data in the first column is taken as index

7. shape:

The DataFrame object can get its number of rows and columns using the shape method

reviews.shape

The code above returns reviews with a tuple whose first element is the number of rows and second element is the number of columns

8. head () :

reviews.head()

The code above returns the first five rows of data from reviews

reviews.head(2)

The above code returns the first two rows of data from reviews

Problems encountered in learning:

In the exercise of Kaggle, he asked me to read a CSV file. When I read, I did not use index_col=0 to take the element in the first column as index, so the result was inconsistent with the expected result

Solution:

At that time, I did not know the function of index_col, but just thought about how to remove the extra column. When I searched online, I found that someone introduced the drop method of DataFrame, as shown below:

df.drop(['column name 1', 'column name 2', 'column name 3'], axis=1)

However, this method is not suitable for me, because the column I need to delete does not have a name, but it provides me with a new method called 'drop'. I started looking for information related to the drop method, and finally found a method to delete columns by the number of columns:

df.drop(df.columns[[0]], axis=1)

The above code removes the contents of the first column, where axis=1 refers to the column and axis=0 refers to the row

That can also delete more than one column at a time:

df.drop(df.columns[[0, 1, 2]], axis=1)

The above code removes columns 1, 2, and 3

Reference:

Creating, Reading and Writing | Kaggle

python - About how to drop first columns from DataFrame? - Stack Overflow

学习汇总:

ps:部分代码来自于kaggle的learn pandas,刚开始学习,错误的地方欢迎大家指出

1. 什么是DataFrame对象:DataFrame对象就是一个存储数据的表格

2. 什么是Series对象:Series对象就是一个存储数据的列表

3. DataFrame和Series的关系:Series是DataFrame中的一部分

4. 创建DataFrame:

(代码块中所有的pd为pandas缩写)

输入1:

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

 输出1:

YesNo
050131
1212

输入2:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

输出2:

Bob

Sue

0I liked itPretty good
1It was awfulBland

输入3:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

输出3:

BobSue
Product AI liked itPretty good
Product BIt was awfulBland

index会将index中的内容填入最左侧一列来替换默认的数字

5. 创建Series:

输入1:

pd.Series([1, 2, 3, 4, 5])

输出1:

01
12
23
34
45

                        dtype : int 64

输入2:

pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

输出2:

2015 Sales30
2016 Sales35
2017 Sales40

                Name: Product A , dtype : int 64

6. 读取csv文件:

可以通过pandas的read_csv方法来读取csv文件中的数据并且返回一个DataFrame对象

reviews = pd.read_csv('path')

read_csv方法还可以指定数据中的某一列作为index

reviews = pd.read_csv(’path‘, index_col=0)

index_col = 0,将第一列的数据作为index

7. shape:

DataFrame对象可以通过shape方法来获取它的行数和列数

reviews.shape

上面的代码会返回reviews一个tuple,tuple中的第一个元素是行数,第二个元素是列数

 8. head():

reviews.head()

上面的代码会返回reviews中的前五行数据

reviews.head(2)

上面的代码会返回reviews中的前两行数据

学习中遇到的问题:

在kaggle的练习中他要求我读取一个csv文件,我读取时没有用index_col=0将第一列的元素作为index导致结果与预期不符

解决:

在当时并不了解index_col的作用,只是想着该如何将多出来的一列去除,在网上搜索时发现有人介绍DataFrame的drop方法,如下所示:

df.drop(['column name 1', 'column name 2', 'column name 3'], axis=1)

但是这个方法并不适合我,因为我需要删除的那一列并没有名称,但是他为我提供了一个新的方法drop,我开始查找和drop方法相关的资料,最后找到了通过列数来删除列的方法:

df.drop(df.columns[[0]], axis=1)

上面的代码可以删除第一列的内容,axis=1是指列,axis=0是指行

也可以同时删除多列:

df.drop(df.columns[[0, 1, 2]], axis=1)

上面的代码可以删除第1,2,3列

引用:

Creating, Reading and Writing | Kaggle

python - About how to drop first columns from DataFrame? - Stack Overflow

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值