Pandas learn 1

最新推荐文章于 2024-07-27 17:39:19 发布

bigpangpang_01

最新推荐文章于 2024-07-27 17:39:19 发布

阅读量57

点赞数

文章标签： pandas python 数据分析

本文链接：https://blog.csdn.net/m0_60707271/article/details/126635359

版权

Learning Summary:

Ps: Some of the code comes from Kaggle's Learn Pandas, If I make mistakes, please point out.

1. What is a DataFrame object: A DataFrame object is a table that stores data

2. What is a Series object: A Series object is a list of data stores

3. Relationship between DataFrame and Series: Series is a part of the DataFrame

4. Create a DataFrame:

(All pd in the code block are Pandas.)

In [1]:

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Out [1]:

	Yes	No
0	50	131
1	21	2

In [2]:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Out [2]:

	Bob	Sue
0	I liked it	Pretty good
1	It was awful	Bland

In [3]:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Out [3]:

	Bob	Sue
Product A	I liked it	Pretty good
Product B	It was awful	Bland

Index replaces the default number by filling the leftmost column with the contents of index

5. Create a Series:

In [1]:

pd.Series([1, 2, 3, 4, 5])

Out [1]:

0	1
1	2
2	3
3	4
4	5

dtype : int 64

In [2]:

pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

Out [2]:

2015 Sales	30
2016 Sales	35
2017 Sales	40

Name: Product A , dtype : int 64

6. Read the CSV file:

You can go to Pandas's read_csv method to read data from a CSV file and return a DataFrame object

reviews = pd.read_csv('path')

The read_csv method can also specify a column in the data as the index

reviews = pd.read_csv(’path‘, index_col=0)

Index_col = 0, the data in the first column is taken as index

7. shape:

The DataFrame object can get its number of rows and columns using the shape method

reviews.shape

The code above returns reviews with a tuple whose first element is the number of rows and second element is the number of columns

8. head () :

reviews.head()

The code above returns the first five rows of data from reviews

reviews.head(2)

The above code returns the first two rows of data from reviews

Problems encountered in learning:

In the exercise of Kaggle, he asked me to read a CSV file. When I read, I did not use index_col=0 to take the element in the first column as index, so the result was inconsistent with the expected result

Solution:

At that time, I did not know the function of index_col, but just thought about how to remove the extra column. When I searched online, I found that someone introduced the drop method of DataFrame, as shown below:

df.drop(['column name 1', 'column name 2', 'column name 3'], axis=1)

However, this method is not suitable for me, because the column I need to delete does not have a name, but it provides me with a new method called 'drop'. I started looking for information related to the drop method, and finally found a method to delete columns by the number of columns:

df.drop(df.columns[[0]], axis=1)

The above code removes the contents of the first column, where axis=1 refers to the column and axis=0 refers to the row

That can also delete more than one column at a time:

df.drop(df.columns[[0, 1, 2]], axis=1)

The above code removes columns 1, 2, and 3

Reference:

Creating, Reading and Writing | Kaggle

python - About how to drop first columns from DataFrame? - Stack Overflow

学习汇总：

ps：部分代码来自于kaggle的learn pandas，刚开始学习，错误的地方欢迎大家指出

1. 什么是DataFrame对象：DataFrame对象就是一个存储数据的表格

2. 什么是Series对象：Series对象就是一个存储数据的列表

3. DataFrame和Series的关系：Series是DataFrame中的一部分

4. 创建DataFrame：

（代码块中所有的pd为pandas缩写）

输入1：

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

输出1：

	Yes	No
0	50	131
1	21	2

输入2：

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

输出2:

	Bob	Sue
0	I liked it	Pretty good
1	It was awful	Bland

输入3:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

输出3:

	Bob	Sue
Product A	I liked it	Pretty good
Product B	It was awful	Bland

index会将index中的内容填入最左侧一列来替换默认的数字

5. 创建Series：

输入1:

pd.Series([1, 2, 3, 4, 5])

输出1:

0	1
1	2
2	3
3	4
4	5

dtype : int 64

输入2:

pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

输出2:

2015 Sales	30
2016 Sales	35
2017 Sales	40

Name: Product A , dtype : int 64

6. 读取csv文件：

可以通过pandas的read_csv方法来读取csv文件中的数据并且返回一个DataFrame对象

reviews = pd.read_csv('path')

read_csv方法还可以指定数据中的某一列作为index

reviews = pd.read_csv(’path‘, index_col=0)

index_col = 0，将第一列的数据作为index

7. shape：

DataFrame对象可以通过shape方法来获取它的行数和列数

reviews.shape

上面的代码会返回reviews一个tuple，tuple中的第一个元素是行数，第二个元素是列数

8. head()：

reviews.head()

上面的代码会返回reviews中的前五行数据

reviews.head(2)

上面的代码会返回reviews中的前两行数据

学习中遇到的问题：

在kaggle的练习中他要求我读取一个csv文件，我读取时没有用index_col=0将第一列的元素作为index导致结果与预期不符

解决：

在当时并不了解index_col的作用，只是想着该如何将多出来的一列去除，在网上搜索时发现有人介绍DataFrame的drop方法，如下所示：

df.drop(['column name 1', 'column name 2', 'column name 3'], axis=1)

但是这个方法并不适合我，因为我需要删除的那一列并没有名称，但是他为我提供了一个新的方法drop，我开始查找和drop方法相关的资料，最后找到了通过列数来删除列的方法：

df.drop(df.columns[[0]], axis=1)

上面的代码可以删除第一列的内容，axis=1是指列，axis=0是指行

也可以同时删除多列：

df.drop(df.columns[[0, 1, 2]], axis=1)

上面的代码可以删除第1，2，3列

引用：

Creating, Reading and Writing | Kaggle

python - About how to drop first columns from DataFrame? - Stack Overflow

bigpangpang_01

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Pandas learn 1

在kaggle的练习中他要求我读取一个csv文件，我读取时没有用index_col=0将第一列的元素作为index导致结果与预期不符。可以通过pandas的read_csv方法来读取csv文件中的数据并且返回一个DataFrame对象。上面的代码会返回reviews一个tuple，tuple中的第一个元素是行数，第二个元素是列数。3. DataFrame和Series的关系：Series是DataFrame中的一部分。上面的代码可以删除第一列的内容，axis=1是指列，axis=0是指行。...
复制链接

扫一扫