入门Pandas必练习100题基础到进阶|入门教程

最新推荐文章于 2024-08-10 08:21:43 发布

zg1g

最新推荐文章于 2024-08-10 08:21:43 发布

阅读量870

点赞数 14

文章标签： pandas

本文链接：https://blog.csdn.net/daigualu/article/details/140677394

版权

作者:郭震

https://www.machinelearningplus.com/python/101-pandas-exercises-python/

# Allow several prints in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

1. How to import pandas and check the version?

如何导入pandas并检查其版本？

import pandas as pd
print(pd.__version__)

# Print all pandas dependencies
print(pd.show_versions(as_json=True))

1.0.1
{'system': {'commit': None, 'python': '3.7.6.final.0', 'python-bits': 64, 'OS': 'Linux', 'OS-release': '5.4.0-29-generic', 'machine': 'x86_64', 'processor': 'x86_64', 'byteorder': 'little', 'LC_ALL': 'None', 'LANG': 'zh_CN.UTF-8', 'LOCALE': 'zh_CN.UTF-8'}, 'dependencies': {'pandas': '1.0.1', 'numpy': '1.18.1', 'pytz': '2019.3', 'dateutil': '2.8.1', 'pip': '20.0.2', 'setuptools': '45.2.0.post20200210', 'Cython': '0.29.15', 'pytest': '5.3.5', 'hypothesis': '5.5.4', 'sphinx': '2.4.0', 'blosc': None, 'feather': None, 'xlsxwriter': '1.2.7', 'lxml.etree': '4.6.2', 'html5lib': '1.0.1', 'pymysql': None, 'psycopg2': None, 'jinja2': '2.11.1', 'IPython': '7.12.0', 'pandas_datareader': None, 'bs4': '4.8.2', 'bottleneck': '1.3.2', 'fastparquet': None, 'gcsfs': None, 'matplotlib': '3.1.3', 'numexpr': '2.7.1', 'odfpy': None, 'openpyxl': '3.0.3', 'pandas_gbq': None, 'pyarrow': None, 'pytables': None, 'pyxlsb': None, 's3fs': None, 'scipy': '1.4.1', 'sqlalchemy': '1.3.13', 'tables': '3.6.1', 'tabulate': None, 'xarray': None, 'xlrd': '1.2.0', 'xlwt': '1.3.0', 'numba': '0.48.0'}}
None

2. How to create a series from a list, numpy array and dict?

如何在pandas中从列表、NumPy数组和字典创建一个Series.

Create a pandas series from each of the items below: a list, numpy and a dictionary

# Input
import numpy as np
a_list = list("abcdefg")
numpy_array = np.arange(1, 10)
dictionary = {"A":  0, "B":1, "C":2, "D":3, "E":5}

series1 = pd.Series(a_list)
print(series1)
series2 = pd.Series(numpy_array)
print(series2)
series3 = pd.Series(dictionary)
print(series3)

3. How to convert the index of a series into a column of a dataframe?

如何将Series的索引转换为DataFrame的一列？

Convert the series ser into a dataframe with its index as another column on the dataframe.

# input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)
print(ser[:5])

# solution 1 using DataFrame
ser_df = pd.DataFrame(ser)
ser_df.reset_index()

# using pandas to_frame()
ser_df = ser.to_frame().reset_index()
ser_df

4. How to combine many series to form a dataframe?

Combine ser1 and ser2 to form a dataframe.

# input
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

# using pandas DataFrame
ser_df = pd.DataFrame(ser1, ser2).reset_index()
ser_df.head()
# using pandas DataFrame with a dictionary, gives a specific name to the column
ser_df = pd.DataFrame({"col1":ser1, "col2":ser2})
ser_df.head(5)
# using pandas concat
ser_df = pd.concat([ser1, ser2], axis = 1)
ser_df.head()

5. How to assign name to the series’ index?

Give a name to the series ser calling it ‘alphabets’.

# input
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))

# using series rename method
ser.rename("alphabets")
# using series attribute
ser.name = "other_name"
ser

6. How to get the items of series A not present in series B?

Get all items of ser1 and ser2 not common to both.

# input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

ser1[~ser1.isin(ser2)]

7. How to get the items not common to both series A and series B?

Get all items of ser1 and ser2 not common to both.

# input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# using pandas
a_not_b = ser1[~ser1.isin(ser2)]
b_not_a = ser2[~ser2.isin(ser1)]

a_not_b.append(b_not_a, ignore_index = True)

# using numpy union and intersection
ser_u = pd.Series(np.union1d(ser1, ser2))
ser_i = pd.Series(np.intersect1d(ser1, ser2))
ser_u[~ser_u.isin(ser_i)]

8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

Compute the minimum, 25th percentile, median, 75th, and maximum of ser.

# input
state = np.random.RandomState(100)
ser = pd.Series(state.normal(10, 5, 25))

# using pandas
ser.describe()

# or using numpy
np.percentile(ser, q = [0, 25, 50, 75, 100])

9. How to get frequency counts of unique items of a series?

Calculate the frequency counts of each unique value ser.

# input
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

ser.value_counts()

10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?

From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

# input
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))
ser

ser.value_counts()
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser
# we do value_counts to see the repetitions for each value, then we do ~ser.isin value_counts, filter by index the first 2 and = "Other renames the values"

11. How to bin a numeric series to 10 groups of equal size?

Bin the series ser into 10 equal deciles and replace the values with the bin name.

# input
ser = pd.Series(np.random.random(20))
ser

pd.qcut(ser, q = 10)
# we can also pass labels
pd.qcut(ser, q = [0, .10, .20, .30, .40, .50, .60, .70, .80, .90, 1], labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()

12. How to convert a numpy array to a dataframe of given shape? (L1)####

Reshape the series ser into a dataframe with 7 rows and 5 columns

# input
ser = pd.Series(np.random.randint(1, 10, 35))
ser

# using numpy
pd.DataFrame(np.array(ser).reshape(7, 5))

# using only pandas
pd.DataFrame(ser.values.reshape(7, 5))

13. How to find the positions of numbers that are multiples of 3 from a series?

Find the positions of numbers that are multiples of 3 from ser.

# input

np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, 10))
ser

# using the where clause
ser.where(lambda x: x%3 == 0).dropna()

# using numpy and reshape to get a pandas series
#pd.Series(np.argwhere(ser%3 == 0).reshape(4))
np.argwhere(ser%3 == 0)

14. How to extract items at given positions from a series####

From ser, extract the items at positions in list pos.

# input

ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

# using loc
ser.loc[pos]

# using series take
ser.take(pos)

15. How to stack two series vertically and horizontally ?

Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

# input
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

# vertical
ser1.append(ser2)
# or using pandas concat and axis = 0
pd.concat([ser1, ser2], axis = 0)

# horizontal
pd.concat([ser1, ser2], axis = 1)

感谢你的支持,原创不易,希望转发,点击,以及收藏,也可以点击阅读原文更多AI知识分享,同时也可以关注知识星球:郭震AI学习星球

长按上图二维码查看「郭震AI学习星球」

更多Python、数据分析、爬虫、前后端开发、人工智能等教程参考.
以上全文,欢迎继续点击阅读原文学习,阅读更多AI资讯,[请点击这里] https://ai-jupyter.com/

zg1g

关注

14
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
入门Pandas必练习100题基础到进阶|入门教程

作者:郭震https://www.machinelearningplus.com/python/101-pandas-exercises-python/#AllowseveralprintsinonecellfromIPython.core.interactiveshellimportInteractiveShellInteractiveShell.ast_node_inte...
复制链接

扫一扫