Python pandas DataFrame是一个类似于表的数据结构对象。它包含行和列。每列包含相同类型的数据。对于每一列数据,您可以使用行号来迭代列元素。本文将告诉你如何创建一个pandas DataFrame 对象,以及如何获取其中的列和行数据。
1. 如何创建 Pandas DataFrame 对象。
- 调用pandas模块的DataFrame(data, index=index, columns=columns)方法来创建 python pandas DataFrame对象。
- 的数据参数保存数据帧的对象数据,它可以是一个2维阵列或一个Python字典对象。
- 该指数参数是 数据框对象的行索引号,它是一个Python列表对象。
- 该列 参数是 数据框对象的列标签的文字,我们可以使用每个列的值,以获得 数据帧中的对象的一列数据pandas.Series类型的对象。
1.1 通过二维数组创建 Pandas DataFrame 对象。
- 下面的示例将创建一个带有二维数组的 python pandas DataFrame 对象。
import pandas as pd ''' This function create a python pandas DataFrame object with a 2 dimension array. ''' def create_dataframe_from_2_dimension_array(): pd.set_option('display.unicode.east_asian_width', True) ''' Define a 2 dimension array, each element in the array's first dimension is a list. It contains the position number, programming language and operating system ''' data = [[1, 'python', 'Windows'], [5, 'java', 'Linux'],[8, 'c++', 'macOS']] # Define the column list, each element in the list is the column label. columns = ['Position', 'Programming Language', 'Operating System'] # Define the row index list. index = [1, 2, 3] # Create the python pandas DataFrame object. df = pd.DataFrame(data, index=index, columns=columns) # Print out the DataFrame object data. print(df) # Return the python pandas DataFrame object. return df
- 当你运行上面的函数时,它会在控制台打印出下面的数据。
Position Programming Language Operating System 1 1 python Windows 2 5 java Linux 3 8 c++ macOS
1.2 通过Python Dictionary 对象创建Pandas DataFrame 对象。
- 下面的示例将使用 Python 字典对象创建一个 python pandas DataFrame 对象。
import pandas as pd ''' This function create a python pandas DataFrame object with a python dictionary object. ''' def create_dataframe_from_dictionary_object(): pd.set_option('display.unicode.east_asian_width', True) ''' Define a python dictionary object, the key is the column name, the value is a list that contains the column value of each row in the column. ''' dict_obj = {'Position': [1, 5, 8], 'Programming Language': ['python', 'java', 'c++'], 'Operating System': ['Windows', 'Linux', 'macOS']} # Create a list object to store the row index number. index = [1, 2, 3] # Create the python pandas DataFrame object df = pd.DataFrame(dict_obj, index=index) # Print the DataFrame object's data in the console. print(df) # Return the created DataFrame object. return df print(create_dataframe_from_dictionary_object())
- 下面是上面的示例函数在控制台中的执行结果。
Position Programming Language Operating System 1 1 python Windows 2 5 java Linux 3 8 c++ macOS
2. 如何迭代 Python Pandas DataFrame 对象。
2.1 迭代 DataFrame 列。
- python pandas DataFrame对象的columns属性将在列表中返回所有DataFrame对象的列值。
- 然后我们可以迭代返回的列列表,然后在 pandas Series对象中获取列数据。下面是一个例子。
''' This function will iterate the dataframe_object and print out each column data list in the python pandas DataFrame object. ''' def iterate_dataframe_object(dataframe_object): print('=================== iterate_dataframe_object ======================') # Loop the DataFrame object's columns. for column in dataframe_object.columns: # Print out the column name. print(column) # Get the column data in a pandas Series object. column_data_series = dataframe_object[column] # Print out the column data Series object. print(column_data_series) print('=======================================') if __name__ == '__main__': #create_dataframe_from_2_dimension_array() df = create_dataframe_from_dictionary_object() iterate_dataframe_object(df)
- 下面是上面例子的执行结果。
=================== iterate_dataframe_object ====================== Position 1 1 2 5 3 8 Name: Position, dtype: int64 ======================================= Programming Language 1 python 2 java 3 c++ Name: Programming Language, dtype: object ======================================= Operating System 1 Windows 2 Linux 3 macOS Name: Operating System, dtype: object =======================================
2.2 迭代 DataFrame 行。
- 您可以使用pandas模块的DataFrame对象的iterrows()函数来获取 DataFrame 对象的行迭代器。
- 然后就可以调用python next()函数用迭代器对items进行迭代,然后得到DataFrame对象的每一行数据。下面是示例源代码。
''' Created on Oct 23, 2021 @author: songzhao ''' import pandas as pd ''' This function create a python pandas DataFrame object with a python dictionary object. ''' def create_dataframe_from_dictionary_object(): pd.set_option('display.unicode.east_asian_width', True) ''' Define a python dictionary object, the key is the column name, the value is a list that contains the column value of each row in the column. ''' dict_obj = {'Position':[1, 5, 8], 'Programming Language':['python', 'java', 'c++'], 'Operating System':['Windows', 'Linux', 'macOS']} # Create a list object to store the row index number. index = [1, 2, 3] # Create the python pandas DataFrame object df = pd.DataFrame(dict_obj, index=index) # Print the DataFrame object's data in the console. print(df) # Return the created DataFrame object. return df ''' This function will iterate the DataFrame object rows and print each row data. ''' def iterate_dataframe_rows(df_obj): print('=================== iterate_dataframe_rows ======================') # Call the DataFrame object's iterrows() function to get row iterator. iterator = df_obj.iterrows() # Get the next item in the iterator. row = next(iterator, None) # While there are rows in the iterator. while row != None: row_number = row[0] series_obj = row[1] print('row number = ', row_number) print(series_obj.index) print(series_obj.values) print('\r\n') # Get the next row from the iterator. row = next(iterator, None) if __name__ == '__main__': df = create_dataframe_from_dictionary_object() iterate_dataframe_rows(df)
- 当您运行上面的示例源代码时,您将获得以下输出。
Position Programming Language Operating System 1 1 python Windows 2 5 java Linux 3 8 c++ macOS =================== iterate_dataframe_rows ====================== row number = 1 Index(['Position', 'Programming Language', 'Operating System'], dtype='object') [1 'python' 'Windows'] row number = 2 Index(['Position', 'Programming Language', 'Operating System'], dtype='object') [5 'java' 'Linux'] row number = 3 Index(['Position', 'Programming Language', 'Operating System'], dtype='object') [8 'c++' 'macOS']