工作中,经常遇到需要将查询结果的文本,转换为二维表(类似Excel的表格)格式,供代码处理, 或者保存为Excel/CSV。该代码:
- 切割文本为list的 list(行间分隔符,行内值分割符)
- 将切割后list的list转换为一个DataFrame或者Dictionry List。
- 然后通过pandas保存为Excel。
import pandas as pd
columns_names = ['name', 'Math_A', 'English_A', 'Math_B', 'English_B', 'Project_num', 'Sex']
data = """Alice,1.1,3.0,1.7,5.0,2,FEOLBob,2.2,2.6,2.5,2.6,3,MEOLCathy,3.3,2.0,3.6,2.4,0,MEOLDany,4.4,1.7,2.4,1.3,1,FEOLElla,5.0,3.0,5.0,3.0,7,MEOLFord,3.2,3.3,2.2,3.6,2,FEOLGary,2.4,4.4,3.3,2.4,1,MEOLHam,1.5,5.0,4.4,5.0,5,MEOLIco,4.3,3.2,1.5,2.2,3,FEOLJack,4.5,2.4,4.3,3.1,4,MEOL"""
print("\n >>>>> 1. The table string is: \n", data)
# Split table string to a list [list].切割文本为list的list。如果内层的list(行)的长度与列名的长度不一致,则为无效list。否则,将该list(行)给row_list
# 本例中,‘EOL’为行分隔符, ‘,’字段分隔符。 请根据您自己的数据和分隔符修改代码集成到您的代码中。
print("\n >>>>> 2. Start to split the table string... ")
row_list = []
for values in ([row.split(",") for row in data.split("EOL")]):
if len(values) == len(columns_names):
row_list.append(values)
else:
print(" >>>>> The invalid row is: ", values)
print(" >>>>> The split list [list] is: \n", row_list)
# Transfer list [list] to DataFrame.转换list的list为Pandas的DataFrame
row_list_df = pd.DataFrame(row_list, columns=columns_names)
print("\n >>>>> 3. The DataFrame is: \n", row_list_df)
row_list_df.to_csv('test1.csv')
# Transfer list [list] to Dictionary list。转换list的list为字典列表
# value_dict_list = [dict(zip(columns_names, value)) for value in row_list if len(value) == len(columns_names)]
value_dict_list = [dict(zip(columns_names, value)) for value in row_list]
print("\n >>>>> 4. The Dictionary list: \n", value_dict_list)
pd.DataFrame(value_dict_list).to_csv('test2.csv')
# 可以根据需求,通过pd保存为Excel或者CSV。
数据表文本转换为DataFrame/List[list]/List dictinary
**运行结果:**
C:\py\venv\Scripts\python.exe C:/py/test.py
>>>>> 1. The table string is:
Alice,1.1,3.0,1.7,5.0,2,FEOLBob,2.2,2.6,2.5,2.6,3,MEOLCathy,3.3,2.0,3.6,2.4,0,MEOLDany,4.4,1.7,2.4,1.3,1,FEOLElla,5.0,3.0,5.0,3.0,7,MEOLFord,3.2,3.3,2.2,3.6,2,FEOLGary,2.4,4.4,3.3,2.4,1,MEOLHam,1.5,5.0,4.4,5.0,5,MEOLIco,4.3,3.2,1.5,2.2,3,FEOLJack,4.5,2.4,4.3,3.1,4,MEOL
>>>>> 2. Start to split the table string...
>>>>> The invalid row is: ['']
>>>>> The split list [list] is:
[['Alice', '1.1', '3.0', '1.7', '5.0', '2', 'F'], ['Bob', '2.2', '2.6', '2.5', '2.6', '3', 'M'], ['Cathy', '3.3', '2.0', '3.6', '2.4', '0', 'M'], ['Dany', '4.4', '1.7', '2.4', '1.3', '1', 'F'], ['Ella', '5.0', '3.0', '5.0', '3.0', '7', 'M'], ['Ford', '3.2', '3.3', '2.2', '3.6', '2', 'F'], ['Gary', '2.4', '4.4', '3.3', '2.4', '1', 'M'], ['Ham', '1.5', '5.0', '4.4', '5.0', '5', 'M'], ['Ico', '4.3', '3.2', '1.5', '2.2', '3', 'F'], ['Jack', '4.5', '2.4', '4.3', '3.1', '4', 'M']]
>>>>> 3. The DataFrame is:
name Math_A English_A Math_B English_B Project_num Sex
0 Alice 1.1 3.0 1.7 5.0 2 F
1 Bob 2.2 2.6 2.5 2.6 3 M
2 Cathy 3.3 2.0 3.6 2.4 0 M
3 Dany 4.4 1.7 2.4 1.3 1 F
4 Ella 5.0 3.0 5.0 3.0 7 M
5 Ford 3.2 3.3 2.2 3.6 2 F
6 Gary 2.4 4.4 3.3 2.4 1 M
7 Ham 1.5 5.0 4.4 5.0 5 M
8 Ico 4.3 3.2 1.5 2.2 3 F
9 Jack 4.5 2.4 4.3 3.1 4 M
>>>>> 4. The Dictionary list:
[{'name': 'Alice', 'Math_A': '1.1', 'English_A': '3.0', 'Math_B': '1.7', 'English_B': '5.0', 'Project_num': '2', 'Sex': 'F'}, {'name': 'Bob', 'Math_A': '2.2', 'English_A': '2.6', 'Math_B': '2.5', 'English_B': '2.6', 'Project_num': '3', 'Sex': 'M'}, {'name': 'Cathy', 'Math_A': '3.3', 'English_A': '2.0', 'Math_B': '3.6', 'English_B': '2.4', 'Project_num': '0', 'Sex': 'M'}, {'name': 'Dany', 'Math_A': '4.4', 'English_A': '1.7', 'Math_B': '2.4', 'English_B': '1.3', 'Project_num': '1', 'Sex': 'F'}, {'name': 'Ella', 'Math_A': '5.0', 'English_A': '3.0', 'Math_B': '5.0', 'English_B': '3.0', 'Project_num': '7', 'Sex': 'M'}, {'name': 'Ford', 'Math_A': '3.2', 'English_A': '3.3', 'Math_B': '2.2', 'English_B': '3.6', 'Project_num': '2', 'Sex': 'F'}, {'name': 'Gary', 'Math_A': '2.4', 'English_A': '4.4', 'Math_B': '3.3', 'English_B': '2.4', 'Project_num': '1', 'Sex': 'M'}, {'name': 'Ham', 'Math_A': '1.5', 'English_A': '5.0', 'Math_B': '4.4', 'English_B': '5.0', 'Project_num': '5', 'Sex': 'M'}, {'name': 'Ico', 'Math_A': '4.3', 'English_A': '3.2', 'Math_B': '1.5', 'English_B': '2.2', 'Project_num': '3', 'Sex': 'F'}, {'name': 'Jack', 'Math_A': '4.5', 'English_A': '2.4', 'Math_B': '4.3', 'English_B': '3.1', 'Project_num': '4', 'Sex': 'M'}]
Process finished with exit code 0