本篇延续:自动化报告的前奏|使用python-pptx操作PPT(一)
因为在pptx-python中使用table,需要单个cell逐一输入,于是在想有没有pandas可以直接读入的方式,
有两个开源项目有类似的功能:
其中mspandas
写的比较复杂,PandasToPowerpoint
比较易懂,就以此为本篇主要的研究对象。
其他系列:
- 自动化报告的前奏|使用python-pptx操作PPT(一)
- 自动化报告pptx-python|如何将pandas的表格写入PPTX(二)
- 自动化报告pptx-python|高效通过PPT模版制造报告(三)
- 自动化报告pptx-python|pptx 分析报告的工具包:reportgen(四)
1 案例1
核心代码:pd2ppt.py
在函数df_to_table
中可以看到:
def df_to_table(slide, df, left=None, top=None, width=None, height=None,
colnames=None, col_formatters=None, rounding=None,
name=None, table_style=None):
left = process_position_parameter(left)
top = process_position_parameter(top)
width = process_position_parameter(width)
height = process_position_parameter(height)
rows, cols = df.shape
shp = slide.shapes.add_table(rows+1, cols, left, top, width, height)
if colnames is None:
colnames = list(df.columns)
# Insert the column names
for col_index, col_name in enumerate(colnames):
shp.table.cell(0,col_index).text = col_name
m = df.values
for row in range(rows):
for col in range(cols):
val = m[row, col]
if col_formatters is None:
text = str(val)
else:
text = _do_formatting(val, col_formatters[col])
shp.table.cell(row+1, col).text = text
if name is not None:
shp.name = name
if table_style is not None:
table_graphic = shp._element.graphic.graphicData.tbl
table_graphic[0][-1].text = getattr(TableStyle, table_style)
return shp
# 一个例子
df = pd.DataFrame(
{'District':['Hampshire', 'Dorset', 'Wiltshire', 'Worcestershire'],
'Population':[25000, 500000, 735298, 12653],
'Ratio':[1.56, 7.34, 3.67, 8.23]})
pres = Presentation()
blank_slide_layout = pres.slide_layouts[6]
slide = pres.slides.add_slide(blank_slide_layout)
table = df_to_table(slide, df, col_formatters=['', ',', '.2'],
rounding=['', 3, ''],
table_style = 'DarkStyle2Accent5Accent6')
pres.save(filename)
整个的核心思路遍历pandas每一个元素,然后再赋值到shape.table
的每一个元素当中;
其中:
- col_formatters,设置每一列的小数点位数,案例中的
['', ',', '.2']
,第三列需要保留2位小数 - rounding,是否四舍五入
- table_style,风格改变
其中,下述是表格的风格,其中不同版本的PPT会有不同的风格,代码示例,DarkStyle2Accent5Accent6
,这里可以看到由三种元素组成(DarkStyle2,Accent5,Accent6):
table_graphic = shp._element.graphic.graphicData.tbl
table_graphic[0][-1].text = getattr(TableStyle, table_style)
# 举例
df_to_powerpoint(
"output/pd2ppt_test_3.pptx", df, col_formatters=['', ',', '.2'],
rounding=['', 3, ''],
table_style = 'DarkStyle2Accent5Accent6'
)
2 案例2
通过pandas创建pptx并保存;
然后重新修改表格里面的数据
from pd2ppt import df_to_powerpoint
import pandas as pd
# pandas 读入
df = pd.DataFrame(
{'District':['Hampshire', 'Dorset', 'Wiltshire', 'Worcestershire'],
'Population':[25000, 500000, 735298, 12653],
'Ratio':[1.56, 7.34, 3.67, 8.23]})
df_to_powerpoint(
"output/pd2ppt_test_3.pptx", df, col_formatters=['', ',', '.2'],
rounding=['', 3, ''],
table_style = 'DarkStyle2Accent5Accent6'
)
table_graphic = shp._element.graphic.graphicData.tbl
table_graphic[0][-1].text = getattr(TableStyle, table_style)
# 修改表格数据
from pptx import Presentation
prs = Presentation("output/pd2ppt_test.pptx")
for s in prs.slides[0].shapes:
print(s.name)
shapes = prs.slides[0]
prs.slides[0].shapes[0].table.cell(0,0).text_frame.text = "District-mattttt"
prs.save(r'output/pd2ppt_test_2.pptx')
再来一个所有数据替换的:
from pptx import Presentation
import six
import pandas as pd
def _do_formatting(value, format_str):
"""Format value according to format_str, and deal
sensibly with format_str if it is missing or invalid.
"""
if format_str == '':
if type(value) in six.integer_types:
format_str = ','
elif type(value) is float:
format_str = 'f'
elif type(value) is str:
format_str = 's'
elif format_str[0] == '.':
if format_str.endswith('R'):
if type(value) in six.integer_types:
value = round_to_n(value, int(format_str[1]))
format_str = ','
if not format_str.endswith('G'):
format_str = format_str + "G"
try:
value = format(value, format_str)
except:
value = format(value, '')
return value
# 整体修改
df = pd.DataFrame(
{'new1':['a', 'b', 'c', 'd'],
'new2':[250, 5000, 7398, 153],
'new3':[1, 7, 3, 8]})
col_formatters=['', ',', '.2']
rounding=['', 3, '']
rows, cols = df.shape
shp = prs.slides[0].shapes[0]#.table
# shp = slide.shapes.add_table(rows+1, cols, left, top, width, height)
colnames = list(df.columns)
# Insert the column names
for col_index, col_name in enumerate(colnames):
shp.table.cell(0,col_index).text = col_name
m = df.values
for row in range(rows):
for col in range(cols):
val = m[row, col]
# text = str(val)
if col_formatters is None:
text = str(val)
else:
text = _do_formatting(val, col_formatters[col])
shp.table.cell(row+1, col).text = text
prs.save(r'output/pd2ppt_test_整体修改.pptx')
其中,_do_formatting
代表表格内格式的调整;
逐个通过table.cell进行修改