Python word

Python转换office word文件为HTML

这里测试的环境是:windows xp,office 2007,python 2.5.2,pywin32 build 213,原理是利用win32com接口直接调用office API,好处是简单、兼容性好,只要office能处理的,python都可以处理,处理出来的结果和office word里面“另存为”一致。

#!/usr/bin/env python

 
#coding=utf-8
 
from win32com import client as wc
 
word = wc.Dispatch ( 'Word.Application' )
 
doc = word.Documents .Open ( 'd:/labs/math.doc' )
 
doc.SaveAs ( 'd:/labs/math.html' , 8 )
 
doc.Close ( )
 
word.Quit ( )

关键的就是doc.SaveAs(‘d:/labs/math.html’, 8)这一行,网上很多文章写成:doc.SaveAs(‘d:/labs/math.html’, win32com.client.constants.wdFormatHTML),直接报错:

AttributeError: class Constants has no attribute ‘wdFormatHTML’

当然你也可以用上面的代码将word文件转换成任意格式文件(只要office 2007支持,比如将word文件转换成PDF文件,把8改成17即可),下面是office 2007支持的全部文件格式对应表:

wdFormatDocument                    =  0
wdFormatDocument97 = 0
wdFormatDocumentDefault = 16
wdFormatDOSText = 4
wdFormatDOSTextLineBreaks = 5
wdFormatEncodedText = 7
wdFormatFilteredHTML = 10
wdFormatFlatXML = 19
wdFormatFlatXMLMacroEnabled = 20
wdFormatFlatXMLTemplate = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML = 8
wdFormatPDF = 17
wdFormatRTF = 6
wdFormatTemplate = 1
wdFormatTemplate97 = 1
wdFormatText = 2
wdFormatTextLineBreaks = 3
wdFormatUnicodeText = 7
wdFormatWebArchive = 9
wdFormatXML = 11
wdFormatXMLDocument = 12
wdFormatXMLDocumentMacroEnabled = 13
wdFormatXMLTemplate = 14
wdFormatXMLTemplateMacroEnabled = 15
wdFormatXPS = 18

照着字面意思应该能对应到相应的文件格式,如果你是office 2003可能支持不了这么多格式。word文件转html有两种格式可选wdFormatHTML、wdFormatFilteredHTML(对应数字 8、10),区别是如果是wdFormatHTML格式的话,word文件里面的公式等ole对象将会存储成wmf格式,而选用 wdFormatFilteredHTML的话公式图片将存储为gif格式,而且目测可以看出用wdFormatFilteredHTML生成的HTML 明显比wdFormatHTML要干净许多。

当然你也可以用任意一种语言通过com来调用office API,比如PHP.

 

=========================================

使用 python 写 COM

 

2009年09月03日 星期四 下午 07:01

from : http://www.cppblog.com/bigsml/archive/2008/08/14/58851.html

Python 支持Com调用(client com) 以及撰写COM 组件(server com).
1. com 调用示例(使用Windows Media Player 播放音乐)

from win32com.client import Dispatch
mp
= Dispatch( " WMPlayer.OCX " )
tune
= mp.newMedia( " C:/WINDOWS/system32/oobe/images/title.wma " )
mp.currentPlaylist.appendItem(tune)
mp.controls.play()


2. com server 的编写
主要可以参考 <<Python Programming on Win32 之 Chapter 12 Advanced Python and COM http://oreilly.com/catalog/pythonwin32/chapter/ch12.html >>
示例(分割字符串)
- 代码

class PythonUtilities:
     _public_methods_
= [ ' SplitString ' ]
     _reg_progid_
= " PythonDemos.Utilities "
    
# NEVER copy the following ID
     # Use "print pythoncom.CreateGuid()" to make a new one.
     _reg_clsid_ = " {41E24E95-D45A-11D2-852C-204C4F4F5020} "
    
    
def SplitString(self, val, item = None):
        
import string
        
if item != None: item = str(item)
        
return string.split(str(val), item)

# Add code so that when this script is run by
#
Python.exe, it self-registers.
if __name__ == ' __main__ ' :
    
print " Registering COM server "
    
import win32com.server.register
     win32com.server.register.UseCommandLine(PythonUtilities)


- 注册/注销Com

Command-Line Option

Description

 

The default is to register the COM objects.

--unregister

Unregisters the objects. This removes all references to the objects from the Windows registry.

--debug

Registers the COM servers in debug mode. We discuss debugging COM servers later in this chapter.

--quiet

Register (or unregister) the object quietly (i.e., don't report success).


- 使用COM
可以在python 命令行下运行

>>> import win32com.client
>>> s = win32com.client.Dispatch( " PythonDemos.Utilities " )
>>> s.SplitString( " a,b,c " , " , " )
((u
' a ' , u ' a,b,c ' ),)
>>>


3. python server com 原理
其实在注册表中查找到python com 的实现内幕

Windows Registry Editor Version 5.00

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020} ]
@
= " PythonDemos.Utilities "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/Debugging ]
@
= " 0 "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/Implemented Categories ]

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/Implemented Categories/{B3EF80D0-68E2-11D0-A689-00C04FD658FF} ]

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/InprocServer32 ]
@
= " pythoncom25.dll "
" ThreadingModel " = " both "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/LocalServer32 ]
@
= " D://usr//Python//pythonw.exe / " D://usr//Python//lib//site-packages//win32com//server//localserver.py/ " {41E24E95-D45A-11D2-852C-204C4F4F5020} "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/ProgID ]
@
= " PythonDemos.Utilities "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/PythonCOM ]
@
= " PythonDemos.PythonUtilities "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/PythonCOMPath ]
@
= " D:// "

inproc server 是通过pythoncom25.dll 实现
local server 通过localserver.py 实现
com 对应的python 源文件信息在 PythonCOMPath & PythonCOM

4. 使用问题
用PHP 或者 c 调用com 的时候

<? php
$com = new COM( " PythonDemos.Utilities " );
$rs = $com -> SplitString( " a b c " );
foreach ( $rs as $r )
    
echo $r . " /n " ;
?>

会碰到下面的一些错误.
pythoncom error: PythonCOM Server - The 'win32com.server.policy' module could not be loaded.
<type 'exceptions.ImportError'>: No module named server.policy pythoncom error: CPyFactory::CreateInstance failed to create instance. (80004005)


可以通过2种方式解决:
a. 设置环境 PYTHONHOME = D:/usr/Python
另外在c ++ 使用python 的时候, 如果import module 出现错误 'import site' failed; use -v for traceback 的话, 也可以通过设置这个变量解决.

b. 为com 生产exe, dll 可执行文件, setup.py 代码如下 :

from distutils.core import setup
import py2exe

import sys
import shutil

# Remove the build tree ALWAYS do that!
shutil.rmtree( " build " , ignore_errors = True)

# List of modules to exclude from the executable
excludes = [ " pywin " , " pywin.debugger " , " pywin.debugger.dbgcon " , " pywin.dialogs " , " pywin.dialogs.list " ]

# List of modules to include in the executable
includes = [ " win32com.server " ]

# ModuleFinder can't handle runtime changes to __path__, but win32com uses them
try :
    
# if this doesn't work, try import modulefinder
     import py2exe.mf as modulefinder
    
import win32com
    
    
for p in win32com. __path__ [ 1 :]:
         modulefinder.AddPackagePath(
" win32com " , p)
    
    
for extra in [ " win32com.shell " , " win32com.server " ]: # ,"win32com.mapi"
         __import__ (extra)
         m
= sys.modules[extra]
        
for p in m. __path__ [ 1 :]:
             modulefinder.AddPackagePath(extra, p)

except ImportError:
    
# no build path setup, no worries.
     pass

# Set up py2exe with all the options
setup(
     options
= { " py2exe " : { " compressed " : 2 ,
                          
" optimize " : 2 ,
                          
# "bundle_files": 1,
                           " dist_dir " : " COMDist " ,
                          
" excludes " : excludes,
                          
" includes " : includes}},
    
# The lib directory contains everything except the executables and the python dll.
     # Can include a subdirectory name.
     zipfile = None,
     com_server
= [ 'PythonDemos ' ], # 文件名!!
     )



ref:
http://oreilly.com/catalog/pythonwin32/chapter/ch12.html
http://blog.donews.com/limodou/archive/2005/09/02/537571.aspx

 


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值