Python利用minidom编写在线词典_python在线词典设计-CSDN博客

本文链接：https://blog.csdn.net/hackjames/article/details/6943371

(说明: 这是我2007年写在cublog(China Unix)上的博客,现在把它转到这里来.)

阅读英文文献,经常遇到要使用电子词典的问题,但是金山词霸的过于膨大令人生畏,急需一个简单小巧的辞典,一开始我用的是dict.cn,打开IE并输入网址,再输入要查的单词.
    偶然一次在dict.cn的帮助文档上发现了它提供的XML接口,碰巧我正好学到xml解析这一块了,顺便就写了一个简单但是词汇还算丰富的在线词典,至少可以不用打开IE,然后输入网址了吧.呵呵.
    这个辞典主要流程很简单.
    1,提交一个POST请求,http://dict.cn/ws.php?q=XXX, 服务器将会返回一个xml格式的文本,比如提交一个http://dict.cn/ws.php?q=circuit,服务器返回的xml为:

 <dict>
    <audio>http://dict.cn/mp3.php?q=Wqpzt</audio>
    <pron>'sə:kit</pron>
    <def>n. 电路,一圈,巡回</def>
    <sent><orig>The rocket did one circuit of the earth and returned to base.        </orig><trans>火箭绕地球运行一周后返回基地。</trans>< /sent>
    <sent><orig>There are two breakers in this circuit.</orig><trans>这个电路里面使用了两个断路器。</trans>< /sent>
    <sent><orig>The switches close the contacts and complete the circuit.</orig>        <trans>这些开关可使接触器接通电流形成回路。</trans>< /sent>
    </dict>

    2,利用dom.minidom解析这段xml,得到def节点的值,把这个词打印出来就可以了(由于我最关心的是单词的翻译,所以为了简单起见,只取了def节点.其他节点的内容含义,请参见dict.cn).
    这里的关键在于字符的编码转换,必须把xml统一编码为utf8才能正确显示出来.
    下面是这个程序的源代码.

# -*- coding: cp936 -*
#!/usr/bin/python

import sys,os,time,string
import sgmllib
import urllib,httplib
from xml.dom.minidom import *
logo = """
#########################################################
#                      在线英汉词典(TinyDict v1.0)        #
#                          by yuanshl from CSLab of Lzu #
#                            E-mail: yuanshl02@gmail.com#
#########################################################
"""
def post(website,path,value): #POST
    params=urllib.urlencode(value)
    headers={"Accept":"text/html","User-Agent":"IE",\
      "Content-Type":"application/x-www-form-urlencoded"} 
    conn=httplib.HTTPConnection(website) 
    conn.request("POST",path,params,headers)
    #print params
    r=conn.getresponse() 
    #print r.status,r.reason 
    data=r.read() 
    #print data 
    conn.close()
    return data
def getTagText(root, tag):  #xml Parse
    try:
        node = root.getElementsByTagName(tag)[0]
    except:
    return "no"
    rc = ""
    for node in node.childNodes:
        if node.nodeType in ( node.TEXT_NODE, \
                        node.CDATA_SECTION_NODE):
            rc = rc + node.data
    return rc

def get_key():
    d = ["0","1"]
    word = parse("temp.xml")
    d[0] = getTagText(word,"def")  #单词含义
    d[1] = getTagText(word,"rel")  #单词的相关词条
    return d
value = {"q":"good"}
print logo
while 1:
    value["q"] = raw_input("\n请输入要查找的单词
                       (使用UP/DOWN键选择历史查询):")
    if value["q"].strip() == "":
        continue
    d = post("dict.cn","/ws.php",value)
    dat = unicode(d,"gbk").encode("utf-8")[38:]
    f = open('temp.xml',"w+")
    f.write("")
    f.writelines(dat)
    f.close()
    result = get_key()
    
    print "\n%s :\n %s \n" %(value["q"],result[0])
    if result[1] == "no":
        continue
    print "相关词条:  %s \n" %result[1]

最后,需要值得一提的是,python的raw_input()函数非常方便,可以像linux下面的bash那样,利用上,下键查询历史命令,这样需要查询前一个单词的时候,只要用下键就可以了.
这是本程序运行的结果: