一个简单的PYTHON代码

最新推荐文章于 2024-09-21 14:31:28 发布

laxsong

最新推荐文章于 2024-09-21 14:31:28 发布

阅读量4.6k

点赞数

分类专栏： python 文章标签： python list table import delete file

本文链接：https://blog.csdn.net/laxsong/article/details/915363

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

用PYTHON写代码很高兴，更高兴的是今天竟然用了一天的上班时间来练这个。当然也是为我们的那个程序服务的。

我们目前要把一个表态HTML页面转换成PORTAL。由于表态页面数量很大，所以我们采用动态改写的方法。由于这篇的目的不是介绍我们的项目，所以直接说我的脚本。由于我们的工作，我们现在做操作前要对所以的静态页面进行简单的标记分析。这里主要分析TABLE，TR和TD。

下面贴下我的代码：主要是两个文件

htmlParser.py

import os,util

def htmlParse (htmlname,errorfile):
    stackList = []
    total = 0
    incomment = 0
    file = open(htmlname)
    strLines = file.readlines()
    for line in strLines:
        total += 1

comtmp = util.judgeComment (line)
        if comtmp == 1:#
            pos1 = line.find('') + 2
            line = line[0:pos1] + line[pos2:]
        if comtmp == 2:#
            incomment = 0
            pos = line.find('-->') + 2
            line = line[pos:]

        if incomment == 1:# this line is in comment
            continue

        taglist = util.getLineTagList(line)
        for item in taglist:
            res = util.addDelTag(item,stackList)
            if res == -1:
                errorinfo = htmlname + os.altsep + str(total) + str(stackList) + os.linesep
                errorfile.append(errorinfo)
                return

    if len(stackList) != 0:
        result = htmlname + str(stackList) + ' are not closed!' + os.linesep
        #errorfile.append(result)

if __name__ == '__main__':
    pattern = "*.html"
    startdir = "F://sshome"
    #startdir = "D://test"
    files = util.find (pattern, startdir)
    res = []
    for filename in files:
        htmlParse(filename,res)
    res.append(str(len(res)))
    filewrite = file("F://errorPages.txt",'w')
    filewrite.writelines(res)

另一个文件util.py

import os, fnmatch

# judge comment tag to delete comment statement
def judgeComment (line):
    openTag = line.find('')
    if openTag != -1:
        if closeTag != -1:# 
            return 1
        else:#
        return 3
    else:#
        return 4

# sort for a 2 dimension list(array)
def sortFor2di (listtosort):
    size = len(listtosort)
    for i in range(size-1):
        for j in range(i + 1,size):
            list1 = listtosort[i]
            list2 = listtosort[j]
            if list1[0] > list2[0]:
                listtosort[i],listtosort[j] = listtosort[j],listtosort[i]

# get all tags in a line in the form of list
def getLineTagList (line):
    taglist = []
    addTag2List (line,'table',taglist)
    addTag2List (line,'tr',taglist)
    addTag2List (line,'td',taglist)
    sortFor2di (taglist)
    return taglist

def addTag2List (line,tag,taglist):
    pos = line.find('<'+tag)
    if pos != -1:
        taglist.append([pos,'<'+ tag + '>'])
    pos = line.find('</'+tag+'>')
    if pos != -1:
        taglist.append([pos,'</' + tag + '>'])

def addDelTag(itemlist,stackList):
    tag = itemlist[1]
    res = 0
    res += judgeWhichTag (tag,'table',stackList)
    res += judgeWhichTag (tag,'tr',stackList)
    res += judgeWhichTag (tag,'td',stackList)
    if res != 0:
        return -1
    else:
        return 1

#
def judgeWhichTag (tag,lable,stackList):
    if tag == '<' + lable + '>':
        stackList.append(lable)
        return 0
    elif tag == '</' + lable + '>':
        size = len(stackList)
        if size < 1:
            return -1
        elif stackList[size - 1] == lable:
            del(stackList[size -1 ])
            return 0
        else:
            return -1
    else:
        return 0

# used to deal tag
def tagDeal (tag, line,stackList):
    openTag = line.find('<'+tag)
    closeTag = line.find('</'+tag+'>')
    if openTag != -1:
        stackList.append (tag)
        if closeTag == -1:
            return 1
    if closeTag != -1:
        size = len(stackList)
        if size < 1:
            return -1
        else:
            lastItem = stackList[size - 1]
            if lastItem != tag:
                return -1
            else:
                del (stackList[size - 1])
                return 1

def find (pattern,startdir=os.curdir):
    files = []
    os.path.walk(startdir,visitor,(pattern,files))
    files.sort()
    return files

def visitor ((pattern,files),thisdir,names):
    for name in names:
        if fnmatch.fnmatch(name,pattern):
            fullpath = os.path.join(thisdir,name)
            files.append(fullpath)

申明一下，我是初学PYTHON。上面的程序写得很乱，以后有时间再修改或加点注释。当然很欢迎各位朋友

给点意见。

不过，最后的结果是我们的总共1000表态页面中共有200个页面这三种标签有错误。这就意味着有一大堆事情要处理。至于怎么做我们还没做好决定。