Python实现前端项目的压缩

斯剋的苏

已于 2022-07-15 12:02:21 修改

阅读量494

点赞数

于 2022-07-13 12:57:16 首次发布

本文链接：https://blog.csdn.net/qq_44924355/article/details/125761446

版权

前端项目压缩包含两个步骤

Step1 去除HTML/JavaScript/CSS内的注释

该部分通过正则表达式匹配各种类型的注释，
HTML的注释形式为
<!-注释内容–>>
在这里插入图片描述
HTML的script标签下也会附带JavaScript的内容
其中JavaScript和CSS的注释形式为

/*注释内容*/

在这里插入图片描述

//注释内容

在这里插入图片描述

//注释与另外两种注释的正则表达式有两个区别
①匹配注释内容时未使用\s而是(空格)，这是因为//是单行注释，换行符作为了结尾标识
②判断结尾换行符前不是>，这是是考虑以下情况，在HTML的<%include%>标签可能会在内部添加注释（实际上不规范），若删除该部分注释将连带删除>，直接报错
在这里插入图片描述

完整操作为:
首先需要递归获取项目文件夹下所有的扩展名为js/html/css文件路径，读取该路径的文件为字符串，用设计的正则表达式分割字符串再连接，最后写入到输出目录

Step2 删除项目中未被引用的图像

前端项目经过多次修改，可能有很多未被引用的图像，应及时删除，在实际操作时可以从HTML/CSS中找到引用的图像，将其移动到输出目录下
引用图像的格式较为固定，在该步中可以使用正则表达式寻找引用部分
引用位置有两种，一种是HTML的< img src=>标签，
另一种是CSS的 url() 标签，url内的图像路径可能使用单引号、双引号或者不用引号，因此括号也该被匹配到
设计的正则表达式为:
在这里插入图片描述
其中${ctxStatic}是jQuery定义变量的方法，真正移动图像时要将其还原成真实值

另外对于项目中使用的插件，如果擅自删除其中的图像很有可能导致报错，因此对这部分创建了保护机制，在递归读取项目内文件时，如读到受保护的插件文件夹，则将整个文件夹复制到输出目录。如此即可实现对前端项目的压缩

完整代码如下：使用时创建文件夹resources，将项目放入该文件夹下，创建protected_dict.txt，里面放不想动的插件目录，运行代码，输出目录为./dist，

from os import listdir,path,mkdir
from re import split,findall
from shutil import copy
from tqdm import tqdm
import sys

class CompressProject(object):
    def __init__(self):
        self.input_path='./resources'
        self.out_path='./dist'
        self.target_file=['html','css','js']
        self.img_file=['svg','png','jpg','jpeg','gif','ico']
        if path.exists('protected_dict.txt'):
	        with open('protected_dict.txt','r') as f:
	            self.protected_path=f.read().split('\n')
	    else:
	    	self.protected_path=[]

        #self.protected_path=['./resources/static/js/','./resources/static/emage/emagemodel/js','./resources/static/common/iconfont']
        if path.exists(self.out_path):
            pass
        else:
            mkdir(self.out_path)

    def get_all_file(self,dir_path):
        """
        获取dir_path下的所有扩展名在target_file和img_file内的完整路径，并将protected_path下的文件全部转移到out_path内
        :param dir_path:项目路径
        :return:all_img_path,all_file_path:protected_path外的所有以img_file和target_file为扩展名的文件完整路径
        """
        all_file_path=[]
        all_img_path=[]
        for file in listdir(dir_path):
            #如果是文件夹递归调用get_all_file
            if path.isdir(dir_path+'/'+file):
                new_path=self.out_path+'/'+'/'.join(dir_path.split('/')[2:])+'/'+file
                if path.exists(new_path):
                    pass
                else:
                    mkdir(new_path)
                temp_img_path,temp_file_path=self.get_all_file(dir_path+'/'+file)
                all_file_path=all_file_path+temp_file_path
                all_img_path=all_img_path+temp_img_path
            #如果是文件判断其扩展名
            elif path.isfile(path.join(dir_path,file)):
                file_path = dir_path + '/' + file
                new_path = self.out_path + '/' + '/'.join(file_path.split('/')[2:])
                protected_flag = False

                for ppath in self.protected_path:
                    if file_path.startswith(ppath):
                        protected_flag = True
                        break
                    else:
                        pass
                # 如果在protected_path下
                if protected_flag:
                    copy(file_path, new_path)
                # 如果扩展名在target_file下
                elif file.split('.')[-1] in self.target_file:
                    all_file_path.append(file_path)
                    copy(file_path, new_path)
                # 如果扩展名在img_file下
                elif file.split('.')[-1] in self.img_file:
                    all_img_path.append(file_path)

        return all_img_path,all_file_path

    def del_annotation_and_nt(self,path):
        '''
        删除path文件的三类注释/*注释内容*/、//注释内容、<!-注释内容-->>和制表符，并寻找其中引用的img，形如“path/test.svg”或'path/test.jpg'或(path/test.png)
        :param path:
        :return:
        '''
        used_img = []#存放path使用的img
        copy_img=[]#存放复制成功的img
        write_path = self.out_path + '/'+'/'.join(path.split('/')[2:])
        #以两种格式读取文件
        try:
            with open(path, 'r', encoding='utf-8') as f:
                data=f.read()
        except:
            print(path,'read as gb2312')
            with open(path, 'r', encoding='gb2312') as f:
                data=f.read()
        re_format=[r"<!-[\s\S]*?-->",r"/\*{1,2}[\s\S]*?\*/",r"//[ \S]*?[^>\n](?=\n)",r"\t"]
        for ref in re_format:
            data=split(ref,data)
            data=''.join(data)

        #以两种格式写入去除注释后的文件
        try:
            with open(write_path, 'w',encoding='utf-8') as f:
                f.write(data)
        except:
            print(path,'write as gb2312')
            with open(write_path, 'w',encoding='gb2312') as f:
                  f.write(data)

        #寻找引用的图像路径
        img_ref=r'[\'|\"|\(](?:\${ctxStatic}|[a-z]*)[\\|/][\S]+\.(?:png|jpg|svg|jpeg|ico)[\'|\"|\)]'
        img_path_list=findall(img_ref,data)
        for img_path in img_path_list:
            if img_path[1]=='$' or img_path[1:].startswith('static'):
                temp_list=split(r'[\\/]',img_path[1:-1])[1:]
                complete_path=self.input_path+'/static/'+'/'.join(temp_list)
            else:
                complete_path = '/'.join(path.split('/')[:-1]) + '/' + img_path[1:-1]
            out_img_path=self.out_path+'/'+'/'.join(complete_path.split('/')[2:])

            if img_path.endswith('"') or img_path.endswith("'") :
                #print('引号', complete_path)
                used_img.append(complete_path)
            elif img_path.endswith(')'):
                #print('括号', complete_path)
                used_img.append(complete_path)
            else:
                #print('none',complete_path)
                pass
            #将引用的图像复制到out_path
            try:
                copy(complete_path,out_img_path)
                copy_img.append(complete_path)
            except:
                #为找到该引用图像，
                #print('no such img!',img_path,complete_path)
                pass
        return copy_img,used_img

    def process(self):
        print('get all file from path...')
        all_img,filenames=self.get_all_file(self.input_path)
        print('get all file from path successfully!')
        print('all img/all target: {}/{}'.format(len(all_img), len(filenames)))

        not_found_img=[]
        copy_img=[]
        used_img = []
        bar=tqdm(total=len(filenames))
        for idx,temp_path in enumerate(filenames):
            #print(path)
            temp_copy_img,temp_used_img=self.del_annotation_and_nt(temp_path)
            used_img=used_img+temp_used_img
            copy_img=copy_img+temp_copy_img
            bar.update(1)
        bar.close()
        print('copy img/used img/unprotected img: {}/{}/{}'.format(len(copy_img),len(used_img),len(all_img)))

if __name__=='__main__':
    com=CompressProject()
    com.process()