文件压缩

压缩原理

构建利用哈夫曼树,生成哈夫曼编码。出现次数多的字符在上端,数显次数少的在下端。这样编码时,尽管出现次数少的需要编码长,但是出现次数的的距离根节点较近,编码短,且不再用原始字符占用空间而用0和1代替,大大节省了大量空间。解压缩时用原来的哈夫曼树就可以无损解压缩了。

压缩及解压缩步骤:

  1. 开始准备:
  定义结构体CharInfo存储字符,出现次数以及哈弗曼编码。 
  创建一个结构体数组HashInfos,存储CharInfo。
  创建结构体HuffmanTree,根据HashInfos数组的CharInfo的count值建一个大根堆。 
  2. 压缩:
  从文件读取字符,存储到HashInfos数组,记录count值,并生成哈夫曼编码。
  创建CacheInfo结构体,存储有效的字符和次数,并写入文件,方便解压缩重建哈弗曼树。
  重新从头开始,读取文件,将字符对应哈夫曼编码写入文件。
  3. 解压缩:
  从压缩文件读取CachInfo到HashInfsos数组,重建哈夫曼树。
 
 压缩注意:只能对字符进行压缩,因为是存储字符,根据出现次数构建的哈弗曼树。
 

代码部分

Huffman.h
#pragma once
#include"FileCompress.h"
#include<vector>
#include<queue>

template<class W>
struct HuffmanTreeNode
{
    HuffmanTreeNode<W>*_left;
    HuffmanTreeNode<W>*_right;
    HuffmanTreeNode<W>*_parent;
    W _weight;

    HuffmanTreeNode(const W&weight)
        :_weight(weight)
        , _left(NULL)
        , _right(NULL)
        , _parent(NULL)
    {}
};


template<class W>
class HuffmanTree
{
    typedef HuffmanTreeNode<W> HTNode;
public:
    HuffmanTree()
        :_root(NULL)
    {}

    struct NodeCompare
    {
        bool operator()(const HTNode* l, const HTNode*r)
        {
            return l->_weight > r->_weight;
        }
    };

    HuffmanTree(W*weight, size_t n, const W& flag)
    {
        //用NodeCompare比较,将元素顺序从小到大
        priority_queue<HTNode*, vector<HTNode*>, NodeCompare> maxheap;

        for (size_t i=0;i<n;++i)
        {
            if(weight[i]!=flag)
                maxheap.push(new HTNode(weight[i]));
        }

        //建堆的时候,从队列取出两个元素根据他们的count的和创建parent,然后再把parent扔回队列,继续下次
        while (maxheap.size() > 1)
        {
            HTNode*left = maxheap.top();
            maxheap.pop();
            HTNode*right = maxheap.top();
            maxheap.pop();
            HTNode*parent = new HTNode(left->_weight + right->_weight);
            parent->_left = left;
            parent->_right = right;
            left->_parent = parent;
            right->_parent = parent;
            maxheap.push(parent);
        }
        _root = maxheap.top();
    }


    HTNode* GetRoot()
    {
        return _root;
    }

    ~HuffmanTree()
    {
        //先将左右孩子delete了,再去delete根节点
        Destory(_root);
        _root = NULL;
    }

    void Destory(HTNode*root)
    {
        if (root == NULL)
        {
            return;
        }
        Destory(root->_left);
        Destory(root->_right);
        delete root;
    }

protected:
    HTNode* _root;

private:
    HuffmanTree(const HuffmanTree<W>& t);
    HuffmanTree<W>& operator=(const HuffmanTree<W>&t);
};
FileCompress.h
#pragma once

#include<fstream>
#include"Huffman.h"
#include<string>
#include<assert.h> 
using namespace std;


struct CharInfo
{
    char ch;//字符
    long long count;//出现次数
    string code;//字符编码

    CharInfo operator+(const CharInfo& info)
    {
        CharInfo tmp;
        tmp.count = count + info.count;
        return tmp;
    }
    bool operator>(const CharInfo&info)const
    {
        return count > info.count;
    }

    bool operator!=(const CharInfo&info)
    {
        return count != info.count;
    }
};

class FileCompress
{
    typedef HuffmanTreeNode<CharInfo> HTNode;
public:
    //只存放字符和count用于压缩时写入(code不需要)
    struct CacheInfo
    {
        char ch;
        long long count;
    };

    //哈希数组每个位置都初始化
    FileCompress()
    {
        for (size_t i = 0; i < 256; ++i)
        {
            HashInfos[i].ch = i;
            HashInfos[i].count = 0;
        }
    }


    void Compress(const char*file)
    {
        //1.统计文件出现次数,ifstream输入方式打开,是从源文件取,将内容写到其他地方的方式
        ifstream ifs(file, ios_base::in | ios_base::binary);//自动关闭,打开失败抛异常
        char ch;
        while (ifs.get(ch))//ifs >> ch
        {
            ++HashInfos[ch].count;
        }

        //2.生成Huffman树

        //flag标识是否有效
        CharInfo flag;
        flag.count = 0;

        //根据HashInfos数组的值生成HafumanTree
        HuffmanTree<CharInfo> tree(HashInfos, 256, flag);//大根堆的树

        //3.每个树的节点转换出haffman编码0或1
        ConvertHuffmanCode(tree.GetRoot());

        //4.压缩
        string compressfile = file;

        //创建.huffman文件,用ofs指向
        compressfile += ".huffman";
        ofstream ofs(compressfile.c_str(), ios_base::out | ios_base::binary);

        //5.向压缩文件写入字符和出现次数,方便解压缩从该文件重建huffman树
        for (size_t i = 0; i < 256; ++i)
        {
            if (HashInfos[i].count > 0)
            {
                CharInfo info;
                info.ch = HashInfos[i].ch;
                info.count = HashInfos[i].count;
                ofs.write((char*)&info, sizeof(CacheInfo));
            }
        }
        //在文件结尾加入文件写入完毕的标识
        CacheInfo end;
        end.count = 0;
        ofs.write((char*)&end, sizeof(CacheInfo));

        //开始压缩,根据_hashInfo内字符的huffman编码写入value,并满一字节就写入到文件
        ifs.clear();
        ifs.seekg(0);
        char value = 0;
        int pos = 0;
        while(ifs.get(ch))
        {
            string code = HashInfos[(unsigned char)ch].code;
            for (size_t i = 0; i < code.size(); ++i)
            {
                if (code[i] == '0')
                    value &= ~(1 << pos);
                else if (code[i] == '1')
                    value |= (1 << pos);
                else
                    assert(false);
                ++pos;
                //满足8位一字节就写入到文件里

                if (pos == 8)
                {
                    ofs.put(value);
                    printf("%x\n", value);
                    pos = 0;
                    value = 0;
                }
            }
        }
        //表示如果最后的编码没构成一个字节需单独处理
        if (pos > 0)
        {
            ofs.put(value);
            printf("%x\n", value);
        }

    }


    //5.解压缩
    void UnCompress(const char*file)
    {
        //打开压缩文件解压缩,并去掉后缀并改为.unhuffman
        ifstream ifs(file, ios_base::in | ios_base::binary);
        string uncompressfile = file;
        //rfind,找不到就返回npos,size_t的最大值
        size_t pos=uncompressfile.rfind('.');
        assert(pos != string::npos);
        uncompressfile.erase(pos);

#ifdef _DEBUG
        uncompressfile += ".unhuffman";
#endif

        ofstream ofs(uncompressfile.c_str(),ios_base::out|ios_base::binary);

        //从文件读到info,对应到HashInfos数组
        while(1)
        {
            CacheInfo info;
            ifs.read((char*)&info, sizeof(CacheInfo));
            if (info.count > 0)
                HashInfos[(unsigned char)info.ch].count = info.count;
            else
                break;
        }

        CharInfo flag;
        flag.count = 0;
        //根据HashInfos数组创建哈夫曼树tree对象,tree内生成哈夫曼二叉树
        HuffmanTree<CharInfo> tree(HashInfos, 256, flag);
        //解压缩
        HTNode*root = tree.GetRoot();
        HTNode*cur = root;
        //根节点的count是所有字符出现次数的和
        long long AllCount = root->_w.count;
        char ch;
        //读一个字符,判断字符每个位是0还是1,并根据0,1向左右走
        //每走一次判断节点是否是叶子结点,如果是则把对应的字符写入解压缩的文件
        //不是则走下一次,判断是向左还是右直到叶子结点
        while (ifs.get(ch))
        {
            for (size_t i = 0; i < 8; ++i)
            {
                if ((ch & (1 << i))!=0)
                    cur = cur->_right;
                else
                    cur = cur->_left;

                if (cur->_left == NULL
                    && cur->_right == NULL)
                {
                    ofs.put(cur->_w.ch);
                    cur = root;
                    if (--AllCount == 0)
                        return;
                }
            }
        }
    }


    //生成huffman编码
    //方法一
    void ConvertHuffmanCode(HTNode* root)
    {
        if (root == NULL)
            return;
        //找到叶子,倒着向上给数据编码
        if (root->_left == NULL && root->_right == NULL)
        {
            string &code = HashInfos[(unsigned char)root->_w.ch].code;
            HTNode*cur = root;
            HTNode* parent = cur->_parent;
            while (parent != NULL)
            {
                if (cur == parent->_left)
                    code += '0';
                else
                    code += '1';
                cur = parent;
                parent = parent->_parent;
            }
            reverse(code.begin(), code.end());
        }
        ConvertHuffmanCode(root->_left);
        ConvertHuffmanCode(root->_right);
    }
    //方法二
    /*void ConvertHuffmanCode(HTNode*root)
    {
    if (root == NULL)
    return;

    if (root->_left == NULL && root->_right == NULL)
    {
    HashInfos[(unsigned char)root->_w.ch].code = root->_w.code;
    return;
    }

    if (root->_left != NULL)
    {
    root->_left->_w.code = root->_w.code + '0';
    ConvertHuffmanCode(root->_left);
    }

    if (root->_right != NULL)
    {
    root->_right->_w.code = root->_w.code + '1';
    ConvertHuffmanCode(root->_right);
    }
    }*/

private:
    CharInfo HashInfos[256];
};

//先压缩文件
void _FileCompress()
{
    FileCompress FC;
    FC.Compress("I have a dream.txt");
}

//再解压文件
void _FileUnCompress()
{
    FileCompress FC;
    FC.UnCompress("I have a dream.txt.huffman");
}
FileCompress.cpp
// FileCompress.cpp: 定义控制台应用程序的入口点。


#include"FileCompress.h"
#include<Windows.h>
#include<iostream>
using namespace std;
int main()
{
    //获取压缩和解压缩共耗时(ms)
    DWORD start=GetTickCount();
    _FileCompress();
    _FileUnCompress();
    DWORD end = GetTickCount();
    cout << end - start << endl;
    return 0;
}

文章末尾附上马丁·路德金的文章《I have a dream》,可以做测试

Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of bad captivity.
But one hundred years later, the Negro still is not free. One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. One hundred years later, the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later, the Negro is still languished in the corners of American society and finds himself an exile in his own land. So we’ve come here today to dramatize a shameful condition.
I am not unmindful that some of you have come here out of great trials and tribulations. Some of you have come fresh from narrow jail cells. Some of you have come from areas where your quest for freedom left you battered by the storms of persecution and staggered by the winds of police brutality. You have been the veterans of creative suffering. Continue to work with the faith that unearned suffering is redemptive.
Go back to Mississippi, go back to Alabama, go back to South Carolina, go back to Georgia, go back to Louisiana, go back to the slums and ghettos of our northern cities, knowing that somehow this situation can and will be changed. Let us not wallow in the valley of despair.
I say to you today, my friends, so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.
I have a dream that one day this nation will rise up, live up to the true meaning of its creed: “We hold these truths to be self-evident; that all men are created equal.”
I have a dream that one day on the red hills of Georgia the sons of former slaves and the sons of former slave-owners will be able to sit down together at the table of brotherhood.
I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.
I have a dream that my four children will one day live in a nation where they will not be judged by the color if their skin but by the content of their character.
I have a dream today.
I have a dream that one day down in Alabama with its governor having his lips dripping with the words of interposition and nullification, one day right down in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.
I have a dream today.
I have a dream that one day every valley shall be exalted, every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight, and the glory of the Lord shall be revealed, and all flesh shall see it together.
This is our hope. This is the faith that I go back to the South with. With this faith we will be able to hew out of the mountain of despair a stone of hope. With this faith we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day.
This will be the day when all of God’s children will be able to sing with new meaning.
My country, ’ tis of thee,
Sweet land of liberty,
Of thee I sing:
Land where my fathers died,
Land of the pilgrims’ pride,
From every mountainside
Let freedom ring.
And if America is to be a great nation this must become true. So let freedom ring from the prodigious hilltops of New Hampshire.
Let freedom ring from the mighty mountains of New York!
Let freedom ring from the heightening Alleghenies of Pennsylvania!
Let freedom ring from the snowcapped Rockies of Colorado!
Let freedom ring from the curvaceous slops of California!
But not only that; let freedom ring from Stone Mountain of Georgia!
Let freedom ring from Lookout Mountain of Tennessee!
Let freedom ring from every hill and molehill of Mississippi!
From every mountainside, let freedom ring!
When we let freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God’s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual, “Free at last! free at last! thank God almighty, we are free at last!”

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值