哈夫曼树与哈夫曼编码和解码实现

最新推荐文章于 2022-10-21 20:03:19 发布

守护荆棘

最新推荐文章于 2022-10-21 20:03:19 发布

阅读量1.6k

点赞数 4

分类专栏：数据结构学习代码文章标签：哈夫曼树数据结构

本文链接：https://blog.csdn.net/SGL_LGS/article/details/108204948

版权

数据结构学习代码专栏收录该内容

22 篇文章 0 订阅

订阅专栏

今天在复习哈夫曼树的时候，发现了一个不错的B站视频，讲解的非常清晰直观。地址传送门。有讲到了哈夫曼树的原理及哈夫曼编码解码的过程。原理讲解直接移步上述链接，就不再画蛇添足。作为巩固随手练习一道题：

给定一篇文章，统计里面的各个字符出现的频次，并构建哈夫曼树，实现哈夫曼编码和解码的过程。并且计算哈夫曼编码和定长编码的空间节省了多少？
输入的文本：

each year, the american heart association (aha), in conjunction with the centers for disease control and prevention, the national institutes of health, and other government agencies, brings together the most up-to-date statistics on heart disease, stroke, other vascular diseases, and their risk factors and presents them in its heart disease and stroke statistical update. the statistical update is a valuable resource for researchers, clinicians, healthcare policy makers, media professionals, the lay public, and many others who seek the best national data available on disease morbidity and mortality and the risks, quality of care, medical procedures and operations, and costs associated with the management of these diseases in a single document. indeed, since 1999, the statistical update has been cited more than 8700 times in the literature (including citations of all annual versions). in 2009 alone, the various statistical updates were cited 1600 times (data from isi web of science). in recent years, the statistical update has undergone some major changes with the addition of new chapters and major updates across multiple areas. for this year s edition, the statistics committee, which produces the document for the aha, updated all of the current chapters with the most recent nationally representative data and inclusion of relevant articles from the literature over the past year and added a new chapter detailing how family history and genetics play a role in cardiovascular disease (cvd) risk. also, the 2011 statistical update is a major source for monitoring both cardiovascular health and disease in the population, with a focus on progress toward achievement of the aha s 2020 impact goals. below are a few highlights from this year s update. mortality data are presented according to the underlying cause of death. any-mention mortality means that the condition was nominally selected as the underlying cause or was otherwise mentioned on the death certificate. for many deaths classified as attributable to cvd, selection of the single most likely underlying cause can be difficult when several major comorbidities are present, as is often the case in the elderly population. it is useful, therefore, to know the extent of mortality due to a given cause regardless of whether it is the underlying cause or a contributing cause (ie, its any-mention status). the number of deaths in 2007 with any mention of specific causes of death was tabulated by the nhlbi from the nchs public-use electronic files on mortality. the first set of statistics for each disease in this update includes the number of deaths for which the disease is the underlying cause. two exceptions are chapter 7 (high blood pressure) and chapter 9 (heart failure). high bp, or hypertension, increases the mortality risks of cvd and other diseases, and hf should be selected as an underlying cause only when the true underlying cause is not known. in this update, hypertension and hf death rates are presented in 2 ways: (1) as nominally classified as the underlying cause and (2) as any-mention mortality. national and state mortality data presented according to the underlying cause of death were computed from the mortality tables of the nchs world wide web site, the health data interactive data system of the nchs, or the cdc compressed mortality file. any-mention numbers of deaths were tabulated from the electronic mortality files of the nchs world wide web site and from health data interactive.

step1:统计文章中每个字符出现的次数，并保存在map中
step2:将map中的每一项转化为一个单节点树，构成深林
step3:构建霍夫曼树
step4:获得每个字符的编码，保存在map中
step5:编码文章的字符，并将编码的结果输出
step6:依据编码的文档，加上霍夫曼树的信息，解码文档输出
在这里插入图片描述

#include<iostream>
#include<fstream>
#include<unordered_map>
#include<algorithm>

using namespace std;

struct TreeNode {
	int count;
	char charcate;
	TreeNode* left;
	TreeNode* right;
	TreeNode(int n, char c) { 
		count = n; charcate = c; 
		left = nullptr; right = nullptr;
	}
};

bool compare(TreeNode* s1, TreeNode* s2) {
	return s1->count < s2->count;
}

class Solution {
private:
	unordered_map<char, string> codingMap;
public:
	//step1:统计文章中每个字符出现的次数，并保存在map中
	unordered_map<char, int> countC(string& s) {
		ifstream ifs;
		ifs.open(s);
		_ASSERT(ifs.is_open());
		ifs >> noskipws;
		char temp;
		unordered_map<char, int> data;
		while (!ifs.eof()) {
			ifs >> temp;
			if (data.find(temp) == data.end())
				data[temp] = 1;
			else
				data[temp]++;
		}
		ifs.close();
		return data;
	}

	//step2:将map中的每一项转化为一个单节点树，构成深林
	vector<TreeNode*> mapToNode(unordered_map<char, int>& data) {
		vector<TreeNode*> out;
		for (auto ptr = data.begin(); ptr != data.end(); ptr++) {
			TreeNode* temp = new TreeNode(ptr->second, ptr->first);
			out.push_back(temp);
		}
		return out;
	}

	//step3:构建霍夫曼树
	TreeNode* creatTree(vector<TreeNode*>& Node) {
		while (Node.size() > 1) {
			sort(Node.begin(), Node.end(), compare);
			TreeNode* left = Node[0];
			TreeNode* right = Node[1];
			Node.erase(Node.begin());
			Node.erase(Node.begin());
			TreeNode* new_Node = new TreeNode(left->count + right->count, '#');
			new_Node->left = left;
			new_Node->right = right;
			Node.push_back(new_Node);
		}
		return Node[0];
	}

	//step4:获得每个字符的编码，保存在map中
	void codeString(TreeNode* root, string code) {
		if (root == nullptr) return;
		if (root->left == nullptr&&root->right == nullptr) {
			codingMap[root->charcate] = code;
			return;
		}

		codeString(root->left, code + "0");
		codeString(root->right, code + "1");
	}

	//step5:编码文章的字符，并将编码的结果输出
	void coding(string& input, string& output) {
		ifstream ifs;
		ifs.open(input);
		_ASSERT(ifs.is_open());
		ifs >> noskipws;
		char temp;

		ofstream ofs;
		ofs.open(output);
		_ASSERT(ofs.is_open());
		while (!ifs.eof()) {
			ifs >> temp;
			string codeCharacte = codingMap[temp];
			for (auto single : codeCharacte)
				ofs << single;
		}
		ofs.close();
	}

	//step6:依据编码的文档，加上霍夫曼树的信息，解码文档输出
	void encoding(string& input_path, string& output_path, TreeNode* root) {
		ifstream ifs;
		ifs.open(input_path);
		_ASSERT(ifs.is_open());
		ifs >> noskipws;
		char temp;

		ofstream ofs;
		ofs.open(output_path);
		_ASSERT(ofs.is_open());
		TreeNode* ptr = root;
		while (!ifs.eof()) {
			ifs >> temp;
			if (temp == '0' && ptr->left != nullptr)
				ptr = ptr->left;
			else if (temp == '1' && ptr->right != nullptr)
				ptr = ptr->right;
			else {
				ofs << ptr->charcate;
				if (temp == '0') ptr = root->left;
				if (temp == '1') ptr = root->right;
			}
		}
	}
};

int main() {
	Solution sol;
	string input_path = "C:\\Users\\426-4\\Desktop\\artic_input.txt";
	//step1:统计文章中每个字符出现的次数，并保存在map中
	unordered_map<char, int> map = sol.countC(input_path);
	//step2:将map中的每一项转化为一个单节点树，构成深林
	vector<TreeNode*> node = sol.mapToNode(map);
	//step3:构建霍夫曼树
	TreeNode* root = sol.creatTree(node);
	//step4:获得每个字符的编码，保存在map中
	sol.codeString(root, "");
	//step5:编码文章的字符，并将编码的结果输出
	string coding_path = "C:\\Users\\426-4\\Desktop\\artic_coding.txt";
	sol.coding(input_path, coding_path);
	//step6:依据编码的文档，加上霍夫曼树的信息，解码文档输出
	string encoding_path = "C:\\Users\\426-4\\Desktop\\artic_encoding.txt";
	sol.encoding(coding_path, encoding_path, root);
	return 0;
}