找出段落中每个字母构造哈夫曼树

最新推荐文章于 2022-11-27 11:45:33 发布

ZoeLandia

最新推荐文章于 2022-11-27 11:45:33 发布

阅读量709

点赞数 1

分类专栏： C 文章标签： hashmap 数据结构 c语言

本文链接：https://blog.csdn.net/triggerV/article/details/118106224

版权

C 专栏收录该内容

35 篇文章 0 订阅

订阅专栏

过程

算出字母的频率

计算出每个英文字母出现的概率，无论大小写，忽略其他字符。并以概率作为权重。

利用Java的Hash Map数据类型计算出次数。

hash map是基于哈希表的 Map 接口的实现。hash map有两个元素一个是key（键名），一个是value（键值）。实现的原理：将这段文字中出现的字母，作为键值（key），出现的次数作为键值（value），hash map中的键名是不能重复的，那么统计这些字母数量，就变成了统计这些相同键名得数量。我的实现方式是从第一个字母开始，把字母存到哈希表去，第一个就是a ：1，然后按顺序存第二个字母f ：1，如果出现跟前面有的字母重复的话，前面的字母键值就 + 1，例如aba，遍历到第3个字母的时候，a的键值就变成2，就是a ：2了，如果遇到空格数字，标点符号，及其他特殊字符就排除掉，不插入，如果标点符号也统计则不需要判断此条件，全部都插入即可。以此类推其他字母也是一样的。

import java.util.HashMap;

public class Hashmap{
    public static void main(String[] args){
        HashMap <String, Integer> map = new HashMap <String, Integer> ();
        String str = "Effificient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectiffied Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplififies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation";
        for(int i = 0; i < str.length(); i++){
            char ch = str.charAt(i);
            String key = String.valueOf(ch);
            if(map.containsKey(key)){
                //Integer value = map.get(key).intValue();
                Integer value = map.get(key);
                map.put(key, value + 1);
            }else{
                // map.put(key, 1);可以统计所有字符，包括中文
                //利用ascii码去除字符串中的数字，空格，标点符号，特殊字符。限定只统计标点符号
                if(ch >= 'A' && ch <= 'Z' || ch >= 'a' && ch <= 'z'){
                    map.put(key, 1);
                }
            }
        }
        System.out.println(map);
    }
}

结果

{B=1, C=2, E=1, L=3, M=1, N=4, R=2, T=2, W=3, a=52, b=3, c=21, d=16, e=54, f=27, g=10, h=15, i=55, k=4, l=36, m=22, n=41, o=41, p=10, r=31, s=53, t=48, u=14, v=3, w=8, y=13}
人力统计算出各字母出现的概率

字母次数概率
V 3 0.006
B 4 0.007
K 4 0.007
G 10 0.018
P 10 0.018
W 11 0.020
Y 13 0.024
U 14 0.026
H 15 0.028
D 16 0.029
C 23 0.042
M 23 0.042
F 27 0.050
R 33 0.061
L 39 0.072
O 41 0.076
N 45 0.083
T 50 0.092
A 52 0.096
E 55 0.101
I 55 0.101

字母	次数	概率
V	3	0.006
B	4	0.007
K	4	0.007
G	10	0.018
P	10	0.018
W	11	0.020
Y	13	0.024
U	14	0.026
H	15	0.028
D	16	0.029
C	23	0.042
M	23	0.042
F	27	0.050
R	33	0.061
L	39	0.072
O	41	0.076
N	45	0.083
T	50	0.092
A	52	0.096
E	55	0.101
I	55	0.101

构造哈夫曼树

根据哈夫曼树构造原理构造出哈夫曼树：规定小的结点在左边，大的在右边。左边的编码为‘0’，右边的编码为‘1’。

代码以及结果

代码

#include <stdio.h>
#include <stdlib.h>
#define MAXBIT  50
#define MAXLEAF 50
#define MAXNODE 2 * MAXLEAF - 1
#define INFINITY 65535

//编码结构体
typedef struct HCodeType{
	int start;
	int bit[MAXBIT];
}HCodeType;

//结点结构体
typedef struct HNode{
	int parent, lchild, rchild;
	double weight;
	char data;
}HNode;

//构造哈夫曼树 
void HuffmanTree(HNode HN[MAXNODE], int n){
	int i = 0, j;
	int x1, x2;
	double s1, s2;
	char ch;
	
	printf("请输入各种字母\n");
	//初始化哈夫曼树的叶子结点 
	while(i < n){
		HN[i].parent = -1;
		HN[i].lchild = -1;
		HN[i].rchild = -1;
		HN[i].weight = 0;
		scanf("%c", &ch);
		scanf("%c", &HN[i].data);
		i++;
	}
	
	printf("请依次输入字母的权重\n");
	i = 0;
	while(i < n){
		scanf("%lf", &HN[i].weight);
		i++;
	} 
	
	//初始化哈夫曼树中其他结点 
	for(i = n; i < 2 * n - 1; i++){
		HN[i].parent = -1;
		HN[i].lchild = -1;
		HN[i].rchild = -1;
		HN[i].weight = 0;
		HN[i].data = '0';
	}
	
	//选出最小权重的两个结点 
	for(i = 0; i < n - 1; i++){
		s1 = s2 = INFINITY;
		x1 = x2 = 0;
		
		for(j = 0; j < n + i; j++){
			if(HN[j].weight < s1 && -1 == HN[j].parent){
				s2 = s1;
				x2 = x1;
				s1 = HN[i].weight;
				x1 = j;
			}else if(HN[j].weight < s2 && -1 == HN[j].parent){
				s2 = HN[j].weight;
				x2 = j;
			}
		}
		HN[x1].parent = n + i;
		HN[x2].parent = n + i;
		HN[n + i].weight = HN[x1].weight + HN[x2].weight;
		HN[n + i].lchild = x1;
		HN[n + i].rchild = x2;
	}
}

int main(){
	HNode HN[MAXNODE];
	HCodeType HC[MAXLEAF], cd;
	int i, j, k, p;
	int n;
	printf("请输入字母的总数\n");
	scanf("%d", &n);
	HuffmanTree(HN, n);
	
	//自下而上获取编码，逆序存入
	for(i = 0 ; i < n; i++){
		cd.start = n - 1;
		k = i;
		p = HN[k].parent;
		while(p != -1){
			if(k == HN[p].lchild){
				cd.bit[cd.start] = 0;
			}else{
				cd.bit[cd.start] = 1;
			}
			cd.start--;
			k = p;
			p = HN[k].parent; 
		}
		
		for(j = cd.start + 1; j < n; j++){
			HC[i].bit[j] = cd.bit[j];
			HC[i].start = cd.start;
		}
	}
	
	//输出编码
	for(i = 0; i < n; i++){
		printf("%c  ", HN[i].data);
		for(j = HC[i].start + 1; j < n; j++){
			printf("%d", HC[i].bit[j]);
		}
		printf("\n");
	} 
	
	return 0;
}

运行结果

ZoeLandia

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
找出段落中每个字母构造哈夫曼树

过程算出字母的频率计算出每个英文字母出现的概率，无论大小写，忽略其他字符。并以概率作为权重。利用Java的Hash Map数据类型计算出次数。hash map是基于哈希表的 Map 接口的实现。hash map有两个元素一个是key（键名），一个是value（键值）。实现的原理：将这段文字中出现的字母，作为键值（key），出现的次数作为键值（value），hash map中的键名是不能重复的，那么统计这些字母数量，就变成了统计这些相同键名得数量。我的实现方式是从第一个字母开始，把字母存到哈希
复制链接

扫一扫