文件压缩

最新推荐文章于 2021-10-18 20:28:10 发布

hope2jiang

最新推荐文章于 2021-10-18 20:28:10 发布

阅读量1.9k

点赞数

分类专栏：竞赛题目 Algorithm 文章标签： character each list sorting 算法 idea

本文链接：https://blog.csdn.net/hope2jiang/article/details/593143

版权

竞赛题目同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

Algorithm

2 篇文章 0 订阅

订阅专栏

文件压缩

　　提高文件的压缩率一直是人们追求的目标。近几年有人提出了这样一种算法，它虽然只是单纯地对文件进行重排，本身并不压缩文件，但是经这种算法调整后的文件在大多数情况下都能获得比原来更大的压缩率。

　　该算法具体如下：对一个长度为n的字符串S，首先根据它构造n个字符串，其中第i个字符串由将S的前i-1个字符置于末尾得到。然后把这n个字符串按照首字符从小到大排序，如果两个字符串的首字符相等，则按照它们在S中的位置从小到大排序。排序后的字符串的尾字符可以组成一个新的字符串S'，它的长度也是n，并且包含了S中的每一个字符。最后输出S'以及S的首字符在S'中的位置p。举例：

　　S: example

　　1、构造n个字符串

　　example
　　xamplee
　　ampleex
　　mpleexa
　　pleexam
　　leexamp
　　eexampl

　　2、将字符串排序

　　ampleex
　　example
　　eexampl
　　leexamp
　　mpleexa
　　pleexam
　　xamplee

　　3、输出
　　xelpame S'
　　7 　　　p

　　由于英语单词构造的特殊性，某些字母对出现的频率很高，因此在S'中相同的字母有很大几率排在一起，从而提高S'的压缩率。虽然这种算法利用了英语单词的特性，然而在实践的过程中，人们发现它几乎适用于所有的文件压缩。

任务1：zip1.pas(zip1.exe)
　　读入字符串S，输出S'和p。
　　输入文件zip1.in包含两行，第1行是一个整数n（1 <=n<=10000），代表S的长度，第2行是字符串S。
　　输出文件zip1.out包含两行，第1行是S'，第2行是整数p。

　　任务2：zip2.pas(zip2.exe)

读入S'和p，输出字符串S。
　　输入文件zip2.in包含三行，第1行是一个整数n（1<=n<=10000），代表S'的长度，第2行是字符串S'，第3行是整数p。
　　输出文件zip2.out仅包含一行S。
　　输入样例1：
　　7
　　example

　　输出样例1：
　　xelpame
　　7

　　输入样例2：
　　7
　　xelpame
　　7

　　输出样例2：
　　example

Solution:
1. S --> S'
Following is the main process to get S' from S.

Figure 1. S-S' Process

If you look at list <A> carefully, you will find that the combination of first character of each word in the list is exactly S, i.e., 'example'. And sorting output of S is exactly the combination of the first character of each word in list , i.e., 'example' -----sort------> 'aeelmpx'.

Now lood at the first and last character of each word in list carefully, do not forget to refer to Figure 2 below, do you find anything interesting? The secret is that for each word in list , the previous char of the first character of the word in the Figure 2 is the exactly the last character of the word, which is part of the result, S'.
Take the first word 'ampleex' in list for example, the first char of 'ampleex' is 'a', in Figure 2, the previous char of 'a' is 'x', so the last char of 'ampleex' is also 'x' and 'x' becomes the first char of S'. Same idea, for the second word 'example', 'e''s previous char is also 'e', so the second char of S' is 'e'.

|--------->---------------->-----------|
　　　　　　　　　　|--[e]->[x]->[a]->[m]->[p]->[l]->[e]<--|

Figure 2. Circle

After getting the secret, we can simplify the process as Figure 3. Then, how to get the position of the first character in S in the result S'? Actually, P equals the position of second character in S in <C>. Refer Figure 3.

(1)sort each char　 (2)get previous
 in S of each char
example(S) =====================> aeelmpx ================> xelpame (S')
 <C>

Figure 3. Simple S-S' Process

2. S' --> S

How to get S by S' and P? See Figure 4 below. We know that P is 7. So it points to the last 'e' in S'. Then we can get the S like this. first write the last 'e' in <D>, then see the what is the conrespondent of 'e' in <E>, it 'x', so we write 'x' and catenate it to 'e'. Now result is 'ex'. Then see what is the conrespondent of 'x' in <E>, it's 'a', catenate it to the result and get 'exa', same idea, continue the process. See Figure 5. which demos the process. The combination of the characters in square brackets is S.

---- ----
　　　　　　　　　| x | | a |
　　　　　　　　　| e | | e |
　　　　　　　　　| l | sort | e |
　　　　　　　　　| p | ====> | l |
　　　　　　　　　| a | | m |
　start point | m | | p |
P -----------> | e | | x |
 ---- ----
 <D> <E>

 Figure 4. S' - S process

[e] ---- x -|
 |-----<----|
 V
　　　　　　　　　　 [x] ---- a -|
 |-----<----|
 V
　　　　　　　　　　 [a] ---- m -|
 |-----<----|
 V
　　　　　　　　　　 [m] ---- p -|
 |-----<----|
 V
　　　　　　　　　　[p] ---- l -|
 |-----<----|
 V
　　　　　　　　　　 [l] ---- e -|
 |-----<----|
 V
　　　　　　　　　　 [e] ---- e
　　　　　　　　
　　　　　　 Figure 4. S' - S process
3. Code