uva 213 Message Decoding 字符串处理

Some message encoding schemes require that an encoded message be sent in two parts.  The first part, 
called  the  header,  contains  the  characters  of  the  message.   The  second  part  contains  a  pattern  that 
represents the message.  You must write a program that can decode messages under such a scheme. 


    The heart of the encoding scheme for your program is a sequence of “key” strings of 0’s and 1’s as 
follows: 


             0, 00, 01, 10, 000, 001, 010, 011, 100, 101, 110, 0000, 0001, . . . , 1011, 1110, 00000, . . . 


    The first key in the sequence is of length 1, the next 3 are of length 2, the next 7 of length 3, the 
next 15 of length 4, etc.    If two adjacent keys have the same length, the second can be obtained from 
the first by adding 1 (base 2).  Notice that there are no keys in the sequence that consist only of 1’s. 


    The keys are mapped to the characters in the header in order.  That is, the first key (0) is mapped 
to the first character in the header, the second key (00) to the second character in the header, the  kth 
key is mapped to the kth character in the header.  For example, suppose the header is: 


AB#TANCnrtXc 


Then 0 is mapped to A, 00 to B, 01 to #, 10 to T, 000 to A, ..., 110 to X, and 0000 to c. 


    The encoded message contains only 0’s and 1’s and possibly carriage returns, which are to be ignored. 
The  message  is  divided  into  segments.    The  first  3  digits  of  a  segment  give  the  binary  representation 
of the length of the keys in the segment.  For example, if the first 3 digits are 010, then the remainder 
of  the  segment  consists  of  keys  of  length  2  (00,  01,  or  10). The  end  of  the  segment  is  a  string  of  1’s 
which is the same length as the length of the keys in the segment.  So a segment of keys of length 2 is 
terminated  by  11.   The  entire  encoded  message  is  terminated  by  000  (which  would  signify  a  segment 
in  which  the  keys  have  length  0).  The  message  is  decoded  by  translating  the  keys  in  the  segments 
one-at-a-time into the header characters to which they have been mapped. 


Input 


The input file contains several data sets.       Each data set consists of a header, which is on a single line 
by  itself,  and  a  message,  which  may  extend  over  several  lines.   The  length  of  the  header  is  limited 
only  by  the  fact  that  key  strings  have  a  maximum  length  of  7  (111  in  binary). If  there  are  multiple 
copies of a character in a header, then several keys will map to that character.            The encoded message 
contains only  0’s and  1’s, and it is a legitimate encoding according to the described scheme.             That is, 
the message segments begin with the 3-digit length sequence and end with the appropriate sequence of 
1’s.  The keys in any given segment are all of the same length, and they all correspond to characters in 
the header.  The message is terminated by  000. 


    Carriage returns may appear anywhere within the message part.  They are  not to be considered as 
part of the message. 


Output 


For each data set, your program must write its decoded message on a separate line.  There should not 
be blank lines between messages. 


Sample input 


TNM   AEIOU 
0010101100011 
1010001001110110011 
11000 
$#**\ 
0100000101101100011100101000 


Sample output 


TAN   ME 

##*\$ 


是一道字符串处理的问题,按照给定的规则对一个字符串进行解码,这道题目的关键是这个字符串可能会被多个换行符分隔开,因此需要写一个过滤换行符的函数,解决了这个问题,就可以按照题目要求,先读三位,计算长度,再读相应长度的位数,计算数值,根据事先计算好的数组code进行映射。

#include <iostream>
#include <stdio.h>
#include <map>
#include <string.h>
#define MAX 1010
using namespace std;
int code[11][2<<11];
char header[MAX];



int readchar()
{
    int c;
    while(1)
    {
        c=getchar();
        if(c!='\n'&&c!='\r')
            break;
    }
    return c;
}

int readmsg(int len)
{
    int res=0;
    while(len--)
        res=(res<<1)+readchar()-'0';
    return res;
}

int readcode()
{
    memset(code,0,sizeof(code));
    int c;
    code[1][0]=readchar();
    if(code[1][0]==EOF)
        return 0;
    for(int i=2; i<8; i++)
    {
        for(int j=0; j<(1<<i)-1; j++)
        {
            c=getchar();
            if(c==EOF)
                return 0;
            if(c=='\n'||c=='\r')
                return 1;
            code[i][j]=c;
        }
    }
    return 1;
}





int main()
{
    int len,dcode;
    while(readcode())
    {
        while(1)
        {
            len=readmsg(3);
            if(len==0)
                break;
            while(1)
            {
                dcode=readmsg(len);
                if(dcode==((1<<len)-1))
                    break;
                putchar(code[len][dcode]);
            }
        }
        printf("\n");

    }
    return 0;
}


在Python中,字符串是文本的表示方式,而文本编码格式则是将字符串转换为计算机能够存储和处理的二进制形式的规则。在Python 3中,默认的字符串是Unicode格式,这意味着字符串由一系列的Unicode字符组成。Unicode旨在为每个字符提供一个唯一的编号,称为码点(code point)。Unicode字符串通常以`'u'`前缀表示,但在Python 3中,所有的字符串字面量默认都是Unicode,因此不再需要`'u'`前缀。 当需要将Unicode字符串写入文件或通过网络传输时,需要将其编码为某种特定的字节序列,这个过程称为编码(encoding)。常见的编码格式有ASCII、UTF-8、UTF-16和UTF-32等。每种编码方式都有其特点,比如ASCII只涵盖了基本的拉丁字母,而UTF-8是一种可变长度的编码格式,能够编码所有Unicode字符,并且对英文字符的编码和ASCII保持兼容。 编码和解码的过程如下: 1. 编码(Encoding):将Unicode字符串转换为特定编码的字节序列。 2. 解码(Decoding):将特定编码的字节序列转换回Unicode字符串。 例如,将Unicode字符串编码为UTF-8格式: ```python text = "你好,世界!" encoded_text = text.encode('utf-8') print(encoded_text) # 输出编码后的字节序列 ``` 将UTF-8编码的字节序列解码回Unicode字符串: ```python decoded_text = encoded_text.decode('utf-8') print(decoded_text) # 输出解码后的Unicode字符串 ``` 在处理文本时正确使用编码和解码非常重要,以避免数据损坏或者乱码问题。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值