从codeforces697B、727B、8A三个字符串类型题中讨论scanf正则表达式的使用

最新推荐文章于 2021-04-13 19:11:11 发布

ccutyear

最新推荐文章于 2021-04-13 19:11:11 发布

阅读量766

点赞数 2

分类专栏：我自己对正则表达式在ACM竞赛中使用的理解 codeforces 文章标签：正则表达式模拟题解 codeforces

本文链接：https://blog.csdn.net/ccutyear/article/details/52900124

版权

codeforces 同时被 2 个专栏收录

14 篇文章 0 订阅

订阅专栏

我自己对正则表达式在ACM竞赛中使用的理解

2 篇文章 0 订阅

订阅专栏

正则表达式的使用本身在ACM竞赛中并不是一种使用率非常高的技巧。因为凡是使用正则表达式能做到的，只要我们愿意，我们大可以多敲几行代码来实现。但是，使用了正则表达式后就能是我们的代码更加精简，而且效果非常显著。（代码更精、运行内存更小、时间更短这不就是我们做事的目标和意义吗？）在这里我就从我在这一周中碰到的三个问题来讨论正则表达式的活用。

　　　首先，是一个我找到的讲解正则表达式很好的一个博客：http://blog.csdn.net/huangxy10/article/details/8117870

首先是codeforces697B题：

Barney is standing in a bar and starring at a pretty girl. He wants to shoot her with his heart arrow but he needs to know the distance between him and the girl to make his shot accurate.

$\text{[math]}$

Barney asked the bar tender Carl about this distance value, but Carl was so busy talking to the customers so he wrote the distance value (it's a real number) on a napkin. The problem is that he wrote it in scientific notation. The scientific notation of some real number x is the notation of form AeB, where A is a real number and B is an integer and x = A × 10B is true. In our case A is between 0 and 9 and B is non-negative.

Barney doesn't know anything about scientific notation (as well as anything scientific at all). So he asked you to tell him the distance value in usual decimal representation with minimal number of digits after the decimal point (and no decimal point if it is an integer). See the output format for better understanding.

Input

The first and only line of input contains a single string of form a.deb where a, d and b are integers and e is usual character 'e' (0 ≤ a ≤ 9, 0 ≤ d < 10100, 0 ≤ b ≤ 100) — the scientific notation of the desired distance value.

a and b contain no leading zeros and d contains no trailing zeros (but may be equal to 0). Also, b can not be non-zero if a is zero.

Output

Print the only real number x (the desired distance value) in the only line in its decimal notation.

Thus if x is an integer, print it's integer value without decimal part and decimal point and without leading zeroes.

Otherwise print x in a form of p.q such that p is an integer that have no leading zeroes (but may be equal to zero), and q is an integer that have no trailing zeroes (and may not be equal to zero).

        Examples 
      
          input 
        
8.549e2

          output 
        
854.9

          input 
        
8.549e3

          output 
        
8549

          input 
        
0.33e0

          output 
        
0.33

题意很简单，就是以指数形式给了一个数字，现在要求我们用数字的形式输出。给的数字格式为“a.deb”，（0<=a<=9，0<=d<10^100，0<=b<=100）

这个题如果只是单纯的想从ＡＣ的角度来看并不难，但是，如何使代码更加精简却是一种技术。

　　在下面这个代码中的输出部分还用到了一个平时不常用的一个技巧：

对于printf("%*.*s",dlen,len,s)中的输出结果是输出字符串s中的len位，而且输出后所占的宽度为dlen。

小数点.后“*”表示输出位数，具体的数据来自参数表
printf格式字符串中，与宽度控制和精度控制有关的常量都可以换成变量，方法就是使用一个“*”代替那个常量，然后在后面提供变量给“*”。

同样，小数点.前也可以添加*，也要用户输入一个位宽值来代替，表示输出的字符所占位宽。

如果是printf("%*.*d",dlen,len,5)中就是输出一个宽度为dlen，长度为len的一串数字0，最后一位数是5。

先是给大家精简的代码：

#include <cstdio>
int main() {
    int a, dlen, b;
    char d[101];
    while(scanf("%d.", &a)!=EOF){
//题中就已经说明了a 的范围只有0~10所以用这种输入方式就能很好的就收a。 
	    scanf("%[^e]%ne%d", d, &dlen, &b);
/*这里就用到了正则表达式，在d部分是一个长度为100的小数部分，在输入完
  字符串d之后，接着就是输入一个字符'e'所以，用'%[^e]'就能顺利的就收字
  符串d这样就省去了后续的处理。%n接收的并不是输入而是在此之前输入的长
  度，在这里就是接收字符串d的长度并且赋值给dlen。然后加上'e'来跳过在
  接收完字符串d之后的字符'e',从而使用%d来接收'e'之后的正数赋值给b。*/
//在我个人看来光只是这一个输入部分的活用就足够使人佩服了。 
	    if (dlen == 1 && d[0] == '0' && b == 0)
	        printf("%d\n", a);
/*因为在输入过程中就已经把所有的处理都完成了。所以，在输出的部分就相当
方便只需要把这个特例进行输出就行了。(d的小数部分和指数部分都是'0'。）*/
	    else
	        if (b >= dlen)
	            printf("%d%s%.*d\n", a, d, b - dlen, 0);
//当b的数值比字符串d的长度更大时就是不断的往后补0,而且补0 的数量就是b-dlen。 
	        else
	            printf("%d%.*s.%s\n", a, b, d, d + b);
/*%.*s对应的就是b,d表示输出的字符串d的前b位。之后，在输出一个'.'后，%s,对应
d+b将字符串d从b为开始往后输出，直至结束。（不过这个题中貌似不需要消除后导0）*/ 
	}
    return 0;
}

在这个代码中丝毫没用用到什么复杂的思路和高深的算法，用的只是所有都知道。但是，却平时很少使用的输入输出方式。

接下来就是前几周的一个codeforces题：

Vasily exited from a store and now he wants to recheck the total price of all purchases in his bill. The bill is a string in which the names of the purchases and their prices are printed in a row without any spaces. Check has the format "name1price1name2price2...namenpricen", where namei (name of the i-th purchase) is a non-empty string of length not more than10, consisting of lowercase English letters, and pricei (the price of the i-th purchase) is a non-empty string, consisting of digits and dots (decimal points). It is possible that purchases with equal names have different prices.

The price of each purchase is written in the following format. If the price is an integer number of dollars then cents are not written.

Otherwise, after the number of dollars a dot (decimal point) is written followed by cents in a two-digit format (if number of cents is between 1 and 9 inclusively, there is a leading zero).

Also, every three digits (from less significant to the most) in dollars are separated by dot (decimal point). No extra leading zeroes are allowed. The price always starts with a digit and ends with a digit.

For example:

"234", "1.544", "149.431.10", "0.99" and "123.05" are valid prices,
".333", "3.33.11", "12.00", ".33", "0.1234" and "1.2" are not valid.

Write a program that will find the total price of all purchases in the given bill.

Input

The only line of the input contains a non-empty string s with length not greater than 1000 — the content of the bill.

It is guaranteed that the bill meets the format described above. It is guaranteed that each price in the bill is not less than one cent and not greater than 106 dollars.

Output

Print the total price exactly in the same format as prices given in the input.

        Examples 
      
          input 
        
chipsy48.32televizor12.390

          output 
        
12.438.32

          input 
        
a1b2c3.38

          output 
        
6.38

          input 
        
aa0.01t0.03

          output 
        
0.04

题意：再输入中就是先输入商品的名称（名称中只有字母），然后直接输入商品价格。每一对输入中都没有任何分隔符。在输入的价格中每三位就会有一个小数点。如果在小数点之后只有两位数字，则这两个数字是小数部分。如果有三位数字，则这三个数字还是整数部分。最后输出所有商品的总价，小数部分正常输出，正数部分每三位需要一个点’.‘。

这只是简单的模拟，首先总价格和商品的名称没有任何关系，只需要找出其中的数字部分判断其中是否是有小数部分并且加到sum中。最后，按要求输出即可。（我的思路是打算在找数字的过程中就直接存储在一个数组中并且相加，在存储过程中只需注意进位即可，最后，在输出过程中没输出以为在输出一个点’.‘就行了。）

我的代码太麻烦太长太差就不贴了，只上别人的精简代码了：（对于这个题我也看了很多代码，这个代码是最精简的了。）

#include <bits/stdc++.h>
using namespace std;
#define ll long long
ll a,b,c;
char s[1005];
char ans[2000];
int main(){
	//freopen("数据输入.IN","r",stdin);
	while(scanf("%*[^0-9]%[.0-9]",s)!=EOF){
		//printf("%s\n",s); //这句备注在下文会提到。 
/*在这个输入语句中就是充分使用了正则表达的典型，%*[^0-9]
表示了先跳过了对非数字字符的接收，（也就是跳过了对商品名
称的接收。）用%[.0-9]表示只对数字字符和'.'进行接收。
我们暂时可以先当做它只输入一个商品名称和对应价格，之后在说
明为什么可以这么做*/
		int len=strlen(s); 
//实际上只要在输入中插入一个%n就可以把这步省略。 
		if(len>=4&&s[len-3]=='.'){
			b+=(s[len-2]-'0')*10+(s[len-1]-'0');
			len-=3;
		}
/*用这个来判断最后两位是否是小数，如果是小时就计算加到b中。
并且使len减3，因为在计算整数部分时小数部分不能计入其中。*/ 
		c=0;
		for(int i=0;i<len;i++)if(s[i]!='.'){
			c=c*10+(s[i]-'0');
		}           //在这里就是计算整数部分。 
		a+=c;       //a就是最后总和的整数部分 
		s[0]='\0';  //把字符串s进行初始化准备进行下一次输入。 
	}
	a=a+b/100;      //b/100之后就是整数部分。 
	vector<ll>v;    
	while(a){
		v.push_back(a%1000);
		a/=1000;
	}        //开一个数组v用来把整数部分分成每部分三位数存储。 
	if(v.size()==0)printf("0"); 
//当v数组为空时就是总和为0的时候，只需输出一个0即可。 
	for(int i=v.size()-1;i>=0;i--){
		if(i==v.size()-1)printf("%I64d",v[i]);
		else printf(".%03I64d",v[i]);
	}
//按题目要求输出整数部分每三位有一个点'.'。 
	if(b%100)printf(".%02I64d",b%100);
//最后输出小数部分，小数部分宽度为2当不足时用0补上。 
	return 0;
}

把我的备注加上后貌似感觉不出简短啊。- _- iii

在这个代码中虽然不如前697B的代码那么精简，但是，它在输入中对正则表达式的使用就非常的神奇。因为，在codeforces中测试数据每次都是单组输入的，（这也就是为什么有时候在codeforces上写的代码不需要循环输入的原因。）它不像很多高校的OJ那样输入多组测试用例。而至于EOF实际上就是保证每次读到文件末尾，所以，假如我们备注去掉在输入文本中的输入语句如下图：

图（1）输入文本中的内容

最后的输出结果是如下图：

图（2）输出结果

在这里我们发现在代码中是整个字符串一起进行处理。而且，在最后的时候还是成功得出了答案“478.666.49”。因为，“！=EOF”的作用是读到文本末尾。也就是说在第一个while循环中就已经处理了所有的字符串，在文档中的内容接收并处理结束后就跳出循环开始进行输出。

不过还是要注意：这种接收数据的方式，只适用于那些单组测试数据。如果是像HUD，POJ那些输入的是T组测试用例或者说是多组测试用例就并不适用。所以，很多时候有关scanf中使用正则表达式还是带有一定的局限性。我在做codeforces 的8A题时更是感觉到了其中的局限性。（当然，其实我相信用正则是可以做到的。只是我的水平太有限了。所以，无法在下面这题中也使用正则表达式。）

Peter likes to travel by train. He likes it so much that on the train he falls asleep.

Once in summer Peter was going by train from city A to city B, and as usual, was sleeping. Then he woke up, started to look through the window and noticed that every railway station has a flag of a particular colour.

The boy started to memorize the order of the flags' colours that he had seen. But soon he fell asleep again. Unfortunately, he didn't sleep long, he woke up and went on memorizing the colours. Then he fell asleep again, and that time he slept till the end of the journey.

At the station he told his parents about what he was doing, and wrote two sequences of the colours that he had seen before and after his sleep, respectively.

Peter's parents know that their son likes to fantasize. They give you the list of the flags' colours at the stations that the train passes sequentially on the way from A to B, and ask you to find out if Peter could see those sequences on the way from A to B, or from B to A. Remember, please, that Peter had two periods of wakefulness.

Peter's parents put lowercase Latin letters for colours. The same letter stands for the same colour, different letters — for different colours.

Input

The input data contains three lines. The first line contains a non-empty string, whose length does not exceed 105, the string consists of lowercase Latin letters — the flags' colours at the stations on the way from A to B. On the way from B to A the train passes the same stations, but in reverse order.

The second line contains the sequence, written by Peter during the first period of wakefulness. The third line contains the sequence, written during the second period of wakefulness. Both sequences are non-empty, consist of lowercase Latin letters, and the length of each does not exceed 100 letters. Each of the sequences is written in chronological order.

Output

Output one of the four words without inverted commas:

«forward» — if Peter could see such sequences only on the way from A to B;
«backward» — if Peter could see such sequences on the way from B to A;
«both» — if Peter could see such sequences both on the way from A to B, and on the way from B to A;
«fantasy» — if Peter could not see such sequences.

        Examples 
      

          input 
        
atob
a
b

          output 
        
forward

          input 
        
aaacaaa
aca
aa

          output 
        
both

Note

It is assumed that the train moves all the time, so one flag cannot be seen twice. There are no flags at stations A and B.

题意：输入三个字符串：s,a,b。根据字符串a，b在字符串s中找出的顺序来决定输出。

对于这个题，我不打算贴出正确的代码。我只是想通过这个题来说明，在scanf中使用正则表达式并不是万能的。

我原本打算通过sscanf输入来处理已接收的字符串s，a或b。这样只需判断最后被赋值的字符串是否是一个空串就能解决问题了。于是我就尝试了再%[str]

中能否把某个变量的字符串str放到"[]"中进行使用。于是我就实验了一下，我在试验中发现对于sscanf(s,"%*[^a]%s",ch1);中的a它无法以字符串的形式出现在其中这样也就无法像前面两个题那样直接跳过与字符串a相匹配的部分了。

接下来再总结一下：如果想把正则表达式应用在输入中来精简代码就必须是题中已经给出了限定条件，即：题中已经说明了输入格式是固定的前提下。否则不适用。

以上都只能算是我个人对正则的理解，虽然如此，但我还是觉得codeforces8A这个题用正则输入的能解的，而且非常的精简。如果有幸有哪位大神在看完了我的扯淡后能够给出一个正则做法。那我希望您能够私信将代码发给我，我也将会非常高兴。谢谢。（我要的是codeforces8A的正则做法，不是codeforces8A 的答案。）