Sicily1133: SPAM (电子邮箱匹配)

5 篇文章 0 订阅

Description

You never had any friends, and don’t really want any anyways, and so you have decided to collect email addresses from web pages for direct e-mail advertising.

The text delivered to a web browser is usually marked up HTML, which may contain email addresses of the form: user@server

¨ Both user and server are of the form alpha.numeric.with.dots. By alpha.numeric.with.dots, we mean a sequence of one or more characters which are alphabetic (A-Z,a-z), numeric (0-9), hyphens (-), underbars (_) and/or periods (.), with the following restrictions on periods:
1. The sequence neither starts nor ends with a period.
2. No periods are adjacent.

¨ Email addresses are preceded by the beginning of the file, or some character other than a letter (A-Z,a-z), digit (0-9), hyphen (-), or underbar (_).
¨ Email addresses are succeeded by the end of the file, or some character other than a letter (A-Z,a-z), digit (0-9), hyphen (-), or underbar (_).
¨
If the scanned text contains a sequence of the form : first@second@third
Then the output should contain first@second and second@third as email addresses. In a longer run, each pair split by an @-sign should appear as an email address in the output.

The point of this problem is to extract and record the email addresses embedded in other text.

Input
The input file will contain zero or more lines of ASCII text.

Output
Other than the standard leader and trailer, the output file has each email address found in the input file in the order it was found (duplicates not removed).

Sample Input
Copy sample input to clipboard
bob@banks.com wrote:
What does x=7 mean for this problem? For
example,

..a@a@aa@aaa@aaa..a@a@aa@aaa@aaa..a@a..@a…a@..@..

this scrolling @-example from jim@jones.com

Sample Output
bob@banks.com
a@a
a@aa
aa@aaa
aaa@aaa
a@a
a@aa
aa@aaa
aaa@aaa
a@a
jim@jones.com

题目概况:

现给你一个输入文件,文件中包含了不少电子邮箱地址,同时夹杂了不少干扰因素。希望你编写一个程序,将其中所有的电子邮箱地址找出来并一一输出。

题目长篇大论,就是为了说明电子邮箱的格式:user@server,下面是具体要求:
1.user与server的组成元素的种类如下:alphabetic (A-Z,a-z), numeric (0-9), hyphens (-), underbars (_) and/or periods (.);
2对于periods (.),有两点要求:I.不能出现在电子邮箱的首部和末尾;II. 不能出现两个相邻的periods(.)。
3.user与server不能为空;
4.电子邮箱的界限:当遇到非alphabetic (A-Z,a-z), numeric (0-9), hyphens (-), underbars (_) and/or periods (.)的字符时即为电子邮箱地址的界限。

除了上面这些,题目还讲述了一类特殊的情况: first@second@third
对于这个字符串应解析成两个电子邮箱: first@second, second@third. 更多@的情况以此类推。

算法思路:
1.使用while(getline(istream& , string)) 来读取文件的每一行,将每一行存储在每个string中;
2.然后遍历整个string,找到@,记录下对应的index,设置一个start变量与一个end变量从@的两侧开始延伸,截取符合条件的字符串即为电子邮箱,然后输出。

下面是我的代码:

# include <iostream>
# include <string>

using namespace std;

bool isValid(char ch); // Judge whether the char is valid
void solve(string str); // Deal with the string and print the email address
string split_email(int index, string str); // split the string to get the email address

int main(void) {
    string str;
    while(getline(cin, str)) { // Must use getline() because the string include space
        solve(str);
    }
    return 0;
}
bool isValid(char ch) {
    return ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || 
           (ch >= '0' && ch <= '9') || (ch == '-') || (ch == '.') ||
           (ch == '_'));
}

void solve(string str) {
    for (int i = 0; i < str.size(); i++) {
        if (str[i] == '@') {
            string result = split_email(i, str);
            if (result != "") {
                cout << result << endl;
            }
        }
    }
}

string split_email(int index, string str) {
    int start, end;
    start = end = index;
    for (int i = index-1; i >= 0; i--) {
        if (isValid(str[i])) {
            if (str[i+1] == '.') {
                if (str[i] == '.') {
                    start++; // cannot occur two adjacent period(.)
                    break;
                } else {
                    start--;
                }
            } else {
                start--;
            }
        } else {
            break;
        }
    }
    for (int i = index+1; i < str.size(); i++) {
        if (isValid(str[i])) {
            if (str[i-1] == '.') {
                if (str[i] == '.') {
                    end--;
                    break;
                } else {
                    end++;
                }
            } else {
                end++;
            }
        } else {
            break;
        }
    }
    if (start == index || end == index) { // user and server cannot be blank
        return "";
    }
    string result = str.substr(start, end-start+1); // get the email
    if (result[0] == '.') { // the first char of email cannot be period(.)
        result = result.substr(1);
    }
    if (result[result.size()-1] == '.') { // the last char of email cannot be period(.)
        result = result.substr(0, result.size()-1);
    }
    return result;
}

以上内容皆为本人观点,欢迎大家提出批评和指导,我们一起探讨!


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值