与PHP对抗招聘者垃圾邮件-概念证明

Ever since I moved off of Google services (due to quality, not privacy concerns), I’d been looking for the perfect email service. Having tried several, and having been with FastMail for a while now, I came to the realization that there’s no such thing. The biggest concern I have with modern email providers, is the fact that they are all quite bad at spam control.

自从我离开Goog​​le服务(出于质量而非出于隐私考虑)以来,我一直在寻找完美的电子邮件服务。 经过几次尝试,并且已经在FastMail中工作了一段时间,我意识到没有这样的事情 。 我对现代电子邮件提供商的最大担忧是,他们对垃圾邮件的控制都非常糟糕。

I don’t mean the “Nigerian prince” type of spam, which is mostly blocked successfully (unless you’re using FastMail – they can’t even recognize those) but stuff that I’m really, really not interested in getting. Case in point, recruiter spam.

我并不是说“尼日利亚王子”类型的垃圾邮件,该垃圾邮件通常被成功阻止(除非您使用的是FastMail,否则他们甚至无法识别这些垃圾邮件),但是我真的非常感兴趣。 招聘人员垃圾邮件就是一个很好的例子。

Illustration of blocked email

In this tutorial, we’ll get started with building a custom email processor which can read individual emails, run them through some predefined rules, and act on them. The end result will be very similar to what many providers offer out of the box, but it’ll lay the groundwork for more advanced aspects in future posts. Example uses of our app:

在本教程中,我们将开始构建一个自定义电子邮件处理器,该处理器可以读取单个电子邮件,通过一些预定义规则运行它们并对其执行操作。 最终结果将与许多供应商提供的即用型产品非常相似,但它将为将来的帖子中的更高级方面打下基础。 我们的应用程序的示例用法:

  • when recruiter-type keywords are detected, reply to the email with a template response and delete it. This is possible to some extent with rules that most email providers offer, but those aren’t very detailed, and usually don’t support variables.

    当检测到招聘者类型的关键字时,回复带有模板响应的电子邮件并将其删除。 大多数电子邮件提供商提供的规则在某种程度上是可行的,但是这些规则并不十分详尽,通常不支持变量。
  • when companies keep sending you emails even after you unsubscribe or report them for spam (e.g. Ello), the engine should remember these and in the future purge them automatically. Some providers (e.g. FastMail), won’t stop a sender from getting into your inbox even after hundreds of spam reports.

    当公司在您退订或报告垃圾邮件后仍继续向您发送电子邮件(例如Ello)时,引擎应记住这些电子邮件,并在将来自动清除它们。 即使在数百次垃圾邮件举报之后,某些提供商(例如FastMail)也不会阻止发件人进入您的收件箱。

This way, we can keep the provider we’re used to, but also do some manual improvements their team just didn’t know how to do.

这样,我们可以保留我们惯用的提供者,但也可以进行一些团队不知道如何做的手动改进。

In this post, we’ll focus on the first use case.

在本文中,我们将重点介绍第一个用例。

自举 (Bootstrapping)

Feel free to use your own environment if you have one set up – we’ll be using our Homestead Improved box as usual to get a pre-made environment going in a matter of minutes. To follow along, execute the following:

如果已设置,请随意使用您自己的环境-我们将像往常一样使用“ 改进Homestead”包装盒在几分钟内获得一个预制的环境。 要继续,请执行以下操作:

git clone https://github.com/swader/homestead_improved hi_mail
cd hi_mail
./bin/folderfix.sh
vagrant up; vagrant ssh
mkdir -p Project/public

You should already have homestead.app in your etc/hosts file if you’ve used Homestead Improved before. If not, add it as per instructions. The default site included with the box points to ~/Code/Project, which is good enough for us.

如果您以前使用过Homestead Improvement ,那么您的etc/hosts文件中应该已经有homestead.app 。 如果没有,请按照说明进行添加。 框内包含的默认站点指向~/Code/Project ,对我们来说已经足够了。

Once inside the box, we’ll create an index.php file in ~/Code/Project/public with some demo code:

进入框后,我们将在~/Code/Project/public带有一些演示代码的index.php文件:

<?php
phpinfo();

This screen immediately tells us what we need to know: is php-imap installed, or not?

此屏幕立即告诉我们我们需要了解的内容:是否已安装php-imap?

screenshot of php-imap shown on the phpinfo screen

Sure enough, it comes pre-installed with the Homestead Improved box. If you’re missing php-imap, please follow the instructions to get it installed – we’ll need it before moving on (on Ubuntu, sudo apt-get install php7.0-imap should do the trick).

果然,它已预装了Homestead Improvement盒子。 如果您缺少php-imap,请按照说明进行安装-在继续之前,需要它(在Ubuntu上, sudo apt-get install php7.0-imap应该可以解决问题)。

As a final bootstrapping step, let’s install the package we’ll use to interact with our IMAP inbox: tedivm/fetch.

作为最后一个引导步骤,让我们安装与IMAP收件箱进行交互的软件包: tedivm / fetch

composer require tedivm/fetch

Of course, we need to modify our index.php file to include Composer’s autoloader as well:

当然,我们需要修改我们的index.php文件以包括Composer的自动加载器:

<?php

require_once `../vendor/autoload.php`;

读取IMAP收件箱 (Reading IMAP Inboxes)

I have both a Gmail account, and a Fastmail account. The examples below will be using both, so that we can show the differences between the two inboxes, and apply tweaks as necessary to make our project provider-agnostic.

我既有一个Gmail帐户,又有一个Fastmail帐户。 下面的示例将同时使用这两个示例,以便我们可以显示两个收件箱之间的差异,并根据需要应用调整以使我们的项目提供者不可知。

In PHP, built-in imap functions work like native file functions – you create a handle, and then pass it around into other functions. The API is old (really, really old!), so it only exists in this procedural form. This is why, whenever we can, we’ll use the Fetch library we installed previously.

在PHP中,内置的imap函数的作用类似于本机file函数–创建一个句柄,然后将其传递给其他函数。 该API很旧(真的,真的很旧!),因此它仅以这种程序形式存在。 这就是为什么我们将尽可能使用先前安装的Fetch库的原因。

Gmail –基本提取 (Gmail – Basic Fetching)

Let’s start with baby steps and log into our Gmail account. First, under account settings and in the Forwarding and POP/IMAP tab, make sure Enable IMAP is activated.

让我们从简单的步骤开始,然后登录我们的Gmail帐户。 首先,在帐户设置下以及“ 转发和POP / IMAP”标签中,确保启用了“启用IMAP”。

Enabled IMAP in Gmail
<?php

require_once '../vendor/autoload.php';

use Fetch\Server;

$server = new Server('imap.googlemail.com', 993);
$server->setAuthentication('account@gmail.com', 'password');


$messages = $server->getMessages();
/** @var $message \Fetch\Message */
foreach ($messages as $i => $message) {
    echo "Subject for {$i}: {$message->getSubject()}\n<br>";
}

Following the Fetch docs, we attempt to initiate a connection with our Gmail account. Unfortunately, if you have 2FA (2 factor authentication) activated, you’ll see the following error:

在提取文档之后,我们尝试启动与Gmail帐户的连接。 不幸的是,如果您激活了2FA( 2因子身份验证 ),则会看到以下错误:

Exception requesting an application specific password for Gmail

This is easily rectified. We can go to our Google account’s app passwords page and generate one (select “Other” from the menu, give it an arbitrary name, and copy the password into the code). Now if we test things…

这很容易纠正。 我们可以转到Google帐户的应用程序密码页面并生成一个页面 (从菜单中选择“其他”,为其指定一个任意名称,然后将密码复制到代码中)。 现在,如果我们测试一下……

Gmail email subjects displayed on screen

Excellent – we got our Gmail emails. Now let’s hook into Fastmail.

太好了–我们收到了Gmail电子邮件。 现在,让我们进入Fastmail。

FastMail –基本提取 (FastMail – Basic Fetching)

Similar to Gmail, Fastmail also supports app passwords, but it requires them regardless of you using 2FA or not. Create one here.

与Gmail类似,Fastmail也支持应用密码,但无论您是否使用2FA,它都需要输入密码。 在此处创建一个。

Generating an app password for Fastmail

The values for FastMail are as follows:

FastMail的值如下:

$server = new Server('imap.fastmail.com', 993);
$server->setAuthentication('username@fastmail.com', 'password');

$messages = $server->getMessages();
/** @var $message \Fetch\Message */
foreach ($messages as $i => $message) {
    echo "Subject for {$i}: {$message->getSubject()}\n<br>";
}

Note that the messages get fetched in a non-nested fashion, so even though your inbox might show a number like 100, the real number of emails retrieved might be much more than that, in case some emails are replies, grouped for context, and more.

请注意,邮件是以非嵌套方式获取的,因此,即使您的收件箱中可能显示100之类的数字,但在某些电子邮件得到答复,根据上下文分组的情况下,检索到的电子邮件的实际数量可能远不止于此。更多。

Both Gmail and Fastmail default to the Inbox folder, which is exactly what we want.

Gmail和Fastmail都默认为“收件箱”文件夹,这正是我们想要的。

目标电子邮件 (Targeted Emails)

Depending on the number of emails in your inboxes, you may have noticed a huge performance issue – it took forever to fetch them! Obviously, this can’t work if we want to process incoming emails in a timely manner. After all, our goal is to process all emails that come in, and deal with them if possible.

根据收件箱中电子邮件的数量,您可能已经注意到一个巨大的性能问题–提取它们花了很多时间! 显然,如果我们想及时处理收到的电子邮件,则无法使用。 毕竟,我们的目标是处理所有传入的电子邮件,并在可能的情况下进行处理。

Unfortunately, since the email specification was developed back in the stone age of the internet, there’s no native way to get push notifications when a new email arrives. There is another way, though.

不幸的是,由于电子邮件规范是在互联网的石器时代开发的,因此,当新电子邮件到达时,没有本机的方法来获取推送通知。 不过,还有另一种方法。

IMAP supports searching, and this can include flag statuses. As per the docs, passing in “UNSEEN” should return all unread messages:

IMAP支持搜索,并且可以包括标志状态。 根据docs ,传入“ UNSEEN”应返回所有未读消息:

$messages = $server->search('UNSEEN');
A single unseen message from the Gmail inbox

Sure enough, our email stating that we’ve successfully created an app password for this very app we’re building is still unread, sitting in the inbox. Success, and the call was quite fast, too!

可以肯定的是,我们仍然在邮箱中声明我们已经为正在构建的这个应用程序成功创建了应用程序密码的电子邮件尚未阅读。 成功,通话也很快!

扫描电子邮件 (Scanning the Emails)

Now that we know how to retrieve unread messages, it’s time to analyze them and perform some actions on them if they trigger our rule checks.

现在,我们知道了如何检索未读消息,是时候分析它们并在它们触发我们的规则检查时对其执行一些操作了。

Let’s use the first example – getting rid of recruiter spam. Recruiter emails come in different shapes and sizes, so there is no absolute way to identify them all. Instead, we should rely on several pointers the sum of which, if given a numeric value, can exceed a given threshold. For example, if 100 is the threshold required to mark an email as recruiter spam, we can produce the following table:

让我们使用第一个示例–摆脱招聘人员的垃圾邮件。 招聘电子邮件的形状和大小各不相同,因此没有绝对的方式来识别所有电子邮件。 相反,我们应该依靠几个指针,如果给定一个数值,它们的总和可以超过给定的阈值。 例如,如果将电子邮件标记为招聘者垃圾邮件所需的阈值为100,我们可以生成下表:

RuleValuePts
containsfinding IT opportunities100
containsPHP specialists?80
containsstartups?10
containssaw your profile on GitHub50
containsexplore-group.com100
from@explorerec.com100
containsnew position20
containsurgent(ly)? need30
containshuge plus15
containsfull-stack developer30
containsinterviews?20
containsCV60
containsskills10
containscandidates?20
规则
包含 寻找IT机会 100
包含 PHP专家? 80
包含 创业公司? 10
包含 在GitHub上看到了您的个人资料 50
包含 Explore-group.com 100
@ explorerec.com 100
包含 新的位置 20
包含 紧急吗? 需要 30
包含 巨大的优势 15
包含 全栈开发人员 30
包含 面试? 20
包含 简历 60
包含 技能 10
包含 候选人? 20

I’d like to thank all my Twitter followers who sent in some recruiter spam email examples and helped build the above table.

我要感谢我的所有Twitter关注者,他们发送了一些招聘者的垃圾邮件示例,并帮助建立了上表。

The explore-group ones refer to a team of brutally persistent spammers, so their emails will automatically trigger a recruiter spam alert.

explore-group邮件指的是一个残酷持久的垃圾邮件发送者小组,因此他们的电子邮件将自动触发招聘者的垃圾邮件警报。

Values are regular expressions – this allows us to do partial matches, which is particularly useful in recognizing sender domains, or strings that may vary slightly, but are essentially the same, like “PHP specialist” and “PHP specialists”.

值是正则表达式–这使我们能够进行部分匹配,这在识别发件人域或可能稍有不同但本质上相同的字符串(例如“ PHP专家”和“ PHP专家”)中特别有用。

For the sake of performance, it makes sense to check the rules in the order of their point value, descending. If only one of them triggers a 100, then there’s no need to check the rest.

为了提高性能,按点值降序检查规则是有意义的。 如果只有其中一个触发100,则无需检查其余的。

Let’s see the code for this. Please forgive the code’s spaghetti nature – as this is just a proof of concept, it’ll be OOP-ed and packaged up in a followup article.

让我们来看一下代码。 请原谅代码的意粉性质–因为这只是概念的证明,将由OOP编写并打包在后续文章中。

<?php

require_once '../vendor/autoload.php';

use Fetch\Message;
use Fetch\Server;

$inboxes = [
    'primary@gmail.com' => [
        'username' => 'primary@gmail.com',
        'password' => 'password',
        'aliases' => ['onealias@gmail.com', 'anotheralias@gmail.com'],
        'smtp' => 'smtp.googlemail.com',
        'imap' => 'imap.googlemail.com'
    ],
    'primary@mydomain.com' => [
        'username' => 'someusername',
        'password' => 'password',
        'aliases' => ['alias@mydomain.com'],
        'smtp' => 'smtp.fastmail.com',
        'smtp_port' => '587',
        'imap' => 'imap.fastmail.com',
        'starttls' => true
    ]
];

$rules = [
    ['contains' => 'finding IT opportunities', 'points' => 100],
    ['contains' => 'PHP specialists?', 'points' => 80],
    ['contains' => 'startups?', 'points' => 10],
    ['contains' => 'saw your profile on GitHub', 'points' => 50],
    ['contains' => 'explore-group\.com', 'points' => 100],
    ['from' => '@explorerec\.com', 'points' => 100],
    ['contains' => 'new position', 'points' => 20],
    ['contains' => 'urgent(ly)? need', 'points' => 30],
    ['contains' => 'huge plus', 'points' => 15],
    ['contains' => 'full-stack developer', 'points' => 30],
    ['contains' => 'interviews?', 'points' => 20],
    ['contains' => 'CV', 'points' => 60],
    ['contains' => 'skills', 'points' => 10],
    ['contains' => 'candidates?', 'points' => 20],
];

$points = [];
foreach ($rules as $key => &$rule) {
    $points[$key] = $rule['points'];
    if (isset($rule['contains'])) {
        $rule['contains'] = '/' . $rule['contains'] . '/i';
    }
    if (isset($rule['from'])) {
        $rule['from'] = '/' . $rule['from'] . '/i';
    }
}
array_multisort($points, SORT_DESC, $rules);

$unreadMessages = [];
foreach ($inboxes as $id => $inbox) {
    $server = new Server($inbox['imap'], 993);
    $server->setAuthentication($inbox['username'], $inbox['password']);
    $unreadMessages[$id] = $server->search('UNSEEN');
}

foreach ($unreadMessages as $id => $messages) {
    echo "Now processing: ".$id. "<br>";
    /**
     * @var Message $message
     */
    foreach ($messages as $i => $message) {

        $spam = isRecruiterSpam($rules, $message) ? '' : 'not';
        echo "Subject for {$i}: {$message->getSubject()} is probably {$spam} recruiter spam.\n<br>";
    }
}

function isRecruiterSpam($rules, Message $message)
{
    $sum = 0;
    foreach ($rules as $rule) {
        if (isset($rule['contains'])) {
            if (preg_match($rule['contains'], $message->getSubject())
                || preg_match($rule['contains'], $message->getHtmlBody())
            ) {
                $sum += $rule['points'];
            }
        } else {
            if (isset($rule['from'])) {
                if (preg_match($rule['from'], $message->getOverview()->from)
                ) {
                    $sum += $rule['points'];
                }
            }
        }
        if ($sum > 99) {
            return true;
        }
    }

    return false;
}

First, we define our inboxes and all the necessary configuration values. Then, we sort the rules array by the value of the points key, and turn the strings into regexes by adding delimiters. Next, we extract all the unseen messages from all our accounts, and then iterate through them.

首先,我们定义收件箱和所有必要的配置值。 然后,我们根据points键的值对规则数组进行排序,并通过添加定界符将字符串转换为正则表达式。 接下来,我们从所有帐户中提取所有看不见的消息,然后进行遍历。

At this point, we call the isRecruiterSpam function on each, which in turn grabs the from field, and the subject and HTML body, and runs the checks on them. After every rule, we check if the $sum has exceeded 100 points, and if so, we return true – we’re fairly certain the message is recruiter spam at that point. Otherwise, we keep summing up, and finally return false if all the rules are checked and the result is still under 100.

此时,我们在每个函数上调用isRecruiterSpam函数,该函数依次获取from字段,主题和HTML主体,并对它们进行检查。 在每条规则之后,我们检查$sum是否已超过100点,如果是,则返回true –我们相当确定该消息是招聘者垃圾邮件。 否则,我们将继续汇总,如果所有规则都经过检查并且结果仍小于100,则最后返回false。

No messages have been detected as recruiter spam

In my initial test, no messages were flagged as recruiter spam. Let’s try forwarding a past one over from the other email account, and see what happens.

在我的初始测试中,没有邮件被标记为招聘者垃圾邮件。 让我们尝试从另一个电子邮件帐户转发过去的邮件,然后看看会发生什么。

Two messages have been flagged as recruiter spam!

Success! Our engine has successfully recognized recruiter spam! Now, let’s see about replying.

成功! 我们的引擎已成功识别招聘者垃圾邮件! 现在,让我们来看看回复。

发送回复 (Sending the replies)

To reply to a message, we’ll need to pull in another package. Let’s make it SwiftMailer, as it’s the de-facto battle-tested standard in sending emails from PHP.

要回复邮件,我们需要提取另一个软件包。 让我们将它设为SwiftMailer,因为它是从PHP发送电子邮件的实际经过测试的标准。

composer require swiftmailer/swiftmailer

We won’t be going through the very basics of SwiftMailer here, that’s documented elsewhere.

我们不会在这里介绍SwiftMailer的基础知识, 其他地方都有介绍。

Let’s think about what needs to be done now:

让我们考虑一下现在需要做什么:

  1. A message, once read and identified as recruiter spam, needs to be marked as read, otherwise it’ll keep getting picked up in subsequent searches.

    邮件一旦被阅读并被识别为招聘者垃圾邮件,则需要将其标记为已阅读,否则它将在以后的搜索中不断被拾取。
  2. When replying, the reply should come from the email address it was sent to.

    回复时,回复应来自发送到的电子邮件地址。
  3. Ideally, an auto-replied message should be placed into another folder on the server. This is useful for periodic checking and identification of false positives.

    理想情况下,应将自动回复的邮件放入服务器上的另一个文件夹。 这对于定期检查和识别误报很有用。

With the requirements defined, let’s see what the code might look like.

在定义了需求之后,让我们看看代码是什么样的。

foreach ($unreadMessages as $id => $messages) {
    echo "Now processing: " . $id . "<br>";
    if (!empty($messages)) {
        $mailer = Swift_Mailer::newInstance(
            Swift_SmtpTransport::newInstance(
                $inboxes[$id]['smtp'], $inboxes[$id]['smtp_port'],
                (isset($inboxes[$id]['starttls'])) ? 'tls' : null
            )
                ->setUsername($inboxes[$id]['username'])
                ->setPassword($inboxes[$id]['password'])
                ->setStreamOptions(
                    [
                        'ssl' => [
                            'allow_self_signed' => true,
                            'verify_peer' => false,
                        ],
                    ]
                )

        );
    } else {
        continue;
    }
    /**
     * @var Message $message
     */
    foreach ($messages as $i => $message) {

        if (isRecruiterSpam($rules, $message)) {

            $message->setFlag(Message::FLAG_SEEN);

            $potentialSender = $message->getAddresses('to')[0]['address'];
            $sender = (in_array($potentialSender, $inboxes[$id]['aliases']))
                ? $potentialSender : $inboxes[$id]['aliases'][0];

            $reply = Swift_Message::newInstance('Re: ' . $message->getSubject())
                ->setFrom($message->getAddresses('to')[0]['address'])
                ->setTo($message->getAddresses('from')['address'])
                ->setBody(
                    file_get_contents('../templates/recruiter.html'),
                    'text/html'
                );

            $result = $mailer->send($reply);
        }

    }
}

In a nutshell:

简而言之:

  • if any of the inbox keys under $unreadMessages has a non-empty array (meaning it had some recruiter spam), we initiate a Mailer – this is for performance reasons. If we have many inboxes, we don’t want to build a Mailer instance even for the inboxes that are clean.

    如果$unreadMessages下的任何收件箱密钥都有一个非空数组(表示它有一些招聘者垃圾邮件),我们将启动Mailer –这是出于性能原因。 如果我们有很多收件箱,即使是干净的收件箱,我们也不想构建Mailer实例。

  • we iterate through these detected spam messages with the Mailer prepared, and then build an email reply. For recipient we select the sender, and for sender we select either the first address in the list of the original email’s recipients, if it’s among the aliases defined for this inbox, or if not, the first alias from the list. This is because some recruiters are so lazy they’ll mass mail a thousand people, put themselves as the recipient, and put the “victims” in BCC.

    我们会在准备好Mailer的情况下遍历这些检测到的垃圾邮件,然后构建电子邮件回复。 对于收件人,我们选择发件人,对于发件人,我们选择原始电子邮件收件人列表中的第一个地址(如果它是为此收件箱定义的别名之一),否则,请从列表中选择第一个别名。 这是因为有些招聘人员很懒惰,他们会大量邮寄一千人,把自己当作接收者,并将“受害者”放在BCC中。
  • finally, we grab the contents of a prepared template email to inject as the email’s body, and send.

    最后,我们抓取准备好的模板电子邮件的内容,以作为电子邮件正文进行注入并发送。

At this point, we need to actually write the email template:

此时,我们需要实际编写电子邮件模板:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Recruiter reply</title>
</head>
<body>
<p>Hello!</p>
<p>Please do not be alarmed by the quick reply - it's automated.</p>
<p>Based on your email's contents, you sound like a recruiter. As such, before getting back in touch with me, please read <a href="https://www.linkedin.com/pulse/20140516082146-67624539-dear-recruiters?trk=prof-post">this</a>.</p>
<p>In case we misidentified your intentions and your email, we apologize - our bot is still young and going through some growing pains. If that's the case and you have a genuine non-recruiter concern you'd like to discuss, please feel free to reply directly to this email.</p>
<p>Kind regards,<br>Bruno</p>
</body>
</html>

Sure enough, our reply comes back as planned.

果然,我们的答复按计划返回。

An automatic reply to recruiter spam

白名单 (Whitelisting)

But what happens if the recruiter replies to our reply? You might think it’ll get an auto-reply again, since email chains contain quoted emails from before. Not so! It’s the email providers who connect emails into chains – the actual reply contains nothing but our own content, so there’s no reason to worry about someone getting stuck in a reply loop with our automatic engine.

但是,如果招聘者回复我们的回复会怎样? 您可能会认为它将再次获得自动答复,因为电子邮件链包含以前引用的电子邮件。 不是这样! 电子邮件提供商将电子邮件链接到链中–实际的回复只包含我们自己的内容,因此没有理由担心有人会被我们的自动引擎卡在回复循环中。

Still, we can take care of this edge case to be on the safe side, by putting some extra content at the bottom of our template, e.g.:

尽管如此,我们可以通过在模板的底部放置一些额外的内容来保护这种边缘情况,以便安全起见,例如:

...

<p>Kind regards,<br>Bruno</p>
<p style="text-align: right"><em>sent via our-little-app</em></p>
</body>
</html>

We can target this text in our rules like so:

我们可以按照以下规则将文本定位:

RuleValuePts
containssent via our-little-app-1000
规则
包含 通过我们的小应用发送 -1000

… but then we’d have to remove the early-detection mechanism which triggers a positive identification as soon as 100 points are reached, and it’d be clumsy to keep these whitelist rules in the same set as the blacklist ones.

…但是然后我们必须删除早期检测机制,一旦达到100分,该机制就会触发阳性识别,并且将这些白名单规则与黑名单规则保持在同一组将很笨拙。

It’s better if we just make a brand new detection mechanism for skipping blacklist scans entirely. It’ll be a separate loop that also checks body, subject, and headers, but the performance impact will be negligible because Fetch caches the message’s properties for repeat calls (a fetched message’s content cannot change – if it changes, it’s a new message – only flags can change).

最好是采用一种全新的检测机制来完全跳过黑名单扫描。 这将是一个单独的循环,还可以检查正文,主题和标题,但是对性能的影响可以忽略不计,因为Fetch缓存邮件属性以进行重复调用(获取的邮件内容无法更改–如果更改,则为新邮件–只有标志可以更改)。

We’ll make a new isWhitelisted function, and call it before we check the other rules.

我们将创建一个新的isWhitelisted函数,并在检查其他规则之前调用它。

if (!isWhitelisted($whitelistRules, $message)
            && isRecruiterSpam($rules, $message)
        ) {

// ...

function isWhitelisted($rules, Message $message) {
    foreach ($rules as $rule) {
        if (isset($rule['contains'])) {
            if (preg_match($rule['contains'], $message->getSubject())
                || preg_match($rule['contains'], $message->getHtmlBody())
            ) {
                return true;
            }
        } else {
            if (isset($rule['from'])) {
                if (preg_match($rule['from'], $message->getOverview()->from)
                ) {
                    return true;
                }
            }
        }
    }
    return false;
}

You’ll notice it’s almost identical to isRecruiterSpam, only there are no points.

您会注意到,它与isRecruiterSpam几乎相同,只是没有分数。

Naturally, we also need the $whitelistRules array, which at this point is fairly small:

自然地,我们还需要$whitelistRules数组,此数组现在很小:

$whitelistRules = [
    ['contains' => '/sent via our-little-app/i'],
];

With this, we not only made sure that emails that contain our auto-reply are ignored, but we can also easily let through emails from people/domains we know and trust. Coupled with the Contacts API many providers provide, the power we now wield over our inboxes is truly immense.

这样,我们不仅可以确保忽略包含自动回复的电子邮件,而且还可以轻松地让我们认识和信任的人员/域的电子邮件通过。 结合许多提供商提供的Contacts API,我们现在在收件箱上所拥有的力量确实是无穷的。

资料夹 (Folders)

The final requirement for our proof of concept was moving the messages we auto-replied to into a separate folder. The IMAP API has folder support, and Fetch has implemented it rather smoothly, so it’s only a matter of a couple lines of code.

对概念验证的最终要求是将自动回复的消息移到单独的文件夹中。 IMAP API具有文件夹支持,Fetch相当顺利地实现了它,因此仅需几行代码即可。

First, we’ll make a folder named “autoreplied” on each inbox if it doesn’t exist.

首先,如果每个收件箱都不存在,我们将在其上创建一个名为“ autoreplied”的文件夹。

$server = new Server($inbox['imap'], 993);
    $server->setAuthentication($inbox['username'], $inbox['password']);

    if (!$server->hasMailBox('autoreplied')) {
        $server->createMailBox('autoreplied');
    }

    $unreadMessages[$id] = $server->search('UNSEEN');

Then, after a reply has been sent, we’ll move (copy and delete) the message we’re replying to to this folder.

然后,在发送回复后,我们会将要回复的邮件移动 (复制和删除)到该文件夹​​。

$result = $mailer->send($reply);
if ($result) {
    $message->moveToMailBox('autoreplied');
}

Sure enough, it moves the message into the newly created folder:

果然,它将消息移动到新创建的文件夹中:

Message we autoreplied to has been marked as read and moved into a separate folder

结论 (Conclusion)

In this tutorial, we touched on reading our inbox for recent messages, running them through some rules, and performing some actions on them based on what the rules said. We now have a way to filter our inboxes programmatically!

在本教程中,我们着重阅读了收件箱中的最新消息,通过一些规则运行它们,并根据规则所说的对它们执行一些操作。 现在,我们有了一种以编程方式过滤收件箱的方法!

By now, you’ve probably noticed several problems with this implementation, some of which may be:

到目前为止,您可能已经注意到此实现的一些问题,其中一些可能是:

  • there’s no way to dynamically define rules or whitelists. You must change the code. A database would come in handy, and a log-in system with a CRUD interface of sorts.

    无法动态定义规则或白名单。 您必须更改代码。 数据库会派上用场,而登录系统则带有各种CRUD接口。
  • there’s no caching whatsoever, so every call is quite slow.

    没有任何缓存,因此每个调用都很慢。
  • as we come up with more rules and conditions, the isRecruiterSpam function will become more and more complex, finally reaching the point of total chaos. This is something we need to fix if we want a flexible, scalable, dynamic system – especially if we want to identify more types of emails than just recruiter spam!

    随着我们提出更多的规则和条件, isRecruiterSpam函数将变得越来越复杂,最终达到完全混乱的地步。 如果我们想要一个灵活,可扩展,动态的系统,这是我们需要解决的问题–尤其是当我们想要识别更多类型的电子邮件而不仅仅是招聘者的垃圾邮件时!

  • adding more functionality to this app is tedious at best – we’re breaking every SOLID principle with this code, and need to refactor. Ideally, we want the app to be usable by multiple users at once. Not only that, but we also want to share some training data between users, for better spam protection.

    向此应用程序添加更多功能充其量是乏味的-我们用此代码破坏了所有SOLID原则,并且需要重构。 理想情况下,我们希望该应用一次可被多个用户使用。 不仅如此,我们还希望在用户之间共享一些培训数据,以更好地保护垃圾邮件。

We’ll deal with all of these problems in subsequent articles – now that we know everything we’ll need and our proofs of concept work, we can clean up the code and turn it all into something worth the effort.

我们将在后续文章中处理所有这些问题-现在我们知道了我们需要的一切以及概念证明的工作原理,我们可以清理代码并将其全部转化为值得努力的东西。

In a followup, we’ll turn our spaghetti script experiment into a multi-user app by properly designing and structuring it. We’ll also power it up with a cronjob, and start building a proper rule engine. Stay tuned!

在后续工作中,我们将通过适当地设计和结构化意大利面条脚本实验,使其成为一个多用户应用程序。 我们还将通过cronjob启动它,并开始构建适当的规则引擎。 敬请关注!

翻译自: https://www.sitepoint.com/fighting-recruiter-spam-with-php-proof-of-concept/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值