

If you write original content day in and day out, you already are aware of the fact that your posts will end up on bunch of SPAM sites within a few days sometimes even few minutes. Some users even noted that the site with stolen content outranked the original post. It is very frustrating as a website owner to see that someone is stealing your content without permission, monetizing it, outranking you in SERPs, and stealing your audience. Content Scraping is a huge problem these days considering that it is so easy for someone to steal your content. In this article, we will cover what is blog content scraping, how to catch content scrapers, how to deal with content scrapers, how you can reduce and prevent content scraping, how to take advantage of content scraping, how to make money from content scrapers, and is content scraping ever good?

如果您日复一日地撰写原始内容,那么您已经知道,您的帖子将在几天甚至几分钟内最终出现在大量SPAM网站上。 一些用户甚至指出,内容被盗的网站的排名超过了原始帖子。 作为网站所有者,看到有人在未经许可的情况下窃取您的内容,将其货币化,在SERP中使您排名居高不下以及在窃取您的受众群体方面感到非常沮丧。 考虑到某人窃取您的内容非常容易,因此如今的内容搜寻是一个巨大的问题。 在本文中,我们将介绍什么是博客内容抓取,如何捕获内容抓取,如何处理内容抓取,如何减少和防止内容抓取,如何利用内容抓取,如何从内容抓取工具中获利,内容抓取永远好吗?

什么是博客内容抓取? (What is Blog Content Scraping?)

Blog content scraping is an act usually performed with scripts that extract content from numerous sources and pulls it into one site. It is so easy now that anyone can install a WordPress site, put a free or commercial theme, and install a few plugins that will go and scrape content from selected blogs, so it can be published on their site.

博客内容抓取是一种通常使用脚本执行的操作,该脚本从众多来源中提取内容并将其提取到一个站点中。 现在,任何人都可以安装WordPress网站,放置免费或商业主题,以及安装一些插件来从选定博客中抓取内容,因此非常容易,因此可以将其发布在他们的网站上。

他们为什么偷我的内容? (Why are they Stealing my Content?)

Some of our users have asked us why are they stealing my content? The simple answer is because you are AWESOME. The truth is that these content scrapers have ulterior motives. Below are just few reasons why someone would scrape your content:

我们的一些用户问我们为什么要窃取我的内容? 简单的答案是因为您真棒。 事实是这些刮板有别有用心。 以下是某些人会抓取您的内容的几个原因:

  • Affiliate commission – There are some dirty affiliate marketers out there that just wants to exploit the system to make few extra bucks. They will use your content and other’s content to bring traffic to their site through search engine. These sites are usually targeted towards a specific niche, so they have related products that they are promoting.会员佣金 –那里有一些肮脏的会员营销商,他们只是想利用该系统赚很少的钱。 他们将使用您的内容和其他人的内容通过搜索引擎将流量吸引到他们的网站。 这些网站通常针对特定的利基市场,因此它们正在推广相关产品。
  • Lead Generation – Often we see lawyers and realtors doing this. They want to seem like industry leaders in their small communities. They do not have the bandwidth to produce quality content, so they go out and scrape content from other sources. Sometimes, they are not even aware of this because they are paying some scumbag $30/month to add content and help them get better SEO. We have encountered quite a few of these in the past.潜在客户生成 –我们经常看到律师和房地产经纪人这样做。 他们希望在自己的小社区中看起来像行业领导者。 他们没有带宽来产生高质量的内容,因此他们出去从其他来源抓取内容。 有时,他们甚至没有意识到这一点,因为他们每月要支付30美元的卑鄙行为以添加内容并帮助他们获得更好的SEO。 过去我们遇到过很多这样的问题。
  • Advertising Revenue – Some folks just want to create a “hub” of knowledge. A one-stop-shop for users in a specific niche. If I had a penny for every time someone has done this with our content, then we would have a few hundred pennies. Often we notice that our site content is being scraped. The scraper always replies, I was doing this for the good of the community. Except the site is plastered with ads.广告收入 –有些人只是想创造知识的“枢纽”。 为特定细分市场的用户提供一站式服务。 如果每次有人对我们的内容进行操作时我得到一分钱,那么我们将有几百美分。 通常,我们会注意到我们的网站内容被抓取。 刮板总是答复,我这样做是为了社区的利益。 除了网站上贴满广告。

These are just a few reasons why someone would steal your content.


如何抓取内容抓取工具? (How to Catch Content Scrapers?)

Catching content scrapers is a tedious task and can take up a lot of time. The are few ways that you can utilize to catch content scrapers.

抓取内容抓取器是一项繁琐的任务,可能会花费大量时间。 可以使用几种方法来捕获内容抓取工具。

Search Google with Your Post Titles


Yup that is as painful as it sounds. This method is probably not worth it especially if you are writing about a very popular topic.

是的,听起来很痛苦。 这种方法可能不值得,尤其是当您撰写有关非常受欢迎的主题的文章时。



If you add internal links in your posts, you will notice a trackback if a site steals your content. This way is pretty much the scraper telling you that they are scraping your content. If you are using Akismet, then a lot of these trackbacks will show up in the SPAM folder. Again, this will only work if you have internal links in your posts.

如果您在帖子中添加内部链接,那么如果网站窃取了您的内容,您将注意到一个引用。 这种方式几乎可以告诉您抓取者正在抓取您的内容。 如果您使用的是Akismet,则很多此类引用将显示在SPAM文件夹中。 同样,这只有在您的帖子中具有内部链接时才有效。

Webmaster Tools


If you use google webmaster tools, then you are probably aware of the Links to your site page. If you look under “Traffic”, you will see a page that says Links to your site. Chances are your scrapers will be among the top ones there. They will have hundreds if not thousands of links to your pages (considering that you have internal links).

如果您使用的是Google网站站长工具,则您可能知道链接到您的网站页面。 如果您在“流量”下查看,则会看到一个页面,显示指向您网站的链接。 您的刮板机很有可能会在那里排名第一。 他们将有数百个(如果不是数千个)指向您页面的链接(考虑到您具有内部链接)。

Links to Your Site - Google Webmaster Tools

FeedBurner Uncommon Uses


If you have setup Feedburner for your WordPress blog, then you can see some uncommon uses. In the Analyze Tab under Feed Stats, you will see “Uncommon Uses”. There you will see a list of sites.

如果您为WordPress博客设置了Feedburner ,那么您会看到一些不常见的用法。 在Feed Stats下的Analyze标签中,您会看到“罕见用途”。 在那里,您将看到一个站点列表。

FeedBurner Uncommon Uses
如何处理内容抓取工具 (How to Deal with Content Scrapers)

There are few approaches that people take when dealing with content scrapers. The Do Nothing Approach, Kill them all approach, Take Advantage of them approach.

人们处理内容抓取工具时几乎没有采取任何方法。 不采取任何措施,杀死所有人的方法,利用他们的方法。

The Do Nothing Approach


This is by far the easiest approach you can take. Usually the most popular bloggers would recommend this because it takes A LOT of time fighting the scrapers. This approach simply recommends that “instead of fighting them, spend your time producing even more quality content and having fun”. Now obviously if it is a well-known blog like Smashing Magazine, CSS-Tricks, Problogger, or others, then they do not have to worry about it. They are authority sites in Google’s eyes.

到目前为止,这是您可以采用的最简单的方法。 通常,最受欢迎的博客作者会建议您这样做,因为它需要花费大量时间与抓取工具进行斗争。 这种方法只是建议“与其争斗,不如将时间花在制作更多高质量的内容上并获得乐趣”。 现在很明显,如果它是Smashing Magazine,CSS-Tricks,Problogger等知名博客,那么他们就不必担心。 它们是Google眼中的权威网站。

However during the Panda Update, we know some good sites got flagged as scrapers because google thought their scrapers were original content. So this approach is not always the best in our opinion.

但是,在熊猫更新期间,我们知道一些良好的网站被标记为抓取工具,因为Google认为其抓取工具是原始内容。 因此,在我们看来,这种方法并不总是最好的。

Kill them all Approach


The exact opposite of the “Do Nothing Approach”. In this approach, you simply contact the scraper and ask them to take the content down. If they refuse to do so or simply do not reply to your requests, then you file a DMCA (Digital Millennium Copyright Act) with their host. In our experience, majority of the scraping websites do not have a contact form available. If they do, then utilize it. If they do not have the contact form, then you need to do a Whois Lookup.

与“不采取任何行动”完全相反。 通过这种方法,您只需联系刮板并要求他们取下物品。 如果他们拒绝这样做或只是不回复您的请求,则您向其主人提出DMCA(数字千年版权法案)。 根据我们的经验,大多数抓取网站都没有联系表格。 如果他们这样做,那就利用它。 如果他们没有联系表单,则需要进行Whois查找。

Whois Lookup

You can see the contact info on the administrative contact. Usually the administrative, and technical contact is the same. The whois also shows the domain registrar. Most well-known web hosting companies and domain registrars have DMCA forms or emails. You can see that this specific person is with Hostgator because of their nameservers. HostGator has a form for DMCA complaints. If the nameserver is something like ns1.theirdomain.com, then you have to dig deeper by doing reverse IP lookups and searching for IPs.

您可以在管理联系人上查看联系人信息。 通常,管理和技术联系是相同的。 whois还显示了域名注册商。 大多数知名的网络托管公司和域名注册商都有DMCA表格或电子邮件。 您可以看到该特定人员由于其域名服务器而与Hostgator在一起。 HostGator有一个针对DMCA投诉的表格。 如果名称服务器类似于ns1.theirdomain.com,则必须通过反向IP查找并搜索IP进行更深入的研究。

You can also use a third party service for DMCA.com for takedowns.


Jeff Starr in his article suggest that you should block the bad guy’s IPs. Access your logs for their IP address, and then block it with something like this in your root .htaccess file:

杰夫·斯塔尔 ( Jeff Starr)在他的文章中建议您应该阻止坏人的IP。 访问您的日志以获取其IP地址,然后在根.htaccess文件中使用类似的内容将其阻止:

Deny from 123.456.789

You can also redirect them to a dummy feed by doing something like this:


RewriteCond %{REMOTE_ADDR} 123\.456\.789\.
RewriteRule .* http://dummyfeed.com/feed [R,L]

You can get really creative here as Jeff suggests. Send them to really large text feeds full with Lorem Ipsum. You can send them some disgusting images of bad things. You can also send them right back to their own server causing an infinite loop which will crash their site.

正如Jeff所建议的,您在这里可以变得很有创意。 将它们发送到充满Lorem Ipsum的超大型文本Feed中。 您可以向他们发送一些令人作呕的坏事图像。 您还可以将它们直接发送回自己的服务器,从而导致无限循环,这将导致其站点崩溃。

The last approach that we take is to take Advantage of them.


如何利用内容抓取工具 (How to Take Advantage of Content Scrapers)

This is our approach of dealing with content scrapers, and it turns out quite well. It helps our SEO as well as help us make extra bucks. Majority of the scrapers use your RSS Feed to steal your content. So these are some of the things that you can do:

这是我们处理内容抓取工具的方法,并且效果很好。 它不仅有助于我们的SEO,还可以帮助我们赚取额外的收益。 大多数刮板使用您的RSS Feed来窃取您的内容。 因此,您可以执行以下操作:

  • Internal Linking – You need to interlink the CRAP out of your posts. With the Internal Linking Feature in WordPress 3.1, it is now easier than ever. When you have internal links in your article, it helps you increase pageviews and reduce bounce rate on your own site. Secondly, it gets you backlinks from the people who are stealing your content. Lastly, it allows you to steal their audience. If you are a talented blogger, then you understand the art of internal linking. You have to place your links on interesting keywords. Make it tempting for the user to click it. If you do that, then the scraper’s audience will too click on it. Just like that, you took a visitor from their site and brought them back to where they should have been in the first place.
  • 内部链接 –您需要将CRAP 链接到您的帖子之外。 借助WordPress 3.1中内部链接功能,现在比以往任何时候都容易。 当文章中有内部链接时,它可以帮助您增加网页浏览量并降低自己网站上的跳出率 。 其次,它使您从窃取您内容的人那里获得反向链接。 最后,它可以让您窃取他们的观众。 如果您是一位才华横溢的博客作者,那么您将了解内部链接的技巧。 您必须将链接放在有趣的关键字上。 诱使用户单击它。 如果您这样做,那么刮板的听众也会点击它。 就像这样,您从他们的网站上带走了一个访客,并将他们带回到他们本来应该去的地方。
  • Auto Link Keywords with Affiliate Links具有会员链接的自动链接关键字Ninja Affiliate and Ninja会员SEO Smart Links that will automatically replace assigned keywords with affiliate links. For example: HostGator, SEO智能链接)会自动将分配的关键字替换为会员链接。 例如:HostGator的, StudioPress, StudioPressMaxCDN, MaxCDNGravity Forms << These all will be auto-replaced with affiliate links when this post goes live.重力形式 <<这些都将被自动替换为会员链接时,这个帖子上线。
  • Get Creative with RSS Footer – You can either use the RSS Footer or WordPress SEO by Yoast Plugin to add custom items to your RSS Footer. You can add just about anything you want here. We know some people who like to promote their own products to their RSS readers. So they will add banners. Guess what, now those banners will appear on these scraper’s website as well. In our case, we always add a little disclaimer at the bottom of our posts in our RSS feeds. It simply reads like “How to Put Your WordPress Site in Read Only State for Site Migrations and Maintenance is a post from: WPBeginner which is not allowed to be copied on other sites.” By doing this, we get a backlink to the original article from scraper’s site which lets google and other search engines know we are authority. It also lets their users know that the site is stealing our content. If you are good with codes, then you can totally get nuts. Such as adding related posts just for your RSS readers, and bunch of other stuff. Check out our guide to completely manipulating your WordPress RSS feed.
  • 通过RSS页脚获得创意 –您可以使用RSS页脚Yoast Plugin的WordPress SEO将自定义项目添加到RSS页脚。 您可以在此处添加几乎任何您想要的东西。 我们知道有些人喜欢向RSS读者推广自己的产品。 因此,他们将添加横幅。 猜猜是什么,现在这些横幅也将出现在这些刮板的网站上。 就我们而言,我们总是在RSS feed中的帖子底部添加一些免责声明。 它看起来像是“ 如何将WordPress网站置于网站迁移和维护的只读状态是来自WPBeginner的帖子,不允许将其复制到其他网站上。” 这样,我们就可以从刮板的站点上获得原始文章的反向链接,从而使Google和其他搜索引擎知道我们是权威。 它还使他们的用户知道该网站正在窃取我们的内容。 如果您擅长使用代码,那么您完全可以发疯。 例如,仅为您的RSS阅读器添加相关的帖子,以及许多其他内容。 查阅我们的指南,以完全操纵WordPress RSS feed

如何减少博客内容的抓取并可能防止它 (How You Can Reduce Blog Content Scraping and Possibly Prevent It)

Considering if you take our approach of lots of internal linking, adding affiliate links, rss banners and such chances are that you will reduce content scraping to good measure. If you take Jeff Starr’s suggestion of redirecting content scrapers, that too will stop those scrapers. Aside from what we have shared above, there are a few other tricks that you can use.

考虑到如果您采用我们内部大量链接的方法,请添加会员链接,rss标语,并且这样的机会可能会减少对内容的抓取。 如果您采纳了Jeff Starr的重定向内容搜寻器的建议,那也将阻止这些搜寻器。 除了上面分享的内容以外,您还可以使用其他一些技巧。

完整摘要RSS摘要 (Full vs. Summary RSS Feed)

There has been a debate in the blogging community whether to have full RSS feed or summary RSS feed. We are not going to go into much details about that debate, however one of the PROS of having a Summary Only RSS feed is that you prevent content scraping. You can change the settings by going to your WordPress admin panel and going under Settings » Reading. Then change the setting For each article in a feed show: Summary.

博客社区一直在争论是否拥有完整的RSS feed或摘要RSS feed。 我们将不讨论该辩论的更多细节,但是拥有“仅摘要” RSS提要的PROS之一是您可以防止内容刮取。 您可以通过转到WordPress管理面板并在“设置”»“阅读”下更改设置。 然后更改设置为“供稿显示中的每篇文章:摘要”。

Note: We have full feed because we care more about our RSS readers than the spammers.


引用垃圾邮件 (Trackback SPAM)

Trackbacks and Pingbacks definitely had great uses however, they are now constantly being abused. Often themes display trackbacks and pingbacks under or among the comments. This gives the spammer an incentive to scrape your site and send trackbacks. If you mistakenly approves it, then they get a backlink and mention from your site. Here is how you can disable Trackbacks on all future posts. Here is an article that will show you how to disable trackbacks and pings on existing WordPress posts as well.

引用和Pingbacks肯定有很大的用途,但是,它们现在经常被滥用。 通常,主题会在评论下方或之中显示引用和引用。 这使垃圾邮件制造者有动力来抓取您的网站并发送引用。 如果您错误地批准,则他们会获得反向链接并从您的网站中提及。 这是您如何在以后的所有帖子中禁用“引用” 。 这是一篇将向您展示如何在现有WordPress帖子上禁用引用和ping的文章

内容刮取永远是件好事吗? (Is Content Scraping Ever Good?)

It can be. If you see that you are making money from the scraper’s site, then sure it can be. If you see a lot of traffic from a scraper’s site, then it can be. In most cases however, it is not. You should always try to get your content taken off. But you will realize as your blog gets larger, it is almost impossible to keep track of all content scrapers. We still send out DMCA complaints, however we know that there are tons of other sites that are stealing our content that we just cannot keep up with.

有可能。 如果您发现自己是在刮板的站点上赚钱的,那么可以肯定的是。 如果您看到某个刮板网站的访问量很高,则可以。 但是,在大多数情况下不是。 您应该始终尝试删除内容。 但是您会意识到,随着博客规模的扩大,几乎不可能跟踪所有内容抓取工具。 我们仍然会发送DMCA投诉,但是我们知道还有很多其他网站在窃取我们的内容,而我们无法跟上。

What are your thoughts? Do you use any other mechanics to prevent content scraping? Would love to hear your thoughts.

你觉得呢?你有没有什么想法? 您是否使用其他任何机制来防止内容刮取? 很想听听您的想法。

翻译自: https://www.wpbeginner.com/beginners-guide/beginners-guide-to-preventing-blog-content-scraping-in-wordpress/





