react 数据爬虫_SEO与React:Web爬虫比您想象的要聪明

react 数据爬虫

by Patrick Hund

帕特里克·洪德(Patrick Hund)

SEO与React:Web爬虫比您想象的要聪明 (SEO vs. React: Web Crawlers are Smarter Than You Think)

Many people still worry that if you build a websites using tools like React, Angular, or Ember, it will hurt your search engine ranking.

许多人仍然担心,如果您使用React,Angular或Ember等工具建立网站,则会损害您的搜索引擎排名。

The thinking goes something like this: the web crawlers that search engines use won’t be able crawl a page properly unless it’s completely rendered in the user’s browser. Instead, they’ll only see the HTML code delivered from the backend.

这种想法是这样的:搜索引擎使用的Web爬网程序将无法正确地爬网页面,除非该页面完全在用户浏览器中呈现。 相反,他们只会看到后端提供HTML代码。

If that HTML code contains nothing more than a couple of meta tags and script tags, the search engine will assume your page is basically blank and rank you poorly.

如果该HTML代码仅包含几个元标记和脚本标记,则搜索引擎将假定您的页面基本上是空白的,并且对您的排名很差。

I commonly see Search Engine Optimization (SEO) consultants recommend that you render your page on the backend, so that web crawlers can see a lot of nice HTML code that they can then index.

我通常会看到搜索引擎优化(SEO)顾问建议您在后端呈现页面,以便Web爬网程序可以看到很多不错HTML代码,然后可以将它们编入索引。

To me, this advice seems unreasonable and unrealistic. It’s 2016. Users expect pages to be dynamic and provide them with a snappy user experience. They don’t want to wait for a new HTML page to load every time they click on something.

在我看来,这个建议似乎是不合理和不现实的。 现在是2016年。用户希望页面具有动态性,并为他们提供快速的用户体验。 他们不想每次单击某项时都等待新HTML页面加载。

So is the statement “client-side rendering hurts your page rank” still valid?

那么“客户端渲染会损害您的页面排名”这一说法是否仍然有效?

做研究 (Doing The Research)

First, a disclaimer: I’m by no means an SEO expert. But I did read up on the topic a bit, and here’s what I found.

首先,免责声明:我绝不是SEO专家。 但是我确实阅读了一些有关该主题的内容,这就是我发现的内容。

Here’s an announcement from Google on their webmaster blog from October 2015:

这是Google从2015年10月开始在其网站站长博客上发布的公告

Today, as long as you’re not blocking Googlebot from crawling your JavaScript or CSS files, we are generally able to render and understand your web pages like modern browsers. To reflect this improvement, we recently updated our technical Webmaster Guidelines to recommend against disallowing Googlebot from crawling your site’s CSS or JS files.

今天,只要您不阻止Googlebot抓取JavaScript或CSS文件,我们通常就可以像现代浏览器一样呈现和理解您的网页 。 为了体现这一改进,我们最近更新了技术性网站站长指南 ,建议不要禁止Googlebot抓取您网站CSS或JS文件。

Here’s a Search Engine Land article from in May of 2015:

这是2015年5月发表的搜索引擎领域文章

We ran a series of tests that verified Google is able to execute and index JavaScript with a multitude of implementations. We also confirmed Google is able to render the entire page and read the DOM, thereby indexing dynamically generated content.
我们进行了一系列测试,以证明Google能够通过多种实现来执行JavaScript和建立索引。 我们还确认Google能够呈现整个页面并读取DOM,从而索引动态生成的内容。
SEO signals in the DOM (page titles, meta descriptions, canonical tags, meta robots tags, etc.) are respected. Content dynamically inserted in the DOM is also crawlable and indexable. Furthermore, in certain cases, the DOM signals may even take precedence over contradictory statements in HTML source code. This will need more work, but was the case for several of our tests.
遵守DOM中的SEO信号(页面标题,元描述,规范标签,元机器人标签等)。 动态插入DOM中的内容也是可爬网的和可索引的。 此外,在某些情况下,DOM信号甚至可能优先于HTML源代码中的矛盾声明。 这将需要更多的工作,但是我们的一些测试就是这种情况。

These two sources suggest that it is, indeed, safe to use client-side rendered layout.

这两个来源表明,使用客户端渲染的布局确实是安全的。

Preactjs.com测试 (The Preactjs.com Test)

I recently tweeted a lament about SEO consultants nitpicking about my beloved React. To be precise, I’m in the process of migrating to Preact, a light-weight alternative to Facebook’s React. I got this reply from Jason Miller, one of the developers working on Preact:

最近,我在Twitter上发了一条哀叹词,称SEO顾问对我心爱的React React之以鼻。 确切地说,我正在迁移到Preact ,这是Facebook React的轻巧替代品。 我收到了Jason Miller的答复, Jason Miller是从事Preact的开发人员之一:

Aside from the blog article from Search Engine Land that I’ve quoted above, Jason tweeted a link to a Google search for the Preact home page, which looks like this:

除了我上面引用的来自Search Engine Land的博客文章外,Jason还发布了一个Twitter链接,指向Google搜索Preact主页的链接,该链接如下所示:

This page is rendered entirely client-side, using Preact, as a look at its source code proves:

该页面使用Preact完全在客户端呈现,其源代码证明:

<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Preact: Fast 3kb React alternative with the same ES6 API. Components & Virtual DOM.</title>
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimal-ui">
<meta name="mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="format-detection" content="telephone=no">
<meta name="theme-color" content="#673AB8">
<link rel="manifest" href="/manifest.json">
<link rel="icon" type="image/png" href="/assets/app-icon-192.png" sizes="192x192">
<script>(function(url){window['_boostrap_'+url]=fetch(url);})('/content'+location.pathname.replace(/^\/(repl)?\/?$/, '/index')+'.md');</script>
<link rel="shortcut icon" href="/favicon.ico">
<link href="/style.6bae35e4ff9d687cb418.css" rel="stylesheet">
</head><body>
<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','//www.google-analytics.com/analytics.js','ga');ga('create', 'UA-6031694-20', 'auto');ga('send', 'pageview');</script>
<script type="text/javascript" src="/bundle.a0afd09fd48712ed0f26.js"></script>
</body></html>

If the Googlebot were not able to read the HTML code rendered by Preact, it wouldn’t show more than the contents of the meta tags.

如果Googlebot无法读取Preact呈现HTML代码,则显示的内容将不会超过meta标签的内容。

And yet, here’s what the Google results look like when searching for site:preactjs.com:

但是,这是搜索site:preactjs.com时Google结果的样子:

Another article by Andrew Farmer from March 2016 warns about lacking JavaScript support by search engines other than Google:

2016年3月, 安德鲁·法默 ( Andrew Farmer)的一篇文章警告称,除Google之外,搜索引擎缺乏JavaScript支持:

In my research I couldn’t find any evidence that Yahoo, Bing, or Baidu support JavaScript in their crawlers. If SEO on these search engines is important to you, you’ll need to use server-side rendering, which I’ll discuss in a future article.
在我的研究中,我找不到任何证据表明Yahoo,Bing或Baidu在其搜寻器中支持JavaScript。 如果这些搜索引擎上的SEO对您来说很重要,则需要使用服务器端呈现,我将在以后的文章中进行讨论。

So I decided to try out Jason’s test with other search engines:

因此,我决定与其他搜索引擎一起尝试Jason的测试:

✅兵 (✅ Bing)

Andrew’s warning regarding Bing seems insubstantial. Here are the Bing results when searching for site:preactjs.com:

安德鲁关于必应的警告似乎微不足道。 以下是搜索site:preactjs.com时的必应搜索结果

✅雅虎 (✅ Yahoo)

And the Yahoo results when searching for site:preactjs.com:

以及Yahoo搜索site:preactjs.com的结果

✅鸭鸭去 (✅ Duck Duck Go)

And the Duck Duck Go results when searching for site:preactjs.com:

在搜索site:preactjs.com时得到Duck Duck Go结果

⚠️百度 (⚠️ Baidu)

Chinese search engine Baidu does have problems with preactjs.com. Here are its results when searching for site:preactjs.com:

中国搜索引擎百度的确存在preactjs.com问题。 搜索site:preactjs.com时, 其结果 如下

So it would seem that unless ranking high in what is essentially a China-only search engine is a priority for you, there’s nothing wrong with rendering your web pages on the client-side using JavaScript, as long as you follow some basic rules (quoted from Andrew Farmer’s blog post):

因此,除非您将本质上仅在中国搜索引擎中排名最高的问题视为对您的优先事项,否则只要您遵循一些基本规则(使用引号),使用JavaScript在客户端上呈现网页就没有错。摘自Andrew Farmer的博客文章 ):

  • Render your components before doing anything asynchronous.

    在执行异步操作之前先渲染组件。
  • Test each of your pages with Fetch as Google to ensure that the Googlebot is finding your content

    使用Google抓取方式测试您的每个页面,以确保Googlebot找到了您的内容

Thanks for reading!

谢谢阅读!

2016年10月25日更新 (Update 25th October 2016)

Andrew Ingram ran the same tests that I ran an came to a different conclusion.

安德鲁·英格拉姆 ( Andrew Ingram)进行的测试与我进行的测试相同,得出了不同的结论。

Quote from Andrew:

引用安德鲁的话:

Here’s how many pages various search engines have indexed using the query “site:preactjs.com”
这是各种搜索引擎使用查询“ site:preactjs.com”建立索引的页面数
Google: 17 Bing: 6 Yahoo: 6 Baidu: 1
Google:17 Bing:6 Yahoo:6 Baidu:1
One of the Google results is an error page, but it presumably can’t be de-indexed automatically due to there not yet being a way of declaring a 404-equivalent in SPAs.
Google的结果之一是错误页面,但由于尚无法在SPA中声明404等效项,因此它可能无法自动取消索引。
I’ve also read (I can’t recall where) that Google has a latency of a few days when it comes to indexing SPAs compared to server-rendered apps. This may not be a problem for you, but it’s worth knowing about.
我还读过(我不记得在哪儿),与服务器渲染的应用程序相比,Google在索引SPA方面有几天的延迟。 对于您来说,这可能不是问题,但值得了解。

His working hypothesis is that search engine robots other than Google are able to index client-side rendered pages, but not crawl them, i.e. follow links and index other pages of a website.

他的假设是,除Google之外的搜索引擎机器人都可以索引客户端呈现的页面,但不能抓取它们,即跟踪链接并索引网站的其他页面。

Follow the discussion on Hacker News

跟随关于黑客新闻的讨论

致谢 (Acknowlegements)

Thanks to Adam Audette (Search Engine Land) and Andrew Farmer for their excellent blog articles from which I quoted, Jason Miller for his input and inspiration, my colleagues from the eBay Classifieds Group for their support and Quincy Larson of Free Code Camp for publishing this article!

由于亚当奥德特( Search Engine Land的 )和安德鲁农民从我引述他们的优秀的博客文章, 贾森·米勒对他的投入和灵感,从我的同事eBay的公告集团的支持和昆西拉尔森自由编码营用于发布此文章!

翻译自: https://www.freecodecamp.org/news/seo-vs-react-is-it-neccessary-to-render-react-pages-in-the-backend-74ce5015c0c9/

react 数据爬虫

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值