How to serve AJAX pages (Ember.js, Angular, etc) to Google's dummy bots?

2 篇文章 0 订阅

There may / must be better ways but here we go.

Recipes: 

  1. Headless browser component, e.g.
    1. PhantomJS [1] is the preferred case because it's 1) v8, 2) lightweight, comparing to the next choince
    2. Firefox + Xvfb. I had to use this one because my site breaks under PhantomJS (even if it works fine under Chrome)
  2. Selenium to drive the browser and generate the HTML.
  3. Web server that serves the bots.
As defined by Google [2], AJAX apps should use #! to indicate the bots that it's an AJAX page, and bots will try to look for ?_escaped_fragment_= URL for this AJAX address and expect a JavaScript-free page. So there must be something to run the JavaScripts, generate proper DOM for the dummy bots. Here comes in the headless browsers.

Xvfb is a special X server that runs (at least for me) on Linux and requires no interaction with graphics devices. It renders everything inside memory so can be run on headless servers like Amazon EC2 Linux servers easily. Firefox is the de facto for Linux, works pretty well with Xvfb, and is the default driver for Selenium so it's the definite choice.

Selenium was designed for browser based test automation. It can drives different browsers starting Firefox (with built-in support), Chrome and IE (both require extra "driver"s). In Python there's an API for Selenium but also there are easier APIs like Splinter, which is my choice.

If simply forwarding every URL to the firefox, we're loading a page 20 - 100x slower than actually loading in Firefox, because for each resources (CSS, JavaScript, Images) the server is actually starting a new Firefox tab (if not window) to retrieve that, while the first AJAX page would have loaded them once already. This is slow, so a hack is done here to load all static resources via Requests instead. Better optimisations available, though.

So everything together is at [3].

Good luck.

[1] PhantomJS  http://phantomjs.org 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值