How glow.mozilla.org gets its data

文章来源:http://blog.mozilla.com/data/2011/03/22/how-glow-mozilla-org-gets-its-data/

I’m sure you’ve heard by now, Firefox 4 is officially released.  The Metrics team has done our part by working with webdev to release a new real-time download visualization:

World map visualizing real-time Firefox 4 downloads

http://glow.mozilla.org/

 

The basic backend flow is like this:

  1. The various load balancing clusters that host download.mozilla.org are configured to log download requests to a remote syslog server.
  2. The remote server is running rsyslog and has a config that specifically filters those remote syslog events into a dedicated file that rolls over hourly
  3. SQLStream is installed on that server and it is tailing those log files as they appear.
  4. The SQLStream pipeline does the following for each request:
    1. filtering out anything other than valid download requests
    2. uses MaxMind GeoIP to get a geographic location from the IP address
    3. uses a streaming group by to aggregate the number of downloads by product, location, and timestamp
    4. every 10 seconds, sends a stream of counter increments to HBase for the timestamp row with the column qualifiers being each distinct location that had downloads in that time interval
  5. The glow backend is a python app that pulls the data out of HBase using the Python Thrift interface and writes a file containing a JSON representation of the data every minute.
  6. That JSON file can be cached on the front-end forever since each minute of data has a distinct filename
  7. The glow website pulls down that data and plays back the downloads or allows you to browse the geographic totals in the arc chart view

Some links for people interested in the code:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值