Facebook unveils Presto engine for querying 250 PB data warehouse

SUMMARY:

As Facebook’s user base swells larger and larger, data about users is growing much faster, and the social networking giant has developed a faster way to analyze it all with its Presto query engine.

At a conference for developers at Facebook headquarters on Thursday, engineers working for the social networking giant revealed that it’s using a new homemade query engine called Presto to do fast interactive analysis on its already enormous 250-petabyte-and-growing data warehouse.

More than 850 Facebook employees use Presto every day, scanning 320 TB each day, engineer Martin Traverso said.

“Historically, our data scientists and analysts have relied on Hive for data analysis,” Traverso said. “The problem with Hive is it’s designed for batch processing. We have other tools that are faster than Hive, but they’re either too limited in functionality or too simple to operate against our huge data warehouse. Over the past few months, we’ve been working on Presto to basically fill this gap.”

Facebook created Hive several years ago to give Hadoop some data warehouse and SQL-like capabilities, but it is showing its age in terms of speed because it relies on MapReduce. Scanning over an entire dataset could take many minutes to hours, which isn’t ideal if you’re trying to ask and answer questions in a hurry.

With Presto, however, simple queries can run in a few hundred milliseconds, while more complex ones will run in a few minutes, Traverso said. It runs in memory and never writes to disk, Traverso said.

Traverso explains the architecture of Facebook's new Presto engine. Source: Jordan Novet

Traverso explains the architecture of Facebook’s new Presto engine. Source: Jordan Novet

Think of Presto as Facebook’s version of Cloudera’s Impala SQL querying engine or what Hortonworks is working on with Stinger, but custom-fit for fast performance at Facebook scale. Presto isn’t competing with commercial products out there, although it could well rock the big data world soon. Facebook plans to release Presto in open source this fall, Traverso said.

The size of the data warehouse is growing faster than the number of the site’s users, said Ravi Murthy, a Facebook engineering manager. It’s 4,000 times bigger than it was four years ago. “As we project out this growth, over the next few years, it’s quite clear to us that at some point soon we will reach one exabyte,” Murthy said. “Looking at this exabyte scale, we have to rethink a lot of different things.”

Presto is one of those things. Alongside enabling fast queries, the engine is up to seven times more efficient on the CPU than Hive, Traverso said.

Another ongoing project is cutting down on the amount of space analytics data takes up in storage in Facebook’s data centers. Sambavi Muthukrishnan, an engineering manager, talked about how Facebook has been working on maintaining high availability of data while lowering the number of replicas made of data. This is possible especially with colder, or less frequently accessed, data.

As a preeminent webscale company, Facebook just keeps on innovating on hardware and software, from switches to servers to social graphs. These topics are sure to come up when my colleague Stacey Higginbotham sits down with Jay Parikh, Facebook’s vice president of infrastructure engineering, at GigaOM’s Structure conference in San Francisco on June 19. Presto is just the latest reason that conversation should be fascinating.

Related research

SUBSCRIBER CONTENT
?
http://wangmeng.us/





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是相关文献的链接,供您参考: 1. Gao, Y., et al. "A Big Data Approach to Aircraft Engine Emission Modeling." Journal of Cleaner Production, vol. 309, 2021, 127246. 2. Fornaciari, W., et al. "Big Data Analytics and Predictive Maintenance for Aircraft Engines." Procedia Manufacturing, vol. 53, 2020, pp. 181-187. 3. GE Aviation. "GE Aviation Launches New Jet Engine Health Management Software for Business and General Aviation Operators." GE Aviation, 2021, https://www.geaviation.com/press-release/business-general-aviation/ge-aviation-launches-new-jet-engine-health-management-software-business-and-general-aviation-operators. 4. BAE Systems. "BAE Systems Optimizes Jet Engine Fuel Efficiency with Machine Learning." BAE Systems, 2020, https://www.baesystems.com/en-us/article/bae-systems-optimizes-jet-engine-fuel-efficiency-with-machine-learning. 5. Pratt & Whitney. "Pratt & Whitney Unveils IntelligentEngineTM Suite of Digital Solutions." Pratt & Whitney, 2021, https://www.prattwhitney.com/newsroom/news/2021/05/18/pratt-whitney-unveils-intelligentengine-suite-digital-solutions. 6. Safran. "Safran Improves Aircraft Engine Fuel Efficiency with Machine Learning." Safran, 2020, https://www.safran-group.com/media/safran-improves-aircraft-engine-fuel-efficiency-machine-learning-20201022. 7. Li, Y., et al. "A Machine Learning Approach for Aircraft Engine Fault Diagnosis." Journal of Aircraft, vol. 58, no. 5, 2021, pp. 1837-1846. 8. NASA. "NASA's ML for Propulsion Optimization." NASA, 2020, https://www.nasa.gov/centers/glenn/about/fs21/ml-for-propulsion-optimization.html. 9. Siemens. "Siemens and Lufthansa Technik Collaborate to Optimize Aircraft Engine Maintenance with Machine Learning." Siemens, 2021, https://new.siemens.com/global/en/company/stories/energy/siemens-lufthansa-technik-ai-engine-maintenance.html. 10. Boeing. "Boeing Uses Machine Learning to Optimize Fuel Efficiency." Boeing, 2020, https://www.boeing.com/features/innovation-quarterly/2020/issue-3/boeing-uses-machine-learning-to-optimize-fuel-efficiency.page. 希望这些文献能够对您有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值