Twitter & Performance: An update

On Monday, a fault in the database that stores Twitter user records caused problems on both Twitter.com and our API. The short, non-technical explanation is that a mistake led to some problems that we were able to fix without losing any data.

While we were able to resolve these issues by Tuesday morning, we want to talk about what happened and use this an opportunity to discuss the recent progress we’ve made in improving Twitter’s performance and availability. We recently covered these topics in a pair of June posts here and on our company blog ).

Riding a rocket
Making sure Twitter is a stable platform and a reliable service is our number one priority. The bulk of our engineering efforts are currently focused on this effort, and we have moved resources from other important projects to focus on the issue.

As we said last month, keeping pace with record growth in Twitter’s user base and activity presents some unique and complex engineering challenges. We frequently compare the tasks of scaling, maintaining, and tweaking Twitter to building a rocket in mid-flight.

During the World Cup, Twitter set records for usage . While the event was happening, our operations and infrastructure engineers worked to improve the performance and stability of the service. We have made more than 50 optimizations and improvements to the platform, including:

  • Doubling the capacity of our internal network;
  • Improving the monitoring of our internal network;
  • Rebalancing the traffic on our internal network to redistribute the load;
  • Doubling the throughput to the database that stores tweets;
  • Making a number of improvements to the way we use memcache, improving the speed of Twitter while reducing internal network traffic; and,
  • Improving page caching of the front and profile pages, reducing page load time by 80 percent for some of our most popular pages.

So what happened Monday?
While we’re continuously improving the performance, stability and scalability of our infrastructure and core services, there are still times when we run into problems unrelated to Twitter’s capacity. That’s what happened this week.

On Monday, our users database, where we store millions of user records, got hung up running a long-running query; as a result, most of the table became locked. The locked users table manifested itself in many ways: users were unable to sign-up, sign in, update their profile or background images, and responses from the API were malformed, rendering the response unusable to many of the API clients. In the end, this affected most of the Twitter ecosystem: our mobile, desktop, and web-based clients, the Twitter support and help system, and Twitter.com.

To remedy the locked table, we force-restarted the database server in recovery mode, a process that took more than 12 hours (the database covers records for more than 125 million users -- that’s a lot of records). During the recovery, the users table and related tables remained unavailable. Unfortunately, even after the recovery process completed, the table remained in an unusable state. Finally, yesterday morning we replaced the partially-locked user db with a copy that was fully available (in the parlance of database admins everywhere, we promoted a slave to master), fixing the database and all of the related issues.

We have taken steps to ensure we can more quickly detect and respond to similar issues in the future. For example, we are prepared to more quickly promote a slave db to a master db, and we put additional monitoring in place to catch errant queries like the one that caused Monday’s incidents.

Long-term solutions
As we said last month, we are working on long-term solutions to make Twitter more reliable (news that we are moving into our own data center this fall, which we announced this afternoon , is just one example). This will take time, and while there has been short-term pain, our capacity has improved over the past month.

Finally, despite the rapid growth of our company, we’re still a relatively small crew maintaining a comparatively large (rocket) ship. We’re actively looking for engineering talent, with more than 20 openings currently. If you’re interested in learning more about the problems we’re solving or “joining the flock ,” check out our jobs page .

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值