Quora Infrastructure: How does Quora solve problems in their deployment process?

最新推荐文章于 2024-09-30 09:09:52 发布

wiksys

最新推荐文章于 2024-09-30 09:09:52 发布

阅读量504

点赞数

分类专栏： MySQL 文章标签：数据库

MySQL 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Let's say you want to delete a column from a database table. You would first push code that no longer uses that column, then wait until that push is finished, and then delete the column.

If you want to rename a column, it's trickier . The right way to do this would be roughly:

Create a new column with the new name.
Push code that duplicates writes to both columns, but still reads only from the old column.
Run a query to copy all data from the old column into the new column. At this point the columns are identical and will be maintained as identical because of the duplicate writes.
Push code that switches reads to use the new column.
Push code that stops the duplicate writes and just writes to the new column.
Drop the old column.

This generalizes into any kind of data migration. You probably wouldn't go through the overhead for this just to rename a column, but imagine changing from one data format to a more compressed representation: It's the same process.

Do all the new engineers know this? No, they learn as necessary. Often we just tolerate some errors during the restart window and don't go through a process like this.

Doesn't this make developers' life difficult? Occasionally, but there is no way around it. This is a consequence of running a service that never stops. It's not feasible to make an atomic code deployment process when you have hundreds of servers and don't want to have downtime. And if there is going to be a data migration that will take time to run, you'd need to do this even if you could deploy code atomically. I'd also suggest that this is not that difficult. Because of all the continuous deployment infrastructure our pushes are really lightweight. We always have the option of taking a service down to do a migration to avoid some of this overhead, and we have done that on a few occasions.

Doesn't this make the code look dirty? Temporarily, while this process is happening, it does, but at the end, a few hours later, the code is back to being clean.