系统设计DDIA之Chapter 7 Transactions 之单对象和多对象操作

最新推荐文章于 2024-10-10 16:42:38 发布

暴躁老哥在线刷题

最新推荐文章于 2024-10-10 16:42:38 发布

阅读量733

点赞数 22

分类专栏： SystemDesign 文章标签：数据库 DDIA 系统设计系统架构大数据

本文链接：https://blog.csdn.net/qq_32424059/article/details/142036073

版权

SystemDesign 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

在数据库中，原子性和隔离性是ACID事务的关键属性，确保数据的可靠性和一致性：

原子性保证事务要么完全完成，要么在发生故障时彻底回滚，以防止部分更新。
隔离性确保并发事务不会相互干扰，产生与顺序执行相同的结果。

单对象操作将这些属性应用于单个对象，例如单个文档或表中的一行。数据库通过使用预写日志（WAL）等技术实现原子性，并通过锁机制实现隔离性。轻量级操作（如比较和设置或递增）提供了某些事务的好处，但仅限于单个对象，不提供完整的事务保证。

多对象事务在多个相关对象需要一起更新时是必要的，例如维护具有外键的表之间的引用完整性或同步非规范化数据。然而，在分布式数据库中跨分区实现多对象事务非常复杂，并可能影响性能和可用性。因此，许多分布式数据库避免使用它们，这导致在应用程序级别的错误处理和并发控制更加复杂。

在不严格遵循ACID原则的系统中，开发者必须使用诸如带回退的重试、幂等操作、最终一致性、冲突解决策略以及应用程序级别的补偿逻辑等策略来有效处理错误并保持数据一致性。

Single-Object and Multi-Object Operations in Databases:

In the context of databases, atomicity and isolation are key properties of ACID transactions that ensure reliability and consistency:

Atomicity guarantees that a transaction is either fully completed or entirely rolled back in case of failure, preventing partial updates.
Isolation ensures that concurrent transactions do not interfere with each other, producing the same result as if they were executed sequentially.

Single-object operations apply these properties to individual objects, such as a single document or a row in a table. Databases achieve atomicity using techniques like Write-Ahead Logs (WAL) and isolation through locking mechanisms. Lightweight operations, like compare-and-set or increment, provide some benefits of transactions but are limited to single objects and do not offer full transaction guarantees.

Multi-object transactions are necessary when multiple related objects need to be updated together, such as maintaining referential integrity between tables with foreign keys or synchronizing denormalized data. However, implementing multi-object transactions across partitions in distributed databases is complex and can impact performance and availability. Therefore, many distributed databases avoid them, leading to more complicated error handling and concurrency control at the application level.

In systems that do not strictly follow the ACID philosophy, developers must use strategies like retries with backoff, idempotent operations, eventual consistency, conflict resolution policies, and application-level compensation logic to handle errors and maintain data consistency effectively.

问题列表

什么是 ACID 事务中的原子性和隔离性？为什么它们很重要？
为什么在某些应用程序中需要多对象事务？你能提供一个例子吗？
执行单对象写入时可能会出现哪些问题？数据库通常如何确保这些操作的原子性和隔离性？
什么是“轻量级事务”的例子？为什么它们不被视为完整的事务？
为什么许多分布式数据库避免实现多对象事务？这么做的后果是什么？
在哪些情况下单对象操作是足够的？什么时候需要多对象事务？
处理事务中的错误和中止有哪些挑战？为什么重试中止的事务并不总是那么简单？
为什么某些系统在发生错误时选择不撤销已执行的操作？这对应用程序开发人员有何影响？
在不严格遵循 ACID 原则的系统中，有哪些有效的错误处理策略？

Question List

What are atomicity and isolation in the context of ACID transactions, and why are they important?
Why might a multi-object transaction be necessary in certain applications, and can you provide an example?
What are the potential issues when performing single-object writes, and how do databases typically ensure atomicity and isolation for these operations?
What are some examples of "lightweight transactions," and why are they not considered full transactions?
Why do many distributed databases avoid implementing multi-object transactions, and what are the consequences of this choice?
In which scenarios are single-object operations sufficient, and when are multi-object transactions needed?
What challenges are associated with handling errors and aborts in transactions, and why is retrying aborted transactions not always straightforward?
Why do some systems choose not to undo actions when an error occurs, and what are the implications for application developers?
What strategies can be used for effectively handling errors in systems that do not adhere to strict ACID principles?

参考答案

什么是 ACID 事务中的原子性和隔离性？为什么它们很重要？
- 回答：原子性意味着事务被视为一个不可分割的整体：它要么完全完成，要么没有任何效果。如果事务的任何部分失败，则事务中的所有更改都会被回滚，确保不会出现部分更新或不一致的状态。这种属性允许事务在失败时安全地重试，而不必担心半完成的操作。
  隔离性确保并发运行的事务不会相互干扰。它保证并发执行的事务的结果与顺序执行的结果相同。这可以防止脏读（读取未提交的数据）和竞争条件等问题。
  重要性：这些属性对于维护数据库的一致性、预防错误和简化应用程序开发至关重要。原子性和隔离性有助于确保即使在发生故障或并发访问的情况下，数据仍然准确和一致，使系统更可靠且更易于理解。
为什么在某些应用程序中需要多对象事务？你能提供一个例子吗？
- 回答：多对象事务在多个对象之间的更改需要协调以维护一致性时是必要的。
  例如，当 Alice 想要购买一辆二手车时，会涉及两个表：
  1. 库存表需要更新，将可用汽车数量减少 1。
  2. 订单表需要插入一条新记录，表明 Alice 已购买一辆汽车。
    这些更改必须作为单个事务一起执行。如果没有多对象事务，就有可能只有其中一个更改被执行，例如库存减少而没有相应的订单记录。这会使数据库处于不一致状态，导致数据完整性问题。
执行单对象写入时可能会出现哪些问题？数据库通常如何确保这些操作的原子性和隔离性？
- 回答：在执行单对象写入时，可能会出现一些问题，使数据库处于不一致状态：
- 如果在写入数据库时发生断电，则可能只写入部分数据，导致记录损坏或不完整。
- 如果在写入大型对象（如 20 KB 的 JSON 文档）过程中发生网络故障，可能只存储了一部分数据，导致文档无法解析或不完整。
  这些部分更新可能导致混乱和错误，因为数据库可能包含不完整或不一致的数据。
  确保原子性：数据库使用预写日志 (WAL) 实现原子性。在实际更改数据之前，计划的更改会被写入日志。如果在写入操作过程中发生故障，数据库可以使用该日志回滚任何部分更改，确保对象要么完全更新，要么不更改。
  确保隔离性：隔离性通常通过锁来实现。当一个事务需要更新对象时，它会锁定该对象，防止其他事务在操作完成之前访问它。这确保在同一时间只有一个事务可以修改对象，从而防止冲突并保持一致性。
什么是“轻量级事务”的例子？为什么它们不被视为完整的事务？
- 回答：一些数据库提供诸如比较和设置 (CAS) 或增量操作作为“轻量级事务”。比较和设置操作允许仅在当前值没有被其他并发事务修改的情况下继续写入，有效地防止了丢失更新。类似地，增量操作允许一个值以原子方式增加，而不需要单独的读-修改-写循环。
  这些操作不被视为完整的事务，因为它们仅适用于单个对象，而不是多个对象。完整的事务通常将跨不同对象的多个操作分组为一个原子且隔离的执行单元，确保所有更改要么一起发生，要么都不发生。而轻量级事务只操作单个对象，缺乏完整多对象事务的更广泛的保证。
为什么许多分布式数据库避免实现多对象事务？这么做的后果是什么？
- 回答：许多分布式数据库避免使用多对象事务，因为在多个分区之间实现它们非常复杂，并且会影响高可用性和性能。
- 跨分区的难度：协调跨多个分区的事务需要确保不同节点之间的一致性和隔离性，这在技术上具有挑战性，并且通常涉及大量开销。
- 对高可用性的影响：多对象事务需要多个节点之间的协调和同步，这可能会减慢系统速度并减少其快速处理请求的能力。在某些情况下，节点可能需要等待其他节点完成其事务部分，从而降低整体系统的响应性和可用性。
  这么做的后果：
- 错误处理复杂：在没有多对象事务的情况下，开发人员必须手动处理多步骤操作中只有一部分成功的情况。这增加了复杂性和漏洞的可能性。
- 并发挑战：没有多对象事务提供的隔离，并发操作可能导致数据不一致或冲突，需要更仔细地管理数据的并发访问。
在哪些情况下单对象操作是足够的？什么时候需要多对象事务？
- 回答：当事务只涉及对单个对象的修改时（例如单个行、文档或键值对），单对象操作是足够的。例如，如果一个表没有任何外键或对其他表的依赖关系，那么任何更改（插入、更新、删除）只会影响这个单一对象。在这种情况下，不需要多对象事务。
  多对象事务在需要一起更新多个相关对象以保持一致性时变得必要。例如，如果一个表有一个外键引用另一个表，那么对这个表中的一行的任何修改都可能需要相应地更新被引用的行，以确保引用完整性。在这种情况下，多对象事务确保所有相关更改要么一起应用，要么都不应用，以防止数据不一致。
处理事务中的错误和中止有哪些挑战？为什么重试中止的事务并不总是那么简单？
- 回答：挑战包括：
  1. 节点过载：如果一个节点过载或响应缓慢，重试事务可能会使问题恶化，因为增加的请求会进一步增加已经紧张的节点的负载，可能导致重试反馈循环，进一步降低性能。
  2. 重复操作：如果事务实际上成功了，但由于网络故障客户端没有收到确认，重试事务可能导致相同的操作被执行两次。
  3. 可重试与不可重试的错误：并不是所有的错误都适合重试。临时性错误（如网络延迟）可以重试，但永久性错误（如无效输入）不能通过重试解决。
  4. 客户端进程在重试过程中失败：如果客户端进程在重试过程中失败，任何计划提交的数据可能会丢失，导致不完整状态。
为什么某些系统在发生错误时选择不撤销已执行的操作？这对应用程序开发人员有何影响？
- 回答：在具有无领导复制的Dynamo风格数据库中，通常使用一种“尽力而为”的方法来处理错误。这意味着数据库将尝试尽最大努力执行请求的操作，但如果发生错误，它不会自动回滚更改或撤销已采取的操作。
  对应用程序开发人员的影响：
- 由于在发生错误时数据库不保证自动回滚，应用程序开发人员必须实现自己的机制来检测和恢复错误。这可能需要添加应用程序级别的检查、重试和逻辑来处理部分写入时可能出现的不一致性。
- 开发人员在设计应用程序时需要更加谨慎，以处理潜在的错误、网络故障或部分更新。这会增加应用程序代码的复杂性，并使跨所有操作确保数据一致性变得更加困难。
在不严格遵循 ACID 原则的系统中，有哪些有效的错误处理策略？
- 回答：策略包括：
  - 捕获错误并抛出异常：一种基本的方法是捕获错误并将异常抛给用户或应用程序，然后由其处理错误。
  - 带回退的重试：通过指数回退机制实现重试，以处理临时性错误，如网络超时或暂时的服务不可用。
  - 幂等操作：将操作设计为幂等的，这样多次应用相同的操作效果等同于只应用一次，这使重试更加安全。
  - 最终一致性机制：使用最终一致性模型，在数据随着时间的推移变得一致的情况下，允许后台进程或重试解决差异。

What are atomicity and isolation in the context of ACID transactions, and why are they important?
- Answer: Atomicity means that a transaction is treated as a single, indivisible unit: it must either complete entirely or have no effect at all. If any part of the transaction fails, all changes made during the transaction are rolled back, ensuring there are no partial updates or inconsistent states. This property allows transactions to be safely retried without worrying about half-finished operations.
  Isolation ensures that concurrently running transactions do not interfere with each other. It guarantees that the outcome of executing transactions concurrently is the same as if the transactions were executed sequentially, one after the other. This prevents issues such as dirty reads (reading uncommitted data) and race conditions.
  Importance: These properties are crucial for maintaining database consistency, preventing errors, and simplifying application development. Atomicity and isolation help ensure that data remains accurate and consistent even in the presence of failures or concurrent access, making the system more reliable and easier to reason about.
Why might a multi-object transaction be necessary in certain applications, and can you provide an example?
- Answer: Multi-object transactions are necessary when multiple changes across different objects (such as rows, tables, or documents) must be coordinated to maintain consistency. For example, when Alice wants to buy a used car, two tables are involved:
  1. The inventory table needs to be updated to decrease the number of available cars by 1.
  2. The order table needs a new record to be inserted, indicating that Alice has purchased a car.
    Both of these changes must be performed together as part of a single transaction. Without a multi-object transaction, there is a risk that only one of the changes will occur—such as the inventory being decreased without a corresponding order record. This would leave the database in an inconsistent state, leading to data integrity issues.
What are the potential issues when performing single-object writes, and how do databases typically ensure atomicity and isolation for these operations?
- Answer: When performing single-object writes, several issues can arise that could leave the database in an inconsistent state:
- If a power failure occurs while writing to the database, only part of the data might be written, leaving a corrupted or incomplete record.
- If a network failure happens halfway through writing a large object, like a 20 KB JSON document, only a portion of the data may be stored, resulting in an unparseable or partial document.
  These partial updates can lead to confusion and errors, as the database may contain incomplete or inconsistent data.
  Ensuring Atomicity: Databases use a Write-Ahead Log (WAL) to achieve atomicity. Before making any changes to the actual data, the intended changes are written to a log. If a failure occurs during the write operation, the database can use this log to roll back any partial changes, ensuring that the object is either fully updated or not changed at all.
  Ensuring Isolation: Isolation is typically implemented using locks. When a transaction needs to update an object, it acquires a lock on that object, preventing other transactions from accessing it until the operation is complete. This ensures that only one transaction can modify the object at a time, preventing conflicts and maintaining consistency.
What are some examples of "lightweight transactions," and why are they not considered full transactions?
- Answer: Some databases provide operations like compare-and-set (CAS) or increment as "lightweight transactions." The compare-and-set operation allows a write to proceed only if the current value has not been modified by another concurrent transaction, effectively preventing lost updates. Similarly, an increment operation allows a value to be increased atomically without requiring a separate read-modify-write cycle.
  These operations are not considered full transactions because they only apply to single objects, not multiple objects. A full transaction typically groups multiple operations across different objects into one atomic and isolated execution unit, ensuring that all changes occur together or not at all. In contrast, lightweight transactions operate on a single object and lack the broader guarantees that full multi-object transactions provide.
Why do many distributed databases avoid implementing multi-object transactions, and what are the consequences of this choice?
- Answer: Many distributed databases avoid multi-object transactions because implementing them across multiple partitions is complex and can compromise high availability and performance.
- Difficulty Across Partitions: Coordinating transactions that span multiple partitions requires ensuring consistency and isolation across different nodes, which is technically challenging and often involves significant overhead.
- Impact on High Availability: Multi-object transactions require coordination and synchronization between multiple nodes, which can slow down the system and reduce its ability to quickly handle requests. In some cases, nodes may need to wait for others to complete their part of a transaction, reducing the overall system's responsiveness and availability.
  Implications of This Choice:
- Complicated Error Handling: Without multi-object transactions, developers must manually handle cases where only some parts of a multi-step operation succeed. This increases complexity and the likelihood of bugs.
- Concurrency Challenges: Without the isolation provided by multi-object transactions, concurrent operations might lead to inconsistent data or conflicts, requiring more careful management of concurrent access to data.
In which scenarios are single-object operations sufficient, and when are multi-object transactions needed?
- Answer: Single-object operations are sufficient when a transaction only involves modifications to a single object, such as a single row, document, or key-value pair. For example, if a table does not have any foreign keys or dependencies on other tables, then any changes (insert, update, delete) only affect this single object. In this case, there is no need for a multi-object transaction.
  Multi-object transactions become necessary when multiple related objects need to be updated together to maintain consistency. For example, if a table has a foreign key that references another table, any modification to a row in this table may require corresponding updates to the referenced row to ensure referential integrity. In such cases, multi-object transactions ensure that all related changes are applied together or not at all, preventing data inconsistencies.
What challenges are associated with handling errors and aborts in transactions, and why is retrying aborted transactions not always straightforward?
- Answer: Challenges include:
  1. Overloading Nodes: If a node is overloaded or slow to respond, retrying the transaction may worsen the problem by adding more requests to an already strained node, potentially leading to a feedback loop of retries that further degrades performance.
  2. Duplicate Operations: If a transaction is actually successful, but the client does not receive the acknowledgment due to a network failure, retrying the transaction could result in the same operations being applied twice.
  3. Retriable vs. Non-Retriable Errors: Not all errors are suitable for retries. Temporary errors like network delays can be retried, but permanent errors like invalid input cannot be resolved through retries.
  4. Client Process Failure During Retry: If the client process fails during a retry, any data it intended to commit may be lost, resulting in an incomplete state.
Why do some systems choose not to undo actions when an error occurs, and what are the implications for application developers?
- Answer: In Dynamo-style databases with leaderless replication, a "best effort" approach is often used for handling errors. This means that the database will attempt to perform the requested operations to the best of its ability, but if an error occurs, it does not automatically roll back changes or undo actions that were already taken.
  Implications for Application Developers:
- Since the database does not guarantee automatic rollbacks in the case of errors, application developers must implement their own mechanisms for detecting and recovering from errors. This may involve adding application-level checks, retries, and logic to handle inconsistencies that can arise when partial writes occur.
- Developers need to be more cautious when designing applications to handle potential errors, network failures, or partial updates. This can increase the complexity of the application code and make it harder to ensure data consistency across all operations.
What strategies can be used for effectively handling errors in systems that do not adhere to strict ACID principles?
- Answer: Strategies include:
  - Catching Errors and Throwing Exceptions: A basic approach where errors are caught and exceptions are thrown to the user or application, which must then handle the error.
  - Retries with Backoff: Implementing retries with exponential backoff to handle transient errors, such as network timeouts or temporary service unavailability.
  - Idempotent Operations: Design operations to be idempotent, so that applying the same operation multiple times has the same effect as applying it once, making retries safer.
  - Eventual Consistency Mechanisms: Use an eventual consistency model, where data becomes consistent over time, allowing background processes or retries to resolve discrepancies.
  - Conflict Resolution Policies: Implement conflict resolution strategies like "last write wins" to automatically resolve conflicts when nodes disagree about the state of the data.
  - Application-Level Compensation Logic: Develop compensation logic to handle partial failures, such as manual intervention or compensating transactions to correct inconsistent states.