System Design之Design a Payment System

本文是北美模拟面试题 Design a Payment System 的笔记,原视频可以在 System Design Guru 频道查看。

北美的 System Design 面试没有标准答案,全部为开放式问答,只要言之有理能讲清楚各种选择下的 tradeoff 即可。

题目

请设计一个在线购物网站的支付系统:

  • 每当用户点击购物车中的“支付”按钮时,请求会发送到你的系统
  • 只支持信用卡支付
  • 依靠第三方服务(例如VISA)处理实际的信用卡支付
  • 重点在于确保支付操作的“仅一次”交付

考点

  1. 同步(Sync)与异步(Async)
    1. 同步(SYNC):
      1. 优点:
        1. 设计简单
      2. 缺点:
        1. 紧耦合
        2. 如果一部分失败,整个系统可能会失败
        3. 难以扩展
        4. 阻塞请求
      3. 使用场景:
        1. 需要立即反馈
        2. 支付处理时间可预测且短
    2. 异步(ASYNC):
      1. 优点:
        • 松耦合
        • 容易扩展
      2. 缺点:
        • 更复杂,尤其是在错误处理或重试机制方面
        • 延迟反馈
      3. 使用场景:
        • 长时间运行的操作且不需要立即反馈
        • 高负载系统且扩展性是主要关注点
        • 系统需要保持响应并同时处理其他任务的情况

2. 示例工作流程

  1. 用户点击支付按钮 (同步):
    • 验证支付详情并创建支付请求。
    • 为用户提供即时反馈(例如,“正在处理您的支付...”)。
  2. 支付处理 (异步):
    • 将支付请求添加到消息队列(Message Queue)中以进行异步处理。
    • 支付服务处理请求并与第三方支付网关(例如VISA)交互。
  3. 通知 (异步):
    • 完成后,更新数据库中的支付状态。
    • 使用异步通知通知用户支付结果。

3. 确保仅一次交付(Exactly Once Delivery)

  1. 在支付系统或任何分布式系统中,实现仅一次交付是一项具有挑战性但至关重要的要求。它确保每个支付请求只被处理一次,防止重复收费或未处理的支付问题。
  2. 使用重试和去重机制模拟仅一次交付。
  3. 幂等键(Idempotency Keys):
    • 幂等键是客户端为每个支付请求生成的唯一标识符。这个键确保即使由于网络问题、超时等原因请求被重试,服务器也只会处理一次。
    • 当接收到支付请求时,服务器检查数据库中是否已经存在该幂等键:
      • 如果存在,服务器返回之前存储的结果。
      • 如果不存在,服务器处理请求,存储与幂等键相关联的结果,然后返回结果给客户端。
  4. 将支付请求发送到具有仅一次交付语义的消息队列中。

4. 处理不一致

  1. 与银行的结算文件(Settlement File):
    • 结算文件是由支付系统生成的批处理文件,总结了一段时间内(通常是每天)的所有交易。这些文件被发送到银行或支付处理器进行对账,以确保两个系统对处理的总金额一致。
  2. 双重记账(Double-Entry Accounting):
    • 每笔交易都被记录为两笔对立的分录,确保总金额始终平衡为零。这有助于维护系统的一致性并检测差异。
    • 对于每笔支付,创建两个分录:一个从付款人的账户中借记,另一个记入收款人的账户。
  3. 账簿(Ledger):
    • 一个全面且有组织的财务交易记录。每笔记录在账簿中的交易包括交易金额、涉及的账户以及交易类型(借记或贷记)。
    • 维护记录所有交易的账簿,并定期检查其是否为零和余额。账簿中的任何不平衡都表明需要调查的差异。
  4. 可重试和不可重试的失败:
    1. 分类错误:
      • 可重试错误: 通过后续尝试可以解决的临时错误(例如,网络超时、临时服务不可用、速率限制)。
      • 不可重试错误: 通过重试无法解决的永久性错误(例如,无效的支付详情、资金不足、授权失败)。
    2. 可重试示例: 支付服务提供商(PSP)返回500错误。
    3. 不可重试示例: 下游服务永久不可用或用户输入无效。

5. 异步系统中的延迟优化

  1. 重用TCP连接(TCP Connection):
    1. 减少延迟:
      • 连接建立时间: 建立新的TCP连接涉及三次握手(SYN, SYN-ACK, ACK),这会引入延迟。通过重用现有连接,可以避免这一开销。
      • 更快的数据传输: 一旦建立了TCP连接,后续的数据传输可以立即开始,而无需额外的握手。

6. 延迟与吞吐量

  1. 延迟(Latency): 请求从源到目标并返回所需的时间。简单来说,就是从请求发出到收到第一个响应之间的延迟。
  2. 吞吐量(Throughput): 系统在给定时间内可以处理的工作量或交易数量。通常以每秒请求数(RPS)或每秒传输的数据量来衡量。
  3. 一般来说,使用消息队列(MQ)时,超额配置工作线程可以通过处理更多并发任务来减少延迟并减少队列中的等待时间。

7. 正确定义状态机

  1. 可能的状态:
    1. NOT_STARTED
    2. QUEUED
    3. EXECUTING
    4. SUCCEEDED
    5. FAILED
    6. RETRYING(如果需要更清晰)

8. 超时和重试机制

  1. 在对外部系统进行调用时,总是设置一个超时时间。
    • 如果外部系统在预期时间内没有响应,不应无限期等待,应超时并稍后重试。
  2. 重试机制的三个要素:
    1. 指数退避(Exponential Backoff)
    2. 抖动(Jittering)
    3. 重试限制(Retry Limit)

9. 实体设计

  1. 结账订单(Checkout Order):
    1. id: 结账订单的唯一标识符。
    2. 产品信息(Product Info): 结账订单中包含的产品详细信息。
    3. 支付订单ID(Payment Order IDs): 关联支付订单的ID列表。
  2. 支付订单(Payment Order):
    1. id: 支付订单的唯一标识符。
    2. 金额(Amount): 支付的总金额。
    3. 货币(Currency): 支付所用的货币。
    4. 付款人(Payer): 付款人的详细信息(例如,付款人ID、姓名、电子邮件等)。
    5. 收款人(Recipient): 收款人的详细信息(例如,收款人ID、姓名、电子邮件等)。
    6. 订单状态(Order State): 支付订单的当前状态(例如,NOT_STARTED, QUEUED, EXECUTING, SUCCEEDED, FAILED)。
    7. 重试计数(Retry Count): 支付订单的重试次数。
    8. 创建时间(CreateEpoch): 支付订单的创建时间戳。
    9. 更新时间(UpdateEpoch): 支付订单的最后更新时间戳。

以下是上文的英文版本:

Key Points:

1. Sync Vs Async

  1. SYNC:
    1. PRO:
      • Easy design
    2. CON:
      • Tight coupling
      • If one part fails, the entire system can fail
      • Hard to scale
      • Blocking request
    3. Use Cases:
      • Immediate feedback is required
      • The payment processing time is predictable and short
  2. ASYNC:
    1. PRO:
      • Loose coupling
      • Easy to scale
    2. CON:
      • More complex, especially inerror handling or retry mechanisms
      • Delayed feedback
    3. Use Cases:
      • Long-running operations where immediate feedback is not critical
      • High-load systems where scalability is a primary concern
      • Situations where the system needs to remain responsive and handle other tasks concurrently

2. Example Workflow

  1. User Clicks Pay Button (Sync):
    • Validate payment details and create a payment request.
    • Provide immediate feedback to the user (e.g., "Processing your payment...").
  2. Payment Processing (Async):
    • Add the payment request to a message queue for asynchronous processing.
    • The payment service processes the request and interacts with third-party payment gateways (e.g., VISA).
  3. Notification (Async):
    • Upon completion, update the payment status in the database.
    • Notify the user about the payment result using asynchronous notifications.

3. Exactly Once Delivery

  1. Achieving exactly-once delivery in a payment system, or any distributed system, is a challenging but crucial requirement. It ensures that each payment request is processed exactly once, preventing issues such as duplicate charges or missed payments.
  2. Use retry + deduplication to simulate exactly-once delivery.
  3. Idempotency Keys:
    • An idempotency key is a unique identifier generated by the client for each payment request. This key ensures that even if a request is retried (due to network issues, timeouts, etc.), the server processes it only once.
    • When a payment request is received, the server checks if the idempotency key already exists in the database:
      • If it exists, the server returns the previously stored result.
      • If it does not exist, the server processes the request, stores the result associated with the idempotency key, and returns the result to the client.
  4. Send the payment request to a message queue with exactly-once delivery semantics.

4. Handle Inconsistency

  1. Settlement File with Bank:
    • Settlement files are batch files generated by the payment system that summarize all transactions over a certain period (usually daily). These files are sent to the bank or payment processor to reconcile the transactions and ensure both systems agree on the total amount processed.
  2. Double-Entry Accounting:
    • Every transaction is recorded as two offsetting entries, ensuring that the total amount always balances to zero. This helps in maintaining consistency and detecting discrepancies within the system.
    • For each payment, create two entries: one debiting the payer's account and one crediting the recipient's account.
  3. Ledger:
    • A comprehensive and organized record of all financial transactions. Each transaction recorded in the ledger includes details such as the transaction amount, the accounts involved, and the type of transaction (debit or credit).
    • Maintain a ledger that records all transactions and periodically checks for a zero-sum balance. Any imbalance in the ledger indicates a discrepancy that needs to be investigated.
  4. Retryable and Non-Retryable Failures:
    1. Classify Errors:
      • Retryable Errors: Transient errors that can be resolved with subsequent attempts (e.g., network timeouts, temporary service unavailability, rate limiting).
      • Non-Retryable Errors: Permanent errors that cannot be resolved by retrying (e.g., invalid payment details, insufficient funds, authorization failures).
    2. Retryable Example: Payment service provider (PSP) returns a 500 error.
    3. Non-Retryable Example: Downstream service is permanently unavailable or user input is invalid.

5. Latency Optimization in Async Systems

  1. Reuse TCP Connection:
    1. Reduced Latency:
      • Connection Setup Time: Establishing a new TCP connection involves a three-way handshake (SYN, SYN-ACK, ACK), which introduces latency. Reusing existing connections avoids this overhead.
      • Faster Data Transmission: Once a TCP connection is established, subsequent data transfers can start immediately without the need for additional handshakes.

6. Latency vs. Throughput

  1. Latency: The time it takes for a request to travel from the source to the destination and back. It's the delay from the moment a request is made until the first response is received.
  2. Throughput: The amount of work or number of transactions a system can handle in a given amount of time. It is often measured in requests per second (RPS) or data transferred per second.
  3. Generally, with a message queue (MQ), overprovisioning workers can reduce latency by handling more tasks concurrently and reducing wait times in the queue.

7. Properly Define State Machine

  1. Possible States:
    1. NOT_STARTED
    2. QUEUED
    3. EXECUTING
    4. SUCCEEDED
    5. FAILED

8. Timeout and Retry Mechanism

  1. When making an external call to an external system, always have a timeout.
    • If the external system doesn’t respond within the expected time frame, timeout and retry later.
  2. Retry Three Elements:
    1. Exponential Backoff
    2. Jittering
    3. Retry Limit

9. Entity Design

  1. Checkout Order:
    1. id: Unique identifier for the checkout order.
    2. Product info: Details about the products included in the checkout order.
    3. Payment order ids: List of associated payment order IDs.
  2. Payment Order:
    1. id: Unique identifier for the payment order.
    2. Amount: The total amount of the payment.
    3. Currency: The currency in which the payment is made.
    4. Payer: Details about the payer (e.g., payer ID, name, email, etc.).
    5. Recipient: Details about the recipient (e.g., recipient ID, name, email, etc.).
    6. Order State: The current state of the payment order (e.g., NOT_STARTED, QUEUED, EXECUTING, SUCCEEDED, FAILED).
    7. Retry count: Number of times the payment order has been retried.
    8. CreateEpoch: Timestamp when the payment order was created.
    9. UpdateEpoch: Timestamp when the payment order was last updated.

  • 33
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值