通过调用接口写入数据库

最新推荐文章于 2024-06-03 21:56:49 发布

Davina_yu

最新推荐文章于 2024-06-03 21:56:49 发布

阅读量267

点赞数 3

文章标签： python 算法开发语言

本文链接：https://blog.csdn.net/Davina_yu/article/details/139261035

版权

有一批数据，希望通过调用接口写入数据库，接口QPS为100，并发调用接口的线程数为80-120之间，设计一个写入数据的算法

python实例

在处理高并发写入数据库的场景时，通常需要考虑几个关键点：合理控制并发量以避免超过接口QPS限制、错误处理与重试机制、以及可能的数据批量处理以提高效率。以下是一个基于Python语言，使用线程池和队列来实现该需求的简化示例。此示例中，我们假设有一个write_to_db函数作为模拟的数据库写入操作，它会模拟实际的数据库写入逻辑。同时，我们会使用concurrent.futures.ThreadPoolExecutor来管理线程池，并通过一个队列来控制数据的流入速度，确保QPS不超过设定值。

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import queue

# 假设的数据库写入函数
def write_to_db(item):
    # 模拟写入操作耗时，这里简单用time.sleep代表
    time.sleep(0.5)  # 假设单次写入耗时0.5秒
    print(f"Written item: {item}")
    return True

# 数据生成器，模拟不断到来的数据
def data_generator(total_items):
    for i in range(total_items):
        yield i

# 控制QPS的写入函数
def controlled_write(db_writer, qps, batch_size=10, max_workers=100):
    start_time = time.time()
    items_written = 0
    batch = []
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_item = {}
        queue_count = 0
        
        while True:
            if not future_to_item or all(f.done() for f in future_to_item.values()):
                # 当没有正在执行的任务或者所有任务都已完成时，从数据生成器获取新数据
                try:
                    item = next(data_generator)
                    batch.append(item)
                    queue_count += 1
                except StopIteration:
                    # 数据生成完毕，等待所有任务完成
                    break
                
                if queue_count >= batch_size:
                    # 达到批次大小，提交任务并清空批次
                    for item in batch:
                        future = executor.submit(db_writer, item)
                        future_to_item[future] = item
                    batch = []
                    queue_count = 0
            
            # 控制QPS
            elapsed = time.time() - start_time
            target_executions = qps * elapsed
            actual_executions = len(future_to_item)
            
            if actual_executions < target_executions:
                # 确保至少有一个任务在执行以接近目标QPS
                if not future_to_item and queue_count > 0:
                    item = batch.pop(0)
                    future = executor.submit(db_writer, item)
                    future_to_item[future] = item
                    queue_count -= 1
            else:
                # 防止过度并发，等待一段时间
                time.sleep(1 / qps)

            # 检查并清理已完成的任务
            for future in as_completed(future_to_item.keys()):
                item = future_to_item[future]
                del future_to_item[future]
                if future.exception() is not None:
                    print(f"Error writing item {item}: {future.exception()}")
                else:
                    items_written += 1

    print(f"Total items written: {items_written}")

# 设置参数并执行
controlled_write(write_to_db, qps=100, batch_size=10, max_workers=120)

请注意，这个示例是一个简化的模型，真实环境中的实现可能需要根据具体数据库的特性（如事务处理、连接池管理等）、网络状况及系统资源进行调整。此外，生产环境中还需要考虑异常处理、日志记录、数据一致性保证等高级需求。

java 实例

在设计这样一个数据写入算法时，我们需要注意几个关键点：遵守接口的QPS限制、合理控制并发以避免过载、以及确保数据的完整性和一致性。下面是一个基于Java的简单示例，使用了Semaphore来控制并发数，并结合BlockingQueue作为缓冲区来管理待写入的数据，以确保不会因为生产速度过快而导致消费（即接口调用）跟不上。

import java.util.concurrent.*;

public class DataWriter {
    // 接口QPS限制，这里假设每秒最多调用100次
    private static final int QPS_LIMIT = 100;
    // 计算每个时间片允许的请求数，假设每个时间片为1秒
    private static final int REQUESTS_PER_TIMESLICE = QPS_LIMIT / 4; // 假设每四分之一秒放行一次请求
    // 并发调用接口的最大线程数
    private static final int MAX_THREADS = 120;
    // 控制并发的信号量
    private final Semaphore semaphore = new Semaphore(MAX_THREADS);
    // 用于缓冲待写入数据的阻塞队列
    private final BlockingQueue<String> dataQueue = new LinkedBlockingQueue<>();
    
    public void sendData(String data) {
        try {
            // 将数据放入队列
            dataQueue.put(data);
            // 模拟数据写入操作
            writeData();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            System.out.println("Thread interrupted while putting data into queue.");
        }
    }

    private void writeData() {
        final String data;
        try {
            // 从队列中获取数据
            data = dataQueue.take();
            // 获取许可
            semaphore.acquire();
            // 模拟实际的接口调用过程
            invokeApi(data);
            // 释放许可
            semaphore.release();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            System.out.println("Thread interrupted while writing data.");
        }
    }

    private void invokeApi(String data) {
        // 这里模拟实际的接口调用逻辑
        // 例如使用HttpClient、OkHttp等发起HTTP请求
        System.out.println("Writing data to DB: " + data);
        try {
            // 模拟接口调用耗时
            Thread.sleep(10); // 假设每次调用大约需要10毫秒
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    public static void main(String[] args) {
        DataWriter writer = new DataWriter();
        // 模拟数据产生并写入
        for (int i = 0; i < 500; i++) { // 假设有500条数据需要写入
            final String data = "Data " + i;
            new Thread(() -> writer.sendData(data)).start();
        }
    }
}

请注意，上述代码是一个简化的示例，主要展示了如何通过Semaphore控制并发访问，以及如何使用BlockingQueue作为生产者消费者模型中的缓冲区来管理数据流。在实际应用中，你可能需要根据实际情况调整参数，比如根据真实的QPS限制和系统性能调整并发线程数和队列大小，以及考虑异常处理、日志记录、数据批处理优化等方面。此外，对于高并发写入数据库的场景，还应考虑数据库层面的优化策略，如批量插入、事务管理等。

scala 实例

在Scala中设计一个能够高效且安全地写入数据库的算法，同时考虑接口的QPS限制和并发需求，可以采用Akka框架来管理线程池和异步操作，以确保高效且控制好并发量。下面是一个简化的示例，展示了如何使用Akka的Actor模型来实现这一需求。

首先，确保你的项目中已经引入了Akka相关依赖。如果你使用sbt作为构建工具，可以在build.sbt中添加如下依赖：

import akka.actor.typed.{ActorRef, ActorSystem, SpawnProtocol}
import akka.actor.typed.scaladsl.Behaviors
import scala.concurrent.duration._

// 数据写入任务
case class WriteTask(data: String)

// 写入完成通知
case object WriteDone

object DataWriter {
  def apply(dbInterface: ActorRef[DbInterface.Command]): ActorRef[WriteTask] =
    ActorSystem(SpawnProtocol(), "DataWriterSystem").spawn(
      Behaviors.receiveMessage { case WriteTask(data) =>
        // 模拟写入数据库的操作，实际应用中应替换为调用数据库接口的逻辑
        dbInterface ! DbInterface.Write(data)
        Behaviors.same
      },
      name = "dataWriter"
    )
}

// 定义数据库接口Actor，模拟真实数据库交互
sealed trait DbInterfaceCommand
final case class Write(data: String) extends DbInterfaceCommand
final case object WriteDone extends DbInterfaceCommand

object DbInterface {
  sealed trait Command
  final case class Acknowledge(taskId: Int) extends Command

  def apply(qpsLimit: Int): ActorRef[Command] = {
    val system = ActorSystem(Behaviors.setup { context =>
      // 真实情况下，这里应根据QPS限制来调整处理速度，此处简化处理
      Behaviors.receiveMessage {
        case Write(data) =>
          println(s"Writing data to DB: $data")
          context.scheduleOnce(10.milliseconds, context.self, WriteDone) // 模拟写入延迟
          Behaviors.same
        case WriteDone =>
          println("Write completed.")
          Behaviors.same
      }
    }, "DbInterface")

    system
  }
}

object Main extends App {
  val dbInterface = DbInterface(100) // 假设QPS限制为100
  val dataWriter = DataWriter(dbInterface)

  // 模拟产生写入任务
  (1 to 120).foreach { i =>
    dataWriter ! WriteTask(s"Data $i")
  }
}

请注意，这个例子为了简化展示，使用了定时器简单模拟了数据库写入的延迟和完成通知，实际应用中你需要根据具体的数据库接口和QPS限制来调整写入策略，可能包括使用RateLimiter来更精确地控制请求速率、错误处理、重试逻辑等。

另外，根据你的具体需求，你可能还需要考虑如何有效地分批处理数据、错误恢复机制、以及如何监控和调整系统的性能。