pluck_in_batches 使用教程

最新推荐文章于 2024-08-31 07:31:57 发布

郝言元

最新推荐文章于 2024-08-31 07:31:57 发布

阅读量341

点赞数 3

本文链接：https://blog.csdn.net/gitblog_00987/article/details/141734141

版权

pluck_in_batches 使用教程

pluck_in_batchesA faster alternative to the custom use of `in_batches` with `pluck`项目地址:https://gitcode.com/gh_mirrors/pl/pluck_in_batches

项目介绍

pluck_in_batches 是一个 Ruby gem，旨在提供一个更快速和高效的替代方案，用于在 ActiveRecord 中使用 in_batches 和 pluck 方法。这个 gem 通过减少 SQL 查询的数量和内存分配，显著提高了数据处理的性能。

项目快速启动

安装

首先，将以下代码添加到你的 Gemfile 中：

gem 'pluck_in_batches'

然后运行：

bundle install

或者手动安装：

gem install pluck_in_batches

使用示例

单列处理

User.where(active: true).pluck_each(:email) do |email|
  # 对每个 email 进行处理
end

多列处理

User.where(active: true).pluck_each(:id, :email) do |id, email|
  # 对每个 id 和 email 进行处理
end

批量处理

User.where("age > 21").pluck_in_batches(:email) do |emails|
  jobs = emails.map { |email| PartyReminderJob.new(email) }
  # 对每个批次的 emails 进行处理
end

应用案例和最佳实践

应用案例

假设你有一个包含数百万用户的数据库，并且你需要对所有活跃用户的电子邮件地址进行某种处理（例如发送电子邮件）。使用 pluck_in_batches 可以显著提高处理速度和效率。

User.where(active: true).pluck_in_batches(:email) do |emails|
  emails.each do |email|
    # 发送电子邮件
  end
end

最佳实践

批量处理：尽量使用批量处理方法 pluck_in_batches，以减少内存使用和提高性能。
避免内存泄漏：确保在处理大量数据时，代码不会导致内存泄漏。可以使用 uncached 方法来防止内存泄漏。

User.uncached do
  User.where(active: true).pluck_in_batches(:email) do |emails|
    emails.each do |email|
      # 发送电子邮件
    end
  end
end

典型生态项目

Sidekiq-Iteration

sidekiq-iteration 是一个与 Sidekiq 配合使用的 gem，用于在 Sidekiq 作业中迭代处理大型集合。它可以与 pluck_in_batches 结合使用，以进一步提高处理效率。

class UserJob
  include Sidekiq::Job
  include SidekiqIteration::Iteration

  def build_enumerator(cursor:)
    User.where(active: true).pluck_each_enumerator(:email, cursor: cursor)
  end

  def each_iteration(email)
    # 对每个 email 进行处理
  end
end

通过结合使用 pluck_in_batches 和 sidekiq-iteration，你可以在处理大量数据时，实现高效且可靠的批量处理。

pluck_in_batchesA faster alternative to the custom use of `in_batches` with `pluck`项目地址:https://gitcode.com/gh_mirrors/pl/pluck_in_batches