论文速递 | Operations Research 5月文章合集

在这里插入图片描述

编者按:

在本系列文章中,我们梳理了运筹学顶刊Operations Research在2024年5月份发布的6篇文章的基本信息,旨在帮助读者快速洞察领域新动态。

推荐文章1

  • 题目:Drone-Delivery Network for Opioid Overdose: Nonlinear Integer Queueing-Optimization Models and Methods阿片类药物过量应急的无人机递送网络:非线性整数排队优化模型与方法
  • 期刊:Operations Research
  • 原文链接:https://doi.org/10.1287/opre.2022.0489
  • 发表日期:2024/05/07
  • 作者:Miguel A. Lejeune , Wenbo Ma
  • 摘要
    • We propose a new stochastic emergency network design model that uses a fleet of drones to quickly deliver naloxone in response to opioid overdoses. The network is represented as a collection of M / G / K M/G/K M/G/Kqueueing systems in which the capacity K K K of each system is a decision variable, and the service time is modeled as a decision-dependent random variable. The model is a queuing-based optimization problem which locates fixed (drone bases) and mobile (drones) servers and determines the drone dispatching decisions and takes the form of a nonlinear integer problem intractable in its original form. We develop an efficient reformulation and algorithmic framework. Our approach reformulates the multiple nonlinearities (fractional, polynomial, exponential, factorial terms) to give a mixed-integer linear programming (MILP) formulation. We demonstrate its generalizability and show that the problem of minimizing the average response time of a collection of M / G / K M/G/K M/G/Kqueueing systems with unknown capacity K K K is always MILP-representable. We design an outer approximation branch-and-cut algorithmic framework that is computationally efficient and scales well. The analysis based on real-life data reveals that drones can in Virginia Beach: (1) decrease the response time by 82%, (2) increase the survival chance by more than 273%, (3) save up to 33 additional lives per year, and (4) provide annually up to 279 additional quality-adjusted life years.
    • 我们提出了一种新的随机应急网络设计模型,该模型使用一个队列的无人机快速递送纳洛酮以应对阿片类药物过量。该网络被表示为一组 M / G / K M/G/K M/G/K 排队系统,其中每个系统的容量 K K K 是一个决策变量,服务时间被建模为一个依赖于决策的随机变量。该模型是一个基于排队的优化问题,它定位固定(无人机基地)和移动(无人机)服务器,并确定无人机的调度决策,形成一个在其原始形式中难以处理的非线性整数问题。我们开发了一个高效的重构和算法框架。我们的方法重构了多种非线性(分式、多项式、指数、阶乘项),以提供一个混合整数线性规划(MILP)公式。我们证明了其通用性,并表明最小化一组容量未知的 M / G / K M/G/K M/G/K 排队系统平均响应时间的问题始终是MILP可表示的。我们设计了一个外部逼近的分枝定界算法框架,该框架在计算上高效且具有良好的扩展性。基于实际数据的分析表明,在弗吉尼亚海滩,无人机可以:(1)将响应时间减少82%,(2)将生存机会提高超过273%,(3)每年多挽救多达33条生命,(4)每年提供多达279个额外的质量调整生命年限。

推荐文章2

  • 题目:Online Learning for Constrained Assortment Optimization Under Markov Chain Choice Model基于马尔可夫链选择模型的约束商品组合优化在线学习
  • 期刊:Operations Research
  • 原文链接:https://doi.org/10.1287/opre.2022.0693
  • 发表日期:2024/05/15
  • 作者:Shukai Li , Qi Luo , Zhiyuan Huang , Cong Shi
  • 摘要
    • We study a dynamic assortment selection problem where arriving customers make purchase decisions among offered products from a universe of products under a Markov chain choice (MCC) model. The retailer only observes the assortment and the customer’s single choice per period. Given limited display capacity, resource constraints, and no a priori knowledge of problem parameters, the retailer’s objective is to sequentially learn the choice model and optimize cumulative revenues over a finite selling horizon. We develop a fast linear system based explore-then-commit (FastLinETC for short) learning algorithm that balances the tradeoff between exploration and exploitation. The algorithm can simultaneously estimate the arrival and transition probabilities in the MCC model by solving a linear system of equations and determining the near-optimal assortment based on these estimates. Furthermore, our consistent estimators offer superior computational times compared with existing heuristic estimation methods, which often suffer from inconsistency or a significant computational burden.
    • 我们研究了一个动态商品组合选择问题,在该问题中,顾客在到达时根据马尔可夫链选择(MCC)模型在提供的商品中做出购买决策。零售商每个周期只能观察到商品组合和顾客的单一选择。在有限的展示容量、资源约束和对问题参数没有先验知识的情况下,零售商的目标是在有限的销售周期内,逐步学习选择模型并优化累积收入。我们开发了一种基于快速线性系统的探索-承诺学习算法(简称FastLinETC),该算法在探索和利用之间取得了平衡。该算法通过求解线性方程组来同时估计MCC模型中的到达概率和转移概率,并基于这些估计确定近最优的商品组合。此外,与现有的启发式估计方法相比,我们的一致性估计器在计算时间上具有显著优势,而现有方法通常存在不一致性或计算负担过重的问题。

推荐文章3

  • 题目:Matching Impatient and Heterogeneous Demand and Supply匹配急迫且异质的需求与供给
  • 期刊:Operations Research
  • 原文链接:https://doi.org/10.1287/opre.2022.0005
  • 发表日期:2024/05/15
  • 作者:Angelos Aveklouris , Levi DeValve , Maximiliano Stock , Amy Ward
  • 摘要
    • Service platforms must determine rules for matching heterogeneous demand (customers) and supply (workers) that arrive randomly over time and may be lost if forced to wait too long for a match. Our objective is to maximize the cumulative value of matches, minus costs incurred when demand and supply wait. We develop a fluid model, that approximates the evolution of the stochastic model and captures explicitly the nonlinear dependence between the amount of demand and supply waiting and the distribution of their patience times, also known as reneging or abandonment times in the literature. The fluid model-invariant states approximate the steady-state mean queue lengths in the stochastic system and, therefore, can be used to develop an optimization problem whose optimal solution provides matching rates between demand and supply types that are asymptotically optimal (on fluid scale as demand and supply rates grow large). We propose a discrete review matching policy that asymptotically achieves the optimal matching rates. We further show that, when the aforementioned matching optimization problem has an optimal extreme point solution, which occurs when the patience time distributions have increasing hazard rate functions, a state-independent priority policy that ranks the edges on the bipartite graph connecting demand and supply is asymptotically optimal. A key insight from this analysis is that the ranking critically depends on the patience time distributions and may be different for different distributions even if they have the same mean, demonstrating that models assuming, for example, exponential patience times for tractability, may lack robustness. Finally, we observe that, when holding costs are zero, a discrete review policy that does not require knowledge of interarrival and patience time distributions is asymptotically optimal.
    • 服务平台必须确定匹配异质性需求(顾客)和供给(工人)的策略,这些需求和供给会随机到达,并且如果被迫等待太久可能会流失。我们的目标是最大化匹配的累计价值,同时减去需求和供给等待时产生的成本。我们开发了一个流体模型,用于近似随机模型的演变,并明确捕捉需求和供给等待量与其耐心时间分布(在文献中也称为放弃时间)之间的非线性依赖关系。流体模型的不变量状态近似了随机系统中的稳态平均队列长度,因此可以用于开发一个优化问题,其最优解提供了需求和供给类型之间的匹配率,在流体尺度上(随着需求和供给速率的增长)渐近最优。我们提出了一种离散审查匹配策略,该策略在渐近上达到最优匹配率。我们进一步表明,当上述匹配优化问题有一个最优极值点解时,即当耐心时间分布具有增加的危险率函数时,一种状态无关的优先级策略,在连接需求和供给的二分图上对边进行排序是渐近最优的。该分析的一个关键见解是,排序严重依赖于耐心时间分布,并且即使它们具有相同的均值,对于不同的分布可能有所不同,这表明假设例如指数耐心时间以简化模型可能缺乏鲁棒性。最后,我们观察到,当持有成本为零时,一种不需要了解到达间隔和耐心时间分布的离散审查策略是渐近最优的。

推荐文章4

  • 题目:On the Robustness of Second-Price Auctions in Prior-Independent Mechanism Design关于先验独立机制设计中二级价格拍卖的鲁棒性研究
  • 期刊:Operations Research
  • 原文链接:https://doi.org/10.1287/opre.2022.0428
  • 发表日期:2024/05/16
  • 作者:Jerry Anunrojwong , Santiago R. Balseiro , Omar Besbes
  • 摘要
    • Classical Bayesian mechanism design relies on the common prior assumption, but the common prior is often not available in practice. We study the design of prior-independent mechanisms that relax this assumption: The seller is selling an indivisible item to n buyers such that the buyers’ valuations are drawn from a joint distribution that is unknown to both the buyers and the seller, buyers do not need to form beliefs about competitors, and the seller assumes the distribution is adversarially chosen from a specified class. We measure performance through the worst-case regret, or the difference between the expected revenue achievable with perfect knowledge of buyers’ valuations and the actual mechanism revenue. We study a broad set of classes of valuation distributions that capture a wide spectrum of possible dependencies: independent and identically distributed (i.i.d.) distributions, mixtures of i.i.d. distributions, affiliated and exchangeable distributions, exchangeable distributions, and all joint distributions. We derive in quasi closed form the minimax values and the associated optimal mechanism. In particular, we show that the first three classes admit the same minimax regret value, which is decreasing with the number of competitors, whereas the last two have the same minimax regret equal to that of the case n = 1. Furthermore, we show that the minimax optimal mechanisms have a simple form across all settings: a second-price auction with random reserve prices, which shows its robustness in prior-independent mechanism design. En route to our results, we also develop a principled methodology to determine the form of the optimal mechanism and worst-case distribution via first-order conditions that should be of independent interest in other minimax problems.
    • 经典的贝叶斯机制设计依赖于共同先验假设,但在实践中往往无法获得共同先验。我们研究了放松这一假设的先验独立机制设计:卖方将一件不可分割的物品卖给 n 位买家,买家的估值来自一个对买家和卖方都未知的联合分布,买家不需要形成关于竞争者的信念,卖方假设该分布是从一个指定的类别中对抗性选择的。我们通过最坏情况下的遗憾来衡量绩效,即在完美了解买家估值的情况下可实现的预期收入与实际机制收入之间的差异。我们研究了广泛的估值分布类别,这些类别涵盖了各种可能的依赖性:独立同分布(i.i.d.)分布、i.i.d.分布的混合、关联和可交换分布、可交换分布以及所有联合分布。我们以准闭式形式导出了极小极大值及其相关的最优机制。特别地,我们表明前三类具有相同的极小极大遗憾值,该值随着竞争者数量的增加而减少,而后两类的极小极大遗憾与 n = 1 的情况相等。此外,我们证明了极小极大最优机制在所有设置中都具有简单形式:带随机保留价格的二级价格拍卖,这表明其在先验独立机制设计中的鲁棒性。在得出我们的结果的过程中,我们还开发了一种原则性方法,通过一阶条件确定最优机制的形式和最坏情况分布,这在其他极小极大问题中也具有独立的意义。

推荐文章5

  • 题目:Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback具有梯度反馈的强单调和指数凹博弈中的自适应双重最优无遗憾学习
  • 期刊:Operations Research
  • 原文链接:https://doi.org/10.1287/opre.2022.0446
  • 发表日期:2024/05/23
  • 作者:Michael Jordan , Tianyi Lin , Zhengyuan Zhou
  • 摘要
    • Online gradient descent (OGD) is well-known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of Θ ( log ⁡ T ) \Theta(\log T) Θ(logT) for strongly convex cost functions, and (2) in the multiagent setting of strongly monotone games with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of Θ ( 1 T ) \Theta\left(\frac{1}{T}\right) Θ(T1). Whereas these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, AdaOGD, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves O ( log ⁡ 2 T ) O(\log^2 T) O(log2T) regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs AdaOGD in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of O ( log ⁡ 3 T T ) O\left(\frac{\log^3 T}{T}\right) O(Tlog3T), again optimal up to log factors. We illustrate our algorithms in a learning version of the classic newsvendor problem, in which, because of lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multiretailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step algorithm.
    • 在线梯度下降(OGD)在强凸性或单调性假设下已被广泛认可为双重最优:(1)在单代理环境中,对于强凸成本函数,它实现了 Θ ( log ⁡ T ) \Theta(\log T) Θ(logT)的最优遗憾;(2)在每个代理均采用OGD的强单调博弈的多代理环境中,联合行为以 Θ ( 1 T ) \Theta\left(\frac{1}{T}\right) Θ(T1)的最优速率收敛到唯一的纳什均衡。这些有限时间保证突出了OGD的优点,但OGD的缺点在于需要知道强凸性/单调性参数。在本文中,我们设计了一种完全自适应的OGD算法,AdaOGD,它不需要事先了解这些参数。在单代理环境中,我们的算法在强凸性下实现了 O ( log ⁡ 2 T ) O(\log^2 T) O(log2T)的遗憾,这在对数因子范围内是最优的。此外,如果每个代理在强单调博弈中采用AdaOGD,联合行为在最后迭代的意义上以 O ( log ⁡ 3 T T ) O\left(\frac{\log^3 T}{T}\right) O(Tlog3T)的速率收敛到唯一的纳什均衡,同样在对数因子范围内是最优的。我们通过一个经典的报童问题的学习版本来说明我们的算法,在该问题中,由于销售损失,只能观察到(噪声)梯度反馈。我们的结果立即提供了第一个可行且接近最优的算法,适用于单一零售商和多零售商设置。我们还使用在线牛顿步骤算法将我们的结果扩展到更一般的指数凹成本函数和博弈环境中。

推荐文章6

  • 题目:The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation前瞻和近似策略评估在具有线性价值函数近似的强化学习中的作用
  • 期刊:Operations Research
  • 原文链接:https://doi.org/10.1287/opre.2022.0357
  • 发表日期:2024/05/30
  • 作者:Anna Winnicki , Joseph Lubars , Michael Livesay , R. Srikant
  • 摘要
    • Function approximation is widely used in reinforcement learning to handle the computational difficulties associated with very large state spaces. However, function approximation introduces errors that may lead to instabilities when using approximate dynamic programming techniques to obtain the optimal policy. Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation. We quantitatively characterize the impact of lookahead and m-step rollout on the performance of approximate dynamic programming (DP) with function approximation. (i) Without a sufficient combination of lookahead and m-step rollout, approximate DP may not converge. (ii) Both lookahead and m-step rollout improve the convergence rate of approximate DP. (iii) Lookahead helps mitigate the effect of function approximation and the discount factor on the asymptotic performance of the algorithm. Our results are presented for two approximate DP methods: one that uses least-squares regression to perform function approximation and another that performs several steps of gradient descent of the least-squares objective in each iteration.
    • 函数近似在强化学习中广泛使用,以处理与非常大的状态空间相关的计算困难。然而,函数近似引入的误差可能会在使用近似动态规划技术获得最优策略时导致不稳定。因此,前瞻策略改进和m步展开策略评估等技术在实践中被用来改善具有函数近似的近似动态规划的性能。我们定量描述了前瞻和m步展开对具有函数近似的近似动态规划(DP)性能的影响。(i)如果没有足够的前瞻和m步展开的结合,近似DP可能不会收敛。(ii)前瞻和m步展开都能提高近似DP的收敛速度。(iii)前瞻有助于缓解函数近似和折扣因子对算法渐近性能的影响。我们的结果适用于两种近似DP方法:一种使用最小二乘回归进行函数近似,另一种在每次迭代中执行最小二乘目标的多个梯度下降步骤。
  • 7
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值