论文速递 | Management Science 月7文章合集

运筹OR帷幄

已于 2024-08-28 05:55:49 修改

阅读量1k

点赞数 24

文章标签：人工智能算法

于 2024-08-28 05:55:37 首次发布

本文链接：https://blog.csdn.net/weixin_53463894/article/details/141291385

版权

编者按：

在本系列文章中，我们从运筹学顶刊 Management Science 7月份发布的28篇文章中筛选出9篇文章，并介绍基本信息，旨在帮助读者快速洞察行业最新动态。

文章1

题目
How Framing Influences Strategic Interactions
框架如何影响战略互动
作者
Christopher K. Hsee, Alex Imas, Xilin Li
原文链接
https://doi.org/10.1287/mnsc.2023.00518
发布时间
July 3, 2024
摘要
In many settings, a person’s outcome depends not only on her own behavior, but also on her counterpart’s. Such strategic decisions have traditionally been studied using normative game theory, which assumes that people adopt equilibrium strategies and will reach the same decision, regardless of how the problem is described (framed). We examine a potentially important type of framing effect—focusing on how the relationship between players’ actions generates joint outcomes. Any strategic interaction can be described by either spelling out the outcomes of all possible action combinations (which we call “outcome framing,” or simply “O-framing”) or describing what will happen if different players choose the same action or choose different actions (which we call “relation framing,” or simply “R-framing”). O-framing has been the typical way to describe a strategic problem in prior work, whereas R-framing is commonly employed in real-life communications. We propose that these functionally equivalent frames induce different psychological processes and lead to different decisions: Relative to O-framing, R-framing increases players’ beliefs about their counterparts’ likelihood of coordinating on a cooperative option. We demonstrate this effect in the context of classic games such as the Prisoner’s Dilemma and the Stag Hunt. We find that, compared with O-framing, R-framing significantly increases people’s likelihood to choose the action that maximizes collective benefits rather than individual interests, and it does so by increasing beliefs that one’s partner will choose the same action as well. We derive conditions when this effect is likely to emerge and discuss the managerial implications of this research.
在许多情境中，一个人的结果不仅取决于她自己的行为，还取决于她对手的行为。传统上，这类战略决策使用规范博弈论进行研究，假设人们采用均衡策略，并且无论问题如何描述（框架），他们都会做出相同的决策。我们研究了一种可能重要的框架效应——关注玩家行为之间的关系如何产生共同结果。任何战略互动都可以通过列出所有可能的行为组合的结果（我们称之为“结果框架”或简称“O框架”）或描述不同玩家选择相同行为或不同行为时会发生什么（我们称之为“关系框架”或简称“R框架”）来描述。在先前的工作中，O框架一直是描述战略问题的典型方式，而R框架在现实生活中的交流中常用。我们认为这些功能上等效的框架会引发不同的心理过程，并导致不同的决策：相对于O框架，R框架增加了玩家对对手在合作选项上协调的可能性的信念。我们在囚徒困境和鹿猎游戏等经典博弈的背景下展示了这一效应。我们发现，与O框架相比，R框架显著增加了人们选择最大化集体利益而不是个人利益的行为的可能性，并且通过增加对方也会选择相同行为的信念来实现这一点。我们推导出了这种效应可能出现的条件，并讨论了这一研究的管理意义。

文章2

题目
Estimating Stockout Costs and Optimal Stockout Rates: A Case on the Management of Ugly Produce Inventory
估算缺货成本和最佳缺货率：以丑陋农产品库存管理为例
作者
Stanley Frederick W. T. Lim, Elliot Rabinovich, Sanghak Lee, Sungho Park
原文链接
https://doi.org/10.1287/mnsc.2021.03174
发布时间
July 9, 2024
摘要
Efficiently managing inventories requires an accurate estimation of stockout costs. This estimation is complicated by challenges in determining how to compensate consumers monetarily to ensure they will maintain the same level of utility they would have obtained had stockouts not occurred. This paper presents an analysis of these compensation costs as applied to the design of optimal stockout rates by an online retailer marketing to consumers aesthetically substandard fruits and vegetables rejected by mainstream grocery chains. Because growers face high uncertainty in their harvesting conditions and in the aesthetic quality of their crops and there are little data on hand to predict the value consumers attach to the availability of subpar produce, it is difficult to optimally match the supply of these products with consumers’ demand. Our analysis draws from a multiple discrete-continuous extreme value (MDCEV) choice model to calculate consumer compensations that the retailer can use to estimate the opportunity costs of stockouts to manage its inventory. We show that not taking into account these compensation costs could unduly inflate the optimal stockout rates for these products. Armed with these compensation cost estimates, we show how these costs can serve as incentives for retailers to source greater inventory amounts of imperfect produce from growers and how this will ultimately translate into less waste in the supply chain.
有效管理库存需要准确估算缺货成本。这种估算因难以确定如何通过货币补偿消费者，以确保他们能够保持原有的效用水平而变得复杂。本文分析了这些补偿成本，并将其应用于设计最佳缺货率，以便一个在线零售商向消费者销售被主流杂货连锁店拒绝的外观欠佳的水果和蔬菜。由于种植者在收获条件和作物的外观质量方面面临高度不确定性，且几乎没有数据可以预测消费者对次等农产品供应的价值，因此很难将这些产品的供应与消费者需求最佳匹配。我们的分析利用多重离散连续极值（MDCEV）选择模型来计算消费者补偿金额，零售商可以使用这些补偿金额来估算缺货的机会成本以管理库存。我们发现，如果不考虑这些补偿成本，可能会导致这些产品的最佳缺货率过高。通过这些补偿成本估算，我们展示了这些成本如何激励零售商从种植者那里采购更多次等农产品库存，并最终转化为供应链中更少的浪费。

文章3

题目
Improving the Efficiency of Payments Systems Using Quantum Computing
使用量子计算提高支付系统的效率
作者
Christopher McMahon,Donald McGillivray, Ajit Desai, Francisco Rivadeneyra, Jean-Paul Lam, Thomas Lo, Danica Marsden, Vladimir Skavysh
原文链接
https://doi.org/10.1287/mnsc.2023.00314
发布时间
July 12, 2024
摘要
High-value payment systems (HVPSs) are typically liquidity intensive because payments are settled on a gross basis. State-of-the-art solutions to this problem include algorithms that seek netting sets and allow for ad hoc reordering of submitted payments. This paper introduces a new algorithm that explores the entire space of payments reordering to improve the liquidity efficiency of these systems without significantly increasing payment delays. Finding the optimal payment order among the entire space of reorderings is, however, an NP-hard combinatorial optimization problem. We solve this problem using a hybrid quantum annealing algorithm. Despite the limitations in size and speed of today’s quantum computers, our algorithm provides quantifiable liquidity savings when applied to the Canadian HVPS using a 30-day sample of transaction data. By reordering batches of 70 payments, we achieve an average of Canadian © $240 million in daily liquidity savings, with a settlement delay of approximately 90 seconds. For a few days in the sample, the liquidity savings exceed C$1 billion. Compared with classical computing and with current algorithms in HVPS, our quantum algorithm offers larger liquidity savings, and it offers more reliable and consistent solutions, particularly under time constraints.
高价值支付系统（HVPS）通常因为支付是基于总额结算而需要大量流动性。当前解决这一问题的先进方案包括寻找净额结算集合的算法和允许对提交的支付进行临时重新排序。本文介绍了一种新算法，探索支付重新排序的整个空间，以提高这些系统的流动性效率，而不会显著增加支付延迟。然而，在整个重新排序空间中找到最优支付顺序是一个NP难的组合优化问题。我们使用混合量子退火算法解决了这个问题。尽管当今量子计算机在规模和速度上有局限性，我们的算法在应用于加拿大HVPS并使用30天的交易数据样本时提供了可量化的流动性节省。通过对70笔支付进行批次重新排序，我们每天平均节省了2.4亿加元的流动性，结算延迟约为90秒。在样本中的某些天，流动性节省超过10亿加元。与经典计算和当前HVPS算法相比，我们的量子算法提供了更大的流动性节省，并且在时间限制下提供了更可靠和一致的解决方案。

文章4

题目
Discovering Causal Models with Optimization: Confounders, Cycles, and Instrument Validity
通过优化发现因果模型：混淆因素、循环和工具变量的有效性
作者
Frederick Eberhardt, Nur Kaynar, Auyon Siddiq
原文链接
https://doi.org/10.1287/mnsc.2021.02066
发布时间
July 17, 2024
摘要
We propose a new optimization-based method for learning causal structures from observational data, a process known as causal discovery. Our method takes as input observational data over a set of variables and returns a graph in which causal relations are specified by directed edges. We consider a highly general search space that accommodates latent confounders and feedback cycles, which few extant methods do. We formulate the discovery problem as an integer program and propose a solution technique that exploits the conditional independence structure in the data to identify promising edges for inclusion in the output graph. In the large-sample limit, our method recovers a graph that is (Markov) equivalent to the true data-generating graph. Computationally, our method is competitive with the state-of-the-art, and can solve in minutes instances that are intractable for alternative causal discovery methods. We leverage our method to develop a procedure for investigating the validity of an instrumental variable and demonstrate it on the influential quarter-of-birth and proximity-to-college instruments for estimating the returns to education. In particular, our procedure complements existing instrument tests by revealing the precise causal pathways that undermine instrument validity, highlighting the unique merits of the graphical perspective on causality.
我们提出了一种新的基于优化的方法，用于从观察数据中学习因果结构，即因果发现。我们的方法以一组变量的观察数据为输入，并返回一个图，其中因果关系由有向边指定。我们考虑了一个高度通用的搜索空间，能够容纳潜在的混淆因素和反馈循环，这是很少有现有方法能够做到的。我们将发现问题表述为一个整数规划，并提出了一种利用数据中的条件独立结构来识别包含在输出图中的有希望边的解决技术。在大样本限制下，我们的方法恢复了一个与真实数据生成图（马尔可夫）等价的图。在计算上，我们的方法与最先进的方法具有竞争力，可以在几分钟内解决替代因果发现方法无法处理的实例。我们利用我们的方法开发了一种检查工具变量有效性的程序，并在估计教育回报的有影响力的出生季度和邻近大学的工具上进行了演示。特别是，我们的程序通过揭示破坏工具有效性的精确因果路径，补充了现有的工具测试，突出了因果关系图形视角的独特优点。

文章5

题目
The Cost of Impatience in Dynamic Matching: Scaling Laws and Operating Regimes
动态匹配中的急躁成本：缩放法则与操作模式
作者
Angela Kohlenberg, Itai Gurvich
原文链接
https://doi.org/10.1287/mnsc.2023.01513
发布时间
July 19, 2024
摘要
We study matching queues with abandonment. The simplest of these is the two-sided queue with servers on one side and customers on the other, both arriving dynamically over time and abandoning if not matched by the time their patience elapses. We identify nonasymptotic and universal scaling laws for the matching loss due to abandonment, which we refer to as the “cost of impatience.” The scaling laws characterize the way in which this cost depends on the arrival rates and the (possibly different) mean patience of servers and customers. Our characterization reveals four operating regimes identified by an operational measure of patience that brings together mean patience and utilization. The four regimes subsume the regimes that arise in asymptotic (heavy-traffic) approximations. The scaling laws, specialized to each regime, reveal the fundamental structure of the cost of impatience and show that its order of magnitude is fully determined by (i) a “winner-take-all” competition between customer impatience and utilization, and (ii) the ability to accumulate inventory on the server side. Practically important is that when servers are impatient, the cost of impatience is, up to an order of magnitude, given by an insightful expression where only the minimum of the two patience rates appears. Considering the trade-off between abandonment and capacity costs, we characterize the scaling of the optimal safety capacity as a function of costs, arrival rates, and patience parameters. We prove that the ability to hold inventory of servers means that the optimal safety capacity grows logarithmically in abandonment cost and, in turn, slower than the square-root growth in the single-sided queue.
我们研究了具有放弃的匹配队列。最简单的例子是双边队列，一边是服务器，一边是客户，两者都动态地随时间到达，如果在耐心消失前没有匹配，则会放弃。我们确定了由于放弃而导致的匹配损失的非渐近和普遍缩放法则，我们称之为“急躁成本”。这些缩放法则描述了这种成本如何依赖于到达率和服务器与客户的（可能不同的）平均耐心。我们的表征揭示了由将平均耐心和利用率结合在一起的操作耐心度量所识别的四个操作模式。这四种模式涵盖了渐近（重负载）近似中出现的模式。每种模式专用的缩放法则揭示了急躁成本的基本结构，并显示其数量级完全由（i）客户急躁和利用率之间的“赢者通吃”竞争以及（ii）在服务器端累积库存的能力决定。实际重要的是，当服务器急躁时，急躁成本在数量级上由一个只有两个耐心率中最小值的有见地的表达式给出。在考虑放弃和容量成本之间的权衡时，我们描述了最佳安全容量的缩放，作为成本、到达率和耐心参数的函数。我们证明，持有服务器库存的能力意味着最佳安全容量在放弃成本上呈对数增长，而不是在单边队列中的平方根增长。

文章6

题目
A High-Dimensional Choice Model for Online Retailing
在线零售的高维选择模型
作者
Zhaohui (Zoey) Jiang, Jun Li, Dennis Zhang
原文链接
https://doi.org/10.1287/mnsc.2020.02715
发布时间
July 19, 2024
摘要
Online retailers are facing an increasing variety of product choices and diversified consumer decision journeys. To improve many operations decisions for online retailers, such as demand forecasting, inventory management, and pricing, an important first step is to obtain an accurate estimate of the substitution patterns among a large number of products offered in the complex online environment. Classic choice models either do not account for these substitution patterns beyond what is reflected through observed product features or do so in a simplified way by making a priori assumptions. These shortcomings become particularly restrictive when the underlying substitution patterns get complex as the number of options increases. We provide a solution by developing a high-dimensional choice model that allows for flexible substitution patterns and easily scales up. We leverage consumer clickstream data and combine econometric and machine learning (graphical lasso, in particular) methods to learn the substitution patterns among a large number of products. We show our method offers more accurate demand forecasts in a wide range of synthetic scenarios when compared with classical models (e.g., the independent and identically distributed Probit model), reducing out-of-sample mean absolute percentage error by 10%–30%. Such performance improvement is further supported by observations from a real-world empirical setting. More importantly, our method excels in precisely recovering substitution patterns across products. Compared with benchmark models, it reduces the percentage deviation from the underlying elasticity matrix by approximately half. This precision serves as a critical input for enhancing business decisions such as assortment planning, inventory management, and pricing strategies.
在线零售商面临着日益多样化的产品选择和多样化的消费者决策旅程。为了改进在线零售商的许多运营决策，例如需求预测、库存管理和定价，第一步是准确估计复杂在线环境中大量产品之间的替代模式。经典的选择模型要么不考虑这些替代模式超出通过观察到的产品特征所反映的内容，要么通过先验假设以简化方式进行考虑。当基础替代模式随着选项数量的增加变得复杂时，这些缺点尤其限制。我们通过开发一个允许灵活替代模式且易于扩展的高维选择模型提供了解决方案。我们利用消费者点击流数据，结合计量经济学和机器学习（特别是图形lasso）方法来学习大量产品之间的替代模式。我们展示了与经典模型（例如独立同分布的Probit模型）相比，我们的方法在广泛的合成场景中提供了更准确的需求预测，将样本外平均绝对百分比误差降低了10%到30%。这种性能提升还得到了真实世界实证环境的支持。更重要的是，我们的方法在精确恢复产品间替代模式方面表现出色。与基准模型相比，它将基础弹性矩阵的百分比偏差减少了大约一半。这种精度是改进业务决策（如产品组合规划、库存管理和定价策略）的关键输入。

文章7

题目
Self-Guided Approximate Linear Programs: Randomized Multi-Shot Approximation of Discounted Cost Markov Decision Processes
自引导近似线性规划：折扣成本马尔可夫决策过程的随机多次近似
作者
Parshan Pakiman, Selvaprabu Nadarajah, Negar Soheili, Qihang Lin
原文链接
https://doi.org/10.1287/mnsc.2020.00038
发布时间
July 23, 2024
摘要
Approximate linear programs (ALPs) are well-known models based on value function approximations (VFAs) to obtain policies and lower bounds on the optimal policy cost of discounted-cost Markov decision processes (MDPs). Formulating an ALP requires (i) basis functions, the linear combination of which defines the VFA, and (ii) a state-relevance distribution, which determines the relative importance of different states in the ALP objective for the purpose of minimizing VFA error. Both of these choices are typically heuristic; basis function selection relies on domain knowledge, whereas the state-relevance distribution is specified using the frequency of states visited by a baseline policy. We propose a self-guided sequence of ALPs that embeds random basis functions obtained via inexpensive sampling and uses the known VFA from the previous iteration to guide VFA computation in the current iteration. In other words, this sequence takes multiple shots at randomly approximating the MDP value function with VFA-based guidance between consecutive approximation attempts. Self-guided ALPs mitigate domain knowledge during basis function selection and the impact of the state-relevance-distribution choice, thus reducing the ALP implementation burden. We establish high-probability error bounds on the VFAs from this sequence and show that a worst-case measure of policy performance is improved. We find that these favorable implementation and theoretical properties translate to encouraging numerical results on perishable inventory control and options pricing applications, where self-guided ALP policies improve upon policies from problem-specific methods. More broadly, our research takes a meaningful step toward application-agnostic policies and bounds for MDPs.
近似线性规划（ALPs）是基于价值函数近似（VFAs）来获取政策和折扣成本马尔可夫决策过程（MDPs）最佳政策成本下界的著名模型。构建ALP需要：（i）基函数，其线性组合定义了VFA，以及（ii）状态相关分布，它决定了在ALP目标中不同状态的重要性，以便最小化VFA误差。这两个选择通常是启发式的；基函数选择依赖于领域知识，而状态相关分布则是使用基线政策访问状态的频率来指定的。我们提出了一种自引导ALP序列，该序列嵌入了通过廉价采样获得的随机基函数，并使用前一迭代的已知VFA来指导当前迭代中的VFA计算。换句话说，这个序列多次随机近似MDP价值函数，并在连续的近似尝试之间进行VFA引导。自引导ALPs在基函数选择期间减轻了领域知识的依赖，并减少了状态相关分布选择的影响，从而降低了ALP实现的负担。我们在该序列的VFAs上建立了高概率误差界，并显示出政策性能的最坏情况度量有所改善。我们发现，这些有利的实现和理论属性转化为在易腐库存控制和期权定价应用中的鼓舞人心的数值结果，其中自引导ALP政策优于特定问题方法的政策。更广泛地说，我们的研究向MDPs的应用无关政策和界限迈出了有意义的一步。

文章8

题目
Nonlinear Decision Rules Made Scalable by Nonparametric Liftings
非参数提升使非线性决策规则可扩展
作者
Eojin Han, Omid Nohadani
原文链接
https://doi.org/10.1287/mnsc.2024.4988
发布时间
July 30, 2024
摘要
Sequential decision making often requires dynamic policies, which are computationally not tractable in general. Decision rules provide approximate solutions by restricting decisions to simple functions of uncertainties. In this paper, we consider a nonparametric lifting framework where the uncertainty space is lifted to higher dimensions to obtain nonlinear decision rules. Current lifting-based approaches require predetermined functions and are parametric. We propose two nonparametric liftings, which derive the nonlinear functions by leveraging the uncertainty set structure and problem coefficients. Both methods integrate the benefits from lifting and nonparametric approaches, and hence provide scalable decision rules with performance bounds. More specifically, the set-driven lifting is constructed by finding polyhedrons within uncertainty sets, inducing piecewise-linear decision rules with performance bounds. The dynamics-driven lifting, on the other hand, is constructed by extracting geometric information and accounting for problem coefficients. This is achieved by using linear decision rules of the original problem, also enabling one to quantify lower bounds of objective improvements over linear decision rules. Numerical comparisons with competing methods demonstrate superior computational scalability and comparable performance in objectives. These observations are magnified in multistage problems with extended time horizons, suggesting practical applicability of the proposed nonparametric liftings in large-scale dynamic robust optimization.
顺序决策通常需要动态策略，这在一般情况下是计算上不可行的。决策规则通过将决策限制为不确定性的简单函数来提供近似解。在本文中，我们考虑了一种非参数提升框架，将不确定性空间提升到更高维度以获得非线性决策规则。当前基于提升的方法需要预定函数且是参数化的。我们提出了两种非参数提升方法，这些方法通过利用不确定性集结构和问题系数来导出非线性函数。这两种方法结合了提升和非参数方法的优点，因此提供了具有性能界限的可扩展决策规则。更具体地说，集合驱动的提升通过在不确定性集合中找到多面体来构建，诱导具有性能界限的分段线性决策规则。另一方面，动态驱动的提升通过提取几何信息和考虑问题系数来构建。这是通过使用原始问题的线性决策规则实现的，还可以量化线性决策规则的目标改进的下限。与竞争方法的数值比较表明，在计算可扩展性和目标性能方面具有明显优势。这些观察结果在具有扩展时间范围的多阶段问题中被放大，表明所提出的非参数提升在大规模动态鲁棒优化中的实际应用性。

文章9

题目
The Bright Side of Price Volatility in Global Commodity Procurement
全球大宗商品采购中价格波动的光明面
作者
Wei Xing, Liming Liu, Fuqiang Zhang, Qian Zhao
原文链接
https://doi.org/10.1287/mnsc.2023.00304
发布时间
July 30, 2024
摘要
This paper studies two competing firms’ choices between the contingent-price contract (CPC) and fixed-price contract (FPC) in global commodity procurement. The FPC price is determined when signing the contract, whereas the CPC price is pegged to an underlying index and remains open until the delivery date. Under both contracts, each firm determines its order quantity based on the updated belief about the market demand. The unrealized CPC price correlates with the market demand, allowing a firm to update its belief about the CPC price using demand information, thereby generating a price-learning effect. We find that, contrary to conventional wisdom, a larger price volatility could benefit the firms, and, under differentiated contracts, a firm might benefit from the improvement of forecast accuracy at its rival. We further show that the price-learning effect plays a critical role in the firms’ contract choices. First, significant price volatility forces the firms to pursue the responsiveness of the CPC. Second, the firms may adopt differentiated contracts to enhance their responses to market changes and dampen competition, and a higher competition intensity more likely leads to contract differentiation. Third, the firms in a small market seek responsiveness and contract differentiation rather than cost efficiency. This study reveals the bright side of price volatility and takes a step toward understanding the effect of two-dimensional information updating.
本文研究了两家竞争企业在全球大宗商品采购中选择应急价格合同（CPC）和固定价格合同（FPC）的情况。FPC的价格在签订合同时确定，而CPC的价格则与一个基础指数挂钩，并保持开放直到交付日期。在这两种合同下，每家公司都根据对市场需求的最新预期确定订单数量。未实现的CPC价格与市场需求相关，使公司能够利用需求信息更新对CPC价格的预期，从而产生价格学习效应。我们发现，与传统观点相反，较大的价格波动可能对公司有利，并且在差异化合同下，公司可能受益于竞争对手预测准确性的提高。我们进一步表明，价格学习效应在公司合同选择中起关键作用。首先，显著的价格波动迫使公司追求CPC的响应能力。其次，公司可能会采用差异化合同以增强对市场变化的响应并减弱竞争，较高的竞争强度更有可能导致合同差异化。第三，小市场中的公司寻求响应能力和合同差异化，而不是成本效率。本研究揭示了价格波动的光明面，并迈出了理解二维信息更新效应的一步。