运筹系列53：GPU用于线性规划和整数规划调研

最新推荐文章于 2023-04-05 21:59:31 发布

IE06

最新推荐文章于 2023-04-05 21:59:31 发布

阅读量2.3k

点赞数 4

分类专栏：运筹学

本文链接：https://blog.csdn.net/kittyzc/article/details/109043633

版权

运筹学专栏收录该内容

93 篇文章 350 订阅

订阅专栏

1. 概览

GPU主要用于并行高性能计算，关于用于OR综述文章列举如下：
Brodtkorb et al. (2013) and Schulz et al. (2013) deals with routing problems.
Luong (2011b) considers Metaheuristics on GPU.
Alba et al. (2013) study parallel metaheuristics.

2. 精确解法

主要是：单纯形、内外点法、动态规划、分支定界法。由于求解方法是树结构，因此只能部分进行加速，概述文献如下：
The Simplex Tableau：Lalami et al. (2011a,b)
The Two-Phase Simplex：Meyer et al. (2011)
The Revised Simplex：Ploskas and Samaras (2015)，Nikolaos and Nikolaos (2013)，Bieling et al. (2010)，Spampinato and Elster (2009)，Greeff (2005)
The Interior Point Method：Jung and O’Leary (2008)
The Exterior Point Method：Ploskas and Samaras (2015)

2.1 线性规划

Greeff (2005)使用GPU加速revised单纯形，声称可提速11.5倍。
单纯形法涉及很多矩阵操作，注入BLAS和MATLAB已经可以将矩阵求逆、矩阵相乘等运算使用GPU进行加速。Spampinato and Elster
(2009)声称对于2000个变量和2000个约束的问题，基于 ATLAS-based solver Whaley and Dongarra (1999)，使用NVIDIA GeForce GTX 280 GPU，相比于 Intel Core 2 Quad 2.83GHz processor可以提速2.5倍。
Nikolaos and Nikolaos (2013) 基于MATLAB求解5000个变量和5000个约束的问题，使用NVIDIA Quadro 6000，比Intel Core i7 3.4GHz提速5.5倍。
Ploskas and Samaras Nikolaos and Nikolaos (2013) 研究了revised单纯形法的一般形式，提出了用于GPU求解的Product Form of the Inverse (PFI，基于Dantzig and Orchard-Hays (1954) ），以及Modification of the PFI (MPFI，基于Benhamadou (2002)）。研究表明PFI略胜于MPFI。
Ploskas and Samaras (2015)实现了GPU版本的revised simplex和exterior point method。基于netlib benchmark，使用NVIDIA Quadro 6000，表明外点法明显优于单纯形法，稀疏矩阵可以提速20倍，稠密矩阵可以提速181倍。
Bieling et al. (2010)基于上文又做了一些优化，使用Goldfarb and Reid (1977)提出的steepest-edge heuristic选择进基和出基变量。作者对比了GLPK求解器，对于8000个变量和2700个约束的问题，NVIDIA GeForce 9600 GT GPU相比于Intel Core 2 Duo E8400 3.0 GHz processor提速了18倍。
Bieling et al. (2010)表明将数据组织成表结构更易于GPU发挥作用。Lalami et al. (2011a,b) and Meyer et al. (2011) 进行了实现，其中Lalami et al. (2011b) 基于Garfinkel and Nemhauser (1972)实现，采用了horizontal decomposition，将表的不同行拆分到不同的GPU上进行加速，通过CPU进行调度。求解 27,000个变量和27,000个约束的问题，使用Intel Xeon E5640 2.66GHz CPU以及两块NVIDIA C2050 GPUs,提速2.5倍。如果只使用一块GPU，提速12.5倍。
Meyer et al. (2011)提出了两阶段法的多GPU实现方法，使用了vertical decomposition，即将变量拆分到GPU上进行加速，减少不同GPU之间的通信。求解25,000个变量和5,000个约束的问题，在两个Intel Xeon X5570 2.93GHz和四个NVIDIA Tesla S1070机器上，比CLP solver要好。
Jung and O’Leary (2008)研究了混合精度的primal-dual interior point内点法，使用GPU计算Choleskey分解、forward and back substitution等，但是结果没有比CPU好。

2.2 动态规划

目前只有knapsack problem使用动态规划进行了求解。
Knapsack Problem：Boyer et al. (2011,2012)
Multi-Choice Knapsack Problem：Suri et al. (2012)
Boyer et al. (2011) ，数据被组织成表格的形式，列代表物品，行代表knapsack的容量。作者还提出了数据压缩方法，减少数据通信。测试机器为Intel Xeon 3.0 GHz，NVIDIA GTX 260 GPU，测试变量数量为100,000个，计算时间减少了26%，overhead的时间不超过总时间的3%。
Boyer et al. (2012)则给出了多GPU的解法， Intel Xeon 3 GHz处理器和Tesla S1070 GPU上，一个GPU能提速14倍，两个GPU能提速28倍。

2.3 分支定界法

B&B很难用GPU来做，因为分支定界树的结构常常非常irregular。目前有三类问题尝试用GPU去求解：
Knapsack Problem：Boukedjar et al. (2012)
Flow-shop Scheduling Problem：Lalami and El Baz (2012)、Lalami (2012)、Chakroun and Melab (2012)
Traveling Salesman Problem：Chakroun et al. (2012, 2013)、Melab et al. (2012)、Carneiro et al. (2011)

knapsack问题的处理方式：首先用CPU生成树，当节点数目达到一定数值时，使用GPU并行进行处理，每一个节点用一颗GPU内核，bounds comparison和节点删除也在GPU上进行，每一步用cpu进行节点的合并。Intel Xeon E5640 2.66GHz processor 以及 NVIDIA C2050 GPU上，对于1000个变量的问题速度提升52倍。
flow shop问题中，99%的时间被用于bounds的比较，因此GPU主要用于提升bounds比较，其他步骤则在CPU上进行。Intel Xeon E5520 2.27GHz 、NVIDIA C2050系统上，Taillard (1993)的测试阿尼拉，最多提速77倍。 Chakroun and Melab (2012) 测试了多GPU的例子，在两块Tesla T10机器上，最多提速105倍。
Chakroun et al. (2013) 设计了CPU-GPU混合算法，将branching和pruning算子也设计在GPU上了。Intel Xeon E5520 2.27GHz、NVIDIA C2050机器上，使用Taillard (1993)例子，提速160倍。cooperative approach比concurrent one快36%。作者建议用GPU cores探索树结构，用CPU cores做数据准备和数据传输。

TSP问题：, Carneiro et al.(2011) 在Intel Core i5750 2.66GHz、NVIDIA GeForce GTS 450机器上，最多提速11倍。tsp问题的求解有很多启发式算法，GPU也主要用在这些启发式算法的加速上。

3. 参考文献清单

Alba, E., Dorronsoro, B., 2008. Cellular genetic algorithms. Vol. 42. Springer.
Alba, E., Luque, G., Nesmachnow, S., 2013. Parallel metaheuristics: recent advances and new trends. International Transactions in Operational Research 20 (1), 1–48.
Bai, H., Ouyang, D., Li, X., He, L., Yu, H., dec. 2009. MAX-MIN Ant System on GPU with CUDA. In: Fourth International Conference on Innovative Computing, Information and Control (ICICIC 2009). pp. 801–804.
Bellman, R., 1957. Dynamic Programming. Princeton University Press.
Benhamadou, M., 2002. On the simplex algorithm ‘revised form’. Advances in Engineering Software 33 (11), 769–777.
Bieling, J., Peschlow, P., Martini, P., april 2010. An efficient GPU implementation of the revised simplex method. In: 24th IEEE International Parallel Distributed Processing Symposium, Workshops and Phd Forum (IPDPSW 2010). pp. 1–8.
Boukedjar, A., Lalami, M., El Baz, D., feb. 2012. Parallel branch and bound on a CPU-GPU system. In: 20th International Conference on Parallel, Distributed and network-based Processing (PDP 2012). pp. 392–398.
Boyer, V., El Baz, D., Elkihel, M., feb. 2011. Dense dynamic programming on multi GPU. In: 19th International Conference on Parallel, Distributed and network-based Processing (PDP 2011). pp. 545–551.
Boyer, V., El Baz, D., Elkihel, M., 2012. Solving knapsack problems on GPU. Computers and Operations Research 39 (1), 42–47.
Bo˙zejko, W., 2009. Solving the flow shop problem by parallel programming. Journal of parallel and distributed computing 69 (5), 470–481.
Brodtkorb, A. R., Hagen, T. R., Schulz, C., Hasle, G., 2013. GPU computing in discrete optimization. Part I: Introduction to the GPU. EURO Journal on Transportation and Logistics, 1–29.
Bukata, L., S˚ucha, P., Hanz´alek, Z., 2015. Solving the resource constrained project scheduling problem using the parallel tabu search designed for the {CUDA} platform. Journal of Parallel and Distributed Computing 77 (0), 58 – 68.
Bukata, L., Sucha, P., 2013. A GPU algorithm design for resource constrained scheduling problem. In: 21st Conference on Parallel, Distributed and networked-based Processing (PDP). pp. 367–374.
Carneiro, T., Muritiba, A. E., Negreiros, M., Lima de Campos, G. A., 2011. A new parallel schema for branch-and-bound algorithms using gpgpu. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2011 23rd International Symposium on. IEEE, pp. 41–47.
Catala, A., Jaen, J., Modioli, J., sept. 2007. Strategies for accelerating ant colony optimization algorithms on graphical processing units. In: 2007 IEEE Congress on Evolutionary Computation (CEC 2007). pp. 492–500.
Cecilia, J., Garcia, J., Ujaldon, M., Nisbet, A., Amos, M., may 2011. Parallelization strategies for ant colony optimisation on GPUs. In: 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum (IPDPSW 2011). pp. 339–346.
Chakroun, I., Melab, N., 2012. An adaptative multi-GPU based branch-andbound. a case study: the flow-shop scheduling problem. In:IEEE 14th International Conference on High Performance Computing and Communication and 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS). pp. 389 – 395.
Chakroun, I., Melab, N., Mezmaz, M., Tuyttens, D., 2013. Combining multicore and gpu computing for solving combinatorial optimization problems. Journal of Parallel and Distributed Computing 73 (12), 1563–1577.
Chakroun, I., Mezmaz, M., Melab, N., Bendjoudi, A., 2012. Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurrency and Computation: Practice and Experience 25 (8), 1121–1136.
Chen, S., Davis, S., Jiang, H., A., N., 2011. CUDA-based genetic algorithm on traveling salesman problem. In: Lee, R. (Ed.), Computers and Information Science. Springer Berlin Heidelberg, pp. 241–252.
Czapi´nski, M., Barnes, S., 2011. Tabu Search with two approaches to parallel flow shop evaluation on CUDA platform. Journal of Parallel and Distributed Computing 71, 802 – 811.
Dantzig, G., 1951. Maximization of a linear function of variables subject to linear inequalities. In: Activity Analysis of Production and Allocation. Wiley and Chapman-Hall, pp. 339–347.
Dantzig, G. B., Orchard-Hays, W., 1954. The product form for the inverse in the simplex method. Mathematical Tables and Other Aids to Computation, 64–67.
Del´evacq, A., Delisle, P., Gravel, M., Krajecki, M., 2013. Parallel ant colony optimization on graphics processing units. Journal of Parallel and Distributed Computing 73 (1), 52–61.
Dorigo, M., Birattari, M., St¨utzle, T., 2006. Ant colony optimization. Computational Intelligence Magazine, IEEE 1 (4), 28–39.
Fu, J., Lei, L., Zhou, G., aug. 2010. A parallel ant colony optimization algorithm with GPU-acceleration based on all-in-roulette selection. In: Third International Workshop on Advanced Computational Intelligence (IWACI
2010). pp. 260–264.
Garfinkel, R. S., Nemhauser, G. L., 1972. Integer programming. Vol. 4. Wiley New York.
Glover, F., 1989. Tabu search - part i. ORSA Journal on computing 1 (3), 190–206.
Glover, F., 1990. Tabu search - part ii. ORSA Journal on computing 2 (1), 4–32.
Goldfarb, D., Reid, J., 1977. A practicable steepest-edge simplex algorithm. Mathematical Programming 12 (1), 361–371.
Greeff, G., 2005. The revised simplex algorithm on a GPU. Univ. of Stellenbosch, Tech. Rep.
Harris, M., Sengupta, S., Owens, J. D., 2007. Parallel prefix sum (scan) with CUDA. GPU gems 3 (39), 851–876.
Ibarra, O. H., Kim, C. E., Apr. 1977. Heuristic algorithms for scheduling independent tasks on nonidentical processors. Journal of the ACM 24 (2), 280–289.
Janiak, A., Janiak, W., Lichtenstein, M., jul 2008. Tabu Search on GPU. Journal of Universal Computer Science 14 (14), 2416–2427.
Jiening, W., Jiankang, D., Chunfeng, Z., aug. 2009. Implementation of ant colony algorithm based on GPU. In: Sixth International Conference on Com-GPU Computing Applied to Linear and Mixed Integer Programming 25 puter Graphics, Imaging and Visualization, 2009 (CGIV ’09). pp. 50 –53.
Jung, J., O’Leary, D., 2008. Implementing an interior point method for linear programs on a CPU-GPU system. Electronic Transactions on Numerical Analysis 28, 174–189.
Kallioras, N. A., Kepaptsoglou, K., Lagaros, N. D., 2015. Transit stop inspection and maintenance scheduling: A gpu accelerated metaheuristics approach. Transportation Research Part C: Emerging Technologies 55, 246–260.
Kung, H., 1982. Why systolic architectures? IEEE computer 15 (1), 37–46.
Kung, H., Leiserson, C. E., 1978. Systolic arrays (for vlsi). In: Sparse Matrix Proceedings. pp. 256–282.
Lageweg, B., Lenstra, J., Kan, A. R., 1978. A general bounding scheme for the permutation flow-shop problem. Operations Research 26 (1), 53–67.
Lalami, M., Boyer, V., El Baz, D., may 2011a. Efficient implementation of the simplex method on a CPU-GPU system. In: 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum (IPDPSW), Workshop PCO’11. pp. 1999–2006.
Lalami, M., El Baz, D., may 2012. GPU implementation of the branch and bound method for knapsack problems. In: 26th IEEE International Parallel and Distributed Processing Symposium, Workshops and PhD Forum (IPDPSW), Workshop PCO’12. pp. 1769–1777.
Lalami, M., El Baz, D., Boyer, V., sept. 2011b. Multi GPU implementation of the simplex algorithm. In: 13th IEEE International Conference on High
Performance Computing and Communications (HPCC 2011). pp. 179 –186.
Laporte, G., Martello, S., 1990. The selective travelling salesman problem. Discrete Applied Mathematics 26 (2), 193–207.
Li, J., Hu, X., Pang, Z., Qian, K., 2009a. A parallel ant colony optimization algorithm based on fine-grained model with GPU-acceleration. International
Journal of Innovative Computing, Information and Control 5 (11), 3707–3716.
Li, J., Zhang, L., Liu, L., 2009b. A parallel immune algorithm based on finegrained model with gpu-acceleration. In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC). pp.
683–686.
Luong, T., 2011a. Metaheuristiques paralleles sur GPU. Ph. D. Thesis, Universite Lille 1.
Luong, T. V., 2011b. M´etaheuristiques parall`eles sur GPU. Ph.D. thesis, Universit´e Lille 1 - Sciences et Technologies.
Luong, T. V., Melab, N., Talbi, E., 2011. GPU-based approaches for multiobjective local search algorithms. a case study: the flowshop scheduling problem. Evolutionary Computation in Combinatorial Optimization, 155–
166. 26 Vincent Boyer et al.
Martello, S., Toth, P., 1990. Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc.
Melab, N., Chakroun, I., Mezmaz, M., Tuyttens, D., sept. 2012. A GPUaccelerated branch-and-bound algorithm for the flow-shop scheduling problem. In: 2012 IEEE International Conference on Cluster Computing (CLUSTER). pp. 10–17.
Meyer, X., Albuquerque, P., Chopard, B., 2011. A multi-GPU implementation and performance model for the standard simplex method. In: Euro-Par 2011. pp. 312–319.
Moz, M., Pato, M. V., 2007. A genetic algorithm approach to a nurse rerostering problem. Computers & Operations Research 34 (3), 667–691.
Naiem, A., El-Beltagy, M., 2009. Deep greedy switching: A fast and simple approach for linear assignment problems. In: 7th International Conference of Numerical Analysis and Applied Mathematics.
Nguyen, H., 2008. GPU GEMS 3. Addison Wesley Professional.
Nikolaos, P., Nikolaos, S., 2013. A computational comparison of basis updatingschemes for the simplex algorithm on a cpu-gpu system. American Journal of Operations Research 3, 497.
Olafsson, S., 2006. Chapter 21 metaheuristics. In: Henderson, S. G., Nelson, ´ B. L. (Eds.), Simulation. Vol. 13 of Handbooks in Operations Research and Management Science. Elsevier, pp. 633 – 654.
Osman, I. H., Laporte, G., 1996. Metaheuristics: A bibliography. Annals of Operations Research 63 (5), 511–623.
Paquete, L., St¨utzle, T., 2006. Stochastic local search algorithms for multiobjective combinatorial optimization: A review. Tech. rep., Institut de Recherches Interdisciplinaires et de D´eveloppements en Intelligence Artificielle.
Pedemonte, M., Alba, E., Luna, F., may 2012. Towards the design of systolic genetic search. In: 26th IEEE International Parallel and Distributed Processing Symposium, Workshops and PhD Forum (IPDPSW 2012), Workshop PCO’12. pp. 1778–1786.
Pinel, F., Dorronsoro, B., Bouvry, P., 2010. A new cellular genetic algorithm to solve the scheduling problem designed for the gpu. In: Metaheuristics Conference (META).
Pinel, F., Dorronsoro, B., Bouvry, P., 2013. Solving very large instances of the scheduling of independent tasks problem on the GPU. Journal of Parallel and Distributed Computing 73 (1), 101–110.
Ploskas, N., Samaras, N., 2015. Efficient GPU-based implementations of simplex type algorithms. Applied Mathematics and Computation 250, 552–570.
Reinelt, G., 1991. TSPLIB - a traveling salesman problem library. ORSA Journal on Computing 3 (4), 376–384.
Reinelt, G., 1994. The traveling salesman: computational solutions for TSP applications. Springer-Verlag.
Roverso, R., Naiem, A., El-Beltagy, M., El-Ansary, S., Haridi, S., 2011. A GPU-enabled solver for time-constrained linear sum assignment problems. In: 7th International Conference on Informatics and Systems (INFOS). pp.1–6.
Scavo, T., Aug 2010. Scatter-to-gather transformation for scalability. URL https://hub.vscse.org/resources/223
Schrijver, A., 1986. Theory of integer and linear programming. Wiley, Chichester.
Schulz, C., Hasle, G., Brodtkorb, A. R., Hagen, T. R., 2013. GPU computing in discrete optimization. Part II: Survey focused on routing problems. EURO Journal on Transportation and Logistics, 1–28.
Sivanandam, S., Deepa, S., 2007. Introduction to genetic algorithms. Springer.
Spampinato, D., Elster, A., may 2009. Linear optimization on modern GPUs. In: 2009 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 1–8.
St¨utzle, T., Hoos, H. H., 2000. MAX-MIN ant system. Future generation computer systems 16 (8), 889–914.
Suri, B., Bordoloi, U., Eles, P., 2012. A scalable GPU-based approach to accelerate the multiple-choice knapsack problem. In: Design, Automation Test in Europe Conference Exhibition (DATE). pp. 1126–1129.
Taillard, E., 1993. Benchmark for basic scheduling problems. Journal of Operational Research 64, 278 – 285.
Uchida, A., Ito, Y., Nakano, K., 2014. Accelerating ant colony optimisation for the travelling salesman problem on the gpu. International Journal of Parallel, Emergent and Distributed Systems 29 (4), 401–420.
Whaley, R. C., Dongarra, J., 1999. Automatically tuned linear algebra software. In: Ninth SIAM Conference on Parallel Processing for Scientific Computing. pp. 1–27.
Winston, W. L., Goldberg, J. B., 2004. Operations research: applications and algorithms. Vol. 3. Duxbury press Boston.
You, Y., 2009. Parallel ant system for traveling salesman problem on GPUs. In: Eleventh annual conference on genetic and evolutionary computation. pp. 1–2.
Yu, Q., Chen, C., Pan, Z., 2005. Parallel genetic algorithms on programmable graphics hardware. In: Advances in Natural Computation. Springer, pp. 1051–1059.
Zaj´ı˘cek, T., Sucha, P., 2011. Accelerating a flow shop scheduling algorithm on the GPU. In: Workshop on Models and Algorithms for Planning and Scheduling Problems (MAPSP).
Zdenˇek, B., Jan, D., Pˇremysl, S., Zdenˇek, H., 2013. An acceleration of the algorithm for the nurse rerostering problem on a graphics processing unit. Lecture Notes in Management Science 5, 101–110.

IE06

关注

4
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
运筹系列53：GPU用于线性规划和整数规划调研

1. 概览GPU主要用于并行高性能计算，关于用于OR综述文章列举如下：Brodtkorb et al. (2013) and Schulz et al. (2013) deals with routing problems.Luong (2011b) considers Metaheuristics on GPU.Alba et al. (2013) study parallel metaheuristics.2. 精确解法主要是：单纯形、内外点法、动态规划、分支定界法。由于求解方法是树结构，因
复制链接

扫一扫