探秘DeepOps:GPU集群自动化运维的利器

探秘DeepOps:GPU集群自动化运维的利器

项目简介

DeepOps是一个专门针对配备NVIDIA GPU的Kubernetes和Slurm集群的基础设施自动化工具集。这个开源项目旨在封装最佳实践,从头到尾管理GPU服务器集群的部署,并可根据特定环境进行自定义配置。无论你是希望在自己的数据中心建立一个端到端的管理系统,还是希望在现有的Kubernetes或Slurm集群中增强功能,DeepOps都是理想的选择。

项目技术分析

DeepOps基于Ansible进行自动化部署,它与两大主流集群资源管理器——Kubernetes(通过Kubespray子模块)和Slurm(通过SchedMD)紧密集成。这一设计使得DeepOps能够适应各种场景,包括但不限于:

  1. Kubernetes: 使用Kubespray实现对容器化应用的自动化部署、扩展和管理,提供跨平台支持。
  2. Slurm: 实现开放源代码的集群资源管理和作业调度,确保高效利用计算资源。

DeepOps还提供了虚拟化部署选项,允许你在单一节点上通过Vagrant构建一个GPU启用的虚拟集群,方便测试和实验。

应用场景

  • 大型数据中心: 对于拥有NVIDIA DGX系统的大型数据中心,DeepOps可以轻松建立和管理整个集群管理栈。
  • 现有Kubernetes集群: 如果你的集群已经运行了Kubernetes,DeepOps可以用来部署KubeFlow并连接NFS存储。
  • 已有集群升级: 需要添加资源管理器或批处理调度器时,DeepOps可安装Slurm或Kubernetes。
  • 单机优化: 单独的机器上,无需调度器,只需安装NVIDIA驱动、Docker和NVIDIA Container Runtime。

项目特点

  1. 灵活性: DeepOps可以根据不同的集群需求进行定制,无论是从头搭建新集群,还是为现有环境增补功能。
  2. 自动化: 凭借强大的Ansible自动化能力,DeepOps大大简化了集群部署和维护过程。
  3. 兼容性: 支持多种操作系统,如Ubuntu、CentOS和NVIDIA DGX OS,兼容多云环境。
  4. 社区支持: 开放源代码,通过GitHub进行协作,用户可以通过提交Issue和Pull Request参与贡献。

最新版本为DeepOps 23.08,建议使用稳定分支以获取可靠的功能。开发活动主要在master分支进行,尽管可能频繁变动,但通常保持功能性。

总的来说,对于任何寻求高效管理GPU集群的人,无论是新手还是经验丰富的系统管理员,DeepOps都是值得信赖的伙伴。现在就开始探索DeepOps,释放你的GPU集群潜力吧!

  • 5
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
TABLE OF CONTENTS Introduction to the NVIDIA Turing Architecture ....................................................................1 NVIDIA Turing Key Features.......................................................................................................... 3 New Streaming Multiprocessor (SM) ....................................................................................... 3 Turing Tensor Cores................................................................................................................. 4 Real-Time Ray Tracing Acceleration ......................................................................................... 4 New Shading Advancements.................................................................................................... 4 Mesh Shading...................................................................................................................... 4 Variable Rate Shading (VRS)................................................................................................ 5 Texture-Space Shading........................................................................................................ 5 Multi-View Rendering (MVR)............................................................................................... 5 Deep Learning Features for Graphics....................................................................................... 5 Deep Learning Features for Inference...................................................................................... 6 GDDR6 High-Performance Memory Subsystem....................................................................... 6 Second-Generation NVIDIA NVLink .......................................................................................... 6 USB-C and VirtualLink............................................................................................................... 6 Turing GPU Architecture In-Depth ........................................................................................7 Turing TU102 GPU........................................................................................................................ 7 Turing Streaming Multiprocessor (SM) Architecture.................................................................. 11 Turing Tensor Cores............................................................................................................... 15 Turing Optimized for Datacenter Applications........................................................................... 16 Turing Memory Architecture and Display Features.................................................................... 20 GDDR6 Memory Subsystem................................................................................................... 20 L2 Cache and ROPs................................................................................................................. 21 Turing Memory Compression................................................................................................. 22 Video and Display Engine ....................................................................................................... 22 USB-C and VirtualLink................................................................................................................. 24 NVLink Improves SLI ................................................................................................................... 24 Turing Ray Tracing Technology............................................................................................26 Turing RT Cores .......................................................................................................................... 31 NVIDIA NGX Technology .....................................................................................................34 NGX Software Architecture ........................................................................................................ 34 Deep Learning Super-Sampling (DLSS) ....................................................................................... 35 InPainting ................................................................................................................................... 38 AI Slow-Mo............................................................................................................................. 39 AI Super Rez........................................................................................................................... 39 NVIDIA Turing GPU Architecture WP-09183-001_v01 | iii Turing Advanced Shading Technologies ..............................................................................40 Mesh Shading............................................................................................................................. 40 Variable Rate Shading................................................................................................................. 43 Content Adaptive Shading...................................................................................................... 45 Motion Adaptive Shading....................................................................................................... 46 Foveated Rendering ............................................................................................................... 47 Texture Space Shading ............................................................................................................... 48 The Mechanics of TSS............................................................................................................. 49 Multi-View Rendering................................................................................................................. 51 Multi-View Rendering Use Cases............................................................................................ 52 Resource Management and Binding Model ............................................................................... 54 Turing Features Enhance Virtual Reality ..............................................................................55 Conclusion ..........................................................................................................................57 Appendix A Turing TU104 GPU ............................................................................................58 Appendix B Turing TU106 GPU ...........................................................................................63 Appendix C RTX-OPS Description ........................................................................................66 The Hybrid Rendering Model ..................................................................................................... 66 RTX-OPS Workload-based Metric Explained............................................................................... 67 Appendix D Ray Tracing Overview .......................................................................................69 Basic Ray Tracing Mechanics...................................................................................................... 70 Bounding Volume Hierarchy .................................................................................................. 71 Denoising Filtering...................................................................................................................... 73 NVIDIA Turing GPU Architecture WP-09183-001_v01 | iv LIST OF FIGURES Figure 1. Turing Reinvents Graphics............................................................................................ 2 Figure 2. Turing TU102 Full GPU with 72 SM Units ..................................................................... 8 Figure 3. NVIDIA Turing TU102 GPU.......................................................................................... 10 Figure 4. Turing TU102/TU104/TU106 Streaming Multiprocessor (SM).................................... 12 Figure 5. Concurrent Execution of Floating Point and Integer Instructions in the Turing SM.... 13 Figure 6. New Shared Memory Architecture............................................................................. 14 Figure 7. Turing Shading Performance Speedup versus Pascal on Many Different Workloads. 14 Figure 8. New Turing Tensor Cores Provide Multi-Precision for AI Inference............................ 16 Figure 9. Tesla T4 delivers up to 40X Higher Inference Performance........................................ 17 Figure 10. Tesla T4 Delivers More than 50X the Energy Efficiency of CPU-based Inferencing .... 18 Figure 11. Turing GDDR6 ............................................................................................................. 21 Figure 12. 50% Higher Effective Bandwidth ................................................................................ 22 Figure 13. Video Feature Enhancements..................................................................................... 23 Figure 14. NVLink Enables New SLI Display Topologies............................................................... 25 Figure 15. SOL MAN from NVIDIA SOL Ray Tracing Demo (See Demo) ....................................... 27 Figure 16. Hybrid Rendering Pipeline .......................................................................................... 28 Figure 17. Details of Ray Tracing and Rasterization Pipeline Stages............................................ 29 Figure 18. From Reflections Demo .............................................................................................. 30 Figure 19. Ray Tracing Pre Turing ................................................................................................ 32 Figure 20. Turing Ray Tracing with RT Cores................................................................................ 32 Figure 21. Turing Ray Tracing Performance................................................................................. 33 Figure 22. Turing with 4K DLSS is Twice the Performance of Pascal with 4K TAA....................... 35 Figure 23. DLSS 2X versus 64xSS image almost Indistinguishable................................................ 36 Figure 24. DLSS 2X Provides Significantly Better Temporal Stability and Image Clarity Than TAA ......................................................................................................... 37 Figure 25. NGX InPainting Examples, Missing Image Data Is Intelligently Replaced with Meaningful Image Information................................................................................... 38 Figure 26. AI Super Rez Provides Improved Image Clarity Over Other Filtering Methods.......... 39 Figure 27. Mesh Shading, Visually Rich Images ........................................................................... 40 Figure 28. Current Graphics Pipeline versus a Graphics Pipeline with Task and Mesh Shaders.. 41 Figure 29. Screenshot from the Asteroid Field Demo.................................................................. 42 Figure 30. An Asteroid at Low and High Levels of Detail (LOD) ................................................... 42 Figure 31. Dynamically Computed, Spherical Cutaway of a Koenigsegg Model, Viewed in NVIDIA Holodeck™..................................................................................... 43 Figure 32. Turing VRS Supported Shading Rates and Example Application to a Game Frame..... 44 Figure 33. Example of Content Adaptive Shading........................................................................ 46 NVIDIA Turing GPU Architecture WP-09183-001_v01 | v Figure 34. Perceived Blur Due to Object Motion Combined with Retinal and Display Persistence ..................................................................................................... 47 Figure 35. Traditional Rasterization and Shading Process........................................................... 49 Figure 36. Texture Space Shading Process................................................................................... 50 Figure 37. Texture Space Shading for Stereo............................................................................... 51 Figure 38. 200° FOV HMD Where Two Canted Panels are Used and Benefit from MVR............. 53 Figure 39Figure 37 MVR Single Pass Cascaded Shadow Map Rendering .................................... 54 Figure 40. Turing Features for VR................................................................................................ 56 Figure 41. Turing TU104 Full Chip Diagram ................................................................................. 59 Figure 42. Turing TU106 Full Chip Diagram ................................................................................. 64 Figure 43. Workload Distribution Over One Turing Frame Time ................................................. 66 Figure 44. Peak Operations of Each Type Base for GTX 2080 Ti .................................................. 68 Figure 45. Basic Ray Tracing Process ........................................................................................... 70 Figure 46. Abstraction of Tree Traversal and a Ray Intersecting Different Levels of Bounding Boxes.......................................................................................................... 72 Figure 47. Shadow Map Percentage Closer Filtering (PCF) versus Ray Tracing with Denoising... 74 Figure 48. Shadow Mapping Compared to Ray Traced Shadows that use 1 Sample Per Pixel and Denoising............................................................................................... 74 Figure 49. Screen-Space Ambient Occlusion Compared to Ray-Traced Ambient Occlusion........ 75 Figure 50. RTX Ray Tracing........................................................................................................... 76 Figure 51. Scene from Battlefield V with RTX On and Off............................................................ 77 Figure 52. Scene #2 from Battlefield V with RTX On and Off....................................................... 78 Figure 53. Shadow of the Tomb Raider with RTX ON .................................................................. 79 NVIDIA Turing GPU Architecture WP-09183-001_v01 | vi LIST OF TABLES Table 1. Comparison of NVIDIA Pascal GP102 and Turing TU102 .................................... 8 Table 2. Enhanced Video Engine, Tesla P4 versus Tesla T4............................................ 19 Table 3. DisplayPort Support in Turing GPUs .................................................................. 23 Table 4. Comparison of NVIDIA Pascal GP104 and Turing TU104 GPUs........................ 60 Table 5. Comparison of the Pascal Tesla P4 and the Turing Tesla T4 ........................... 61 Table 6. Comparison of NVIDIA Pascal GP104 to Turing TU106 GPUs........................... 64

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

贾雁冰

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值