Ray submit topic list

最新推荐文章于 2024-07-05 16:44:12 发布

3分钟秒懂大数据

最新推荐文章于 2024-07-05 16:44:12 发布

阅读量170

点赞数

分类专栏： Ray 文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_38201936/article/details/118336008

版权

Ray 专栏收录该内容

5 篇文章 2 订阅

订阅专栏

1、 Improving Ray for Large-scale Applications(3)

5:20am-5:55am CST

At Ant Group, we have built various kinds of distributed systems on top of Ray, and deployed them in production with large scales. In this talk, we'll be covering the problems we've met, and the improvements we've made that make Ray an industry-level system with high scalability and stability.

https://raysummit.anyscale.com/content/Videos/GNDFYdDsqiMWnbibW

2、Ray Internals: Object Management with the Ownership Model(3)

4:45am-5:20am CST

In this talk, we'll do a deep dive into Ray's distributed object management layer. We'll explain the Ray execution model and the basics behind the Ray distributed object store. Next, we'll describe the challenges with achieving both performance and reliability for object management. We'll present our solution to this problem, which is based on a novel concept called ownership that ensures object metadata consistency with low overhead. Finally, we'll present some exciting upcoming work on how to extend ownership to better support recent use cases.

https://raysummit.anyscale.com/content/Videos/zHgY8mePbxsdB2FTB

3、Scaling Interactive Data Science with Modin and Ray(3)

4:00am-4:35am CST

Interactive data science at scale with Modin. Includes a set of comparisons and benchmarks against existing systems/solutions.

https://raysummit.anyscale.com/content/Videos/PYuhGZgrgR4HgwtjB

4、 Mars-on-Ray: Accelerating Large-scale Tensor and DataFrame Workloads(3)

4:35am-5:15am CST

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions. Ray, which provides a general purpose task and actor API, is extremely expressive, and guarantees high performance for distributed Python applications. Mars-on-Ray takes advantage of both Mars and Ray; Mars provides familiar data science APIs, and highly optimized, locality-aware, fine-grained task scheduling, and now Mars-on-Ray allows Mars workloads to run on Ray with the same scheduling strategy, but has an extensive ability to scale on Ray clusters.

https://raysummit.anyscale.com/content/Videos/pKBTjMturGLDLGfTR

5、A Growing Ecosystem of Scalable ML Libraries on Ray(3)

4:35am-5:10am CST

The open-source Python ML ecosystem has seen rapid growth over the recent years. As these libraries mature, there is an increased demand for distributed execution frameworks that allow programmers to handle large amounts of data and coordinate computational resources. In this talk, we discuss our experiences collaborating with the open source Python ML ecosystem as maintainers of Ray, a popular distributed execution framework. We will cover how distributed computing has shaped the way machine learning is done, and go through case studies on how three popular open source ML libraries (Horovod, HuggingFace transformers, and spaCy) benefit from Ray for distributed training.

https://raysummit.anyscale.com/content/Videos/r25YmRrqjKFWrBKMs

6、Software 2.0 Needs Data 2.0: A New Way of Storing and Managing Data for Efficient Deep Learning(3)

4:35am-5:10am CST

Every day, 90% of the data we generate is in unstructured form. However, current solutions for storing the data we create - Databases, Data Lakes, and Data Warehouses (or the Data 1.0 minions), are unfit for storing unstructured data. As a result, data scientists today work with unstructured data like developers used to work in the pre-database era. This slows down ML cycles, bottlenecks access speed and data transfer, and forces data scientists to wrangle with data instead of training models.

Creating Software 2.0 requires a new way of working with unstructured data, which we explore in this session. We present Data 2.0 - a framework bringing together all types of data under one umbrella, representing them in a unified tensorial form which is native to deep neural networks. The streaming process of the method is used for training and deploying machine learning models for both compute and data-bottlenecked operations as if the data is local to the machine. In addition, it allows version-controlling and collaborating on petabyte-scale datasets, as single numpy-like arrays on the cloud or locally. Lastly, we use Ray to improve our workflows.

https://raysummit.anyscale.com/content/Videos/FEqSNNnimxrJwrPTJ

3分钟秒懂大数据

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Ray submit topic list

1、End-to-End AutoML with Ludwig on Ray(1)3:25am-4:00amCSTLudwig is an open source AutoML framework that allows you to train and deploy state-of-the-art deep learning models with no code required. With a single parameter on the command line, the same.
复制链接

扫一扫