Ray submit topic list

1、 Improving Ray for Large-scale Applications(3)

5:20am-5:55am CST

At Ant Group, we have built various kinds of distributed systems on top of Ray, and deployed them in production with large scales. In this talk, we'll be covering the problems we've met, and the improvements we've made that make Ray an industry-level system with high scalability and stability.

https://raysummit.anyscale.com/content/Videos/GNDFYdDsqiMWnbibW

2、Ray Internals: Object Management with the Ownership Model(3)

4:45am-5:20am CST

In this talk, we'll do a deep dive into Ray's distributed object management layer. We'll explain the Ray execution model and the basics behind the Ray distributed object store. Next, we'll describe the challenges with achieving both performance and reliability for object management. We'll present our solution to this problem, which is based on a novel concept called ownership that ensures object metadata consistency with low overhead. Finally, we'll present some exciting upcoming work on how to extend ownership to better support recent use cases.

https://raysummit.anyscale.com/content/Videos/zHgY8mePbxsdB2FTB

3、Scaling Interactive Data Science with Modin and Ray(3)

4:00am-4:35am CST

Interactive data science at scale with Modin. Includes a set of comparisons and benchmarks against existing systems/solutions.

https://raysummit.anyscale.com/content/Videos/PYuhGZgrgR4HgwtjB

4、 Mars-on-Ray: Accelerating Large-scale Tensor and DataFrame Workloads(3)

4:35am-5:15am CST

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions. Ray, which provides a general purpose task and actor API, is extremely expressive, and guarantees high performance for distributed Python applications. Mars-on-Ray takes advantage of both Mars and Ray; Mars provides familiar data science APIs, and highly optimized, locality-aware, fine-grained task scheduling, and now Mars-on-Ray allows Mars workloads to run on Ray with the same scheduling strategy, but has an extensive ability to scale on Ray clusters.

https://raysummit.anyscale.com/content/Videos/pKBTjMturGLDLGfTR

5、A Growing Ecosystem of Scalable ML Libraries on Ray(3)

4:35am-5:10am CST

The open-source Python ML ecosystem has seen rapid growth over the recent years. As these libraries mature, there is an increased demand for distributed execution frameworks that allow programmers to handle large amounts of data and coordinate computational resources. In this talk, we discuss our experiences collaborating with the open source Python ML ecosystem as maintainers of Ray, a popular distributed execution framework. We will cover how distributed computing has shaped the way machine learning is done, and go through case studies on how three popular open source ML libraries (Horovod, HuggingFace transformers, and spaCy) benefit from Ray for distributed training.

https://raysummit.anyscale.com/content/Videos/r25YmRrqjKFWrBKMs

6、Software 2.0 Needs Data 2.0: A New Way of Storing and Managing Data for Efficient Deep Learning(3)

4:35am-5:10am CST

Every day, 90% of the data we generate is in unstructured form. However, current solutions for storing the data we create - Databases, Data Lakes, and Data Warehouses (or the Data 1.0 minions), are unfit for storing unstructured data. As a result, data scientists today work with unstructured data like developers used to work in the pre-database era. This slows down ML cycles, bottlenecks access speed and data transfer, and forces data scientists to wrangle with data instead of training models.

Creating Software 2.0 requires a new way of working with unstructured data, which we explore in this session. We present Data 2.0 - a framework bringing together all types of data under one umbrella, representing them in a unified tensorial form which is native to deep neural networks. The streaming process of the method is used for training and deploying machine learning models for both compute and data-bottlenecked operations as if the data is local to the machine. In addition, it allows version-controlling and collaborating on petabyte-scale datasets, as single numpy-like arrays on the cloud or locally. Lastly, we use Ray to improve our workflows.

https://raysummit.anyscale.com/content/Videos/FEqSNNnimxrJwrPTJ

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

3分钟秒懂大数据

你的打赏就是对我最大的鼓励

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值