Ray 分布式简单教程（2）

求则得之，舍则失之

已于 2022-01-25 15:08:07 修改

阅读量5k

点赞数 5

分类专栏：分布式文章标签：分布式

于 2022-01-25 10:00:25 首次发布

本文链接：https://blog.csdn.net/weixin_43229348/article/details/122678561

版权

分布式专栏收录该内容

4 篇文章

订阅专栏

本教程将介绍 Ray 的核心功能。

Ray 提供了 Python 和 Java API。要在 Python 中使用 Ray，首先使用以下命令安装 Ray：pip install ray。要在 Java 中使用 Ray，首先在您的项目中添加 ray-api 和 ray-runtime 依赖项。然后我们可以使用 Ray 来并行化你的程序。这里只使用Python API。

1.将Python函数与Ray任务并行化

首先，导入ray并初始化ray服务。然后用@ray.remote 装饰你的函数，声明你想远程运行这个函数。最后，使用.remote()调用该函数，而不是正常调用它。这个远程调用产生一个future，或者ObjectRef，然后你可以用ray.get获取它。

import ray
ray.init()

@ray.remote
def f(x):
	return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futers))  # [0, 1, 4, 9]

2.使用Ray Actors并行化Python类

Ray提供了actors，允许你在Python中并行化一个类的实例。当你实例化一个属于Ray actor的类时，Ray会在集群中启动该类的一个远程实例。这个actor然后可以执行远程方法调用并维护自己的内部状态。

import ray
ray.init()  # Only call this once

@ray.remote
class Counter(object):
	def __init__(self):
		self.n = 0
	
	def increment(self):
		self.n+=1
	
	def read(self):
		return self.n

int __name__ == "__main__":
	counters = [Counter.remote() for i in range(4)]
	[c.increment.remote() for c in counters]
	futures = [c.read.remote() for c in counters]
	print(ray.get(futures))  # [1, 1, 1, 1]

3.Ray库的概述

Ray 拥有丰富的库和在其之上构建的框架生态系统。主要的有：

Tune: 可扩展的超参数调优
RLlib: 工业级强化学习
Ray Train:分布式深度学习
Serve:可扩展和可编程的服务

3.1 Tune快速入门

Tune 是一个用于任何规模的超参数调整的库。使用 Tune，您可以在不到 10 行代码中启动多节点分布式超参数扫描。 Tune 支持任何深度学习框架，包括 PyTorch、TensorFlow 和 Keras。

要运行此示例，您需要安装以下内容：pip install "ray[tune]"

# coding=utf-8
# /usr/bin/env python
from ray import tune


def objective(step, alpha, beta):
    return (0.1 + alpha * step / 100) ** (-1) + beta * 0.1


def training_function(config):
    # Hyperparameters
    alpha, beta = config["alpha"], config["beta"]
    for step in range(10):
        # Iterative training function - can be arbitrary training procedure
        intermediate_score = objective(step, alpha, beta)
        # Feed the score back to Tune
        tune.report(mean_loss=intermediate_score)


if __name__ == '__main__':
    analysis = tune.run(
        training_function,
        config={
            "alpha": tune.grid_search([0.001, 0.01, 0.1]),
            "beta": tune.choice([1, 2, 3])
        }
    )
    print("Best config: ", analysis.get_best_config(
        metric="mean_loss", mode="min"
    ))
    # get a dataframe for analyzing trial results
    df = analysis.results_df

如果安装了TensorBoard，则自动显示所有的试验结果:

tensorboard --logdir ~/ray_results

3.2 RLlib快速入门

RLlib是一个建立在Ray之上的用于强化学习的开源库，它为各种应用程序提供了高可扩展性和统一的API。
要运行此示例，您需要安装以下内容：
pip install tensorflow # or tensorflow-gpu
pip install "ray[rllib]"

import gym
from gym.spaces import Discrete, Box
from ray import tune

class SimpleCorridor(gym.Env):
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ))

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1 if done else 0, done, {}

tune.run(
    "PPO",
    config={
        "env": SimpleCorridor,
        "num_workers": 4,
        "env_config": {"corridor_length": 5}})

3.3 Ray Serve快速入门

Ray Serve 是一个基于 Ray 的可扩展模型服务库。它是：

与框架无关：使用相同的工具箱，从使用PyTorch或Tensorflow & Keras等框架构建的深度学习模型，到Scikit-Learn模型或任意业务逻辑，都可以提供服务。
Python First：在纯 Python 中以声明方式配置模型服务，无需 YAML 或 JSON 配置。
Composition Native: 允许您创建“模型管道”，通过将多个模型组合在一起来驱动单个预测。
Horizontally Scalable: 随着您添加更多机器，Serve 可以线性扩展。使您的 ML 支持的服务能够处理不断增长的流量。

要运行此示例，您需要安装以下内容：

pip install scikit-learn
pip install "ray[serve]"

此示例运行服务于 scikit-learn 梯度提升分类器。

from ray import serve

import pickle
import json
import numpy as np
import requests
import os
import tempfile

from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error
# Load data
iris_dataset = load_iris()
data, target, target_names = iris_dataset["data"], iris_dataset[
    "target"], iris_dataset["target_names"]

# Instantiate model
model = GradientBoostingClassifier()

# Training and validation split
np.random.shuffle(data), np.random.shuffle(target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]

# Train and evaluate models
model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))

# Save the model and label to file
MODEL_PATH = os.path.join(tempfile.gettempdir(),
                          "iris_model_logistic_regression.pkl")
LABEL_PATH = os.path.join(tempfile.gettempdir(), "iris_labels.json")

with open(MODEL_PATH, "wb") as f:
    pickle.dump(model, f)
with open(LABEL_PATH, "w") as f:
    json.dump(target_names.tolist(), f)

@serve.deployment(route_prefix="/regressor")
class BoostingModel:
    def __init__(self):
        with open(MODEL_PATH, "rb") as f:
            self.model = pickle.load(f)
        with open(LABEL_PATH) as f:
            self.label_list = json.load(f)

    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

serve.start()
BoostingModel.deploy()

sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}
response = requests.get(
    "http://localhost:8000/regressor", json=sample_request_input)
print(response.text)
# Result:
# {
#  "result": "versicolor"
# }

4.简单样例

本程序实现从一堆图片中筛选出有具体某个人的人脸的图片。

import os, json
import random
from glob import glob
from pprint import pprint

import cv2
import face_recognition
import numpy as np
from tqdm import tqdm
import ray
ray.init()

def get_file_paths(root='D:\PHOTOS\*'):
	"""获取root下的所有以JPG结尾的图像文件"""
    queue = [root]
    files = []
    while len(queue) > 0:
        dir = queue.pop(0)
        for file in glob(dir):
            if os.path.isfile(file) and file.endswith('JPG'):
                files.append(file)
            else:
                queue.append(file + '\*')
    return files


def get_my_image(dir=os.path.abspath('database')):
	"""获取database下标准图像文件，标准图像文件名以david开头, 并从标准图像文件中检测到人脸进行编码"""
    db_imgs = []
    db_enc = []
    for file in glob(dir + '\*'):
        suff = os.path.split(file)[-1]
        if suff.startswith('david'):
            img = cv2.imread(file)
            encoding = face_recognition.face_encodings(img)[0]
            db_imgs.append(img)
            db_enc.append(encoding)

    return db_imgs, np.array(db_enc)

@ray.remote
def get_similar_imgs(files):
	"""需要进行分布式的函数"""
    my_imgs, my_enc = get_my_image()
    matches = []
    pbar = tqdm(total=len(files))
    tc = 1
    for file in files:
        # file = r'D:\PHOTOS\DAY 2\5 BARAT\_IMK5917.JPG'
        img = cv2.imread(file) # r'D:\PHOTOS\1 day 1 haldi mehandi\_IMK4982.JPG' r'D:\PHOTOS\DAY 2\5 BARAT\_IMK5917.JPG'
        img = cv2.resize(img, (0, 0), fx=0.6, fy=0.6)

        face_locations = face_recognition.face_locations(img)
        encoding = face_recognition.face_encodings(img)
        # print(face_locations)
        # print("==> Matching ", len(encoding), "Faces")
        # print(my_enc)

        idx2 = 0
        for enc in encoding:
            results = face_recognition.compare_faces(my_enc, enc, tolerance=0.3)
            # print("results=>", results)
            idx = 0
            for res in results:
                if res:
                    matches.append(file)
                    # print("Found match in", file)
                    y, xx, yy, x = face_locations[idx2]
                    # cv2.imshow(f'img1_{random.randint(1, 1000)}', my_imgs[idx])
                    # cv2.imshow(f'img2_{random.randint(1, 1000)}', img[y: yy, x: xx])
                    # cv2.imshow('img3', img)
                    # cv2.waitKey(1)
                idx += 1
            idx2 += 1
        if tc%10 == 0:
            cv2.destroyAllWindows()
        pbar.update(tc)
        tc += 1
        # break
    pbar.close()
    # print("Matched files=>", matches)
    print(f"[INFO] processed {len(matches)} images")
    return matches


def main():
    files = get_file_paths(root=r'G:\PHOTOS\*')
    print("files count: ", len(files))
    pprint(files)
    process_count = 8
    chunk = len(files)//process_count
    futures = []
    print("len->", len(files))
    for i in range(process_count):
        start, end = i*chunk, (i+1)*chunk
        print(start,end)
        fs = files[start: end]
        futures.append(get_similar_imgs.remote(fs))
    
    matches = ray.get(futures)
    json.dump(matches, open('matches_2.json', 'w'))


if __name__ == "__main__":
    main()