Lecture 2: 基础结构: RPC and threads

最新推荐文章于 2024-09-06 11:31:33 发布

未闻小然桑

最新推荐文章于 2024-09-06 11:31:33 发布

阅读量755

点赞数

分类专栏： 6.824 Distributed Systems 文章标签： 6-824 分布式 RPC Threads

本文链接：https://blog.csdn.net/okingniko/article/details/51079246

版权

6.824 Distributed Systems 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1. 前言

问的最多的问题：为什么这个课程使用Go语言？
6.824课程过去是使用C++的，学生通常会把大量时间花费在与分布式系统设计无关的bugs上。例如：他们释放了那些他们仍在使用的对象。
Go语言可以让你更专注于分布式系统的问题：

类型安全
自动垃圾回收(不存在使用释放后的元素的问题）
对于并发有良好的支持
对于RPC有良好的支持

我们喜欢用Go语言编程，因为他是一个很简单的语言。
你可以花30分钟，使用Go 指南入门。之后可以使用Effective Go 进一步学习。

2 远程过程调用

2.1 RPC简介

Remote Procedure Call (RPC):
分布式系统组织中的重要一部分；所有的labs都使用RPC
目标：易于编程的网络通信方式

隐藏了服务器/客户端交互的大部分细节
客户端的调用更像是传统的程序(procedure)调用
服务器的处理器程序(handler)更像是传统的程序(procedure)

RPC使用的很广泛。

理论上，RPC使得网络交互看起来像fn调用：
客户端：

z = fn(x, y)

服务端：

fn(x, y) {
    compute
    return z
}

RPC追求的就是这种透明性。

在Go中的 RPC实例

RPC信息报文：

  Client             Server
    request--->
       <---response

软件结构：

  client app         handlers
    stubs           dispatcher
   RPC lib           RPC lib
     net  ------------ net

一些细节：

哪个服务器函数(handler)被调用？
- 编组(Marshalling): 将数据编码成报文(packets)
  可以传送数组，指针，对象，Etc..
  Go的RPC库很强大!
  有些内容你不能传递:例如，channels, functions.
- 绑定(Binding): 客户端如何得知交互对象？
  也许客户端提供了服务端的主机名
  也许有一个名字服务器提供了服务名和最佳的服务器主机名的映射
- 线程：
  客户端可能有许多线程，so > 1 call outstanding, match up replies
  服务端的handlers可能会很慢，所以服务器通常在一个线程里运行一个(handler)实例。
RPC的问题：发生错误时该如何处理？
例如：丢包，网络故障，服务器缓慢，服务器崩溃
从客户端的RPC库角度看故障是什么样子？
客户端不会从服务器端接收到一个回复
客户端不知道服务器端是否接受到了响应的请求
也许服务器/网络在发送请求之前就已经有故障了

2.2 一个简单的模式”至少一次”(at least once)

RPC库在一段时间内等待相应的回复
如果没有任何内容到达时，则将请求进行重传
重复这个过程多次
如果依旧没有回复的话——则向应用程序返回一个错误

Q: “至少一次”对于应用程序是不是处理起来很方便？
简单的就好像：
客户端发送了”从银行账户中取出10美元” 一样

Q: 对于客户端程序而言，可能会发生什么错误？
Put(“k”, 10) – an RPC to set key’s value in a DB server
Put(“k”, 20) – client then does a 2nd Put to same key

Q: “至少一次”模式工作的如何？
对于可以重复的操作工作的很好，例如只读操作
if application has its own plan for coping w/ duplicates
which you will need for Lab 1

2.3 一个更好的RPC形式”最多一次”(at most once)

服务器RPC代码检测重复的请求，返回之前的回复内容而不是重新运行处理器程序(进行处理)。

Q: 怎样检测重复的请求？
客户端对每个请求使用一个唯一的ID(XID)
重传时使用相同的XID

服务器：

 if seen[xid]:
     r = old[xid]
 else
     r = handler()
     old[xid] = r
     seen[xid] = true

一些at-most-once 模式的复杂内容将在labs 2中展开。

怎么确保XID值是唯一的？
大随机数？
将序列和单独的客户端ID(ip address)进行组合？
服务器最终必须丢弃陈旧的RPC信息
什么时候丢弃是安全的？
方案1：
唯一的客户端ID值
每一个客户端有单独的RPC序列号
客户端对于每一个RPC有”seen all replies <= X”
看起来很像TCP的方案(序列号，应答号 + 滑动窗口)
方案2：
只允许客户端同一时间进行一个显示的RPC调用
序列号seq+1 到达时允许服务器端丢弃所有<= seq的RPC信息
方案3：
客户端约定重传的总时间小于5min
服务端将在5min后丢弃陈旧的RPC信息
当原来的请求仍在执行时怎么处理那些重复的请求?
服务器并不知道是否已经回复了；不要想着运行两次
方案：当执行RPC调用时启用”pending”标志，重复的请求进行等待或忽略。

2.4 更多的一些问题

如果一个at-most-once 服务器崩溃并且重启了？
如果at-most-once 模式将相关信息复制存放在内存中，服务器在重启后将会将其丢弃并接受重复的请求。
也许应该将(内存中)的信息存放在磁盘上？
也许副本服务器也应该复制重复的信息？
那么exactly once 模式怎么样？
在at-most-once 基础上添加无限重试(unbounded retries)和错误处理(fault-tolerant) 服务
Go语言中的RPC就是at-most-once 模式
- 建立TCP连接
- (客户端)发送请求到TCP连接上
- TCP可能会重传，但是服务器端的TCP会过滤掉重复请求
- 在GO代码中没有重试(即不会产生第二个TCP连接）
- GO RPC代码会在没有收到回复时，返回一个错误
  - 也许是发生超时(TCP timeout)
  - 也许是服务器没有接收到请求
  - 也许是服务器正常处理了请求，但是在回送时服务器或网络发生了故障
Go RPC’s 的at-most-once 模式不足以完成lab1.
它只能应用于单一的RPC调用
如果worker没有回复，那么master会将重复的请求发送给另外的worker,但是原来的worker可能并未失效，并继续工作在重复的请求上。
Go RPC库并不能检测这种类型的重复
No problem in lab 1, which handles at application level
Lab 2 will explicitly detect duplicates

3. Threads

threads are a fundamental server structuring tool
you’ll use them a lot in the labs
they can be tricky
useful with RPC
Go calls them goroutines; everyone else calls them threads

Thread = “thread of control”
threads allow one program to (logically) do many things at once
the threads share memory
each thread includes some per-thread state:
program counter, registers, stack

Threading challenges:
sharing data
two threads modify the same variable at same time?
one thread reads data that another thread is changing?
these problems are often called races
need to protect invariants on shared data
use Go sync.Mutex
coordination between threads
e.g. wait for all Map threads to finish
use Go channels
deadlock
thread 1 is waiting for thread 2
thread 2 is waiting for thread 1
easy detectable (unlike races)
lock granularity
coarse-grained -> simple, but little concurrency/parallelism
fine-grained -> more concurrency, more races and deadlocks
let’s look at labrpc RPC package to illustrate these problems

look at today’s handout – labrpc.go
it similar to Go’s RPC system, but with a simulated network
the network delays requests and replies
the network loses requests and replies
the network re-orders requests and replies
useful for testing labs 2 etc.
illustrates threads, mutexes, channels
complete RPC package is written in Go itself

structure

struct Network
description of network
servers
client endpoints
mutex per network

RPC overview
many examples in test_test.go
e.g., TestBasic()
application calls Call()
reply := end.Call(“Raft.AppendEntries”, args, &reply) – send an RPC, wait for reply
servers side:
srv := MakeServer()
srv.AddService(svc) – a server can have multiple services, e.g. Raft and k/v
pass srv to net.AddServer()
svc := MakeService(receiverObject) – obj’s methods will handle RPCs
much like Go’s rpcs.Register()
pass svc to srv.AddService()

struct Server
a server support many services

AddService
add a service name
Q: why a lock?
Q: what is defer()?

Dispatch
dispatch a request to the right service
Q: why hold a lock?
Q: why not hold lock to end of function?

Call():
Use reflect to find type of argument
Use gob marshall argument
e.ch is the channel to the network to send request
Make a channel to receive reply from network ( <- req.replyCh)

MakeEnd():
has a thread/goroutine that simulates the network
reads from e.ch and process requests
each requests is processed in a separate goroutine
Q: can an end point have many outstanding requests?
Q: why rn.mu.Lock()?
Q: what does lock protect?

ProcessReq():
finds server endpoint
if network unreliable, may delay and drop requests,
dispatch request to server in a new thread
waits on reply by reading ech or until 100 msec has passed
100 msec just to see if server is dead
then return reply
Q: who will read the reply?
Q: is ok that ProcessReq doesn’t hold rn lock?

Service.dispatch():
find method for a request
unmarshall arguments
call method
marshall reply
return reply

Go’s “memory model” requires explicit synchronization to communicate!
This code is not correct:
var x int
done := false
go func() { x = f(…); done = true }
while done == false { }
it’s very tempting to write, but the Go spec says it’s undefined
use a channel or sync.WaitGroup or instead

Study the Go tutorials on goroutines and channels
Use Go’s race detector:
https://golang.org/doc/articles/race_detector.html
go test –race mypkg

未闻小然桑

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lecture 2: 基础结构: RPC and threads

6.824 2016 Lecture 2: Infrastructure: RPC and threadsMost commonly-asked question: Why Go? 6.824 used to use C++ students spent time fixing bugs unrelated to distributed systems e.g., th
复制链接

扫一扫

专栏目录