创建
还是在 main 函数中:
int main() {
...
FwmarkServer fwmarkServer(&gCtls->netCtrl, &gCtls->eventReporter, &gCtls->trafficCtrl);
if (fwmarkServer.startListener()) {
ALOGE("Unable to start FwmarkServer (%s)", strerror(errno));
exit(1);
}
...
}
首先来看其构造函数:
FwmarkServer::FwmarkServer(NetworkController* networkController, EventReporter* eventReporter,
TrafficController* trafficCtrl)
: SocketListener(SOCKET_NAME, true),
mNetworkController(networkController),
mEventReporter(eventReporter),
mTrafficCtrl(trafficCtrl) {}
这里在调用基类 SocketListener 的构造函数时将 listen 设定为 true,因此初始化之后的成员取值如下面注释结果:
SocketListener::SocketListener(const char *socketName, bool listen) {
init(socketName, -1, listen, false);
}
SocketListener::SocketListener(int socketFd, bool listen) {
init(nullptr, socketFd, listen, false);
}
SocketListener::SocketListener(const char *socketName, bool listen, bool useCmdNum) {
init(socketName, -1, listen, useCmdNum);
}
void SocketListener::init(const char *socketName, int socketFd, bool listen, bool useCmdNum) {
mListen = listen; // true
mSocketName = socketName; // "fwmarkd"
mSock = socketFd; // -1
mUseCmdNum = useCmdNum; // false
pthread_mutex_init(&mClientsLock, nullptr);
}
这里 gCtls 的成员被用来初始化了 FwmarkServer 的对应成员
启动监听
SocketListener 类成员 startListener 紧接着被调用启动监听:
int SocketListener::startListener() {
return startListener(4);
}
int SocketListener::startListener(int backlog) {
if (!mSocketName && mSock == -1) {
SLOGE("Failed to start unbound listener");
errno = EINVAL;
return -1;
} else if (mSocketName) {
/* 从此处进入执行 */
if ((mSock = android_get_control_socket(mSocketName)) < 0) {
SLOGE("Obtaining file descriptor socket '%s' failed: %s",
mSocketName, strerror(errno));
return -1;
}
SLOGV("got mSock = %d for %s", mSock, mSocketName);
fcntl(mSock, F_SETFD, FD_CLOEXEC);
}
if (mListen && listen(mSock, backlog) < 0) {
SLOGE("Unable to listen on socket (%s)", strerror(errno));
return -1;
} else if (!mListen)
mClients[mSock] = new SocketClient(mSock, false, mUseCmdNum);
if (pipe2(mCtrlPipe, O_CLOEXEC)) {
SLOGE("pipe failed (%s)", strerror(errno));
return -1;
}
if (pthread_create(&mThread, nullptr, SocketListener::threadStart, this)) {
SLOGE("pthread_create (%s)", strerror(errno));
return -1;
}
return 0;
}
首先进入第一个 else if 条件判断执行,mSock 通过 android_get_control_socket 从环境变量中获取到,由 systemd 启动服务的过程中创建并将 fd 设定到环境变量中,具体的分析在 Netd 服务的 netd 套接字创建 中可以看到,配置如下:
service netd /system/bin/netd
class main
socket netd stream 0660 root system
socket dnsproxyd stream 0660 root inet
socket mdns stream 0660 root system
socket fwmarkd stream 0660 root inet
onrestart restart zygote
onrestart restart zygote_secondary
# b/121354779: netd itself is not updatable, but on startup it dlopen()s the resolver library
# from the DNS resolver APEX. Mark it as updatable so init won't start it until all APEX
# packages are ready.
updatable
紧接着进入第二个 if 条件执行,因为 mListen 为 true,因此 listen 被执行用来监听等待连接建立;
创建子线程进入 SocketListener::threadStart 处理监听和连接
子线程的处理过程
子线程实际调用了 SocketListener 的 runListener 方法:
void *SocketListener::threadStart(void *obj) {
SocketListener *me = reinterpret_cast<SocketListener *>(obj);
me->runListener();
pthread_exit(nullptr);
return nullptr;
}
下面详细分析 runListener 处理过程:
void SocketListener::runListener() {
while (true) {
std::vector<pollfd> fds;
pthread_mutex_lock(&mClientsLock);
fds.reserve(2 + mClients.size());
fds.push_back({.fd = mCtrlPipe[0], .events = POLLIN});
if (mListen) fds.push_back({.fd = mSock, .events = POLLIN});
for (auto pair : mClients) {
// NB: calling out to an other object with mClientsLock held (safe)
const int fd = pair.second->getSocket();
if (fd != pair.first) SLOGE("fd mismatch: %d != %d", fd, pair.first);
fds.push_back({.fd = fd, .events = POLLIN});
}
pthread_mutex_unlock(&mClientsLock);
SLOGV("mListen=%d, mSocketName=%s", mListen, mSocketName);
int rc = TEMP_FAILURE_RETRY(poll(fds.data(), fds.size(), -1));
if (rc < 0) {
SLOGE("poll failed (%s) mListen=%d", strerror(errno), mListen);
sleep(1);
continue;
}
if (fds[0].revents & (POLLIN | POLLERR)) {
char c = CtrlPipe_Shutdown;
TEMP_FAILURE_RETRY(read(mCtrlPipe[0], &c, 1));
if (c == CtrlPipe_Shutdown) {
break;
}
continue;
}
if (mListen && (fds[1].revents & (POLLIN | POLLERR))) {
int c = TEMP_FAILURE_RETRY(accept4(mSock, nullptr, nullptr, SOCK_CLOEXEC));
if (c < 0) {
SLOGE("accept failed (%s)", strerror(errno));
sleep(1);
continue;
}
pthread_mutex_lock(&mClientsLock);
mClients[c] = new SocketClient(c, true, mUseCmdNum);
pthread_mutex_unlock(&mClientsLock);
}
// Add all active clients to the pending list first, so we can release
// the lock before invoking the callbacks.
std::vector<SocketClient*> pending;
pthread_mutex_lock(&mClientsLock);
const int size = fds.size();
for (int i = mListen ? 2 : 1; i < size; ++i) {
const struct pollfd& p = fds[i];
if (p.revents & (POLLIN | POLLERR)) {
auto it = mClients.find(p.fd);
if (it == mClients.end()) {
SLOGE("fd vanished: %d", p.fd);
continue;
}
SocketClient* c = it->second;
pending.push_back(c);
c->incRef();
}
}
pthread_mutex_unlock(&mClientsLock);
for (SocketClient* c : pending) {
// Process it, if false is returned, remove from the map
SLOGV("processing fd %d", c->getSocket());
if (!onDataAvailable(c)) {
release(c, false);
}
c->decRef();
}
}
}
① 通过 if (mListen) fds.push_back({.fd = mSock, .events = POLLIN}); 将对应的套接字加入对应的 fd_set;
② 调用 poll 监听 socket 事件直到返回;
③ 进入 if (mListen && (fds[1].revents & (POLLIN | POLLERR))) 条件判断取出 listen 对应的事件进行处理,这里注意顺序,fds 容器序列中 fds[0] 一定被 control pipe 对应的 fd 占用,如果 mListen 为 true,则 fds[1] 被 listen 状态下的 socket 对应的 fd 占用,如当前情况,而 SocketClient 则从 fds[2] 开始加入到序列中;
④ 在 mListen 为 true 状态下,紧接着处理 listen 事件,调用了 accept4 进行处理:
ACCEPT(2) Linux Programmer's Manual ACCEPT(2)
NAME
accept, accept4 - accept a connection on a socketSYNOPSIS
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <sys/socket.h>int accept4(int sockfd, struct sockaddr *addr,
socklen_t *addrlen, int flags);DESCRIPTION
The accept() system call is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection request on the queue of
pending connections for the listening socket, sockfd, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly
created socket is not in the listening state. The original socket sockfd is unaffected by this call.The argument sockfd is a socket that has been created with socket(2), bound to a local address with bind(2), and is listening for connections after a lis‐
ten(2).The argument addr is a pointer to a sockaddr structure. This structure is filled in with the address of the peer socket, as known to the communications
layer. The exact format of the address returned addr is determined by the socket's address family (see socket(2) and the respective protocol man pages).
When addr is NULL, nothing is filled in; in this case, addrlen is not used, and should also be NULL.The addrlen argument is a value-result argument: the caller must initialize it to contain the size (in bytes) of the structure pointed to by addr; on return
it will contain the actual size of the peer address.The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.
If no pending connections are present on the queue, and the socket is not marked as nonblocking, accept() blocks the caller until a connection is present.
If the socket is marked nonblocking and no pending connections are present on the queue, accept() fails with the error EAGAIN or EWOULDBLOCK.In order to be notified of incoming connections on a socket, you can use select(2) or poll(2). A readable event will be delivered when a new connection is
attempted and you may then call accept() to get a socket for that connection. Alternatively, you can set the socket to deliver SIGIO when activity occurs
on a socket; see socket(7) for details.For certain protocols which require an explicit confirmation, such as DECNet, accept() can be thought of as merely dequeuing the next connection request and
not implying confirmation. Confirmation can be implied by a normal read or write on the new file descriptor, and rejection can be implied by closing the
new socket. Currently only DECNet has these semantics on Linux.If flags is 0, then accept4() is the same as accept(). The following values can be bitwise ORed in flags to obtain different behavior:
SOCK_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description. Using this flag saves extra calls to fcntl(2) to achieve the same
result.SOCK_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor. See the description of the O_CLOEXEC flag in open(2) for reasons why
this may be useful.
这里 accept4 与 accept 的区别在于通过它可以指定对应的 bitwise 的 flag 设定到对应返回的 fd,而无需再调用 fcntl 系统调用,关于 O_CLOEXEC 在 open 的 manual 中有说明:
O_CLOEXEC (Since Linux 2.6.23)
Enable the close-on-exec flag for the new file descriptor. Specifying this flag permits a program to avoid additional fcntl(2) F_SETFD operations to
set the FD_CLOEXEC flag. Additionally, use of this flag is essential in some multithreaded programs since using a separate fcntl(2) F_SETFD opera‐
tion to set the FD_CLOEXEC flag does not suffice to avoid race conditions where one thread opens a file descriptor at the same time as another thread
does a fork(2) plus execve(2).
这里讲的是可以避免竞态,保证在多线程环境下的操作原子性,具体的分析可以参考 open函数O_CLOEXEC模式和fcntl函数FD_CLOEXEC选项
⑤ 以 accept4 返回的 fd 创建新的 SocketClient 对象,并将其存入到 mClients 序列中;
⑥ 接下来会继续进行 fds 中 client 部分的操作,从 fds[2] 开始判断,将 active 的 client 加入到 pending 序列中,这里需要特别注意,当前的 client 并不会在此次循环的 pending 中出现,但是整个处理过程在 while(true) {...} 中进行,因此当下一次循环开始的时候,会同时 poll 当前加入的 client 对应的 socket 以及用于 listen 的 socket 所对应的事件;
⑦ 如上面第⑥步分析,当循环继续进行,在某次 poll 检测到上面创建的 SocketClient 对应的 socket 有消息到来时,其被加入到 pending 列表中,进入接下来的 for 循环处理 pending 对应的回调过程,也就是 onDataAvailable 操作:
void SocketListener::runListener() {
while (true) {
...
for (SocketClient* c : pending) {
// Process it, if false is returned, remove from the map
SLOGV("processing fd %d", c->getSocket());
if (!onDataAvailable(c)) {
release(c, false);
}
c->decRef();
}
}
}
onDataAvailable 是 SocketListener 指定的虚函数:
protected:
virtual bool onDataAvailable(SocketClient *c) = 0;
在 FwmarkServer 类中进行了实现:
bool FwmarkServer::onDataAvailable(SocketClient* client) {
int socketFd = -1;
int error = processClient(client, &socketFd);
if (socketFd >= 0) {
close(socketFd);
}
// Always send a response even if there were connection errors or read errors, so that we don't
// inadvertently cause the client to hang (which always waits for a response).
client->sendData(&error, sizeof(error));
// Always close the client connection (by returning false). This prevents a DoS attack where
// the client issues multiple commands on the same connection, never reading the responses,
// causing its receive buffer to fill up, and thus causing our client->sendData() to block.
return false;
}
总体来讲首先通过 processClient 进行 client 的处理(下面分析),然后通过 client 的 sendData 发送 processClient 的返回信息,最后返回 false,这样在 SocketListener 的 runListener 方法中会调通过 release 进行释放
消息处理
processClient 详细实现如下:
int FwmarkServer::processClient(SocketClient* client, int* socketFd) {
FwmarkCommand command;
FwmarkConnectInfo connectInfo;
iovec iov[2] = {
{ &command, sizeof(command) },
{ &connectInfo, sizeof(connectInfo) },
};
msghdr message;
memset(&message, 0, sizeof(message));
message.msg_iov = iov;
message.msg_iovlen = ARRAY_SIZE(iov);
union {
cmsghdr cmh;
char cmsg[CMSG_SPACE(sizeof(*socketFd))];
} cmsgu;
memset(cmsgu.cmsg, 0, sizeof(cmsgu.cmsg));
message.msg_control = cmsgu.cmsg;
message.msg_controllen = sizeof(cmsgu.cmsg);
int messageLength = TEMP_FAILURE_RETRY(recvmsg(client->getSocket(), &message, MSG_CMSG_CLOEXEC));
if (messageLength <= 0) {
return -errno;
}
if (!((command.cmdId != FwmarkCommand::ON_CONNECT_COMPLETE && messageLength == sizeof(command))
|| (command.cmdId == FwmarkCommand::ON_CONNECT_COMPLETE
&& messageLength == sizeof(command) + sizeof(connectInfo)))) {
return -EBADMSG;
}
Permission permission = mNetworkController->getPermissionForUser(client->getUid());
if (command.cmdId == FwmarkCommand::QUERY_USER_ACCESS) {
if ((permission & PERMISSION_SYSTEM) != PERMISSION_SYSTEM) {
return -EPERM;
}
return mNetworkController->checkUserNetworkAccess(command.uid, command.netId);
}
if (command.cmdId == FwmarkCommand::SET_COUNTERSET) {
return mTrafficCtrl->setCounterSet(command.trafficCtrlInfo, command.uid, client->getUid());
}
if (command.cmdId == FwmarkCommand::DELETE_TAGDATA) {
return mTrafficCtrl->deleteTagData(command.trafficCtrlInfo, command.uid, client->getUid());
}
cmsghdr* const cmsgh = CMSG_FIRSTHDR(&message);
if (cmsgh && cmsgh->cmsg_level == SOL_SOCKET && cmsgh->cmsg_type == SCM_RIGHTS &&
cmsgh->cmsg_len == CMSG_LEN(sizeof(*socketFd))) {
memcpy(socketFd, CMSG_DATA(cmsgh), sizeof(*socketFd));
}
if (*socketFd < 0) {
return -EBADF;
}
int family;
socklen_t familyLen = sizeof(family);
if (getsockopt(*socketFd, SOL_SOCKET, SO_DOMAIN, &family, &familyLen) == -1) {
return -errno;
}
if (!FwmarkCommand::isSupportedFamily(family)) {
return -EAFNOSUPPORT;
}
Fwmark fwmark;
socklen_t fwmarkLen = sizeof(fwmark.intValue);
if (getsockopt(*socketFd, SOL_SOCKET, SO_MARK, &fwmark.intValue, &fwmarkLen) == -1) {
return -errno;
}
switch (command.cmdId) {
case FwmarkCommand::ON_ACCEPT: {
// Called after a socket accept(). The kernel would've marked the NetId and necessary
// permissions bits, so we just add the rest of the user's permissions here.
permission = static_cast<Permission>(permission | fwmark.permission);
break;
}
case FwmarkCommand::ON_CONNECT: {
// Called before a socket connect() happens. Set an appropriate NetId into the fwmark so
// that the socket routes consistently over that network. Do this even if the socket
// already has a NetId, so that calling connect() multiple times still works.
//
// But if the explicit bit was set, the existing NetId was explicitly preferred (and not
// a case of connect() being called multiple times). Don't reset the NetId in that case.
//
// An "appropriate" NetId is the NetId of a bypassable VPN that applies to the user, or
// failing that, the default network. We'll never set the NetId of a secure VPN here.
// See the comments in the implementation of getNetworkForConnect() for more details.
//
// If the protect bit is set, this could be either a system proxy (e.g.: the dns proxy
// or the download manager) acting on behalf of another user, or a VPN provider. If it's
// a proxy, we shouldn't reset the NetId. If it's a VPN provider, we should set the
// default network's NetId.
//
// There's no easy way to tell the difference between a proxy and a VPN app. We can't
// use PERMISSION_SYSTEM to identify the proxy because a VPN app may also have those
// permissions. So we use the following heuristic:
//
// If it's a proxy, but the existing NetId is not a VPN, that means the user (that the
// proxy is acting on behalf of) is not subject to a VPN, so the proxy must have picked
// the default network's NetId. So, it's okay to replace that with the current default
// network's NetId (which in all likelihood is the same).
//
// Conversely, if it's a VPN provider, the existing NetId cannot be a VPN. The only time
// we set a VPN's NetId into a socket without setting the explicit bit is here, in
// ON_CONNECT, but we won't do that if the socket has the protect bit set. If the VPN
// provider connect()ed (and got the VPN NetId set) and then called protect(), we
// would've unset the NetId in PROTECT_FROM_VPN below.
//
// So, overall (when the explicit bit is not set but the protect bit is set), if the
// existing NetId is a VPN, don't reset it. Else, set the default network's NetId.
if (!fwmark.explicitlySelected) {
if (!fwmark.protectedFromVpn) {
fwmark.netId = mNetworkController->getNetworkForConnect(client->getUid());
} else if (!mNetworkController->isVirtualNetwork(fwmark.netId)) {
fwmark.netId = mNetworkController->getDefaultNetwork();
}
}
break;
}
case FwmarkCommand::ON_CONNECT_COMPLETE: {
// Called after a socket connect() completes.
// This reports connect event including netId, destination IP address, destination port,
// uid, connect latency, and connect errno if any.
// Skip reporting if connect() happened on a UDP socket.
int socketProto;
socklen_t intSize = sizeof(socketProto);
const int ret = getsockopt(*socketFd, SOL_SOCKET, SO_PROTOCOL, &socketProto, &intSize);
if ((ret != 0) || (socketProto == IPPROTO_UDP)) {
break;
}
android::sp<android::net::metrics::INetdEventListener> netdEventListener =
mEventReporter->getNetdEventListener();
if (netdEventListener != nullptr) {
char addrstr[INET6_ADDRSTRLEN];
char portstr[sizeof("65536")];
const int ret = getnameinfo((sockaddr*) &connectInfo.addr, sizeof(connectInfo.addr),
addrstr, sizeof(addrstr), portstr, sizeof(portstr),
NI_NUMERICHOST | NI_NUMERICSERV);
netdEventListener->onConnectEvent(fwmark.netId, connectInfo.error,
connectInfo.latencyMs,
(ret == 0) ? String16(addrstr) : String16(""),
(ret == 0) ? strtoul(portstr, nullptr, 10) : 0, client->getUid());
}
break;
}
case FwmarkCommand::SELECT_NETWORK: {
fwmark.netId = command.netId;
if (command.netId == NETID_UNSET) {
fwmark.explicitlySelected = false;
fwmark.protectedFromVpn = false;
permission = PERMISSION_NONE;
} else {
if (int ret = mNetworkController->checkUserNetworkAccess(client->getUid(),
command.netId)) {
return ret;
}
fwmark.explicitlySelected = true;
fwmark.protectedFromVpn = mNetworkController->canProtect(client->getUid());
}
break;
}
case FwmarkCommand::PROTECT_FROM_VPN: {
if (!mNetworkController->canProtect(client->getUid())) {
return -EPERM;
}
// If a bypassable VPN's provider app calls connect() and then protect(), it will end up
// with a socket that looks like that of a system proxy but is not (see comments for
// ON_CONNECT above). So, reset the NetId.
//
// In any case, it's appropriate that if the socket has an implicit VPN NetId mark, the
// PROTECT_FROM_VPN command should unset it.
if (!fwmark.explicitlySelected && mNetworkController->isVirtualNetwork(fwmark.netId)) {
fwmark.netId = mNetworkController->getDefaultNetwork();
}
fwmark.protectedFromVpn = true;
permission = static_cast<Permission>(permission | fwmark.permission);
break;
}
case FwmarkCommand::SELECT_FOR_USER: {
if ((permission & PERMISSION_SYSTEM) != PERMISSION_SYSTEM) {
return -EPERM;
}
fwmark.netId = mNetworkController->getNetworkForUser(command.uid);
fwmark.protectedFromVpn = true;
break;
}
case FwmarkCommand::TAG_SOCKET: {
// If the UID is -1, tag as the caller's UID:
// - TrafficStats and NetworkManagementSocketTagger use -1 to indicate "use the
// caller's UID".
// - xt_qtaguid will see -1 on the command line, fail to parse it as a uint32_t, and
// fall back to current_fsuid().
if (static_cast<int>(command.uid) == -1) {
command.uid = client->getUid();
}
return mTrafficCtrl->tagSocket(*socketFd, command.trafficCtrlInfo, command.uid,
client->getUid());
}
case FwmarkCommand::UNTAG_SOCKET: {
// Any process can untag a socket it has an fd for.
return mTrafficCtrl->untagSocket(*socketFd);
}
default: {
// unknown command
return -EPROTO;
}
}
fwmark.permission = permission;
if (setsockopt(*socketFd, SOL_SOCKET, SO_MARK, &fwmark.intValue,
sizeof(fwmark.intValue)) == -1) {
return -errno;
}
return 0;
}
processClient 首先调用了 recvmsg 进行消息收取,然后对消息进行处理,关于 recvmsg 以及 msghdr cmsghdr 等相关内容,可以参考下面的分析:
具体的处理过程则根据详细的特定功能定义确定。