Android 策略路由

Android使用netd程序对网络进行管理,一个操作系统如何对网络进行管理的呢,又是如何实现的,我们本篇文章就来分析一下。

Android的网络管理我觉得最重要的就是策略路由,为了了解Android的策略路由,我们先熟悉下路由策略。

路由器我们都比较熟悉,属于三层设备,路由器是用于数据包转发的,将数据包从一个网络转发到另一个网络,那么路由器如何确定将一个数据包从一个设备转发到下一跳路由设备呢,是通过路由表。 ip route show命令可以查看路由表。

#ip route show
61.135.165.183 via 192.168.1.1 dev wlp5s0 proto unspec 
default via 192.168.1.1 dev wlp5s0 proto dhcp metric 600 

上面是我台式机上的路由表,省略一些条目只保留两条。第一条的含义是如果要发送数据到61.135.165.183 这个ip指,需要通过wlp5s0这块网卡发出,下一跳地址为192.168.1.1 。 192.168.1.1 这个地址就是我所在局域网的路由器的地址。 第二条是一个默认路由,也就是说目标地址没有匹配到任何路由表条目,就通过默认路由指定的网卡发出,下一跳地址为默认路由指定的地址。

所以总结一下就是路由表有两个作用:
1 选择网卡。
2 指定下一跳地址(网关地址–指路由器地址)。

路由表就是这么简单吗,是的。 由于路由表如此简单,为了满足更灵活的配置策略,操作系统支持了多个路由表,在路由表上层添加了可配置的策略,通过这层策略确定网络数据的转发需要查询哪个路由表。这一层就是策略路由。ip rule show 可以查看策略路由这一层的策略。

#ip rule show 
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

上面也是我台式机上的路由策略,有三条规则,每条格式为:

优先级: from 匹配规则 lookup 路由表

Linux系统默认有255张路由表,前面我们用ip route show看到的是main表。路由策略根据数据包 和 匹配规则,确定要查哪张路由表进行数据转发。 整个过程是按照优先级来查的。 假设某个ip数据包匹配到了路由策略的规则,则使用该规则指定的路由表去查询下一跳和网卡。如果在该路由表中查找到转发规则,则转发数据包,否则继续查下一个优先级的策略。

有了这些背景知识后,我们还没看到具体策略的匹配规则,来点复杂的,我们来看下Android的路由规则, 以6.0模拟器为例:

#ip rule show 
0:	from all lookup local 
10000:	from all fwmark 0xc0000/0xd0000 lookup legacy_system 
10500:	from all oif eth0 uidrange 0-0 lookup eth0 
13000:	from all fwmark 0x10063/0x1ffff lookup local_network 
13000:	from all fwmark 0x10066/0x1ffff lookup eth0 
14000:	from all oif eth0 lookup eth0 
15000:	from all fwmark 0x0/0x10000 lookup legacy_system 
16000:	from all fwmark 0x0/0x10000 lookup legacy_network 
17000:	from all fwmark 0x0/0x10000 lookup local_network 
19000:	from all fwmark 0x66/0x1ffff lookup eth0 
22000:	from all fwmark 0x0/0xffff lookup eth0 
23000:	from all fwmark 0x0/0xffff uidrange 0-0 lookup main 
32000:	from all unreachable

这里面一共有13条策略按照优先级从高到低排列。

第1条策略:任何数据报都要先查local 路由表。我们来看下local路由表:

root@houdini:/ # ip route show table local                                     
broadcast 10.0.2.0 dev eth0  proto kernel  scope link  src 10.0.2.15 
local 10.0.2.15 dev eth0  proto kernel  scope host  src 10.0.2.15 
broadcast 10.0.2.255 dev eth0  proto kernel  scope link  src 10.0.2.15 
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1 
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1 
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1 
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1 

这里面都是广播、本地回环地址相关的路由规则。

我们可以注意到第二条策略里面包含fmark。在介绍第二条规则前我们先说明下fwmark, fwmark是用于实现策略路由的,mark的意思是给数据流量打标记,这样路由规则就可以根据mark值做一些更细粒度的路由策略。 Android里面就是使用fwmark来实现策略路由的。fwmark分两部分,包括 匹配值/掩码。 路由策略只关心掩码位为1的位。 Android 6使用的fwmark为20位,定义如下:

union Fwmark {
    uint32_t intValue;
    struct {
        unsigned netId          : 16;
        bool explicitlySelected :  1;
        bool protectedFromVpn   :  1;
        Permission permission   :  2;
    };
    Fwmark() : intValue(0) {}
};

最低16位为网络id, 17位表示是否指定(显示)使用该网络, 18 位表示是否受vpn保护(为1表示该流量不通过vpn代理出口)。19-20位表示该数据包所属应用的权限。

权限定义如下

// This enum represents the permissions we care about for networking. When applied to an app, it's
// the permission the app (UID) has been granted. When applied to a network, it's the permission an
// app must hold to be allowed to use the network. PERMISSION_NONE means "no special permission is
// held by the app" or "no special permission is required to use the network".
//
// Permissions are flags that can be OR'ed together to represent combinations of permissions.
//
// PERMISSION_NONE is used for regular networks and apps, such as those that hold the
// android.permission.INTERNET framework permission.
//
// PERMISSION_NETWORK is used for privileged networks and apps that can manipulate or access them,
// such as those that hold the android.permission.CHANGE_NETWORK_STATE framework permission.
//
// PERMISSION_SYSTEM is used for system apps, such as those that are installed on the system
// partition, those that hold the android.permission.CONNECTIVITY_INTERNAL framework permission and
// those whose UID is less than FIRST_APPLICATION_UID.
enum Permission {
    PERMISSION_NONE    = 0x0,
    PERMISSION_NETWORK = 0x1,
    PERMISSION_SYSTEM  = 0x3,  // Includes PERMISSION_NETWORK.
};

这里注释写的比较清楚就不详细解析了奥。

再回来看我们Android ip 策略。
第2条策略: 拥有系统权限应用没有明确指定网络的数据包要查legacy_system 路由表。也就是有系统网络权限的应用可以设置路由规则到legacy_system路由表里面,来配置自己的路由规则。 legacy_system路由表为系统应用设置的规则,优先级比较高。

第3条策略: 来自eth0网卡的数据包,并且uid是0(root用户),查找eth0表。 eth0路由表为使用eth0网卡作为出口网卡的规则。这个主要针对外部进入的数据包。
第4条策略: 明确指定使用本地网络的数据包,查 local_network 路由表。
第5条策略:明确指定使用eth0网络的数据包,查eth0路由表。
第6条策略: 由eth0网卡进来的数据包,查eth0路由表。
第7条策略:没有明确指定网络的数据包都需要查 legacy_system路由表(只有用过CONNECTIVITY_INTERNAL的系统应用可以设置的路由表)。
第8条策略:没有明确指定网络的数据包都需要查 legacy_network路由表(其他有管理网络权限的应用也能设置的路由表)。
第9条策略:没有明确指定网络的数据包都需要查 local_network路由表(其他有管理网络权限的应用也能设置的路由表)。
第10条策略:没有明确指定网络id,但是网络id是eth0网络对应id(66) 标记的数据包,查找eth0路由表。
第11条策略:没有指定网络id的数据包,查找eth0路由表。
第12条策略:没有指定网络id的数据包,uid是0的数据,查找main路由表。
第13条策略:表示网络不可达。

总结一下:
local表最优先
系统网络管理应用,有限查legacy_system
明确指定网络的,查对应网络表
没有明确指定网络id的查legacy 相关表
按照网络id查找对应表
默认网络id对应的路由表
main表
不可达

Android的路由策略优先级定义在system/netd/server/RouteController.cpp中

const uint32_t RULE_PRIORITY_VPN_OVERRIDE_SYSTEM = 10000;
const uint32_t RULE_PRIORITY_VPN_OVERRIDE_OIF    = 10500;
const uint32_t RULE_PRIORITY_VPN_OUTPUT_TO_LOCAL = 11000;
const uint32_t RULE_PRIORITY_SECURE_VPN          = 12000;
const uint32_t RULE_PRIORITY_EXPLICIT_NETWORK    = 13000;
const uint32_t RULE_PRIORITY_OUTPUT_INTERFACE    = 14000;
const uint32_t RULE_PRIORITY_LEGACY_SYSTEM       = 15000;
const uint32_t RULE_PRIORITY_LEGACY_NETWORK      = 16000;
const uint32_t RULE_PRIORITY_LOCAL_NETWORK       = 17000;
const uint32_t RULE_PRIORITY_TETHERING           = 18000;
const uint32_t RULE_PRIORITY_IMPLICIT_NETWORK    = 19000;
const uint32_t RULE_PRIORITY_BYPASSABLE_VPN      = 20000;
const uint32_t RULE_PRIORITY_VPN_FALLTHROUGH     = 21000;
const uint32_t RULE_PRIORITY_DEFAULT_NETWORK     = 22000;
const uint32_t RULE_PRIORITY_DIRECTLY_CONNECTED  = 23000;
const uint32_t RULE_PRIORITY_UNREACHABLE         = 32000;

我们来看下eth0路由表里是什么规则

#ip route show table eth0
default via 10.0.2.2 dev eth0  proto static 

是一个静态的默认路由。 也就是说如果进入eth0表进行匹配,都会通过eth0网卡发出,下一跳路由地址是10.0.2.2。

有了前面的介绍这里我们就对Android的策略路由有了一个基本的认识,总结下来就是Android 通过多个路由表 + fwmark的形式,来设置整体路由策略,确定网络请求通过哪块网卡发出。 在多网卡的情况下,包括数据网络、wifi网络、以太网网络,网络管理程序可以进行灵活的配置,指定数据从哪个网络发出。另一方面在vpn网络下,也能方便实现vpn代理,配置哪些应用通过vpn网络进行数据代理。

下面我们来分析下Android 策略路由的具体代码实现。

前面我们知道了Android通过网络id来管理网络实现路由策略,选择网卡。那么是如何创建网络的呢?

frameworks/base/services/core/java/com/android/server/ConnectivityService.java

private void updateNetworkInfo(NetworkAgentInfo networkAgent, NetworkInfo newInfo) {
               ......
               if (networkAgent.isVPN()) {
                    mNetd.createVirtualNetwork(networkAgent.network.netId,
                            !networkAgent.linkProperties.getDnsServers().isEmpty(),
                            (networkAgent.networkMisc == null ||
                                !networkAgent.networkMisc.allowBypass));
                } else {
                    mNetd.createPhysicalNetwork(networkAgent.network.netId,
                            networkAgent.networkCapabilities.hasCapability(
                                    NET_CAPABILITY_NOT_RESTRICTED) ?
                                    null : NetworkManagementService.PERMISSION_SYSTEM);
                }
                ......
    updateLinkProperties(networkAgent, null);
}

当网络注册到ConnectivityService时,会根据该链接点的类型创建网络,对于vpn网络则调用mNetd.createVirtualNetwork()函数来创建虚拟网络。非VPN网络则代表物理网络,使用Netd.createPhysicalNetwork()函数创建。 对于一个物理网络,如果是没有什么限制的,也就是一般应用可以使用的,则它没有什么权限要求,否则只有系统网络管理应用可以使用则设置系统权限(NetworkManagementService.PERMISSION_SYSTEM)。(Vpn网络其实就是打开一个tun设备,创建一个tun网卡,经过路由协议确定使用该网卡的流量,就会通过打开的tun设备将ip数据包转发给打开tun设备的应用程序处理,这样就提供了应用层处理网络数据的能力。因为使用tun网卡,是虚拟网卡,所以叫虚拟网络)

创建网络后,调用updateLinkProperties函数来更新链接属性。

    private void updateLinkProperties(NetworkAgentInfo networkAgent, LinkProperties oldLp) {
        LinkProperties newLp = networkAgent.linkProperties;
        int netId = networkAgent.network.netId;
        ......
        updateInterfaces(newLp, oldLp, netId);
        ......
    }
    
    private void updateInterfaces(LinkProperties newLp, LinkProperties oldLp, int netId) {
        CompareResult<String> interfaceDiff = new CompareResult<String>();
        if (oldLp != null) {
            interfaceDiff = oldLp.compareAllInterfaceNames(newLp);
        } else if (newLp != null) {
            interfaceDiff.added = newLp.getAllInterfaceNames();
        }
        for (String iface : interfaceDiff.added) {
            try {
                if (DBG) log("Adding iface " + iface + " to network " + netId);
                mNetd.addInterfaceToNetwork(iface, netId);
            } catch (Exception e) {
                loge("Exception adding interface: " + e);
            }
        }
        for (String iface : interfaceDiff.removed) {
            try {
                if (DBG) log("Removing iface " + iface + " from network " + netId);
                mNetd.removeInterfaceFromNetwork(iface, netId);
            } catch (Exception e) {
                loge("Exception removing interface: " + e);
            }
        }
    }

updateLinkProperties函数调用Netd.addInterfaceToNetwork(iface, netId)来添加网卡到网络。 所以我们下面按照添加网络和添加接口两个步骤继续分析。 其实就是下面三个重点函数。

mNetd.createVirtualNetwork(networkAgent.network.netId,
                            !networkAgent.linkProperties.getDnsServers().isEmpty(),
                            (networkAgent.networkMisc == null ||
                                !networkAgent.networkMisc.allowBypass));
                                
mNetd.createPhysicalNetwork(networkAgent.network.netId,
                            networkAgent.networkCapabilities.hasCapability(
                                    NET_CAPABILITY_NOT_RESTRICTED) ?
                                    null : NetworkManagementService.PERMISSION_SYSTEM);
                                
mNetd.addInterfaceToNetwork(iface, netId)                             

创建虚拟网络的过程和创建物理网路的过程基本类似,所以我们就以创建物理网络为例进行分析。
先来看createPhysicalNetwork 函数:
frameworks/base/services/core/java/com/android/server/NetworkManagementService.java

    @Override
    public void createPhysicalNetwork(int netId, String permission) {
        mContext.enforceCallingOrSelfPermission(CONNECTIVITY_INTERNAL, TAG);

        try {
            if (permission != null) {
                mConnector.execute("network", "create", netId, permission);
            } else {
                mConnector.execute("network", "create", netId);
            }
        } catch (NativeDaemonConnectorException e) {
            throw e.rethrowAsParcelableException();
        }
    }

这个函数调用NativeDaemonConnector.exectue()函数创建网络,NativeDaemonConnector其实就是利用unix域套接字将命令发送给netd进程来处理。 对于NativeDaemonConnector发送的命令netd进程的处理在system/netd/server/CommandListener.cpp 代码的runCommand函数。 整个命令的发送和接收过程不是本文关注的重点,就不详细说明了。

system/netd/server/CommandListener.cpp

int CommandListener::NetworkCommand::runCommand(SocketClient* client, int argc, char** argv) {
  ......
  
  if (!strcmp(argv[1], "create")) {
        if (argc < 3) {
            return syntaxError(client, "Missing argument");
        }
        unsigned netId = stringToNetId(argv[2]);
        if (argc == 6 && !strcmp(argv[3], "vpn")) {
            bool hasDns = atoi(argv[4]);
            bool secure = atoi(argv[5]);
            if (int ret = sNetCtrl->createVirtualNetwork(netId, hasDns, secure)) {
                return operationError(client, "createVirtualNetwork() failed", ret);
            }
        } else if (argc > 4) {
            return syntaxError(client, "Unknown trailing argument(s)");
        } else {
            Permission permission = PERMISSION_NONE;
            if (argc == 4) {
                permission = stringToPermission(argv[3]);
                if (permission == PERMISSION_NONE) {
                    return syntaxError(client, "Unknown permission");
                }
            }
            if (int ret = sNetCtrl->createPhysicalNetwork(netId, permission)) {
                return operationError(client, "createPhysicalNetwork() failed", ret);
            }
        }
        return success(client);
    }
  ......
  
}


int NetworkController::createPhysicalNetwork(unsigned netId, Permission permission) {
    if (!((MIN_NET_ID <= netId && netId <= MAX_NET_ID) ||
          (MIN_OEM_ID <= netId && netId <= MAX_OEM_ID))) {
        ALOGE("invalid netId %u", netId);
        return -EINVAL;
    }

    if (isValidNetwork(netId)) {
        ALOGE("duplicate netId %u", netId);
        return -EEXIST;
    }

    PhysicalNetwork* physicalNetwork = new PhysicalNetwork(netId, mDelegateImpl);
    if (int ret = physicalNetwork->setPermission(permission)) {
        ALOGE("inconceivable! setPermission cannot fail on an empty network");
        delete physicalNetwork;
        return ret;
    }

    android::RWLock::AutoWLock lock(mRWLock);
    mNetworks[netId] = physicalNetwork;
    return 0;
}

添加物理网络最终调用到NetworkController->createPhysicalNetwork() 函数,函数实现比较简单,其实就是创建了一个PhysicalNetwork实例, 并设置了该网络的权限。

再来看addInterfaceToNetwork来添加网卡到网络的实现。
frameworks/base/services/core/java/com/android/server/NetworkManagementService.java

    @Override
    public void addInterfaceToNetwork(String iface, int netId) {
        modifyInterfaceInNetwork("add", "" + netId, iface);
    }

  private void modifyInterfaceInNetwork(String action, String netId, String iface) {
        mContext.enforceCallingOrSelfPermission(CONNECTIVITY_INTERNAL, TAG);
        try {
            mConnector.execute("network", "interface", action, netId, iface);
        } catch (NativeDaemonConnectorException e) {
            throw e.rethrowAsParcelableException();
        }
    }

也就是给netd发送network interface add netid iface命令, 举个例子添加eth0网卡,假设网络id为102,这个命令也就是:
network interface add 102 eth0. 该命令通过unix 域套接字发送给netd进程。 处理代码如下:
system/netd/server/CommandListener.cpp

int CommandListener::NetworkCommand::runCommand(SocketClient* client, int argc, char** argv) {
     ......
      //    0        1       2       3         4
    // network interface  add   <netId> <interface>
    // network interface remove <netId> <interface>
    if (!strcmp(argv[1], "interface")) {
        if (argc != 5) {
            return syntaxError(client, "Missing argument");
        }
        unsigned netId = stringToNetId(argv[3]);
        if (!strcmp(argv[2], "add")) {
            if (int ret = sNetCtrl->addInterfaceToNetwork(netId, argv[4])) {
                return operationError(client, "addInterfaceToNetwork() failed", ret);
            }
        } else if (!strcmp(argv[2], "remove")) {
            if (int ret = sNetCtrl->removeInterfaceFromNetwork(netId, argv[4])) {
                return operationError(client, "removeInterfaceFromNetwork() failed", ret);
            }
        } else {
            return syntaxError(client, "Unknown argument");
        }
        return success(client);
    }
    ......
         
}

通过NetworkController->addInterfaceToNetwork 方法添加。

int NetworkController::addInterfaceToNetwork(unsigned netId, const char* interface) {
    if (!isValidNetwork(netId)) {
        ALOGE("no such netId %u", netId);
        return -ENONET;
    }

    unsigned existingNetId = getNetworkForInterface(interface);
    if (existingNetId != NETID_UNSET && existingNetId != netId) {
        ALOGE("interface %s already assigned to netId %u", interface, existingNetId);
        return -EBUSY;
    }

    android::RWLock::AutoWLock lock(mRWLock);
    return getNetworkLocked(netId)->addInterface(interface);
}

这里根据netId找到对应的network,然后调用addInterface方法添加接口。
在这里插入图片描述

Network有三个实现,其中PhysicalNetwork代表物理网络,也就是真实的网卡对应网络,比如以太网卡,waln网卡。 VirtualNetwork代表虚拟网络,也就是虚拟网卡,比如tun网卡、tap网卡。 LocalNetwok则对应回环网卡对应的网络。这里我们只分析PhysicalNetwork。

int PhysicalNetwork::addInterface(const std::string& interface) {
    if (hasInterface(interface)) {
        return 0;
    }
    if (int ret = RouteController::addInterfaceToPhysicalNetwork(mNetId, interface.c_str(),
                                                                 mPermission)) {
        ALOGE("failed to add interface %s to netId %u", interface.c_str(), mNetId);
        return ret;
    }
    if (mIsDefault) {
        if (int ret = addToDefault(mNetId, interface, mPermission, mDelegate)) {
            return ret;
        }
    }
    mInterfaces.insert(interface);
    return 0;
}

addInterface方法先调用RouteController::addInterfaceToPhysicalNetwork 来为对应网卡路由表添加流量匹配策略,也就是我们执行ip rule show 看到的对应网卡表相关规则。另外如果该网络是默认网络,则调用addToDefault函数执行默认网络的操作。 Android 会选择一个默认的网卡作为默认网络,也就是说应用程序不明确指定网络的情况下使用的网络为默认网络。

先来看RouteController.addInterfaceToPhysicalNetwork()的实现
system/netd/server/RouteController.cpp

int RouteController::addInterfaceToPhysicalNetwork(unsigned netId, const char* interface,
                                                   Permission permission) {
    if (int ret = modifyPhysicalNetwork(netId, interface, permission, ACTION_ADD)) {
        return ret;
    }
    updateTableNamesFile();
    return 0;
}

很简单: addInterfaceToPhysicalNetwork调用modifyPhysicalNetwork方法添加网络接口。 然后调用updateTableNamesFile方法持久化到配置文件中,用于下次启动后重新加载。

WARN_UNUSED_RESULT int modifyPhysicalNetwork(unsigned netId, const char* interface,
                                             Permission permission, bool add) {
    uint32_t table = getRouteTableForInterface(interface);
    if (table == RT_TABLE_UNSPEC) {
        return -ESRCH;
    }

    if (int ret = modifyIncomingPacketMark(netId, interface, permission, add)) {
        return ret;
    }
    if (int ret = modifyExplicitNetworkRule(netId, table, permission, INVALID_UID, INVALID_UID,
                                            add)) {
        return ret;
    }
    if (int ret = modifyOutputInterfaceRules(interface, table, permission, INVALID_UID, INVALID_UID,
                                            add)) {
        return ret;
    }
    return modifyImplicitNetworkRule(netId, table, permission, add);
}

可以新添加一个网卡要设置4个路由策略,分别是进入该网的数据包的mark,显式路由规则,从该网卡发出的路由规则以及隐式的路由规则。先来看下进入的网络数据包mark的设置。

// An iptables rule to mark incoming packets on a network with the netId of the network.
//
// This is so that the kernel can:
// + Use the right fwmark for (and thus correctly route) replies (e.g.: TCP RST, ICMP errors, ping
//   replies, SYN-ACKs, etc).
// + Mark sockets that accept connections from this interface so that the connection stays on the
//   same interface.
WARN_UNUSED_RESULT int modifyIncomingPacketMark(unsigned netId, const char* interface,
                                                Permission permission, bool add) {
    Fwmark fwmark;

    fwmark.netId = netId;
    fwmark.explicitlySelected = true;
    fwmark.protectedFromVpn = true;
    fwmark.permission = permission;

    char markString[UINT32_HEX_STRLEN];
    snprintf(markString, sizeof(markString), "0x%x", fwmark.intValue);

    if (execIptables(V4V6, "-t", "mangle", add ? "-A" : "-D", "INPUT", "-i", interface, "-j",
                     "MARK", "--set-mark", markString, NULL)) {
        ALOGE("failed to change iptables rule that sets incoming packet mark");
        return -EREMOTEIO;
    }

    return 0;
}

modifyIncomingPacketMark函数分执行了两条iptables命令来创建iptables规则:
/system/bin/iptables -t mangle -A INPUT -i $interface -j MARK --set-mark 0x fwmark.intValue
/system/bin/ip6tables -t mangle -A INPUT -i $interface -j MARK --set-mark 0x fwmark.intValue

也就是设置从该网卡进入的流量都进行标记。标记值为fwmark.intValue。 也就是低16位设置为netid,低17位设置为1,表示明确指定网络, 18位为1,表示受vpn保护,19-20位指定权限。 这样进入该网卡进入的流量就打上了标记。

再来看下设置显式的规则

// A rule to route traffic based on an explicitly chosen network.
//
// Supports apps that use the multinetwork APIs to restrict their traffic to a network.
//
// Even though we check permissions at the time we set a netId into the fwmark of a socket, we need
// to check it again in the rules here, because a network's permissions may have been updated via
// modifyNetworkPermission().
WARN_UNUSED_RESULT int modifyExplicitNetworkRule(unsigned netId, uint32_t table,
                                                 Permission permission, uid_t uidStart,
                                                 uid_t uidEnd, bool add) {
    Fwmark fwmark;
    Fwmark mask;

    fwmark.netId = netId;
    mask.netId = FWMARK_NET_ID_MASK;

    fwmark.explicitlySelected = true;
    mask.explicitlySelected = true;

    fwmark.permission = permission;
    mask.permission = permission;

    return modifyIpRule(add ? RTM_NEWRULE : RTM_DELRULE, RULE_PRIORITY_EXPLICIT_NETWORK, table,
                        fwmark.intValue, mask.intValue, IIF_NONE, OIF_NONE, uidStart, uidEnd);
}

也就是设置明确指定的网络id的数据包,都通过该网卡对应的路由表寻找路由规则。

再来看下怎么配置经由该网卡出去的网络规则

// A rule to route traffic based on a chosen outgoing interface.
//
// Supports apps that use SO_BINDTODEVICE or IP_PKTINFO options and the kernel that already knows
// the outgoing interface (typically for link-local communications).
WARN_UNUSED_RESULT int modifyOutputInterfaceRules(const char* interface, uint32_t table,
                                                  Permission permission, uid_t uidStart,
                                                  uid_t uidEnd, bool add) {
    Fwmark fwmark;
    Fwmark mask;

    fwmark.permission = permission;
    mask.permission = permission;

    // If this rule does not specify a UID range, then also add a corresponding high-priority rule
    // for UID. This covers forwarded packets and system daemons such as the tethering DHCP server.
    if (uidStart == INVALID_UID && uidEnd == INVALID_UID) {
        if (int ret = modifyIpRule(add ? RTM_NEWRULE : RTM_DELRULE, RULE_PRIORITY_VPN_OVERRIDE_OIF,
                                   table, fwmark.intValue, mask.intValue, IIF_NONE, interface,
                                   UID_ROOT, UID_ROOT)) {
            return ret;
        }
    }

    return modifyIpRule(add ? RTM_NEWRULE : RTM_DELRULE, RULE_PRIORITY_OUTPUT_INTERFACE, table,
                        fwmark.intValue, mask.intValue, IIF_NONE, interface, uidStart, uidEnd);
}

这条只设置了权限。也就是针对满足权限的应用流量的规则。

最后就是隐式的路由规则的添加:

// A rule to route traffic based on the chosen network.
//
// This is for sockets that have not explicitly requested a particular network, but have been
// bound to one when they called connect(). This ensures that sockets connected on a particular
// network stay on that network even if the default network changes.
WARN_UNUSED_RESULT int modifyImplicitNetworkRule(unsigned netId, uint32_t table,
                                                 Permission permission, bool add) {
    Fwmark fwmark;
    Fwmark mask;

    fwmark.netId = netId;
    mask.netId = FWMARK_NET_ID_MASK;

    fwmark.explicitlySelected = false;
    mask.explicitlySelected = true;

    fwmark.permission = permission;
    mask.permission = permission;

    return modifyIpRule(add ? RTM_NEWRULE : RTM_DELRULE, RULE_PRIORITY_IMPLICIT_NETWORK, table,
                        fwmark.intValue, mask.intValue);
}

也就是没有明确指定网络id的情况,也是通过该路由规则。 也就是我们前面看到的from all fwmark 0x66/0x1ffff lookup eth0 这条规则。

这里对于一个网络接口的路由规则设置就介绍完了, 还需要注意的就是规则的优先级。分别是RULE_PRIORITY_EXPLICIT_NETWORK, RULE_PRIORITY_VPN_OVERRIDE_OIF, RULE_PRIORITY_IMPLICIT_NETWORK。 有就是显式指定网络的优先级高于 隐士指定的。

const uint32_t RULE_PRIORITY_VPN_OVERRIDE_SYSTEM = 10000;
const uint32_t RULE_PRIORITY_VPN_OVERRIDE_OIF    = 10500;
const uint32_t RULE_PRIORITY_VPN_OUTPUT_TO_LOCAL = 11000;
const uint32_t RULE_PRIORITY_SECURE_VPN          = 12000;
const uint32_t RULE_PRIORITY_EXPLICIT_NETWORK    = 13000;
const uint32_t RULE_PRIORITY_OUTPUT_INTERFACE    = 14000;
const uint32_t RULE_PRIORITY_LEGACY_SYSTEM       = 15000;
const uint32_t RULE_PRIORITY_LEGACY_NETWORK      = 16000;
const uint32_t RULE_PRIORITY_LOCAL_NETWORK       = 17000;
const uint32_t RULE_PRIORITY_TETHERING           = 18000;
const uint32_t RULE_PRIORITY_IMPLICIT_NETWORK    = 19000;
const uint32_t RULE_PRIORITY_BYPASSABLE_VPN      = 20000;
const uint32_t RULE_PRIORITY_VPN_FALLTHROUGH     = 21000;
const uint32_t RULE_PRIORITY_DEFAULT_NETWORK     = 22000;
const uint32_t RULE_PRIORITY_DIRECTLY_CONNECTED  = 23000;
const uint32_t RULE_PRIORITY_UNREACHABLE         = 32000;

要想路由生效,还需要对网路流量进行mark,看完了路由规则的设置我们再来看下网络流量的mark。
bionic/libc/bionic/libc_init_dynamic.cpp

// We flag the __libc_preinit function as a constructor to ensure
// that its address is listed in libc.so's .init_array section.
// This ensures that the function is called by the dynamic linker
// as soon as the shared library is loaded.
__attribute__((constructor)) static void __libc_preinit() {
  // Read the kernel argument block pointer from TLS.
  void** tls = __get_tls();
  KernelArgumentBlock** args_slot = &reinterpret_cast<KernelArgumentBlock**>(tls)[TLS_SLOT_BIONIC_PREINIT];
  KernelArgumentBlock* args = *args_slot;

  // Clear the slot so no other initializer sees its value.
  // __libc_init_common() will change the TLS area so the old one won't be accessible anyway.
  *args_slot = NULL;

  __libc_init_common(*args);

  // Hooks for various libraries to let them know that we're starting up.
  malloc_debug_init();
  netdClientInit();
}

在libc.so被加载的时候就会执行__attribute__((constructor))标识的方法(这些方法被放在ELF的init_array section里面)。 所以在加载了libc的程序里面(基本上所有的进程都要依赖libc库),在libc加载时执行的__libc_preinit方法,会做一些libc初始化工作。 这其中最后一项就是调用netdClientInit()来初始化网络流量标记相关的操作。

bionic/libc/bionic/NetdClient.cpp

extern "C" __LIBC_HIDDEN__ void netdClientInit() {
    if (pthread_once(&netdClientInitOnce, netdClientInitImpl)) {
        __libc_format_log(ANDROID_LOG_ERROR, "netdClient", "Failed to initialize netd_client");
    }
}

static void netdClientInitImpl() {
    void* netdClientHandle = dlopen("libnetd_client.so", RTLD_NOW);
    if (netdClientHandle == NULL) {
        // If the library is not available, it's not an error. We'll just use
        // default implementations of functions that it would've overridden.
        return;
    }
    netdClientInitFunction(netdClientHandle, "netdClientInitAccept4",
                           &__netdClientDispatch.accept4);
    netdClientInitFunction(netdClientHandle, "netdClientInitConnect",
                           &__netdClientDispatch.connect);
    netdClientInitFunction(netdClientHandle, "netdClientInitNetIdForResolv",
                           &__netdClientDispatch.netIdForResolv);
    netdClientInitFunction(netdClientHandle, "netdClientInitSocket", &__netdClientDispatch.socket);
}

static void netdClientInitFunction(void* handle, const char* symbol, FunctionType* function) {
    typedef void (*InitFunctionType)(FunctionType*);
    InitFunctionType initFunction = reinterpret_cast<InitFunctionType>(dlsym(handle, symbol));
    if (initFunction != NULL) {
        initFunction(function);
    }
}

bionic/libc/bionic/NetdClientDispatch.cpp

// This structure is modified only at startup (when libc.so is loaded) and never
// afterwards, so it's okay that it's read later at runtime without a lock.
__LIBC_HIDDEN__ NetdClientDispatch __netdClientDispatch __attribute__((aligned(32))) = {
    __accept4,
    __connect,
    __socket,
    fallBackNetIdForResolv,
};

netdClientInit 调用netdClientInitImpl,通过pthread_once保证netdClientInitImpl只被调用一次。 然后netdClientInitImpl方法首先加载libnetd_client.so库,然后执行netdClientInitFunction分别对accept、cnnect、resolv、socket 四个函数进行hook。hook后的方法分别是libnetd_client.so库里的netdClientInitAccept4,netdClientInitConnect,netdClientInitNetIdForResolv和netdClientInitSocket。 四个函数基本实现一样我们只看netdClientInitConnect的实现

system/netd/client/NetdClient.cpp


extern "C" void netdClientInitConnect(ConnectFunctionType* function) {
    if (function && *function) {
        libcConnect = *function;
        *function = netdClientConnect;
    }
}

netdClientInitConnect函数把__netdClientDispatch.connect 设置成netdClientConnect,并且通过libcConnect保存了 原来__netdClientDispatch.connect指向的系统调用的__connect。

再来看下libc的connect函数实现
bionic/libc/bionic/connect.cpp

int connect(int sockfd, const sockaddr* addr, socklen_t addrlen) {
    return __netdClientDispatch.connect(sockfd, addr, addrlen);
}

libc的connect实现其实是调用的__netdClientDispatch.connect方法,这样应用程序在调用connect方法的时候实际上调用的是system/netd/client/NetdClient.cpp中的netdClientConnect方法。所以到这里我们想也能知道,在netdClientConnect中会对流量进行标记。来看看吧

int netdClientConnect(int sockfd, const sockaddr* addr, socklen_t addrlen) {
    if (sockfd >= 0 && addr && FwmarkClient::shouldSetFwmark(addr->sa_family)) {
        FwmarkCommand command = {FwmarkCommand::ON_CONNECT, 0, 0};
        if (int error = FwmarkClient().send(&command, sockfd)) {
            errno = -error;
            return -1;
        }
    }
    return libcConnect(sockfd, addr, addrlen);
}

实现比较简单,首先通过FwmarkClient().send(&command, sockfd)通知 Fwmark服务端为该文件描述符进行标记,然后调用libc的__connect方法发起系统调用。 所以标记的重点在于FwmarkClient().send(&command, sockfd)方法。

system/netd/client/FwmarkClient.cpp

int FwmarkClient::send(FwmarkCommand* data, int fd) {
    mChannel = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
    if (mChannel == -1) {
        return -errno;
    }

    if (TEMP_FAILURE_RETRY(connect(mChannel, reinterpret_cast<const sockaddr*>(&FWMARK_SERVER_PATH),
                                   sizeof(FWMARK_SERVER_PATH))) == -1) {
        // If we are unable to connect to the fwmark server, assume there's no error. This protects
        // against future changes if the fwmark server goes away.
        return 0;
    }

    iovec iov;
    iov.iov_base = data;
    iov.iov_len = sizeof(*data);

    msghdr message;
    memset(&message, 0, sizeof(message));
    message.msg_iov = &iov;
    message.msg_iovlen = 1;

    union {
        cmsghdr cmh;
        char cmsg[CMSG_SPACE(sizeof(fd))];
    } cmsgu;

    if (data->cmdId != FwmarkCommand::QUERY_USER_ACCESS) {
        memset(cmsgu.cmsg, 0, sizeof(cmsgu.cmsg));
        message.msg_control = cmsgu.cmsg;
        message.msg_controllen = sizeof(cmsgu.cmsg);

        cmsghdr* const cmsgh = CMSG_FIRSTHDR(&message);
        cmsgh->cmsg_len = CMSG_LEN(sizeof(fd));
        cmsgh->cmsg_level = SOL_SOCKET;
        cmsgh->cmsg_type = SCM_RIGHTS;
        memcpy(CMSG_DATA(cmsgh), &fd, sizeof(fd));
    }

    if (TEMP_FAILURE_RETRY(sendmsg(mChannel, &message, 0)) == -1) {
        return -errno;
    }

    int error = 0;

    if (TEMP_FAILURE_RETRY(recv(mChannel, &error, sizeof(error), 0)) == -1) {
        return -errno;
    }

    return error;
}

FwmarkClient::send方法也比较简单,就是通过unix域套接字发送了个消息给对端,然后接收返回数据。这里使用sendmsg方法进行发送,该方法可以跨进程发送文件描述符,在apue中有介绍会飞的描述符,在17章有兴趣的可以读一下,这里我们就不展开了。 这里unix域的对端也在netd进程中。代码如下
system/netd/server/FwmarkServer.cpp

int FwmarkServer::processClient(SocketClient* client, int* socketFd) {
    FwmarkCommand command;
    ......
    // 获取客户端uid对应的权限
    Permission permission = mNetworkController->getPermissionForUser(client->getUid());
    ......
    // 扩去socket文件描述符
    cmsghdr* const cmsgh = CMSG_FIRSTHDR(&message);
    if (cmsgh && cmsgh->cmsg_level == SOL_SOCKET && cmsgh->cmsg_type == SCM_RIGHTS &&
        cmsgh->cmsg_len == CMSG_LEN(sizeof(*socketFd))) {
        memcpy(socketFd, CMSG_DATA(cmsgh), sizeof(*socketFd));
    }

    if (*socketFd < 0) {
        return -EBADF;
    }
    // 获取原来的fwmark值
    Fwmark fwmark;
    socklen_t fwmarkLen = sizeof(fwmark.intValue);
    if (getsockopt(*socketFd, SOL_SOCKET, SO_MARK, &fwmark.intValue, &fwmarkLen) == -1) {
        return -errno;
    }

    switch (command.cmdId) {
        case FwmarkCommand::ON_ACCEPT: {
            ......
            break;
        }

        case FwmarkCommand::ON_CONNECT: {
            // Called before a socket connect() happens. Set an appropriate NetId into the fwmark so
            // that the socket routes consistently over that network. Do this even if the socket
            // already has a NetId, so that calling connect() multiple times still works.
            //
            // But if the explicit bit was set, the existing NetId was explicitly preferred (and not
            // a case of connect() being called multiple times). Don't reset the NetId in that case.
            //
            // An "appropriate" NetId is the NetId of a bypassable VPN that applies to the user, or
            // failing that, the default network. We'll never set the NetId of a secure VPN here.
            // See the comments in the implementation of getNetworkForConnect() for more details.
            //
            // If the protect bit is set, this could be either a system proxy (e.g.: the dns proxy
            // or the download manager) acting on behalf of another user, or a VPN provider. If it's
            // a proxy, we shouldn't reset the NetId. If it's a VPN provider, we should set the
            // default network's NetId.
            //
            // There's no easy way to tell the difference between a proxy and a VPN app. We can't
            // use PERMISSION_SYSTEM to identify the proxy because a VPN app may also have those
            // permissions. So we use the following heuristic:
            //
            // If it's a proxy, but the existing NetId is not a VPN, that means the user (that the
            // proxy is acting on behalf of) is not subject to a VPN, so the proxy must have picked
            // the default network's NetId. So, it's okay to replace that with the current default
            // network's NetId (which in all likelihood is the same).
            //
            // Conversely, if it's a VPN provider, the existing NetId cannot be a VPN. The only time
            // we set a VPN's NetId into a socket without setting the explicit bit is here, in
            // ON_CONNECT, but we won't do that if the socket has the protect bit set. If the VPN
            // provider connect()ed (and got the VPN NetId set) and then called protect(), we
            // would've unset the NetId in PROTECT_FROM_VPN below.
            //
            // So, overall (when the explicit bit is not set but the protect bit is set), if the
            // existing NetId is a VPN, don't reset it. Else, set the default network's NetId.
            if (!fwmark.explicitlySelected) { // 没有明确指定网络才会重新设置netid
                if (!fwmark.protectedFromVpn) { // 没有收到保护,则可以使用任何网络
                    fwmark.netId = mNetworkController->getNetworkForConnect(client->getUid());
                } else if (!mNetworkController->isVirtualNetwork(fwmark.netId)) { // 收到vpn保护,并且之前没有设置使用虚拟网络, 则可以绕过vpn,设置网路id为默认网络
                    fwmark.netId = mNetworkController->getDefaultNetwork();
                }
            } // 如果原来是明确指定网络的,不被覆盖
            break;
        }

        case FwmarkCommand::SELECT_NETWORK: {
            ......
            break;
        }

        case FwmarkCommand::PROTECT_FROM_VPN: {
            ......
            break;
        }

        case FwmarkCommand::SELECT_FOR_USER: {
            ......
            break;
        }

        default: {
            // unknown command
            return -EPROTO;
        }
    }

    fwmark.permission = permission;
    // 设置套接字文件描述符的fwmark值
    if (setsockopt(*socketFd, SOL_SOCKET, SO_MARK, &fwmark.intValue,
                   sizeof(fwmark.intValue)) == -1) {
        return -errno;
    }

    return 0;
}

函数很简单,注释也比较清楚, 最终通过setsockopt设置文件描述副的fmark值。

备注:所有主机发出去的包iif 都先是lo网卡(转发的可能不是,文档上这么说的,需要查内核代码确认)

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值