文章目录
Calico Felix
Felix是一个守护程序,在每个 endpoints 的节点上运行。Felix 负责编制路由和 ACL 规则等,以便为该主机上的 endpoints 资源正常运行提供所需的网络连接
主要实现以下工作
- 管理网络接口,Felix 将有关接口的一些信息写到内核,以使内核能够正确处理该 endpoint 发出的流量。 特别是,它将确保主机正确响应来自每个工作负载的ARP请求,并将为其管理的接口启用IP转发支持。它还监视网络接口的出现和消失,以便确保针对这些接口的编程得到了正确的应用。
- 编写路由,Felix负责将到其主机上endpoints的路由编写到Linux内核FIB(转发信息库)中。 这可以确保那些发往目标主机的endpoints的数据包被正确地转发。
- 编写 ACLs,Felix还负责将ACLs编程到Linux内核中。 这些ACLs用于确保只能在endpoints之间发送有效的网络流量,并确保endpoints无法绕过Calico的安全措施。
- 报告状态,Felix 负责提供有关网络健康状况的数据。 特别是,它将报告配置其主机时发生的错误和问题。 该数据会被写入etcd,以使其对网络中的其他组件和操作才可见
代码分析
关于 bpf 的内容在 calico ebpf dataplane 章节详细介绍,本文只描述同样内容
进程
查看系统进程,启动允许 calico-node -felix
root 208521 2.0 0.7 2337856 61444 ? Sl Feb21 60:33 calico-node -felix
对应启动
} else if *runFelix {
logrus.SetFormatter(&logutils.Formatter{Component: "felix"})
felix.Run("/etc/calico/felix.cfg", buildinfo.GitVersion, buildinfo.BuildDate, buildinfo.GitRevision)
加载配置
- 加载配置
启动配置 /etc/calico/felix.cfg
[global]
MetadataAddr = None
LogFilePath = None
LogSeverityFile = None
LogSeveritySys = None
- 从 env 加载配置
DatastoreType: Kubernetes // DatastoreType 有 kubernetes,etcd;分别是通过 k8s api 或者直接通过 etcd 获取数据
如果配置了 calico-typha,则启动相应 client,在节点数比较多的情况下,Felix 可通过 Typha 直接和 ETCD 进行数据交互
-
获取 ippoolKVPList,然后判断是否 enable ipip 或 vxlan 等 encap 。
-
判断是否开启 bpf,主要通过 kernel 版本,bpf 是否 mount 决定
开启 http 服务
上报健康状态供 liveness and readiness 查询 /bin/calico-node -felix-live
startDataplaneDriver
dpDriver, dpDriverCmd = dp.StartDataplaneDriver(
configParams.Copy(), // Copy to avoid concurrent access.
healthAggregator,
configChangedRestartCallback, // 回调 channel 收 config changed 信息
fatalErrorCallback, // 回调channel,收 fatal error 信息
k8sClientSet)
-
根据 kube-ipvs0 是否存在和是否开启 bpf 判断 ipvs 是否开启
-
iptables mark 划分,0xffff0000,每个 mark 左移 32位进行分割
获取到可用 mark 范围 iptables mark bits acceptMark=0x10000 endpointMark=0xfff00000 endpointMarkNonCali=0x100000 passMark=0x20000 scratch0Mark=0x40000 scratch1Mark=0x80000
即:ACCEPT 标记 0x10000,endpoint 标记为 0xfff00000 等 -
初始化 RouteTable 分配器,维护 routeTable 1-250,有获取,释放等操作,如果开了 wireguard,则需要为 wireguard 分配一个 routeTable
启动 intDP
intDP := intdataplane.NewIntDataplaneDriver(dpConfig)
intDP.Start()
NewIntDataplaneDriver 第一个任务就是创建 iptables,定义一个 inputAcceptActions 列表
ruleRenderer = rules.NewRenderer
- 根据 config.IptablesFilterDenyAction,如果未配置为 DROP
- DefaultEndpointToHostAction endpoint 到 host 的动作,Accept,加到列表中
- IptablesFilterAllowAction Accept,filter 表允许数据包立刻通过
- IptablesMangleAllowAction:“ACCEPT”, mangle 表允许数据包立刻通过
- ServiceLoopPrevention:“Drop”,去未知 service 段 ip 的包 drop
根据环境算出 workload 使用的 mtu,保存到文件 /var/lib/calico/mtu
InternalDataplane
dp := &InternalDataplane{
toDataplane: make(chan interface{}, msgPeekLimit),
fromDataplane: make(chan interface{}, 100),
ruleRenderer: ruleRenderer,
ifaceMonitor: ifacemonitor.New(config.IfaceMonitorConfig, config.FatalErrorRestartCallback), // iface 监控,地址或状态等变化,回调
ifaceUpdates: make(chan *ifaceUpdate, 100),
ifaceAddrUpdates: make(chan *ifaceAddrsUpdate, 100),
config: config,
applyThrottle: throttle.New(10),
loopSummarizer: logutils.NewSummarizer("dataplane reconciliation loops"),
}
iptables 初始化
支持不同 backendMode,legacy 还是 nft
backendMode := environment.DetectBackend(config.LookPathOverride, cmdshim.NewRealCmd, config.IptablesBackend)
iptablesOptions := iptables.TableOptions{
HistoricChainPrefixes: rules.AllHistoricChainNamePrefixes,
InsertMode: config.IptablesInsertMode, // insert 模式
RefreshInterval: config.IptablesRefreshInterval, // 90s 刷新
PostWriteInterval: config.IptablesPostWriteCheckInterval,
LockTimeout: config.IptablesLockTimeout,
LockProbeInterval: config.IptablesLockProbeInterval,
BackendMode: backendMode, // 上面获取的 mode
LookPathOverride: config.LookPathOverride,
OnStillAlive: dp.reportHealth,
OpRecorder: dp.loopSummarizer,
}
如果开启 bpf,则清除 kube-proxy 的 rule
实例化 mangle nat raw filter 表,加到 table 中
mangleTableV4 := iptables.NewTable(
"mangle",
4,
rules.RuleHashPrefix,
iptablesLock,
featureDetector,
iptablesOptions)
natTableV4 := iptables.NewTable(
"nat",
4,
rules.RuleHashPrefix,
iptablesLock,
featureDetector,
iptablesNATOptions,
)
rawTableV4 := iptables.NewTable(
"raw",
4,
rules.RuleHashPrefix,
iptablesLock,
featureDetector,
iptablesOptions)
filterTableV4 := iptables.NewTable(
"filter",
4,
rules.RuleHashPrefix,
iptablesLock,
featureDetector,
iptablesOptions)
dp.iptablesNATTables = append(dp.iptablesNATTables, natTableV4)
dp.iptablesRawTables = append(dp.iptablesRawTables, rawTableV4)
dp.iptablesMangleTables = append(dp.iptablesMangleTables, mangleTableV4)
dp.iptablesFilterTables = append(dp.iptablesFilterTables, filterTableV4)
vxlan
如果配置 vxlan,则为他分配一个 routetable,初始化一个 vxlanManager,每 10s 检查 vtep 和 路由信息等
endpointStatus组合器
dp.endpointStatusCombiner = newEndpointStatusCombiner(dp.fromDataplane, config.IPv6Enabled)
从 Dataplane 上报 workloadendpoint 和 hostendpoint 状态,通常 remove 或 update
manager 初始化
初始化一系列的 manager
ipsetsManager,EndpointManager,FloatingIPManager,MasqManager,IPIPManager,WireguardManager,serviceLoopManager
非 bpf 的还有:
hostIPManager,PolicyManager(raw,mangle,filter,rule)
bpf 的有 RawEgressPolicyManager ,用来做 raw egress policy
bpf 的情况下,额外 为 ipsetsManager 加 BPFIPSets,加 BPFRouteManager,failsafeMgr,BPFEndpointManager,conntrackScanner,bpfproxy 等,在 calico ebpf dataplane 章节详细讲
intDP start
func (d *InternalDataplane) Start() {
// Do our start-of-day configuration.
d.doStaticDataplaneConfig() // 配置 kernel,如 ip_forward,设置 iptables,如果 ipip,循环保持 ipip 的 ip 和 mtu 配置,
// Then, start the worker threads.
go d.loopUpdatingDataplane()
go d.loopReportingStatus()
go d.ifaceMonitor.MonitorInterfaces()
go d.monitorHostMTU()
}
loopUpdatingDataplane
dataplane driver 的循环
processMsgFromCalcGraph
定义一个 calcGraph 更新通知所有注册 manager 的方法,每一个注册 manager 都有 OnUpdate 方法,当收到 calcGraph 的更新信息,通过 manager 的 OnUpdate 通知所有已注册 manager
processMsgFromCalcGraph := func(msg interface{}) {
log.WithField("msg", proto.MsgStringer{Msg: msg}).Infof(
"Received %T update from calculation graph", msg)
d.recordMsgStat(msg)
for _, mgr := range d.allManagers {
mgr.OnUpdate(msg)
}
switch msg.(type) {
case *proto.InSync:
log.WithField("timeSinceStart", time.Since(processStartTime)).Info(
"Datastore in sync, flushing the dataplane for the first time...")
datastoreInSync = true
}
}
processIfaceUpdate
和上面一样,定义一个 iface 更新通知所有注册 manager 和 routetableManager 的方法
processIfaceUpdate := func(ifaceUpdate *ifaceUpdate) {
log.WithField("msg", ifaceUpdate).Info("Received interface update")
if ifaceUpdate.Name == KubeIPVSInterface && !d.config.BPFEnabled {
d.checkIPVSConfigOnStateUpdate(ifaceUpdate.State)
return
}
for _, mgr := range d.allManagers {
mgr.OnUpdate(ifaceUpdate)
}
for _, mgr := range d.managersWithRouteTables {
for _, routeTable := range mgr.GetRouteTableSyncers() {
routeTable.OnIfaceStateChanged(ifaceUpdate.Name, ifaceUpdate.State)
}
}
}
processAddrsUpdate
和上面一样,定义一个 地址更新通知所有注册 manager 的方法
processAddrsUpdate := func(ifaceAddrsUpdate *ifaceAddrsUpdate) {
log.WithField("msg", ifaceAddrsUpdate).Info("Received interface addresses update")
for _, mgr := range d.allManagers {
mgr.OnUpdate(ifaceAddrsUpdate)
}
}
循环操作
定义上面几种方法后,开始循环从 toDataplane,ifaceUpdates,ifaceAddrUpdates channel 获取然后执行对应上面三种 manager 的 OnUpdate。
除以上,循环还会从 channel 收到定时更新的事件。
如果从 FromCalcGraph 收到 sync 且 dataplaneNeedsSync,则执行 apply 进行一次同步信息到 dataplane 的操作,所有的 manager 都会同步与刷新。
MonitorInterfaces
监控接口状态和路由的变化
go d.ifaceMonitor.MonitorInterfaces()
...
for {
log.WithFields(log.Fields{
"updates": filteredUpdates,
"routeUpdates": filteredRouteUpdates,
"resyncC": m.resyncC,
}).Debug("About to select on possible triggers")
select {
case update, ok := <-filteredUpdates:
log.WithField("update", update).Debug("Link update")
if !ok {
log.Warn("Failed to read a link update")
break readLoop
}
m.handleNetlinkUpdate(update)
case routeUpdate, ok := <-filteredRouteUpdates:
log.WithField("addrUpdate", routeUpdate).Debug("Address update")
if !ok {
log.Warn("Failed to read an address update")
break readLoop
}
m.handleNetlinkRouteUpdate(routeUpdate)
case <-m.resyncC:
log.Debug("Resync trigger")
err := m.resync()
if err != nil {
m.fatalErrCallback(fmt.Errorf("failed to read from netlink (resync): %w", err))
}
}
}
其他协程
go d.loopReportingStatus() //上报状态
go d.monitorHostMTU() // 监控主机 mtu,变化了则调配置变化的回调函数
Syncer
calcGraphClientChannels
先初始化一个 calcGraphClientChannels
// Now create the calculation graph, which receives updates from the
// datastore and outputs dataplane updates for the dataplane driver.
//
// The Syncer has its own thread and we use an extra thread for the
// Validator, just to pipeline that part of the calculation then the
// main calculation graph runs in a single thread for simplicity.
// The output of the calculation graph arrives at the dataplane
// connection via channel.
//
// Syncer -chan-> Validator -chan-> Calc graph -chan-> dataplane
// KVPair KVPair protobufs
创建一个 计算图,从数据库和 dpdriver dataplane 的 更新 接受更新
Syncer 将更新的内容通过 channel 发送给 Validator 协程。Validator 只需将计算的那部分进行 pipeline 处理,将其发送给 Calc graph 协程。Calc graph 完成计算后,发送给dataplane 协程。最后 dataplane 完成数据平面处理
计算图表 client channel 收 dpConnector 发到 Dataplane 的 msg;
calcGraphClientChannels := []chan<- interface{}{dpConnector.ToDataplane}
初始化后,如果有 PolicySyncPathPrefix,将 toPolicySync channel 加到 calcGraphClientChannels 里去,该 toPolicySync channel 通过 PolicySyncPathPrefix 下 socket 接收 grpc 请求来的 policySync 事件。
felixsyncer
从 database,remote sync deamon 或 Typha 初始化 syncer;
syncer = felixsyncer.New(backendClient, datastoreConfig.Spec, syncerToValidator, configParams.IsLeader())
syncer watch GlobalNetworkPolicy GlobalNetworkSet IPPool Node Profile WorkloadEndpoint NetworkPolicy NetworkSet HostEndpoint BGPConfiguration 和 k8s 的 KubernetesNetworkPolicy KubernetesEndpointSlice KubernetesService 资源
syncerToValidator
syncerToValidator 是回调函数,当 watch 的资源变化时调用;
func NewSyncerCallbacksDecoupler() *SyncerCallbacksDecoupler {
return &SyncerCallbacksDecoupler{
c: make(chan interface{}),
}
}
type SyncerCallbacksDecoupler struct {
c chan interface{}
}
func (a *SyncerCallbacksDecoupler) OnStatusUpdated(status api.SyncStatus) {
a.c <- status
}
func (a *SyncerCallbacksDecoupler) OnUpdates(updates []api.Update) {
a.c <- updates
}
func (a *SyncerCallbacksDecoupler) SendTo(sink api.SyncerCallbacks) {
for obj := range a.c {
switch obj := obj.(type) {
case api.SyncStatus:
sink.OnStatusUpdated(obj)
case []api.Update:
sink.OnUpdates(obj)
}
}
}
SyncerCallbacksDecoupler 维护了一个 channel, 有三种方法,分别是
- OnStatusUpdated,将 status 发到 channel
- OnUpdates 将 updates 放到 channel
- SendTo 将上面 channel 里的 msg 发到回调函数
AsyncCalcGrap
实例化 AsyncCalcGrap
// Create the ipsets/active policy calculation graph, which will
// do the dynamic calculation of ipset memberships and active policies
// etc.
asyncCalcGraph := calc.NewAsyncCalcGraph(
configParams.Copy(), // Copy to avoid concurrent access.
calcGraphClientChannels,
healthAggregator)
从 syncer 收到所有的更新事件,然后提供给其他 dispatcher 注册,当收到更新事件,分发给注册的接收方处理
allUpdDispatcher := dispatcher.NewDispatcher()
本节点的 endpoint 处理注册,主要有 WorkloadEndpoint 和 HostEndpoint。
localEndpointDispatcher := dispatcher.NewDispatcher()
(*localEndpointDispatcherReg)(localEndpointDispatcher).RegisterWith(allUpdDispatcher)
localEndpointFilter := &endpointHostnameFilter{hostname: hostname}
localEndpointFilter.RegisterWith(localEndpointDispatcher)
// 接收 allUpdDispatcher workloadEnpoint 和 HostEndpoint 的 OnUpdate
func (l *localEndpointDispatcherReg) RegisterWith(disp *dispatcher.Dispatcher) {
led := (*dispatcher.Dispatcher)(l)
disp.Register(model.WorkloadEndpointKey{}, led.OnUpdate)
disp.Register(model.HostEndpointKey{}, led.OnUpdate)
disp.RegisterStatusHandler(led.OnDatamodelStatus)
}
注册 activeRule
根据本节点生效的 policies,profiles 和 本节点的 endpoint 计算在 host 上生效的 rule
activeRulesCalc := NewActiveRulesCalculator()
activeRulesCalc.RegisterWith(localEndpointDispatcher, allUpdDispatcher)
注册 ruleScanner
根据 tags/selectors/named ports 过滤 rule
ruleScanner := NewRuleScanner()
// Wire up the rule scanner's inputs.
activeRulesCalc.RuleScanner = ruleScanner
// Send IP set added/removed events to the dataplane. We'll hook up the other outputs
// below.
ruleScanner.RulesUpdateCallbacks = callbacks
注册 ServiceIndex
匹配 Service,对应 member 变化
serviceIndex := serviceindex.NewServiceIndex()
serviceIndex.RegisterWith(allUpdDispatcher)
// Send the Service IP set member index's outputs to the dataplane.
serviceIndex.OnMemberAdded = func(ipSetID string, member labelindex.IPSetMember) {
if log.GetLevel() >= log.DebugLevel {
log.WithFields(log.Fields{
"ipSetID": ipSetID,
"member": member,
}).Debug("Member added to service IP set.")
}
callbacks.OnIPSetMemberAdded(ipSetID, member)
}
serviceIndex.OnMemberRemoved = func(ipSetID string, member labelindex.IPSetMember) {
if log.GetLevel() >= log.DebugLevel {
log.WithFields(log.Fields{
"ipSetID": ipSetID,
"member": member,
}).Debug("Member removed from service IP set.")
}
callbacks.OnIPSetMemberRemoved(ipSetID, member)
}
serviceIndex.OnAlive = liveCallback
注册 ipsetMemberIndex
计算在每一个 IP set 中的 IP 集合和 named ports,主要是通过 active selectors/tags/named ports 进行匹配对,比如之前的 service 下加减 endpoint
Ipset 变化发送到 dataplane
ipsetMemberIndex := labelindex.NewSelectorAndNamedPortIndex()
ipsetMemberIndex.OnAlive = liveCallback
// Wire up the inputs to the IP set member index.
ipsetMemberIndex.RegisterWith(allUpdDispatcher)
ruleScanner.OnIPSetActive = func(ipSet *IPSetData) {
log.WithField("ipSet", ipSet).Info("IPSet now active")
callbacks.OnIPSetAdded(ipSet.UniqueID(), ipSet.DataplaneProtocolType())
if ipSet.Service != "" {
serviceIndex.UpdateIPSet(ipSet.UniqueID(), ipSet.Service)
} else {
ipsetMemberIndex.UpdateIPSet(ipSet.UniqueID(), ipSet.Selector, ipSet.NamedPortProtocol, ipSet.NamedPort)
}
gaugeNumActiveSelectors.Inc()
}
...
// Send the IP set member index's outputs to the dataplane.
ipsetMemberIndex.OnMemberAdded = func(ipSetID string, member labelindex.IPSetMember) {
if log.GetLevel() >= log.DebugLevel {
log.WithFields(log.Fields{
"ipSetID": ipSetID,
"member": member,
}).Debug("Member added to IP set.")
}
callbacks.OnIPSetMemberAdded(ipSetID, member)
}
ipsetMemberIndex.OnMemberRemoved = func(ipSetID string, member labelindex.IPSetMember) {
if log.GetLevel() >= log.DebugLevel {
log.WithFields(log.Fields{
"ipSetID": ipSetID,
"member": member,
}).Debug("Member removed from IP set.")
}
callbacks.OnIPSetMemberRemoved(ipSetID, member)
}
**polResolver **
汇总本地 endpoints 的 active 策略,计算完整、有序的策略集,然后应用于每个端点;为 host IP 更新进行注册
polResolver := NewPolicyResolver()
// Hook up the inputs to the policy resolver.
activeRulesCalc.PolicyMatchListener = polResolver
polResolver.RegisterWith(allUpdDispatcher, localEndpointDispatcher)
// And hook its output to the callbacks.
polResolver.Callbacks = callbacks
hostIPPassthru := NewDataplanePassthru(callbacks)
hostIPPassthru.RegisterWith(allUpdDispatcher)
其他
如果 开启 vxlan,计算 vxlan 路由
if conf.Encapsulation.VXLANEnabled || conf.Encapsulation.VXLANEnabledV6 {
vxlanResolver := NewVXLANResolver(hostname, callbacks, conf.UseNodeResourceUpdates())
vxlanResolver.RegisterWith(allUpdDispatcher)
}
包括配置检查,profile 检查,encap 配置检查
asyncCalcGraph start
func (acg *AsyncCalcGraph) Start() {
log.Info("Starting AsyncCalcGraph")
acg.flushTicks = time.NewTicker(tickInterval).C
acg.healthTicks = time.NewTicker(healthInterval).C
go acg.loop() // 循环从 channel 里拿同步事件
}
ValidationFilte
初始化 Validator,将 syncer msg 发到 asyncCalcGraph
validator := calc.NewValidationFilter(asyncCalcGraph, configParams)
go syncerToValidator.SendTo(validator)
实例化 DpConnector
dpConnector 主要在 felix 和 dataplane 传递消息,传递是双向的;
dpConnector := newConnector(
configParams.Copy(), // Copy to avoid concurrent access.
connToUsageRepUpdChan,
backendClient,
v3Client,
dpDriver,
failureReportChan)
传递消息
func (fc *DataplaneConnector) Start() {
// Start a background thread to write to the dataplane driver.
go fc.sendMessagesToDataplaneDriver() // 发送到 dataplane
// Start background thread to read messages from dataplane driver.
go fc.readMessagesFromDataplane() // 从 dataplane 获取消息
// Start a background thread to handle Wireguard update to Node.
go fc.handleWireguardStatUpdateFromDataplane()
}