Work是在work-queue entry中描述的，而它不是由hardware unit产生就是有core software产生。CN58XX集成了数据包输入硬件创建a work-queue entry并且为每个到达的数据包submints work..coresoftware 能够船舰work-queue entries并且subnit work sa desired.CN58的PKO,PCI,TIM,DFA硬件单元也能够在完成操作和指令后submit core software创建的work-queue entries。
POW implements8个input work-queue.不同的work-queues能够应用到不同级别的服务。The POW hardware implements static and weighte-dround-robin priorities programmably on a core-by-core basis.
Work Scheduling / Descheduling
Core software从POW请求work。POW为core选择work并且返回一个只想work-queue entries的指针来像core描述这个work。这为cores和coherent memory bus减轻了不少的压力。
在POW硬件支持group时所有work并不是equal的了。每个work都有一个相关的group identifier。A configuration variable for each core specifies the groups that the associated core will accept when it requests work.这个configuration variable 是一个16比特的bitmask，每个group一个bit，这样就能组合所有的groups。如果core不accept与该work相关的group，POW不会schedule a piece of this work 。
即使所有的core都用同样的POW，但是group能够提供一种方法在不同的core上实现不同的功能。例如，包处理可以pipelined from one group of cores to another group of cores，让first group完成第一阶段的工作，下一个group完成下阶段的工作。
A core能够deschedule一个work。当core deschedule当前运行的程序时，POW会稍后reschedule 它。The POW hardware reschedules previously descheduled work at higher priority than it schedules new work from an input queue. Deschedule can be useful in a number of circumstances:
● It can transfer work from one core group to another. This is one mechanism to implement “work pipelining”.
● It can avoid consuming a core for work that requires a large synchronization delay.
● It can make work interruptible.
Ordering and Synchronization of Work
The POW hardware associates a 32-bit tag value and a tag type with each piece of work. The work-queue entry and the request to add work to an input work queue contain the initial tag value. (This initial tag value may be created by either the centralized input packet processing hardware or by core software.) Core software can also later switch the tag/type as the work progresses through different phases of the application. If the same tag value sequences are used by two packets, the packets are ordered.
There are three different tag types:
● ORDERED - Ordering is guaranteed with this tag type. (Atomicity is not.)
● ATOMIC - Ordering and atomicity are guaranteed with this tag type. Two pieces of work holding the same ATOMIC tag cannot be scheduled simultaneously.
● NULL - No ordering is guaranteed with this tag type, and work cannot be inflight (with respect to the POW hardware) with this tag type.
The POW hardware, in combination with core software, uses these tag/type values to order and synchronize related work, and allow unrelated work to be unordered and unsynchronized. This is essential for efficient multi-core execution. Two pieces of work may be / are related and will be ordered and synchronized when they share the same tag value and tag type. Two pieces of work may be unrelated and will execute entirely in parallel when they have different tag or tag type values.
For example, the tag value may be a hash of the standard TCP five-tuple (IP source address, IP destination address, IP protocol, TCP source port, TCP destination port) defining a “flow”. The same flow will have the same tag value, so it may be ordered and synchronized. Different flows will likely have different tag values, so will likely not be ordered and synchronized, and can be executed completely in parallel on different cores.
At different code phases, core software can change the tag value via a tag switch transaction with separated switch request and switch completion wait operations. The POW hardware completes a requested switch when the required ordering and atomicity constraints for the work are met. This separated switch transaction allows software to overlap the switch request latency with other profitable work and also allows software the option to deschedule the work while a tag switch is pending, thus avoiding long synchronization delays.
1.1 POW Work Flow, Operations, and Ordering
Figure 5–1 shows an abstracted view of the POW unit, focusing on the states of work as it flows through the POW Unit, and touching on the most important operations that cores can execute to transform work. Often, work flows through the states in the figure from top to bottom—work is first in the input queues, then in-flight, and finally descheduled or completed.
At any given time, only one piece of work can be scheduled to a particular core. This is shown in the center of Figure 5–1 (see “in unit, sched”). Clearly, the number of scheduled items is limited to the number of cores (up to 16). A core is not scheduled if it is executing unscheduled work or if it completes scheduled work without requesting new work. Scheduled work is a subset of the POW in-flight work. Any core may deschedule a scheduled item at any point (see “in unit, desched” in Figure 5–1). Descheduled work remains in-flight, and will be rescheduled later, but is not currently executing on a core. Work that is scheduled remains scheduled after the completion of a tag switch transaction (SWTAG), unless the switch has a next tag state of NULL. A tag switch with a next tag switch to NULL causes work to immediately become unscheduled on the core. Figure 5–1 abstracts SWTAG transactions by showing them as a single arrow; in reality, a deschedule operation can occur after SWTAG transaction starts, but before it completes.
Work typically enters the POW unit through one of the input queues. The top of Figure 5–1 shows the eight POW input work queues. The POW unit internal entries are shared by in-flight work and work in input queues. Both software and hardware can add input queue entries (ADDWQ). Though the POW unit size is limited, the POW hardware maintains the illusion of an infinite input work queue. When space is not available in the POW unit, the POW hardware adds the input queue entries to an L2/DRAM list maintained by hardware. If space is available in the POW unit when work is added, the POW hardware buffers the work internally immediately and avoids the overhead of the memory list. If the POW hardware puts work in a memory list, it later automatically (and in the background) moves the work from L2/DRAM into the unit as soon as space becomes available in the unit, in the order that the work was originally added.
Work is typically scheduled to a core when core software executes a GET_WORK transaction to request new work. The POW hardware can schedule in-unit input queue entries to cores in response to these requests. The POW hardware can also schedule descheduled work to cores in response to GET_WORKs. The POW hardware prioritizes descheduled work above input queue work. The POW scheduler never schedules a descheduled item that has a pending tag switch, and never schedules an input queue entry with an ATOMIC tag unless it can immediately have the tag. In other words, POW only schedules work when it can make forward progress. Input work-queue entries with the NULL tag type are a special case. The POW hardware immediately unschedules NULL type input queue work returned for a GET_WORK.（就是说core通过GET_WORK请求work，但是POW来决定分配具体work，原则：
In-unit input queue：正常分配）
Work also enters the POW unit when an unscheduled (from the perspective of the POW hardware) core executes a SWTAG transaction. This is shown by the upward arrow in the bottom of Figure 5–1. Work that enters the POW hardware unit this way is immediately scheduled, and is then not distinguishable from other scheduled work.
Figure 5–1 also shows another interesting aspect of the POW hardware on the right side. The ordering guarantees for work as it flows through the POW unit.
First, if work is in an input queue in memory (at the top in Figure 5–1), POW keeps it strictly in-order on a per queue basis.
Second, when work is in-flight (either scheduled or descheduled), ordering is strictly based on tag and tag type values. POW does not force any ordering nor synchronize in-flight work that uses different tag values or different tag type values. This inflight work freely executes in parallel.
NOTE: The group identifier of work does not affect the ordering of in-flight work, it only affects the cores to which a descheduled item can be rescheduled.
Third, when work is both in an input queue and in the POW unit (i.e. between the memory input queues and the in-flight work), the work ordering guarantees are mixed. The POW hardware work scheduler skips past in-unit input queue entries that cannot be immediately scheduled when it searches for chedulable work. The POW scheduler never skips past ORDERED and NULL input queue work, so the POW scheduler schedules work with these types (and the same group) strictly in per input queue order. The POW scheduler skips input queue work with the ATOMIC tag type and a tag that cannot immediately be scheduled, and so only guarantees tag order for ATOMIC input queue work (that has the same group). The POW work scheduler skips over input queue work that is not in the desired group, so no ordering is implied between two input queue entries in different groups. Finally, at the bottom of Figure 5–1, unscheduled work is not synchronized by POW hardware and so is completely unordered.
The POW hardware maintains order across tag switches. Any in-flight work that executes the identical series of tag switches (each with the same tag/type values) while in-flight will be ordered identically through each switch. With proper configuration and software support, the POW hardware can totally order the processing of all packets in a flow. CN58XX provides total per-flow work ordering support for input packets (perhaps all the way to output) as long as the following conditions are true: (1) All packets from the same flow enter POW via the same input queue with the same initial tag value and group, and (2) The software processes packets from the same flow with the same sequence of non-NULL tag switches.
Figure 5–2 depicts an abstracted view of the POW core state (i.e. the POW states visible to a core) and the operations that affect it, focusing on the legal major operations in each state. A state has an arc tagged with a particular operation when it is legal to issue the operation in the state.
NOTE: The POW ADDWQ and NOP commands do not affect POW core state and are legal at any time. The POW CLR_NSCHED command does not affect POW core state and has its own issue rules.
The abstracted states in Figure 5–2 closely mirror the tag types available in the tag switch operation. A new work request that receives work with an ORDERED, ATOMIC, or NULL tag puts the core into the ORDERED, ATOMIC, or NULL POW core state, respectively. A tag switch to an ORDERED, ATOMIC, or NULL tag type puts the core in the ORDERED, ATOMIC or NULL POW core state, respectively. These operations are depicted by arcs entering these states.
Figure 5–2 introduces the NULL_NULL state and the NULL_RD transaction. NULL_NULL is a special state entered only after a deschedule or reset.
NULL_NULL and NULL_RD are required because a deschedule operation detaches an internal POW entry from a core and there may not be another entry available (see “POW Internal Architecture” on page 231 for more description of the POW hardware internals). NULL_NULL is similar to NULL, with the clear difference that it is illegal to SWTAG when in the NULL_NULL state. NULL_RD causes the POW hardware to attempt to convert the state to NULL when it is in the NULL_NULL state. (NULL_RD will fail when there are no more internal POW entries – see “Forward Progress Constraints” on page 244 for forward-progress implications.)
The GET_WORK arcs exiting the ORDERED and ATOMIC states are special and marked with “#”. These transactions implement multiple functions to release the prior work and schedule new work for this core. The POW hardware actually executes an implicit switch to NULL before executing the GET_WORK in these two cases. This implicit switch to NULL releases the prior work, so that the hardware always starts a GET_WORK from the NULL or NULL_NULL states. Note the implication that a GET_WORK from ORDERED or ATOMIC that does not successfully return work will change to the NULL state. A GET_WORK transaction from NULL_NULL that does not successfully return work will stay in the NULL_NULL state.
As in Figure 5–1, Figure 5–2 abstracts SWTAG, GET_WORK, and NULL_RD transactions as a single arc, though all these operations can have separate request and completion times. This is because the initial request solely determines the legal operations that can follow. The only question is whether the next legal transaction can start before the POW hardware completes the previous transaction. A tag switch transaction has explicitly-separated request and completion operations, but the get work and Null Rd transactions are separated only with core IOBDMA operations. IOBDMA operations are described in Section 4.7. Here are the rules regarding transaction start time for the POW transactions that affect POW core state:
● SWTAG_DESCHED, DESCHED, UPD_WQP_GRP, and NOP transactions do not have separated start and completion times, so can be followed immediately by any legal command.
● The transactions marked “*” can start before the prior SWTAG is complete.
● In all other cases in Figure 5–2, A following transaction must not start before the prior transaction is complete.
The hardware behavior is unpredictable when the rules evident in Figure 5–2 are
violated by core software. Note some specific restrictions:
● It IS NOT LEGAL to initiate a deschedule from the NULL or NULL_NULL POW core state.
● It IS NOT LEGAL to initiate any tag switch from the NULL_NULL state.
● It IS NOT LEGAL to initiate a tag switch with tag type of NULL from the NULL POW core state.
● It IS NOT LEGAL to issue any tag switch or get work operation while there is a pending switch with an ORDERED or ATOMIC tag type.
● It IS NOT LEGAL to initiate any transaction while a get work transaction is pending.
● It IS NOT LEGAL to initiate any transaction while a Null Rd operation is pending.
● It IS NOT LEGAL to initiate a SWTAG_FULL or SWTAG_DESCHED transaction with tag type of NULL.
ADDWQ(tag_type, tag, wqp, grp, qos)
This adds work to the input queue selected by the QOS. Tag_type can legally be ATOMIC, ORDERED, or NULL. QOS is a 3-bit value, grp is a 4-bit value, and tag is a 32-bit value.
The work-queue pointer (wqp) must be a 64-bit aligned pointer into L2/DRAM and must point to a legal work-queue entry. See “Work-Queue Entry Format” on page 234. Furthermore, the work-queue entry group, tag type, and tag fields in the work-queue entry in L2/DRAM must exactly match the corresponding values supplied with the ADDWQ or the POW hardware may produce unpredictable results.
The input work-queues are infinite, so this transaction never fails.
This transaction attempts to get work for the requesting core. The value of the POW Core Group Mask Registers[Core][GRP_MSK] CSR for the core at the time of the GET_WORK specifies the groups that are acceptable. The wait option causes the POW hardware to delay responding to the request until either work becomes
available or the request times out. In any case, the POW hardware returns a failure response if it was unable to find work for the core, or a pointer to the workqueue entry if it successfully found work for the core.
NOTE: It is possible, though unlikely, for a time-out to occur when the wait bit is clear, as well as when the wait bit is set, if the work search takes too long.
The POW_PP_GRP_MSK that specifies the acceptable groups for a core must not be written between the start and completion of the GET_WORK, or unpredictable results may occur. Otherwise, the CSR can be written at any time.
The POW_NW_TIM[NW_TIM] CSR specifies the configurable time-out counter interval that controls a single counter used for all cores. The POW hardware times out a GET_WORK request after two interval timer expirations, so the effective time-out interval varies between one and two times the configured interval.
This transaction attempts to change to the NULL state when in the NULL_NULL state. It is a NOP from all other states. The POW hardware will return NULL_NULL when it could not successfully allocate an internal POW entry.
Successful NULL_RDs always leave the core in the NULL state. Unsuccessful NULL_RDs or ones converted to NOPs do not change the core state.
This starts a tag switch transaction. The POW hardware completes the tag switch transaction later when it clears the pending switch bit for the core (refer to Section 5.5 for more information onPOW tag switch-pending indications). An exception is a SWTAG to NULL, whose completion POW hardware never transmits. A SWTAG must not be used when switching from the NULL state.
● A SWTAG from an ATOMIC tag releases the ATOMIC tag immediately once the tag switch transaction starts, perhaps long before the SWTAG transaction completes.
● A SWTAG to an ATOMIC tag completes when the work acquires the new ATOMIC tag. At most one piece of work holds an ATOMIC tag at any time. The FIFO order is the acquisition order for the tag.
● A SWTAG from an ORDERED tag cannot complete until all work ordered earlier in the old tag's FIFO start a SWTAG transaction.
● A SWTAG to an ORDERED tag occurs immediately when switching from an ATOMIC or NULL tag, and occurs once the ordering constraints of the old tag are met when switching from an ORDERED tag.
SWTAG_FULL(new_tag_type,new_tag, new_wqp, new_grp)
This is identical to SWTAG, except that the transaction additionally updates the work-queue pointer (new_wqp) and 4-bit group identifier (new_grp) for the work that is held in the POW. SWTAG_FULL must be used for all switches from the NULL state.
The POW hardware never interprets or uses the work-queue pointer supplied in this transaction, but it may deliver it to software later to complete a GET_WORK.
The POW hardware stores <35:3> of the work-queue pointer. SWTAG_FULL must not be used for switches to NULL.
This executes a deschedule transaction. When the no_sched bit is set on DESCHED (or SWTAG_DESCHED) operations, the POW hardware does not schedule the packet to a core until a subsequent CLEAR_NSCHED operation clears the no_sched bit for the POW entry. The POW entry can be determined with a POW status load with get_cur=1 prior to the DESCHED (refer to Section 5.11.1). The index field in <50:40> identifies the POW ID.
Note that it is recommended that the core be in ATOMIC state rather than ORDERED state at the time of the DESCHED. (See the “POW Performance Considerations” on page 243, below.)
SWTAG_DESCHED(new_tag_type, new_tag, new_grp, no_sched)
This is identical to a SWTAG followed by a DESCHED, except that it also updates the group identifier. It must follow the same start rules as does SWTAG (shown in Figure 5–2 on page 224). SWTAG_DESCHED is well-suited for transferring work from one group to another - work pipelining.
UPD_WQP_GRP (new_wqp, new_grp)
Update the work-queue pointer (new_wqp) and group identifier (new_grp) for the work that is held in the POW.
CLR_NSCHED (wqp, index)
Clears the nosched bit for the POW entry selected by index. CLR_NSCHED is a NOP under any the following conditions:
● the POW entry is not on a deschedule list, or
● the wqp in the POW entry does not match the supplied wqp Before initiating a CLR_NSCHED operation, software must guarantee that all *DESCHEDs and CLR_NSCHEDs are complete. software can read the pend_desched and pend_nosched_clr bits via POW status loads to determine when these conditions are true. (Refer to Sections 5.8 and 5.11.1 for more details on POW status loads.)
After a CLR_NSCHED operation, software must guarantee that the CLR_NSCHED is complete before issuing any subsequent POW operations. It can do this by checking the pend_nosched_clr via POW status reads.
Note also that index will typically be determined by POW status loads prior to the *DESCHED that set the no_sched bit. A POW status load with get_cur=1 returns the index field in <50:40>.