legend

思考助我前行

硬件手册POW-1

1.    POW

 

POW unit是提供一下重要功能的处理器。

Work Queueing

Work是在work-queue entry中描述的,而它不是由hardware unit产生就是有core software产生。CN58XX集成了数据包输入硬件创建a work-queue entry并且为每个到达的数据包submints work..coresoftware 能够船舰work-queue entries并且subnit work sa desired.CN58PKO,PCI,TIM,DFA硬件单元也能够在完成操作和指令后submit core software创建的work-queue entries

       POW implements8input work-queue.不同的work-queues能够应用到不同级别的服务。The POW hardware implements static and weighte-dround-robin priorities programmably on a core-by-core basis.

       Input work-queue在必要时可以大于DRAM甚至无穷大。

Work Scheduling / Descheduling

       Core softwarePOW请求workPOWcore选择work并且返回一个只想work-queue entries的指针来像core描述这个work。这为corescoherent memory bus减轻了不少的压力。

       POW硬件支持group时所有work并不是equal的了。每个work都有一个相关的group identifierA configuration variable for each core specifies the groups that the associated core will accept when it requests work.这个configuration variable 是一个16比特的bitmask,每个group一个bit,这样就能组合所有的groups。如果coreaccept与该work相关的groupPOW不会schedule a piece of this work 

       即使所有的core都用同样的POW,但是group能够提供一种方法在不同的core上实现不同的功能。例如,包处理可以pipelined from one group of cores to another group of cores,让first group完成第一阶段的工作,下一个group完成下阶段的工作。

       A core能够deschedule一个work。当core deschedule当前运行的程序时,POW会稍后reschedule 它。The POW hardware reschedules previously descheduled work at higher priority than it schedules new work from an input queue. Deschedule can be useful in a number of circumstances:

 It can transfer work from one core group to another. This is one mechanism to implement work pipelining.

 It can avoid consuming a core for work that requires a large synchronization delay.

 It can make work interruptible.

Ordering and Synchronization of Work

The POW hardware associates a 32-bit tag value and a tag type with each piece of work. The work-queue entry and the request to add work to an input work queue contain the initial tag value. (This initial tag value may be created by either the centralized input packet processing hardware or by core software.) Core software can also later switch the tag/type as the work progresses through different phases of the application. If the same tag value sequences are used by two packets, the packets are ordered.

There are three different tag types:

 ORDERED - Ordering is guaranteed with this tag type. (Atomicity is not.)

 ATOMIC - Ordering and atomicity are guaranteed with this tag type. Two pieces of work holding the same ATOMIC tag cannot be scheduled simultaneously.

 NULL - No ordering is guaranteed with this tag type, and work cannot be inflight (with respect to the POW hardware) with this tag type.

The POW hardware, in combination with core software, uses these tag/type values to order and synchronize related work, and allow unrelated work to be unordered and unsynchronized. This is essential for efficient multi-core execution. Two pieces of work may be / are related and will be ordered and synchronized when they share the same tag value and tag type. Two pieces of work may be unrelated and will execute entirely in parallel when they have different tag or tag type values.

For example, the tag value may be a hash of the standard TCP five-tuple (IP source address, IP destination address, IP protocol, TCP source port, TCP destination port) defining a “flow”. The same flow will have the same tag value, so it may be ordered and synchronized. Different flows will likely have different tag values, so will likely not be ordered and synchronized, and can be executed completely in parallel on different cores.

At different code phases, core software can change the tag value via a tag switch transaction with separated switch request and switch completion wait operations. The POW hardware completes a requested switch when the required ordering and atomicity constraints for the work are met. This separated switch transaction allows software to overlap the switch request latency with other profitable work and also allows software the option to deschedule the work while a tag switch is pending, thus avoiding long synchronization delays.

1.1      POW Work Flow, Operations, and Ordering

Figure 5shows an abstracted view of the POW unit, focusing on the states of work as it flows through the POW Unit, and touching on the most important operations that cores can execute to transform work. Often, work flows through the states in the figure from top to bottomwork is first in the input queues, then in-flight, and finally descheduled or completed.

At any given time, only one piece of work can be scheduled to a particular core. This is shown in the center of Figure 5(see in unit, sched). Clearly, the number of scheduled items is limited to the number of cores (up to 16). A core is not scheduled if it is executing unscheduled work or if it completes scheduled work without requesting new work. Scheduled work is a subset of the POW in-flight work. Any core may deschedule a scheduled item at any point (see in unit, desched in Figure 51). Descheduled work remains in-flight, and will be rescheduled later, but is not currently executing on a core. Work that is scheduled remains scheduled after the completion of a tag switch transaction (SWTAG), unless the switch has a next tag state of NULL. A tag switch with a next tag switch to NULL causes work to immediately become unscheduled on the core. Figure 5abstracts SWTAG transactions by showing them as a single arrow; in reality, a deschedule operation can occur after SWTAG transaction starts, but before it completes.

Work typically enters the POW unit through one of the input queues. The top of Figure 5shows the eight POW input work queues. The POW unit internal entries are shared by in-flight work and work in input queues. Both software and hardware can add input queue entries (ADDWQ). Though the POW unit size is limited, the POW hardware maintains the illusion of an infinite input work queue. When space is not available in the POW unit, the POW hardware adds the input queue entries to an L2/DRAM list maintained by hardware. If space is available in the POW unit when work is added, the POW hardware buffers the work internally immediately and avoids the overhead of the memory list. If the POW hardware puts work in a memory list, it later automatically (and in the background) moves the work from L2/DRAM into the unit as soon as space becomes available in the unit, in the order that the work was originally added.

Work is typically scheduled to a core when core software executes a GET_WORK transaction to request new work. The POW hardware can schedule in-unit input queue entries to cores in response to these requests. The POW hardware can also schedule descheduled work to cores in response to GET_WORKs. The POW hardware prioritizes descheduled work above input queue work. The POW scheduler never schedules a descheduled item that has a pending tag switch, and never schedules an input queue entry with an ATOMIC tag unless it can immediately have the tag. In other words, POW only schedules work when it can make forward progress. Input work-queue entries with the NULL tag type are a special case. The POW hardware immediately unschedules NULL type input queue work returned for a GET_WORK.(就是说core通过GET_WORK请求work,但是POW来决定分配具体work,原则:

ATOMIC的:能够立即获取tag的可以分配。

不能的就不分配

NULL的:直接unschedules

Deschedule里的:按优先级分配,如果有未定的tag,就不分配

In-unit input queue:正常分配)

Work also enters the POW unit when an unscheduled (from the perspective of the POW hardware) core executes a SWTAG transaction. This is shown by the upward arrow in the bottom of Figure 51. Work that enters the POW hardware unit this way is immediately scheduled, and is then not distinguishable from other scheduled work.

Figure 5also shows another interesting aspect of the POW hardware on the right side. The ordering guarantees for work as it flows through the POW unit.

First, if work is in an input queue in memory (at the top in Figure 51), POW keeps it strictly in-order on a per queue basis.

Second, when work is in-flight (either scheduled or descheduled), ordering is strictly based on tag and tag type values. POW does not force any ordering nor synchronize in-flight work that uses different tag values or different tag type values. This inflight work freely executes in parallel.

NOTE: The group identifier of work does not affect the ordering of in-flight work, it only affects the cores to which a descheduled item can be rescheduled.

Third, when work is both in an input queue and in the POW unit (i.e. between the memory input queues and the in-flight work), the work ordering guarantees are mixed. The POW hardware work scheduler skips past in-unit input queue entries that cannot be immediately scheduled when it searches for chedulable work. The POW scheduler never skips past ORDERED and NULL input queue work, so the POW scheduler schedules work with these types (and the same group) strictly in per input queue order. The POW scheduler skips input queue work with the ATOMIC tag type and a tag that cannot immediately be scheduled, and so only guarantees tag order for ATOMIC input queue work (that has the same group). The POW work scheduler skips over input queue work that is not in the desired group, so no ordering is implied between two input queue entries in different groups. Finally, at the bottom of Figure 51, unscheduled work is not synchronized by POW hardware and so is completely unordered.

The POW hardware maintains order across tag switches. Any in-flight work that executes the identical series of tag switches (each with the same tag/type values) while in-flight will be ordered identically through each switch. With proper configuration and software support, the POW hardware can totally order the processing of all packets in a flow. CN58XX provides total per-flow work ordering support for input packets (perhaps all the way to output) as long as the following conditions are true: (1) All packets from the same flow enter POW via the same input queue with the same initial tag value and group, and (2) The software processes packets from the same flow with the same sequence of non-NULL tag switches.

Figure 5depicts an abstracted view of the POW core state (i.e. the POW states visible to a core) and the operations that affect it, focusing on the legal major operations in each state. A state has an arc tagged with a particular operation when it is legal to issue the operation in the state.

NOTE: The POW ADDWQ and NOP commands do not affect POW core state and are legal at any time. The POW CLR_NSCHED command does not affect POW core state and has its own issue rules.

The abstracted states in Figure 5closely mirror the tag types available in the tag switch operation. A new work request that receives work with an ORDERED, ATOMIC, or NULL tag puts the core into the ORDERED, ATOMIC, or NULL POW core state, respectively. A tag switch to an ORDERED, ATOMIC, or NULL tag type puts the core in the ORDERED, ATOMIC or NULL POW core state, respectively. These operations are depicted by arcs entering these states.

Figure 5introduces the NULL_NULL state and the NULL_RD transaction. NULL_NULL is a special state entered only after a deschedule or reset.

NULL_NULL and NULL_RD are required because a deschedule operation detaches an internal POW entry from a core and there may not be another entry available (see POW Internal Architecture on page 231 for more description of the POW hardware internals). NULL_NULL is similar to NULL, with the clear difference that it is illegal to SWTAG when in the NULL_NULL state. NULL_RD causes the POW hardware to attempt to convert the state to NULL when it is in the NULL_NULL state. (NULL_RD will fail when there are no more internal POW entries – see Forward Progress Constraints on page 244 for forward-progress implications.)

The GET_WORK arcs exiting the ORDERED and ATOMIC states are special and marked with #. These transactions implement multiple functions to release the prior work and schedule new work for this core. The POW hardware actually executes an implicit switch to NULL before executing the GET_WORK in these two cases. This implicit switch to NULL releases the prior work, so that the hardware always starts a GET_WORK from the NULL or NULL_NULL states. Note the implication that a GET_WORK from ORDERED or ATOMIC that does not successfully return work will change to the NULL state. A GET_WORK transaction from NULL_NULL that does not successfully return work will stay in the NULL_NULL state.

As in Figure 5–1, Figure 5–2 abstracts SWTAG, GET_WORK, and NULL_RD transactions as a single arc, though all these operations can have separate request and completion times. This is because the initial request solely determines the legal operations that can follow. The only question is whether the next legal transaction can start before the POW hardware completes the previous transaction. A tag switch transaction has explicitly-separated request and completion operations, but the get work and Null Rd transactions are separated only with core IOBDMA operations. IOBDMA operations are described in Section 4.7. Here are the rules regarding transaction start time for the POW transactions that affect POW core state:

● SWTAG_DESCHED, DESCHED, UPD_WQP_GRP, and NOP transactions do not have separated start and completion times, so can be followed immediately by any legal command.

● The transactions marked * can start before the prior SWTAG is complete.

● In all other cases in Figure 52, A following transaction must not start before the prior transaction is complete.

The hardware behavior is unpredictable when the rules evident in Figure 5–2 are

violated by core software. Note some specific restrictions:

● It IS NOT LEGAL to initiate a deschedule from the NULL or NULL_NULL POW core state.

● It IS NOT LEGAL to initiate any tag switch from the NULL_NULL state.

● It IS NOT LEGAL to initiate a tag switch with tag type of NULL from the NULL POW core state.

● It IS NOT LEGAL to issue any tag switch or get work operation while there is a pending switch with an ORDERED or ATOMIC tag type.

● It IS NOT LEGAL to initiate any transaction while a get work transaction is pending.

● It IS NOT LEGAL to initiate any transaction while a Null Rd operation is pending.

● It IS NOT LEGAL to initiate a SWTAG_FULL or SWTAG_DESCHED transaction with tag type of NULL.

POW Operations

ADDWQ(tag_type, tag, wqp, grp, qos)

This adds work to the input queue selected by the QOS. Tag_type can legally be ATOMIC, ORDERED, or NULL. QOS is a 3-bit value, grp is a 4-bit value, and tag is a 32-bit value.

The work-queue pointer (wqp) must be a 64-bit aligned pointer into L2/DRAM and must point to a legal work-queue entry. See Work-Queue Entry Format on page 234. Furthermore, the work-queue entry group, tag type, and tag fields in the work-queue entry in L2/DRAM must exactly match the corresponding values supplied with the ADDWQ or the POW hardware may produce unpredictable results.

The input work-queues are infinite, so this transaction never fails.

GET_WORK(wait)

 This transaction attempts to get work for the requesting core. The value of the POW Core Group Mask Registers[Core][GRP_MSK] CSR for the core at the time of the GET_WORK specifies the groups that are acceptable. The wait option causes the POW hardware to delay responding to the request until either work becomes

available or the request times out. In any case, the POW hardware returns a failure response if it was unable to find work for the core, or a pointer to the workqueue entry if it successfully found work for the core.

NOTE: It is possible, though unlikely, for a time-out to occur when the wait bit is clear, as well as when the wait bit is set, if the work search takes too long.

The POW_PP_GRP_MSK that specifies the acceptable groups for a core must not be written between the start and completion of the GET_WORK, or unpredictable results may occur. Otherwise, the CSR can be written at any time.

The POW_NW_TIM[NW_TIM] CSR specifies the configurable time-out counter interval that controls a single counter used for all cores. The POW hardware times out a GET_WORK request after two interval timer expirations, so the effective time-out interval varies between one and two times the configured interval.

NULL_RD

This transaction attempts to change to the NULL state when in the NULL_NULL state. It is a NOP from all other states. The POW hardware will return NULL_NULL when it could not successfully allocate an internal POW entry.

Successful NULL_RDs always leave the core in the NULL state. Unsuccessful NULL_RDs or ones converted to NOPs do not change the core state.

SWTAG(new_tag_type, new_tag)

 This starts a tag switch transaction. The POW hardware completes the tag switch transaction later when it clears the pending switch bit for the core (refer to Section 5.5 for more information onPOW tag switch-pending indications). An exception is a SWTAG to NULL, whose completion POW hardware never transmits. A SWTAG must not be used when switching from the NULL state.

● A SWTAG from an ATOMIC tag releases the ATOMIC tag immediately once the tag switch transaction starts, perhaps long before the SWTAG transaction completes.

● A SWTAG to an ATOMIC tag completes when the work acquires the new ATOMIC tag. At most one piece of work holds an ATOMIC tag at any time. The FIFO order is the acquisition order for the tag.

● A SWTAG from an ORDERED tag cannot complete until all work ordered earlier in the old tag's FIFO start a SWTAG transaction.

● A SWTAG to an ORDERED tag occurs immediately when switching from an ATOMIC or NULL tag, and occurs once the ordering constraints of the old tag are met when switching from an ORDERED tag.

SWTAG_FULL(new_tag_type,new_tag, new_wqp, new_grp)

This is identical to SWTAG, except that the transaction additionally updates the work-queue pointer (new_wqp) and 4-bit group identifier (new_grp) for the work that is held in the POW. SWTAG_FULL must be used for all switches from the NULL state.

The POW hardware never interprets or uses the work-queue pointer supplied in this transaction, but it may deliver it to software later to complete a GET_WORK.

The POW hardware stores <35:3> of the work-queue pointer. SWTAG_FULL must not be used for switches to NULL.

DESCHED(no_sched)

 This executes a deschedule transaction. When the no_sched bit is set on DESCHED (or SWTAG_DESCHED) operations, the POW hardware does not schedule the packet to a core until a subsequent CLEAR_NSCHED operation clears the no_sched bit for the POW entry. The POW entry can be determined with a POW status load with get_cur=1 prior to the DESCHED (refer to Section 5.11.1). The index field in <50:40> identifies the POW ID.

Note that it is recommended that the core be in ATOMIC state rather than ORDERED state at the time of the DESCHED. (See the POW Performance Considerations on page 243, below.)

SWTAG_DESCHED(new_tag_type, new_tag, new_grp, no_sched)

This is identical to a SWTAG followed by a DESCHED, except that it also updates the group identifier. It must follow the same start rules as does SWTAG (shown in Figure 5on page 224). SWTAG_DESCHED is well-suited for transferring work from one group to another - work pipelining.

UPD_WQP_GRP (new_wqp, new_grp)

Update the work-queue pointer (new_wqp) and group identifier (new_grp) for the work that is held in the POW.

 CLR_NSCHED (wqp, index)

Clears the nosched bit for the POW entry selected by index. CLR_NSCHED is a NOP under any the following conditions:

● the POW entry is not on a deschedule list, or

● the wqp in the POW entry does not match the supplied wqp Before initiating a CLR_NSCHED operation, software must guarantee that all *DESCHEDs and CLR_NSCHEDs are complete. software can read the pend_desched and pend_nosched_clr bits via POW status loads to determine when these conditions are true. (Refer to Sections 5.8 and 5.11.1 for more details on POW status loads.)

After a CLR_NSCHED operation, software must guarantee that the CLR_NSCHED is complete before issuing any subsequent POW operations. It can do this by checking the pend_nosched_clr via POW status reads.

Note also that index will typically be determined by POW status loads prior to the *DESCHED that set the no_sched bit. A POW status load with get_cur=1 returns the index field in <50:40>.

阅读更多
个人分类: study_log
想对作者说点什么? 我来说一句

S7-200硬件手册.pdf

2013年01月03日 4.08MB 下载

三菱-FX1N 硬件 手册

2015年06月04日 2MB 下载

三菱FX2N硬件手册

2010年11月25日 341KB 下载

CP1E 硬件手册 中文

2010年01月21日 10.61MB 下载

Dell-R710-硬件及维护手册

2017年11月24日 363KB 下载

HOLLiAS_MACS-K_系列硬件手册

2015年07月04日 10.1MB 下载

西门子AS410SMART硬件手册

2018年02月23日 1.26MB 下载

浙大中控DCS 系统硬件手册

2010年03月11日 535KB 下载

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭