8. Tool of genattr
8.1. Overview
genattr will output insn-attr.h from machine description file. In machine description file, besides define_insn, define_expand, define_split, define_peephole and define_peephole2 which tell the optimization chance with rtx instruction combination, patterns about the hardware architecture also present. They are define_delay, define_function_unit, define_insn_reservation, define_cpu_unit and etc. For these architecture informations, they are saved in certain machine description file in directory “gcc-3.46/gcc/config/i386”. At there, we can find two files: athlon.md and pentium.md.
In this tool, three patterns about the hardware will be encountered: define_delay, define_function_unit, and define_insn_reservation. However, for both athon and pentium, no define_delay is present.
Anyway, let’s see something about them one by one.
8.1.1. Overview of DEFINE_DELAY pattern
Following extracted from gccinfo.
An instruction is said to require a "delay slot" if some instructions that are physically after the instruction are executed as if they were located before it. Classic examples are branch and call instructions, which often execute the following instruction before the branch or call is performed.
On some machines, conditional branch instructions can optionally "annul" instructions in the delay slot. This means that the instruction will not be executed for certain branch outcomes. Both instructions that annul if the branch is true and instructions that annul if the branch is false are supported.
Delay slot scheduling differs from instruction scheduling in that determining whether an instruction needs a delay slot is dependent only on the type of instruction being generated, not on data flow between the instructions.
The requirement of an insn needing one or more delay slots is indicated via the `define_delay' expression. It has the following form:
(define_delay TEST
[DELAY-1 ANNUL-TRUE-1 ANNUL-FALSE-1
DELAY-2 ANNUL-TRUE-2 ANNUL-FALSE-2
...])
TEST is an attribute test that indicates whether this `define_delay' applies to a particular insn. If so, the number of required delay slots is determined by the length of the vector specified as the second argument. An insn placed in delay slot N must satisfy attribute test DELAY-N. ANNUL-TRUE-N is an attribute test that specifies which insns may be annulled if the branch is true. Similarly, ANNUL-FALSE-N specifies which insns in the delay slot may be annulled if the branch is false. If annulling is not supported for that delay slot, `(nil)' should be coded.
For example, in the common case where branch and call insns require a single delay slot, which may contain any insn other than a branch or call, the following would be placed in the `md' file:
(define_delay (eq_attr "type" "branch,call")
[(eq_attr "type" "!branch,call") (nil) (nil)])
Multiple `define_delay' expressions may be specified. In this case, each such expression specifies different delay slot requirements and there must be no insn for which tests in two `define_delay' expressions are both true.
For example, if we have a machine that requires one delay slot for branches but two for calls, no delay slot can contain a branch or call insn, and any valid insn in the delay slot for the branch can be annulled if the branch is true, we might represent this as follows:
(define_delay (eq_attr "type" "branch")
[(eq_attr "type" "!branch,call")
(eq_attr "type" "!branch,call")
(nil)])
(define_delay (eq_attr "type" "call")
[(eq_attr "type" "!branch,call") (nil) (nil)
(eq_attr "type" "!branch,call") (nil) (nil)])
8.1.2. Overview of DEFINE_FUNCTION_UNIT pattern
Following extracted from gccinfo.
On most RISC machines, there are instructions whose results are not available for a specific number of cycles. Common cases are instructions that load data from memory. On many machines, a pipeline stall will result if the data is referenced too soon after the load instruction.
In addition, many newer microprocessors have multiple function units, usually one for integer and one for floating point, and often will incur pipeline stalls when a result that is needed is not yet ready.
A machine is divided into "function units", each of which execute a specific class of instructions in first-in-first-out order. Function units that accept one instruction each cycle and allow a result to be used in the succeeding instruction (usually via forwarding) need not be specified. Classic RISC microprocessors will normally have a single function unit, which we can call `memory'. The newer "superscalar" processors will often have function units for floating point operations, usually at least a floating point adder and multiplier.
Each usage of a function unit by a class of insns is specified with a `define_function_unit' expression, which looks like this:
(define_function_unit NAME MULTIPLICITY SIMULTANEITY
TEST READY-DELAY ISSUE-DELAY
[CONFLICT-LIST])
NAME is a string giving the name of the function unit.
MULTIPLICITY is an integer specifying the number of identical units in the processor. If more than one unit is specified, they will be scheduled independently. Only truly independent units should be counted; a pipelined unit should be specified as a single unit. (The only common example of a machine that has multiple function units for a single instruction class that are truly independent and not pipelined are the two multiply and two increment units of the CDC 6600.)
SIMULTANEITY specifies the maximum number of insns that can be executing in each instance of the function unit simultaneously or zero if the unit is pipelined and has no limit.
All `define_function_unit' definitions referring to function unit NAME must have the same name and values for MULTIPLICITY and SIMULTANEITY.
TEST is an attribute test that selects the insns we are describing in this definition. Note that an insn may use more than one function unit and a function unit may be specified in more than one `define_function_unit'.
READY-DELAY is an integer that specifies the number of cycles after which the result of the instruction can be used without introducing any stalls.
ISSUE-DELAY is an integer that specifies the number of cycles after the instruction matching the TEST expression begins using this unit until a subsequent instruction can begin. A cost of N indicates an N-1 cycle delay. A subsequent instruction may also be delayed if an earlier instruction has a longer READY-DELAY value. This blocking effect is computed using the SIMULTANEITY, READY-DELAY, ISSUE-DELAY, and CONFLICT-LIST terms. For a normal non-pipelined function unit, SIMULTANEITY is one, the unit is taken to block for the READY-DELAY cycles of the executing insn, and smaller values of ISSUE-DELAY are ignored.
CONFLICT-LIST is an optional list giving detailed conflict costs for this unit. If specified, it is a list of condition test expressions to be applied to insns chosen to execute in NAME following the particular insn matching TEST that is already executing in NAME. For each insn in the list, ISSUE-DELAY specifies the conflict cost; for insns not in the list, the cost is zero. If not specified, CONFLICT-LIST defaults to all instructions that use the function unit. Typical uses of this vector are where a floating point function unit can pipeline either single- or double-precision operations, but not both, or where a memory unit can pipeline loads, but not stores, etc.
As an example, consider a classic RISC machine where the result of a load instruction is not available for two cycles (a single "delay" instruction is required) and where only one load instruction can be executed simultaneously. This would be specified as:
(define_function_unit "memory" 1 1 (eq_attr "type" "load") 2 0)
For the case of a floating point function unit that can pipeline either single or double precision, but not both, the following could be specified:
(define_function_unit
"fp" 1 0 (eq_attr "type" "sp_fp") 4 4 [(eq_attr "type" "dp_fp")])
(define_function_unit
"fp" 1 0 (eq_attr "type" "dp_fp") 4 4 [(eq_attr "type" "sp_fp")])
*Note_* The scheduler attempts to avoid function unit conflicts and uses all the specifications in the `define_function_unit' expression. It has recently been discovered that these specifications may not allow modeling of some of the newer "superscalar" processors that have insns using multiple pipelined units. These insns will cause a potential conflict for the second unit used during their execution and there is no way of representing that conflict.
8.1.3. Overview of DEFINE_INSN_RESERVATION pattern
This part of pattern is close related to pipeline. Following is extracted from gccinfo.
This section describes constructions of the automaton based processor pipeline description. The order of constructions within the machine description file is not important.
The following optional construction describes names of automata generated and used for the pipeline hazards recognition. Sometimes the generated finite state automaton used by the pipeline hazard recognizer is large. If we use more than one automaton and bind functional units to the automata, the total size of the automata is usually less than the size of the single automaton. If there is no one such construction, only one finite state automaton is generated.
(define_automaton AUTOMATA-NAMES)
AUTOMATA-NAMES is a string giving names of the automata. The names are separated by commas. All the automata should have unique names. The automaton name is used in the constructions `define_cpu_unit' and `define_query_cpu_unit'.
Each processor functional unit used in the description of instruction reservations should be described by the following construction.
(define_cpu_unit UNIT-NAMES [AUTOMATON-NAME])
UNIT-NAMES is a string giving the names of the functional units separated by commas. Don't use name `nothing', it is reserved for other goals.
AUTOMATON-NAME is a string giving the name of the automaton with which the unit is bound. The automaton should be described in construction `define_automaton'. You should give "automaton-name", if there is a defined automaton.
The assignment of units to automata is constrained by the uses of the units in insn reservations. The most important constraint is: if a unit reservation is present on a particular cycle of an alternative for an insn reservation, then some unit from the same automaton must be present on the same cycle for the other alternatives of the insn reservation. The rest of the constraints are mentioned in the description of the subsequent constructions.
The following construction describes CPU functional units analogously to `define_cpu_unit'. The reservation of such units can be queried for an automaton state. The instruction scheduler never queries reservation of functional units for given automaton state. So as a rule, you don't need this construction. This construction could be used for future code generation goals (e.g. to generate VLIW insn templates).
(define_query_cpu_unit UNIT-NAMES [AUTOMATON-NAME])
UNIT-NAMES is a string giving names of the functional units separated by commas.
AUTOMATON-NAME is a string giving the name of the automaton with which the unit is bound.
Following construction is the major one to describe pipeline characteristics of an instruction.
(define_insn_reservation INSN-NAME DEFAULT_LATENCY
CONDITION REGEXP)
DEFAULT_LATENCY is a number giving latency time of the instruction. There is an important difference between the old description and the automaton based pipeline description. The latency time is used for all dependencies when we use the old description. In the automaton based pipeline description, the given latency time is only used for true dependencies. The cost of anti-dependencies is always zero and the cost of output dependencies is the difference between latency times of the producing and consuming insns (if the difference is negative, the cost is considered to be zero). You can always change the default costs for any description by using the target hook `TARGET_SCHED_ADJUST_COST' (*note Scheduling::).
INSN-NAME is a string giving the internal name of the insn. The internal names are used in constructions `define_bypass' and in the automaton description file generated for debugging. The internal name has nothing in common with the names in `define_insn'. It is a good practice to use insn classes described in the processor manual.
CONDITION defines what RTL insns are described by this construction. You should remember that you will be in trouble if CONDITION for two or more different `define_insn_reservation' constructions is TRUE for an insn. In this case what reservation will be used for the insn is not defined. Such cases are not checked during generation of the pipeline hazards recognizer because in general recognizing that two conditions may have the same value is quite difficult (especially if the conditions contain `symbol_ref'). It is also not checked during the pipeline hazard recognizer work because it would slow down the recognizer considerably.
REGEXP is a string describing the reservation of the cpu's functional units by the instruction. The reservations are described by a regular expression according to the following syntax:
regexp = regexp "," oneof
| oneof
oneof = oneof "|" allof
| allof
allof = allof "+" repeat
| repeat
repeat = element "*" number
| element
element = cpu_function_unit_name
| reservation_name
| result_name
| "nothing"
| "(" regexp ")"
`,' is used for describing the start of the next cycle in the reservation.
`|' is used for describing a reservation described by the first regular expression *or* a reservation described by the second regular expression *or* etc.
`+' is used for describing a reservation described by the first regular expression *and* a reservation described by the second regular expression *and* etc.
`*' is used for convenience and simply means a sequence in which the regular expression are repeated NUMBER times with cycle advancing (see `,').
`cpu_function_unit_name' denotes reservation of the named functional unit.
`reservation_name' -- see description of construction `define_reservation'.
`nothing' denotes no unit reservations .
Sometimes unit reservations for different insns contain common parts. In such case, you can simplify the pipeline description by describing the common part by the following construction
(define_reservation RESERVATION-NAME REGEXP)
RESERVATION-NAME is a string giving name of REGEXP. Functional unit names and reservation names are in the same name space. So the reservation names should be different from the functional unit names and can not be the reserved name `nothing'.
The following construction is used to describe exceptions in the latency time for given instruction pair. This is so called bypasses.
(define_bypass NUMBER OUT_INSN_NAMES IN_INSN_NAMES
[GUARD])
NUMBER defines when the result generated by the instructions given in string OUT_INSN_NAMES will be ready for the instructions given in string IN_INSN_NAMES. The instructions in the string are separated by commas.
GUARD is an optional string giving the name of a C function which defines an additional guard for the bypass. The function will get the two insns as parameters. If the function returns zero the bypass will be ignored for this case. The additional guard is necessary to recognize complicated bypasses, e.g. when the consumer is only an address of insn `store' (not a stored value).
The following five constructions are usually used to describe VLIW processors, or more precisely, to describe a placement of small instructions into VLIW instruction slots. They can be used for RISC processors, too.
(exclusion_set UNIT-NAMES UNIT-NAMES)
(presence_set UNIT-NAMES PATTERNS)
(final_presence_set UNIT-NAMES PATTERNS)
(absence_set UNIT-NAMES PATTERNS)
(final_absence_set UNIT-NAMES PATTERNS)
UNIT-NAMES is a string giving names of functional units separated by commas.
PATTERNS is a string giving patterns of functional units separated by comma. Currently pattern is one unit or units separated by white-spaces.
The first construction (`exclusion_set') means that each functional unit in the first string can not be reserved simultaneously with a unit whose name is in the second string and vice versa. For example, the construction is useful for describing processors (e.g. some SPARC processors) with a fully pipelined floating point functional unit which can execute simultaneously only single floating point insns or only double floating point insns.
The second construction (`presence_set') means that each functional unit in the first string can not be reserved unless at least one of pattern of units whose names are in the second string is reserved. This is an asymmetric relation. For example, it is useful for description that VLIW `slot1' is reserved after `slot0' reservation.
We could describe it by the following construction
(presence_set "slot1" "slot0")
Or `slot1' is reserved only after `slot0' and unit `b0' reservation. In this case we could write
(presence_set "slot1" "slot0 b0")
The third construction (`final_presence_set') is analogous to `presence_set'. The difference between them is when checking is done. When an instruction is issued in given automaton state reflecting all current and planned unit reservations, the automaton state is changed. The first state is a source state, the second one is a result state. Checking for `presence_set' is done on the source state reservation, checking for `final_presence_set' is done on the result reservation. This construction is useful to describe a reservation which is actually two subsequent reservations. For example, if we use
(presence_set "slot1" "slot0")
the following insn will be never issued (because `slot1' requires `slot0' which is absent in the source state).
(define_reservation "insn_and_nop" "slot0 + slot1")
but it can be issued if we use analogous `final_presence_set'.
The forth construction (`absence_set') means that each functional unit in the first string can be reserved only if each pattern of units whose names are in the second string is not reserved. This is an asymmetric relation (actually `exclusion_set' is analogous to this one but it is symmetric). For example, it is useful for description that VLIW `slot0' can not be reserved after `slot1' or `slot2' reservation. We could describe it by the following construction
(absence_set "slot2" "slot0, slot1")
Or `slot2' can not be reserved if `slot0' and unit `b0' are reserved or `slot1' and unit `b1' are reserved. In this case we could write
(absence_set "slot2" "slot0 b0, slot1 b1")
All functional units mentioned in a set should belong to the same automaton.
The last construction (`final_absence_set') is analogous to `absence_set' but checking is done on the result (state) reservation. See comments for `final_presence_set'.
You can control the generator of the pipeline hazard recognizer with the following construction.
(automata_option OPTIONS)
OPTIONS is a string giving options which affect the generated code. Currently there are the following options:
* "no-minimization" makes no minimization of the automaton. This is only worth to do when we are debugging the description and need to look more accurately at reservations of states.
* "time" means printing additional time statistics about generation of automata.
* "v" means a generation of the file describing the result automata. The file has suffix `.dfa' and can be used for the description verification and debugging.
* "w" means a generation of warning instead of error for non-critical errors.
* "ndfa" makes nondeterministic finite state automata. This affects the treatment of operator `|' in the regular expressions. The usual treatment of the operator is to try the first alternative and, if the reservation is not possible, the second alternative. The nondeterministic treatment means trying all alternatives, some of them may be rejected by reservations in the subsequent insns. You can not query functional unit reservations in nondeterministic automaton states.
* "progress" means output of a progress bar showing how many states were generated so far for automaton being processed. This is useful during debugging a DFA description. If you see too many generated states, you could interrupt the generator of the pipeline hazard recognizer and try to figure out a reason for generation of the huge automaton.
As an example, consider a superscalar RISC machine which can issue three insns (two integer insns and one floating point insn) on the cycle but can finish only two insns. To describe this, we define the following functional units.
(define_cpu_unit "i0_pipeline, i1_pipeline, f_pipeline")
(define_cpu_unit "port0, port1")
All simple integer insns can be executed in any integer pipeline and their result is ready in two cycles. The simple integer insns are issued into the first pipeline unless it is reserved, otherwise they are issued into the second pipeline. Integer division and multiplication insns can be executed only in the second integer pipeline and their results are ready correspondingly in 8 and 4 cycles. The integer division is not pipelined, i.e. the subsequent integer division insn can not be issued until the current division insn finished. Floating point insns are fully pipelined and their results are ready in 3 cycles. Where the result of a floating point insn is used by an integer insn, an additional delay of one cycle is incurred. To describe all of this we could specify
(define_cpu_unit "div")
(define_insn_reservation "simple" 2 (eq_attr "type" "int")
"(i0_pipeline | i1_pipeline), (port0 | port1)")
(define_insn_reservation "mult" 4 (eq_attr "type" "mult")
"i1_pipeline, nothing*2, (port0 | port1)")
(define_insn_reservation "div" 8 (eq_attr "type" "div")
"i1_pipeline, div*7, div + (port0 | port1)")
(define_insn_reservation "float" 3 (eq_attr "type" "float")
"f_pipeline, nothing, (port0 | port1))
(define_bypass 4 "float" "simple,mult,div")
To simplify the description we could describe the following reservation
(define_reservation "finish" "port0|port1")
and use it in all `define_insn_reservation' as in the following construction
(define_insn_reservation "simple" 2 (eq_attr "type" "int")
"(i0_pipeline | i1_pipeline), finish")