9.5.8.4. Build finite state automaton
The finite state automaton (FSA) is categorized as deterministic (DFA) and nondeterministic (NDFA). For deterministic one, by start state and the triggering event, we can definitely know the state we will transit, while for nondeterministic ones, it will lead to several destinations. In machine description file, for instruction, in unit reservation which contains alternatives (“|” operator) will introduce nondeterministic FSA.
However, we can control the output of these alternatives by using “-ndfa” option during producing the automata. By default, without this option, alternatives are treated in deterministic way, only transition for first qualified alternative is regarded. While with the option, all qualified alternatives are used to form the nondeterministic automaton. But after that, a NDFA to DFA tansformation is taken to put the automaton into DFA. Obviously, the automaton generated by ndfa_flag of 0 is less efficient than that generated by ndfa_flag of 1, however the former automaton can be generated faster, and the efficience loss is not as large as imaging.
6401 static void
6402 build_automaton (automaton_t automaton) in genautomata.c
6403 {
6404 int states_num;
6405 int arcs_num;
6406
6407 ticker_on (&NDFA_time );
6408 if (progress_flag )
6409 {
6410 if (automaton->corresponding_automaton_decl == NULL)
6411 fprintf (stderr , "Create anonymous automaton");
6412 else
6413 fprintf (stderr , "Create automaton `%s'",
6414 automaton->corresponding_automaton_decl->name);
6415 fprintf (stderr , " (1 dot is 100 new states):");
6416 }
6417 make_automaton (automaton);
make_automaton is invoked to build the automaton, which will according to “-ndfa” option, build nondeterministic or deterministic automaton with alternatives’ presence (operator “|” in define_insn_reservation).
5693 static void
5694 make_automaton (automaton_t automaton) in genautomata.c
5695 {
5696 ainsn_t ainsn;
5697 struct insn_reserv_decl *insn_reserv_decl;
5698 alt_state_t alt_state;
5699 state_t state;
5700 state_t start_state;
5701 state_t state2;
5702 ainsn_t advance_cycle_ainsn;
5703 arc_t added_arc;
5704 vla_ptr_t state_stack;
5705 int states_n;
5706 reserv_sets_t reservs_matter = form_reservs_matter (automaton);
5707
5708 VLA_PTR_CREATE (state_stack, 150, "state stack");
5709 /* Create the start state (empty state). */
5710 start_state = insert_state (get_free_state (1, automaton));
5711 automaton->start_state = start_state;
5712 start_state->it_was_placed_in_stack_for_NDFA_forming = 1;
5713 VLA_PTR_ADD (state_stack, start_state);
5714 states_n = 1;
Here, form_reserv_matter will again fill a bitmap of unit reservation. However, this time the bitmap is per automaton. In 8.1.3. Overview of DEFINE_INSN_RESERVATION pattern , it sees that units would be distributed to automaton (by define_cpu_unit pattern, every unit should belong to one automaton), and among different units of the same automaton, there is dependent or exclusive relation (defined by EXCLUSION_SET pattern etc.).
Besides, define_insn_reservation describes units’ reservation by every instruction class (note strictly speaking, insn mentions instruction class). Summarizing information of these classes, it can know the reservation of the unit in whole approximately, which is done in previous section.
Bitmaps of instruction class for units’ reservation per cycle can’t display such information. Here it needs collect these information into the bitmaps for the automata.
5669 static reserv_sets_t
5670 form_reservs_matter (automaton_t automaton) in genautomata.c
5671 {
5672 int cycle, unit;
5673 reserv_sets_t reservs_matter = alloc_empty_reserv_sets();
5674
5675 for (cycle = 0; cycle < max_cycles_num ; cycle++)
5676 for (unit = 0; unit < description ->units_num; unit++)
5677 if (units_array [unit]->automaton_decl
5678 == automaton->corresponding_automaton_decl
5679 && (cycle >= units_array [unit]->min_occ_cycle_num
5680 /* We can not remove queried unit from reservations. */
5681 || units_array [unit]->query_p
5682 /* We can not remove units which are used
5683 `exclusion_set', `presence_set',
5684 `final_presence_set', `absence_set', and
5685 `final_absence_set'. */
5686 || units_array [unit]->in_set_p))
5687 set_unit_reserv (reservs_matter, cycle, unit);
5688 return reservs_matter;
5689 }
At line 5679 min_occ_cycle_num indicates the minmum cycle in all reservations of the unit. In here treatment, in cycles after, the unit will be regarded as reserved by the owner. It is conservative, but enough simple and absolutelt right.
Then in make_automaton , at line 5710, a new start_state is saved in state_stack . It will be the start point of the automaton under built. So it satisfies condition at line 5715 initially.
make_automaton (continued)
5715 while (VLA_PTR_LENGTH (state_stack) != 0)
5716 {
5717 state = VLA_PTR (state_stack, VLA_PTR_LENGTH (state_stack) - 1);
5718 VLA_PTR_SHORTEN (state_stack, 1);
5719 advance_cycle_ainsn = NULL;
5720 for (ainsn = automaton->ainsn_list;
5721 ainsn != NULL;
5722 ainsn = ainsn->next_ainsn)
5723 if (ainsn->first_insn_with_same_reservs)
5724 {
5725 insn_reserv_decl = ainsn->insn_reserv_decl;
5726 if (insn_reserv_decl != DECL_INSN_RESERV (advance_cycle_insn_decl ))
5727 {
5728 /* We process alt_states in the same order as they are
5729 present in the description. */
5730 added_arc = NULL;
5731 for (alt_state = ainsn->alt_states;
5732 alt_state != NULL;
5733 alt_state = alt_state->next_alt_state)
5734 {
5735 state2 = alt_state->state;
5736 if (!intersected_state_reservs_p (state, state2))
5737 {
5738 state2 = states_union (state, state2, reservs_matter);
5739 if (!state2->it_was_placed_in_stack_for_NDFA_forming)
5740 {
5741 state2->it_was_placed_in_stack_for_NDFA_forming
5742 = 1;
5743 VLA_PTR_ADD (state_stack, state2);
5744 states_n++;
5745 if (progress_flag && states_n % 100 == 0)
5746 fprintf (stderr , ".");
5747 }
5748 added_arc = add_arc (state, state2, ainsn, 1);
5749 if (!ndfa_flag )
5750 break ;
5751 }
5752 }
5753 if (!ndfa_flag && added_arc != NULL)
5754 {
5755 added_arc->state_alts = 0;
5756 for (alt_state = ainsn->alt_states;
5757 alt_state != NULL;
5758 alt_state = alt_state->next_alt_state)
5759 {
5760 state2 = alt_state->state;
5761 if (!intersected_state_reservs_p (state, state2))
5762 added_arc->state_alts++;
5763 }
5764 }
5765 }
5766 else
5767 advance_cycle_ainsn = ainsn;
5768 }
5769 /* Add transition to advance cycle. */
5770 state2 = state_shift (state, reservs_matter);
5771 if (!state2->it_was_placed_in_stack_for_NDFA_forming)
5772 {
5773 state2->it_was_placed_in_stack_for_NDFA_forming = 1;
5774 VLA_PTR_ADD (state_stack, state2);
5775 states_n++;
5776 if (progress_flag && states_n % 100 == 0)
5777 fprintf (stderr , ".");
5778 }
5779 if (advance_cycle_ainsn == NULL)
5780 abort ();
5781 add_arc (state, state2, advance_cycle_ainsn, 1);
5782 }
5783 VLA_PTR_DELETE (state_stack);
5784 }
In this automaton, we use states to track the resource usage and state transition indicates the issue of instruction. There are two types of state transition. First, certain instruction can be issued under the state (that is, the instruction will not compete for resource with those already issued). And if no more instruction can be issued, we can do nothing but wait CPU to advance one cycle.
In previous section, it shows if sets first_insn_with_same_reservs , the ainsn is the first define_insn_reservation pattern using the units’ reservation in the machine description file. It indicates, at that point, these units expected may be available and worthes a searching. While if first_insn_with_same_reservs is 0, it means other instruction class using the same units has been arranged, we could just wait for next cycle (head for line 5770).
Above, at line 5761, intersected_state_reservs_p returns nonzero if the two states, at lease one of which represents instruction, compete the same CPU unit in a cycle, and on the other hand, returns zero which means we can iusse the speicified instruction.
Above at line 5761, state stands for the source state, and state2 stands for the possible target state, see that these two states are within the same cycle, transition is possible only they won’t compete for the CPU units. intersected_state_reservs_p checks if these two states will compete for resource, it returns 0, if no such competition found.
4168 static int
4169 intersected_state_reservs_p (state_t state1, state_t state2) in genautomata.c
4170 {
4171 if (state1->automaton != state2->automaton)
4172 abort ();
4173 return reserv_sets_are_intersected (state1->reservs, state2->reservs);
4174 }
Of course, these two states must belong to same automaton. In reservs field of state is the bitmap recording the usage of units in every cycle (the number of cycle is determined by the largest cycle number of all define_insn_reservation patterns), so if condition at line 3881 satisfied, it means the states conflict.
3867 static int
3868 reserv_sets_are_intersected (reserv_sets_t operand_1, in genautomata.c
3869 reserv_sets_t operand_2)
3870 {
3871 set_el_t *el_ptr_1;
3872 set_el_t *el_ptr_2;
3873 set_el_t *cycle_ptr_1;
3874 set_el_t *cycle_ptr_2;
3875
3876 if (operand_1 == NULL || operand_2 == NULL)
3877 abort ();
3878 for (el_ptr_1 = operand_1, el_ptr_2 = operand_2;
3879 el_ptr_1 < operand_1 + els_in_reservs ;
3880 el_ptr_1++, el_ptr_2++)
3881 if (*el_ptr_1 & *el_ptr_2)
3882 return 1;
3883 reserv_sets_or (temp_reserv , operand_1, operand_2);
3884 for (cycle_ptr_1 = operand_1, cycle_ptr_2 = operand_2;
3885 cycle_ptr_1 < operand_1 + els_in_reservs ;
3886 cycle_ptr_1 += els_in_cycle_reserv , cycle_ptr_2 += els_in_cycle_reserv )
3887 {
3888 for (el_ptr_1 = cycle_ptr_1, el_ptr_2 = get_excl_set (cycle_ptr_2);
3889 el_ptr_1 < cycle_ptr_1 + els_in_cycle_reserv ;
3890 el_ptr_1++, el_ptr_2++)
3891 if (*el_ptr_1 & *el_ptr_2)
3892 return 1;
3893 if (!check_presence_pattern_sets (cycle_ptr_1, cycle_ptr_2, FALSE))
3894 return 1;
3895 if (!check_presence_pattern_sets (temp_reserv + (cycle_ptr_2
3896 - operand_2),
3897 cycle_ptr_2, TRUE))
3898 return 1;
3899 if (!check_absence_pattern_sets (cycle_ptr_1, cycle_ptr_2, FALSE))
3900 return 1;
3901 if (!check_absence_pattern_sets (temp_reserv + (cycle_ptr_2 - operand_2),
3902 cycle_ptr_2, TRUE))
3903 return 1;
3904 }
3905 return 0;
3906 }
If these states don’t compete for resource, it is too earily to make conclusion, we need further check if there is conflict due to units’ dependence or exclusion.
First check exclusion among units. In initiate_excl_sets , it sets unit_excl_set_table according to the EXCLUDE_SET patterns present (see 9.5.8.2. Data initialization ). It is a bitmap too, which in fact is an two dimensions array of unit_num * unit_num (as exclusion is a symetric relation, it is in fact is a symetric matrix), for exclusive set unit 1 and 2, at poisiton 1*2 of the array, is 1.
4582 static reserv_sets_t
4583 get_excl_set (reserv_sets_t in_set)
4584 {
4585 int excl_char_num;
4586 int chars_num;
4587 int i;
4588 int start_unit_num;
4589 int unit_num;
4590
4591 chars_num = els_in_cycle_reserv * sizeof (set_el_t);
4592 memset (excl_set , 0, chars_num);
4593 for (excl_char_num = 0; excl_char_num < chars_num; excl_char_num++)
4594 if (((unsigned char *) in_set) [excl_char_num])
4595 for (i = CHAR_BIT - 1; i >= 0; i--)
4596 if ((((unsigned char *) in_set) [excl_char_num] >> i) & 1)
4597 {
4598 start_unit_num = excl_char_num * CHAR_BIT + i;
4599 if (start_unit_num >= description ->units_num)
4600 return excl_set ;
4601 for (unit_num = 0; unit_num < els_in_cycle_reserv ; unit_num++)
4602 {
4603 excl_set [unit_num]
4604 |= unit_excl_set_table [start_unit_num] [unit_num];
4605 }
4606 }
4607 return excl_set ;
4608 }
At line 4594 and 4596, only reserved unit will be searched for excluding units, and the result is put in array excl_set , of which the elements are also bitmaps, which should be the exclusive bitmap of unit having number of the index to other units. No doubt if the exclusive unit appears in the other state, these two states can’t be combined.
Beside exclusive relation, it needs check dependence. At line 3883, first get the combination of the states in temp_reserv . In 8.1.3. Overview of DEFINE_INSN_RESERVATION pattern , we know that presence and absence relationships have final and non-final variants. The final one requires check in target state, while non-final one requires check in source state, so temp_reserv is used as assumed target state. Above expression “temp_reserv + (cycle_ptr_2 - operand_2)” gives the bitmap of units’ reservation of the assumed target state at specified cycle.
The task of check_presence_pattern_sets is quite clear, based on units reserved in original_set, searches dependent units in unit_final_presence_set_table or unit_presence_set_table . If the depenendent units are all present in checked_set, then these two states are compatible possibly.
4692 static int
4693 check_presence_pattern_sets (reserv_sets_t checked_set,
4694 reserv_sets_t origional_set,
4695 int final_p)
4696 {
4697 int char_num;
4698 int chars_num;
4699 int i;
4700 int start_unit_num;
4701 int unit_num;
4702 int presence_p;
4703 pattern_reserv_t pat_reserv;
4704
4705 chars_num = els_in_cycle_reserv * sizeof (set_el_t);
4706 for (char_num = 0; char_num < chars_num; char_num++)
4707 if (((unsigned char *) origional_set) [char_num])
4708 for (i = CHAR_BIT - 1; i >= 0; i--)
4709 if ((((unsigned char *) origional_set) [char_num] >> i) & 1)
4710 {
4711 start_unit_num = char_num * CHAR_BIT + i;
4712 if (start_unit_num >= description ->units_num)
4713 break ;
4714 if ((final_p
4715 && unit_final_presence_set_table [start_unit_num] == NULL)
4716 || (!final_p
4717 && unit_presence_set_table [start_unit_num] == NULL))
4718 continue ;
4719 presence_p = FALSE;
4720 for (pat_reserv = (final_p
4721 ? unit_final_presence_set_table [start_unit_num]
4722 : unit_presence_set_table [start_unit_num]);
4723 pat_reserv != NULL;
4724 pat_reserv = pat_reserv->next_pattern_reserv)
4725 {
4726 for (unit_num = 0; unit_num < els_in_cycle_reserv ; unit_num++)
4727 if ((checked_set [unit_num] & pat_reserv->reserv [unit_num])
4728 != pat_reserv->reserv [unit_num])
4729 break ;
4730 presence_p = presence_p || unit_num >= els_in_cycle_reserv ;
4731 }
4732 if (!presence_p)
4733 return FALSE;
4734 }
4735 return TRUE;
4736 }
The process in check_absence_pattern_sets is similar, only when all units of absent-set are absent in checked_set, these two states may be compatible.
4741 static int
4742 check_absence_pattern_sets (reserv_sets_t checked_set,
4743 reserv_sets_t origional_set,
4744 int final_p)
4745 {
4746 int char_num;
4747 int chars_num;
4748 int i;
4749 int start_unit_num;
4750 int unit_num;
4751 pattern_reserv_t pat_reserv;
4752
4753 chars_num = els_in_cycle_reserv * sizeof (set_el_t);
4754 for (char_num = 0; char_num < chars_num; char_num++)
4755 if (((unsigned char *) origional_set) [char_num])
4756 for (i = CHAR_BIT - 1; i >= 0; i--)
4757 if ((((unsigned char *) origional_set) [char_num] >> i) & 1)
4758 {
4759 start_unit_num = char_num * CHAR_BIT + i;
4760 if (start_unit_num >= description ->units_num)
4761 break ;
4762 for (pat_reserv = (final_p
4763 ? unit_final_absence_set_table [start_unit_num]
4764 : unit_absence_set_table [start_unit_num]);
4765 pat_reserv != NULL;
4766 pat_reserv = pat_reserv->next_pattern_reserv)
4767 {
4768 for (unit_num = 0; unit_num < els_in_cycle_reserv ; unit_num++)
4769 if ((checked_set [unit_num] & pat_reserv->reserv [unit_num])
4770 != pat_reserv->reserv [unit_num]
4771 && pat_reserv->reserv [unit_num])
4772 break ;
4773 if (unit_num >= els_in_cycle_reserv )
4774 return FALSE;
4775 }
4776 }
4777 return TRUE;
4778 }
If the states are combinable, at line 5738, state_union combines them into a new state. This new state then is put into the state_stack as the start point for next step.
4179 static state_t
4180 states_union (state_t state1, state_t state2, reserv_sets_t reservs)
4181 {
4182 state_t result;
4183 state_t state_in_table;
4184
4185 if (state1->automaton != state2->automaton)
4186 abort ();
4187 result = get_free_state (1, state1->automaton);
4188 reserv_sets_or (result->reservs, state1->reservs, state2->reservs);
4189 reserv_sets_and (result->reservs, result->reservs, reservs);
4190 state_in_table = insert_state (result);
4191 if (result != state_in_table)
4192 {
4193 free_state (result);
4194 result = state_in_table;
4195 }
4196 return result;
4197 }
Above parameter reservs is built by form_reservs_matter , which indicates the units’ reservation of the automaton. Collection of reservation of these two states may exceed that of automaton, so needs bit-and operation at line 4189.
Next we need to record this transition to this new state.
4319 static arc_t
4320 add_arc (state_t from_state, state_t to_state, ainsn_t ainsn, in genautomata.c
4321 int state_alts)
4322 {
4323 arc_t new_arc;
4324
4325 new_arc = find_arc (from_state, to_state, ainsn);
4326 if (new_arc != NULL)
4327 return new_arc;
4328 if (first_free_arc == NULL)
4329 {
4330 #ifndef NDEBUG
4331 allocated_arcs_num ++;
4332 #endif
4333 new_arc = create_node (sizeof (struct arc ));
4334 new_arc->to_state = NULL;
4335 new_arc->insn = NULL;
4336 new_arc->next_out_arc = NULL;
4337 }
4338 else
4339 {
4340 new_arc = first_free_arc ;
4341 first_free_arc = first_free_arc ->next_out_arc;
4342 }
4343 new_arc->to_state = to_state;
4344 new_arc->insn = ainsn;
4345 ainsn->arc_exists_p = 1;
4346 new_arc->next_out_arc = from_state->first_out_arc;
4347 from_state->first_out_arc = new_arc;
4348 new_arc->next_arc_marked_by_insn = NULL;
4349 new_arc->state_alts = state_alts;
4350 return new_arc;
4351 }
arc is defined in below. For all to_states outgoing, they are linked by next_out_arc , and to_state indicates the target of the transition.
1140 struct arc in genautomata.c
1141 {
1142 /* The following field refers for the state into which given arc
1143 enters. */
1144 state_t to_state;
1145 /* The following field describes that the insn issue (with cycle
1146 advancing for special insn `cycle advancing' and without cycle
1147 advancing for others) makes transition from given state to
1148 another given state. */
1149 ainsn_t insn;
1150 /* The following field value is the next arc output from the same
1151 state. */
1152 arc_t next_out_arc;
1153 /* List of arcs marked given insn is formed with the following
1154 field. The field is used in transformation NDFA -> DFA. */
1155 arc_t next_arc_marked_by_insn;
1156 /* The following field is defined if NDFA_FLAG is zero. The member
1157 value is number of alternative reservations which can be used for
1158 transition for given state by given insn. */
1159 int state_alts;
1160 };
Arc instances must have one-to-one relationship with pair (source state , target state , insn). This decision has profound impact in automata minization later. The uniqueness is promised by find_arc .
4305 static arc_t
4306 find_arc (state_t from_state, state_t to_state, ainsn_t insn)
4307 {
4308 arc_t arc;
4309
4310 for (arc = first_out_arc (from_state); arc != NULL; arc = next_out_arc (arc))
4311 if (arc->to_state == to_state && arc->insn == insn)
4312 return arc;
4313 return NULL;
4314 }
Above, at line 5749, ndfa_flag , if non-zero, indicates the using of option “-ndfa” (handling alternatives in nondeterministic way). In nondeterministic way, for all alternatives of one define_insn_reservation, every possible transition will be recorded; while in deterministic way, for all alternatives of one define_insn_reservation, only the first qualified one will be recorded tegother with the number of all possible transitions (however transition except the first will never be taken). For example, assuming that an define_insn_reservation has three alternatives: A, B, and C. From start state S, we get following arcs:
S à A (can transit, create state SA, push in state_stack )
S à B (can transit, create state SB, push in state_stack )
S à C (can v, create state SC, push in state_stack )
While in deterministic automata, an arc will be created for first possible transition. For above example, we can get:
S à A (can i transit, create state SA, push in state_stack ), and state_alts of arc of 3.
Assuming that the following define_insn_reservation has two alternatives: E, and F. For nondeterministic automata we get:
S à E (can transit, create state SE, push in state_stack )
S à F (can transit, create state SF, push in state_stack )
And for deterministic automata we get:
S à E (can transit, create state SE, push in state_stack ), and state_alts of arc of 2
So the FOR loop at line 5720 tries to find out the possible transition under current state. Then in make_automaton at line 5769, state_shift will advance current state one cycle ahead by shifting the content of reservs field – of course, a new state and arc should be created for transition. Once created, the state (here is state2 ) will be put in the state_stack as the start state later (now in state_stack are: SA, SB, SC, SE, SF for nondeterministic automaton, and SA, SE for deterministic automaton. Notice that no state2 here, as start_state ∪ SA = SA).
In next run of WHILE loop at line 5715, SF is fetched from state_stack . For nondeterministic automaton we get following transitions, and remember that alternatives must share certain CPU units in valid automaton declaration:
F à E (NOT possible, intersected_state_reservs_p returns 1)
F à A (create FA, push in state_stack , assuming not intersect of CPU unit)
F à B (create FB, push in state_stack , assuming not intersect of CPU unit)
F à C (create FC, push in state_stack , assuming not intersect of CPU unit)
F à F1 (advancing one cycle, create F1, push in state_stack )
While for deterministic automata we get following transitions:
F à E (NOT possible, intersected_state_reservs_p returns 1)
F à A (create FA, push in state_stack , assuming not intersect of CPU unit)
F à F1 (advancing one cycle, create F1, push in state_stack )
Then in next run, F1 is fetched, we get following transitions:
F1 à A (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à B (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à C (create F1C, push in state_stack , assuming not intersect of CPU unit)
F1 à E (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à F (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à F2 (advancing F1 one cycle, create F2, push in state_stack )
And for deterministic automata E is fetched, we get following transitions:
F1 à A (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à B (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à C (create F1C, push in state_stack , assuming not intersect of CPU unit)
F1 à E (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à F (NOT possible, assume intersected_state_reservs_p returns 1)
F1 à F2 (advancing F1 one cycle, create F2, push in state_stack )
At that point, arcs are linked into states by add_arc , we can get following figure. In the figure, arc* records information for the transition, field next_out_arc links all transitions departed from the same state, and field to_state refers to the destination state.
figure 73 : building DFA, stage 1
By this way, all transition opportunities are found out and recorded by arcs. In common sense, we have built the automaton. However, if alternatives present, NDFA may be created by using “-ndfa” option. NDFA is not what we want, we need to transfer it into DFA following.