Introduction
This paper can only give a rough summary of the text formatting component of StarWriter. Many important concepts and aspects are not yet discussed here, but maybe they will be one day, since this paper is work in progress. I hope this will be useful for somebody.
Text Formatting - Main Objectives
Calculation of Paragraph Sizes
The main objective of the text formatting process is the calculation of paragraph sizes. Depending on the direction of writing (left to right in western countries, top to bottom in the Asian world) a width or height is given from the environment, the other value results from the number and sizes of the lines in the paragraph. For this, line breaks have to be calculated. A paragraph has to be split into several parts, if there is not sufficient space for it. The information obtained from the text formatting process are cached for increasing performance. More on this objective can be found in section 9.1.
Visualization
In addition, the text formatting is responsible for repainting parts of an paragraph, overlapping with a given rectangle. The rectangular areas to be repainted are obtained from the layout. For this, an output device (usually a monitor or printer) is selected. The output device knows about different fonts and how to paint them. A more detailed description of repaint events is given in section 9.2.
Calculation of Character Positions
A difference has to be made between document coordinates and paragraph positions. While document coordinates refer to the more physical aspects of pages, paragraph positions refer to the logical position in the string representing a paragraph. Basically, a paragraph is a string containing unicode characters. The text formatting maintains both values and must be able to convert document coordinates into paragraph positions and vice versa. For example, by clicking the document with a mouse, the appropriate position in the paragraph has to be calculated, enabling the user to modify the paragraph at the chosen position. On the other hand, converting paragraph positions into document coordinates takes place each time the cursor position has to be updated. A combination of both aspects occurs, if the cursorUp (cursorDown) key are pressed. First, the current document position has to be transformed into document coordinates. Next, the y-coordinate is shifted to the line above the current line, and finally the resulting point is transferred back to a paragraph position. This will also be discussed in section 9.3.
Invalidation
As already mentioned above, for performance reasons formatting information are cached. Changing text, modifying attributes, or moving a paragraph within the document result in the invalidity of these information. The formatting information have to be recalculated.
Prerequisites – Where do we start?
Document Coordinates
A paragraph has access to its x- and y- position referring to the physical aspect of a window. (0,0) is the upper left corner of the output window, the default value for the upper left text border on the first page of an document is (1702, 1702). Note that these coordinates do not depend on the current resolution. They are measured in twips, i.e., 1/20 pt.
Width and Height of the Environment
A paragraph has to fit into its environment. If a fixed width is given, a paragraph can make a request for more height, if necessary. If this cannot be granted, the paragraph has to be split.
Character Attributes
Usually, character attributes (e.g., font, color, size...) have start and end values referring to paragraph positions. Each paragraph has an array storing character attributes, which determine, how different words or characters of the paragraph are visualized or printed.
Paragraph Attributes
Paragraph attributes (e.g., margins, line spacing, hyphenation...) are associated with a paragraph. These are the default attributes for the whole paragraph.
Output Device and Font
Formatting information refer to a reference device, usually the installed printer. HTML documents require the current window to be the reference device.
To calculate line breaks, mainly width and height of words are required. For this, font objects can be used. Font objects store a font family, font size, font style, etc. A font object can be passed to an output device, which returns the width and height of a given word.
Break Iterator
A BreakIterator is used to find possible break positions in a given string. The string and the index of the first character not fitting to the line are handed over to theBreakIterator. If hyphenation is disabled, the returned value is the end of the last word fitting to this line. If hyphenation is enabled, the end of the last syllable fitting to the line can be obtained from the result.
Portions - The Basic Concept of Text Formatting
Starting point for text formatting is a SwTxtFrm object, which receives calls when user interventions take place or parts of a document have to be repainted. The SwTxtFrmobject initiates the construction of a data structure, which efficiently supports the repaint, reformat, and invalidation process. This data structure mainly consists of so-called text portions and is accessible via an SwParaPortion object.
A text frame has access to an SwTxtNode object, which basically represents the text string (XubString) and the attribute array of this paragraph (illustration 1). Once it has been generated, the SwParaPortion data structure with the formatting information is stored in a cache memory for efficiency reasons.
An SwParaPortion object itself consists of several further text portions. Lines are represented by SwLineLayout objects, each line consists of different derivatives of theSwLinePortion class (illustration 2). Note, that text portions do not store words or characters. They only represent a formatted part of text. For this, they store their width, height, and the maximum ascend of characters, all referring to document coordinates. They also store the number of characters (length) they are representing. Note, that there are portions with width = 0 and length = 1 (e.g., hole portions used to swallow spaces at an end of line) and portions with width ≠ 0 and length = 0 (e.g., portions used to represent notes).
Every portion derives from SwLinePortion, which in turn derives from SwPosSize. The main types of portions and their attributes are introduced in the following sections.
SwPosSize
|
---|
USHORT nWidth // portion width in document coordinates |
USHORT nHeight // portion height in document coordinates |
SwLinePortion
This class represents a basic text portion.
|
---|
SwLinePortion pPortion // pointer to next portion |
xub_StrLen nLineLength // # characters and spaces represented by this portion |
USHORT nAscent // maximum ascend |
insert(), append() // insert and append other portions |
USHORT GetWhichPortion() // portion identification |
virtual void Paint(SwTxtPaintInfo) = 0 // paint the portion |
virtual sal_Bool Format(SwTxtFormatInfo) // format the portion |
SwTxtPortion
These portions represent parts of the paragraph string. They provide functionality for calculating line breaks with respect to the given environment.
|
---|
BreakLine(SwTxtFormatInfo, SwTxtGuess) // break a line |
SwLineLayout
In addition, this class has an pointer to the next line in the paragraph. It can be regarded as representing one line of text.
|
---|
SwLineLayout pNext // pointer to next line |
USHORT nRealHeight // height of this line including line spacing |
SwParaPortion
This portion represents the paragraph text. The SwRepaint and SwCharRange objects are updated, if the appropriate events are triggered.
|
---|
SwRepaint aRepaint // region to repaint |
SwCharRange aReformat // paragraph position to reformat |
Other Portions
There is a huge amount of other portions, which serve very special purposes. Among these are portions representing
- fields (SwExpandPortion)
- enumeration (SwNumberPortion)
- line breaks (SwBreakPortion)
- tabs (SwTabPortion)
and many other... Have a look at their specification in the *.hxx files
Attributes
Within one paragraph it is (of course) possible to apply different formatting attributes to different parts of the text (illustration 3). The next sections explain the handling of attributes.
Attribute Arrays
Each attribute has a start value, pointing to a position in the paragraph string. Usually attributes also have end values, indicating at which position they become invalid. Portion borders are affected by attribute changes, i.e., a change of attributes always requires the beginning of a new portion. Illustration 4 depicts portion borders and attribute changes with respect to the example given in illustration 3. Note that the shown portions can be smaller than in the illustration, but never overlap with an attribute change. Attributes are organized in an attribute array, sorted by their start value.
Attribute Handler and Attribute Stacks
Attributes influence the painting and formatting process. During painting, the portions representing the text to print are traversed, collecting for each portion the attributes being set. These attributes are used to generate a font object, which in turn is responsible for printing the text represented by this portion during output mode. During formatting, the portions are set up and their width, height and number of characters are determined considering the font generated for this portion.
For generating the appropriate font for a given set of attributes, an attribute handler is used. An attribute handler consist of a number of attribute stacks, one stack for each kind of attribute. Reaching the start of an attribute during traversing of the attribute array, the attribute is pushed onto the appropriate stack and the current font is changed according to this attribute. When reaching an end of an attribute in the attribute array, the attribute is pushed from its stack and the remaining top attribute on this stack is used to change the font again. The state of the attribute stack collection during the traversal of the text represents the state of the current font. The attribute stacks are initialized with default attributes, which are specified for the whole paragraph.
Font Objects
Font objects are generated during the traversal of the set of portions representing the paragraph. The current font object is attached to the output device and destroyed when it becomes outdated. The hierarchy of different font classes is shown in illustration 5.
To take Asian or other languages into consideration, an SwFont object consists of three SwSubFonts (Latin, CJK, and CTL[1]). The SwFont::nActual field indicates the current script, i.e., the currently active subfont.
Font objects are modified by character attributes. The attribute handler (see section 5.2) changes the current font object for each attribute pushed or popped from the stacks.
Font objects are able to calculate the width (in document coordinates) of a given string. This is used for the determination of line breaks and portion sizes.
The Script Type Data Structure
It is possible, to define different fonts for different scripts, e.g., to have a Times New Roman, 12 pt. font for "Latin" characters, while "Asian" characters are shown using an "Andale UT", 20 pt. font. For this, is is essential to know the ranges of the different script types. The SwScriptInfo class is a data structure maintaining these information.
Internally, two arrays are used, one for the next script change, an other for the type of script (Illustration 6).
An SwScriptInfo object is part of each paragraph and has to be updated when entering a new character. Referring to the example in illustration 6, entering a character at position 39 invalidates the array at position 1,2 and 3. A change of script type means necessarily a portion change, since different fonts are used for different scripts types.
Attribute Iterators
There are two associated kinds of objects involved in all processes referring operations on portions (format, paint, cursor positioning):
- Attribute Iterators (SwAttrIter)
- Text Information (SwTxtInfo)
During one of these processes, the SwTxtFrm object generates an iterator and an info object. Depending on the current action, the iterator can by an SwTxtPaint, anSwTxtFormat, or an SwTxtCursor iterator. These iterators traverse portions of an paragraph, at the same time they search the attribute array for attribute changes. SwTxtInfoobjects are used to communicate information between iterators and portions. The SwTxtInfo class is introduced in chapter 8.
These are the most frequently used iterator classes:
SwAttrIter
Base class for all iterators.
|
SwFont pFnt // font object, results from evaluating attributes |
SwAttrSet pAttrSet // attribute set |
OutputDevice pLastOut |
xub_StrLen nStartIndex, nEndIndex // indices to attribute array |
xub_StrLen nPos// index to string, last position that has been looked up |
void Chg(SwTxtAttr pHt)// push the attribute onto the appropriate stack and changes the font |
void Rst(SwTxtAttr pHt) // pop the attribute from its stack |
xub_StrLen GetNextAttr( ) // next attribute change position |
sal_Bool Seek(xub_StrLen nPos) // changes font member, considering attributes at position nPos |
sal_Bool SeekAndChg(xub_StrLen nPos, OutputDevice pOut)// changes font member and changes font at output device according to attributes at position nPos |
SwTxtIter
The SwTxtIter class is derived from SwAttrIter. It can be regarded as an iterator with two objectives: iterating over attributes in the attribute array and iterating over lines of a paragraph.
|
---|
xub_StrLen nStart// start position of current line, updated during iteration |
SwLineLayout pCurr // current line |
SwLineLayout pPrev // previous line |
SwLineLayout GetNext() // pCurr->GetNext() |
void CharToLine(xub_StrLen) // sets line iterator to first line intersecting a specified position in text string |
SwLineLayout TwipsToLine(SwTwips) // sets line iterator to first line intersecting a apecified position in document coordinates |
void CalcRealHeight(sal_Bool bNewLine) |
void CalcAscentAndHeight(KSHORT rAscent, KSHORT rHeight) |
SwTxtCursor
This iterator is used for cursor positioning purposes. It is generated by the text frame in case a repositioning of the cursor is necessary. Have a look at section 9.3 for more details.
|
---|
sal_Bool GetCharRect(SwRect, xub_StrLen)// converts paragraph position to document coordinates |
xub_StrLen GetCrsrOfst(SwPosition pPos, Point rPoint) // converts document coordinates to paragraph position, result in pPos |
SwTxtPainter
During a repaint event (of an rectangular area), the text frame generates an SwTxtPainter object. The DrawTxtLine method of the painter is called by the text frame for each line intersecting the repaint rectangle. Within this method the SwTxtPainter redirects the painting task to the portions in the current line by calling their paint methods. For this, the information collected from the attribute array are encapsulated in the appropriate SwTxtInfo object and passed over to the portions. In fact, the actual painting is done by font objects, which are called by the info structures. Have a look at section 9.2 for a more detailed view on these iterator.
|
---|
void DrawTxtLine(SwRect) // draws current line |
SwTxtFormatter
Each time a reformatting has to be performed (e.g., insertion/deletion of text) the text frame generates an SwTxtFormatter object. The proceeding is similar to the painting process and will be discussed in detail in section 9.1.
|
---|
xub_StrLen FormatLine(xub_StrLen nStart) |
Text Information
Different tasks require a different set of information. These are the most frequently used information encapsulating classes:
SwTxtInfo
This is the base class for text information classes.
|
---|
SwParaPortion pPara // information always refer to a paragraph |
SwTxtSizeInfo
TxtSizeInfo objects are able to calculate the width of a given string. This is used for calculating line breaks and the number of characters fitting into a portion. A call of the GetTxtSize method is redirected to the current font object.
|
---|
OutputDevice pOut // the output device |
SwFont pFnt // a font object |
SwTxtFrm pFrm // the text frame |
xub_StrLen nIdx, nLen // start index and length of current portion |
SwPosSize GetTxtSize(OutputDevice, XubString, xub_StrLen) |
SwTxtPaintInfo
Actually, the TxtPaintInfo objects are not only responsible for encapsulating information for the paint process, in fact they are an active part of the paint process. TheDrawText method is called by the portions to be painted and it redirects the painting task to the font.
|
---|
Point aPos // output position |
SwRect aPaintRect // the update rectangle |
void DrawText(SwLinePortion, xub_StrLen) |
SwTxtFormatInfo
The SwTxtFormatInfo class maintains all important information for the formatting process. Besides from this, it calls the external word hyphenation tool.
|
---|
SwLineLayout pRoot // die Root der aktuellen Zeile (pCurr) |
SwLinePortion pLast // die letzte Portion |
xub_StrLen nLineStart // aktueller Zeilenbeginn im rTxt |
USHORT nLeft // left margin |
USHORT nRight// right margin |
USHORT nFirst //left margin of first line |
USHORT nRealWidth // "real" line width |
USHORT nWidth // "virtual" line width |
USHORT nLineHeight // height after CalcLine |
sal_Bool bInterHyph // interactive hyphenation |
sal_Bool bAutoHyph // automatic hyphenation |
..... |
xub_StrLen FormatLine(xub_StrLen nStart) |
Main Objectives - A closer Look
For easier understanding, the main objectives of text formatting, as mentioned in chapter 2, can be examined independently, although for example, the text formatting process always involves a repaint event and makes it necessary to calculate a new cursor position. The basic operations initiated by the iterators are shown in illustration 7:
The main sections (9.1, 9.2, 9.3) of this chapter represent the main tasks of the text formatting process.
Text Formatting
Text formatting is one of the main task for a word processor. Formatting information have to correspond to the attributes defined by the user. The following flow roughly shows the actions triggered from an user intervention to the resulting formatting information.
- The user inserts two characters by copy and paste into the 75. position into a paragraph.
- The text frame is notified by its SwTxtFrm::Modify method.
- The invalid range (75-76) is stored in an SwCharRange object within the appropriate SwParaPortion object.
- The layout calls the SwTxtFrm::Format method. This method checks, if a reformatting process is necessary due to an invalid range in a paragraph.
- An SwTxtFormatter and an SwTxtFormatInfo object are generated.
- The SwTxtFormatter::FormatLine method is called for each line, which has to be reformatted.
- The SwTxtFormatter::BuildPortions method is called and builds as many portions, as fit to the current line.
- A first guess about how many characters are represented by the next portion to be build is the number of characters up to next change of attributes or script type. This is done in the SwTxtFormatter::FormatLine method.
- The SwLinePortion::Format method calculates the width and height of the current portion. In case it doesn't fit into the current line, the break iterator returns a suitable line break position. A more detailed description on this is given in section 9.1.1.
- The current portion is appended to the portion list of this line, and the information structure is updated according to the new situation. Steps 8 to 10 are repeated, as long as there is sufficient space for more portions in the current line.
- Finally, the height and "real" height (considering line spacing) are determined.
Illustration 8 shows the corresponding function calls for this procedure.
The following sections give an short introduction to common tasks to be regarded during text formatting.
Line Breaks and the Break Iterator
In section 9.1 the text formatting process has been discussed. The instance responsible for finding suitable line breaks is the break iterator. Breaks can be text delimiters (like spaces, tabs...) or, in hyphenation mode, a hyphenation possibility. User defined soft hyphens are also provided. The process described in this section is located between the 9. and 10. step from section 9.1.
The functionality of the break iterator is wrapped in the SwTxtGuess class. The operations performed during text formatting are as follows:
- For each text portion an SwTxtGuess object is created, which is responsible for the calculation, if the current text portion still fits into the current line. For a portion, which fits into the line, nothing has to be done.
- Depending on the current font, the SwTxtGuess object determines, which paragraph position would be the last one to fit into the current line. For this, the output device has to sum up the widths of the characters, comparing the result with the given line width. In hyphenation mode, the possible hyphen character at the end of a line is also considered during this calculation.
- If the character at this position is a text delimiter, no line break has to be performed. This has to be eliminated in upcoming versions, because in some cases a text delimiter is not necessarily a possible line break.
- Otherwise, the break iterator is called to find a suitable line break position. In hyphenation mode, this is the end of the last syllable fitting to the current line. We have to make sure that soft hyphens defined by the user are also considered as possible line breaks.
- The line break is stored within the SwTxtGuess object and is required during the further text formatting process. The portion widths and length have to be adjusted according to these line breaks. For hyphenated words, an additional hyphen portion representing the hyphen (which is of course not part of the paragraph string) has to be generated and added to the end of the line.
Some actions and results occurring during this process are depicted in illustration 9.
Line Break Handling
During text formatting, the SwTxtPortion::Format evaluates the information obtained from an SwTxtGuess object. method. Five different cases are distinguished:
- The current portion still fits to the current line.
- The current portion does not fit to the current line but a valid hyphenation position has been found within the portion.
- The current portion does not fit to the current line but a valid word end has been found within the portion.
- The current portion does not fit to the current line and the current portion does not have a valid line break position, but a valid line break position has been found within the current line.
- The current portion does not fit to the current line and the current portion and the current line does not have a valid line break position.
The handling of the first three cases is discussed in section 9.1.1. For example, the fifth case has to be handled, if you insert a word which is wider than a line, and you do not allow hyphenation, or the word does not have a hyphenation position. In this situation, a BreakCut is performed, the word is cut at the end of the current line.
The fourth case demonstrates a situation, which requires to break the straight forward formatting direction of the portions. Imagine the following case:
What happens, if this word has to be hyphenated, because the "e" doesn't fit to the current line? The above example consists of two text portions with different character attributes. During the formatting process, the first portion is already appended to the current line. Getting the correct hyphenation position ("Attrib-ute") requires to split the last portion and inserting a hyphenation portion between the parts. Because the last portion has to be formatted again, this is called an underflow event. These are the actions performed in the above case:
- The SwTxtGuess::Guess method determines the hyphenation position.
- The SwTxtPortion::Format method triggers an underflow event, because the hyphenation position is not in or at the beginning of the current portion.
- During an underflow event, the SwTxtFrm::BuildPortions method does not try to generate a new portion. It makes the previous portion ("Attribu") the current portion and adjusts the current line width to a value, which is 1 twip smaller than the previous portion would require.
- The current portion is formatted with the new line width, forcing the break iterator to calculate a new line break. The values for the current portion (width, length, etc.) are set, and a hyphenation portion (with no lengths but a width for the hyphenation character) appended to the current portion.
- The rest of the string ("ute") is formatted in the usual way.
Repaint Events
Repaint events are handled quite similar as format events, apart from the fact, that usually formatting information are already available.
- The layout notifies the text frame, that a repaint for certain areas has to take place.
- The SwTxtFrm::Paint(SwRect) method is called.
- An SwTxtPainter and an SwTxtPainterInfo object are generated.
- The SwTxtPainter::DrawTextLine method is called for each line, which has an non empty intersection with the repaint area.
- For each portion, which is affected by the repaint event, the appropriate font is generated by examining the attribute array. This font is part of an SwTxtPaintInfoobject.
- The virtual SwLinePortion::Paint(SwTxtPaintInfo) method is called, having the portions paint themselves by using the underlying rendering engine. For this, theSwTxtPaintInfo object is passed to the portion, communicating the font and output device.
Cursor Positioning
Cursor positioning requires the conversion from paragraph positions to document coordinates and vice versa. The main methods for this conversion are:
- SwTxtFrm::GetCharRect(SwRect, SwPosition, SwCrsrMoveState)This methods determines a rectangular area covering the character at a specified paragraph position.
- SwTxtFrm::GetCrsrOfst(SwPosition, Point, SwCrsrMoveState)This method is responsible for finding the right position in a paragraph, for example when using the mouse to place the cursor.
For example, a paragraph position is converted into a cursor position by executing the following steps:
- The frame containing the paragraph is determined.
- An SwTxtCursor and an SwTxtSizeInfo object are constructed.
- SwTxtCursor::GetCharRect is called. The line containing the character results from iterating over the lines and considering the number of characters per line.
- The y-coordinate results from summing over the heights of the skipped lines.
- The portion containing the character is determined by summing over the number of characters of each portion in this line.
- The x-coordinate results from summing over the widths of the skipped portions and the widths of the remaining characters up to the required character. For this, the appropriate font object is used to calculate their size.
- The widths of the rectangle corresponds to the width of the character.
Both methods are combined, when using the cursorUp (cursorDown) key:
- GetCharRect is called with the current position in the paragraph.
- The resulting rectangle is "shifted" to the previous line
- The new position is converted into a position in the paragraph
Cursor Positions inside Fields
Usually traveling into fields is not allowed. But there is one exception, when it comes to accessibility. For accessibility it is important to obtain the position of each character inside a paragraph. That includes positions inside fields. A field is represented by one special character in the paragraph string. So it must be possible to specify positions inside a field. This is done by using the SwCrsrMoveState structure defined in sw/inc/crstate.hxx. This structure has a pointer to a SwSpecialPos struc:
|
---|
xub_StrLen nCharOfst // the position inside the field |
USHORT nLineOfst // this is used for fields which cover more than one line |
BYTE nExtendRange // this is used for special positions ( < 0 or > string length ) |
So if you want to get the position of the second character inside a field, which has the position 5 in the paragraph string, you simply call the GetCharRect function with the SwPosition encapsulating the string position 5 and pass a SwSpecialPos structure with nCharOfst = 2. This also works for fields with follow fields. A follow field is a part of a field, which doesn't have a representation in the paragraph string because the original field has been split into several pieces (e.g., if there are script changes inside the field).
Details on selected Topics
Fly Formatting
Everything you put into your document except from plain text or tables is called a fly. Examples are frames or drawing objects. It is a task of the text formatting, to consider the required space for these things, i.e., we insert fly portions into our lines to indicate that this space is reserved for a fly. Intersections of flys and a given rectangle can be determined by the SwTxtFly::GetFrame(SwRect) method. The algorithm for fly positioning is explained with regard to illustration 10, showing the common case, where the wrap option is set to "parallel", i.e., the text is supposed to float around the fly:
Note, that in our example, a line spacing of "double" is set. Two flys are potentially overlapping the text (denoted by dotted rectangles). These are the steps performed for calculating fly positions during text formatting:
- When formatting the second line ("this String"), we make a first guess on its final height by assuming that the whole line has the same font.
- We already consider line spacing at this point and "scan" the light gray region, which is supposed to be the final placement of the line for collision with flys. If any collisions are found, we reserve space for the flys by inserting fly portions into the line. In our example, we find fly 1 intersecting the light gray area. This means, the second line has three portions, one text portion representing the string "this", the second portion is a fly portion, reserving space for the fly, the third portion represents the string "string".
- The line height is calculated, based on our first assumption, that "this" and "string" are the text portions in this line.
- Having calculated this "real" line height, we can make a more precise scan for fly portions within the darker gray area. If flys found during step 2 do not any longer intersect the line or if new flys are found intersecting the line, the line has to be formatted again, starting with the line height calculated in step 3. In our example, fly 1 is now not any longer intersecting the line, while fly 2 does. We proceed with step 2, considering the dark gray area while scanning for flys.
Note, that our example would cause endless loops in the formerly explained algorithm. While executing step 2 with the darker gray area, a collision with fly 2 is detected, resulting in two portions for the line, one text portion "this" and one fly portion. A new calculation of the line height has the same result as our first execution of step 1. For this reason, we break our algorithm, if we would start a new loop with an area to scan lying higher (with respect to the y coordinate of the upper border) than the area of the last scan (for not causing an endless loop). The final result is that "this String" is distributed on two lines.
Kerning Portions
When using different scripts in one document, we offer the feature “Apply spacing between different scripts”. This additional space between two text portions representing text in different scripts is realized by inserting a SwKernPortion between them. You can see this in the SwTxtFormatter::BuildPortions function. In order to cover all situations, we have to be able to append or prepend a kerning portion. Most cases are covered by the code to append the kerning portion, but when dealing with fields, it can be easier prepending the kerning portion in front of the current portion. We do not want to interfere this feature with the “Allow hanging punctuation” feature used for Asian languages, see section 10.3. For this, we only add an additional gap between two characters of a different script, if both of them are different from punctuation characters.
Hanging Punctuation
Hanging Punctuation is used in some Asian languages. Some characters are allowed to overlap the borders, especially punctuation characters. The characters which are allowed to be hanging characters, are defined as “Not at start of line” in the options dialog. In order to have the break iterator recognize Latin characters as possible hanging punctuation characters for Asian languages, we do this: If a new portion does not fit to the current line and its script is different to the one of the last portion, we temporarily change the language passed over to the break iterator to the language set for the last portion. This way, a Latin dot behind a Japanese text portion would become Japanese and the break iterator would return that it is allowed to have the Latin dot outside the boundaries.
Text Output
The WYSIWYG mode is a combination of online and printed layout. The reason for this is that we do not want the result on the screen differ to much from the printed output on one hand, on the outher hand the result on the screen should not totally depend on the selected printer and printer driver. The algorithm for computing output positions for the screen works like this:
- We make an first assumption about each character position on the screen, by calling the OutputDevice::GetTextArray function for the selected printer. The result is a so-called kerning array, which contains the positions of the characters of the string relative to the first character.
- The width of the current character is calculated by calling OutputDevice::GetCharWidth for the current screen font.
- The screen position (nScreenPos) of the next character equals the last screen position plus the width of the current character.
- We calculate the output position of the next character by using this formula:outputPosition = ( 3 * nScreenPos + printed position of next character ) / 4
- The resulting value is subtracted by the width of the current character and stored in the kerning array as the output position for the current character. We proceed by taking the next character and start to repeat steps 2 – 5.
We have a special treatment for blanks. Finding a blank during step 2, we let the output position for this blank be the position obtained from the printer during step 1. This allow us to have fixed positions in our output to the screen, compared to the printer.
We also consider the previous character during this algorithm. If the last character is a blank, the output position of the current character is its position as it occurs in the kerning array.
The nDelta Member of SwParaPortion
During text formatting, we only want to format lines, which have been changed because of an input event. Making changes to a line can influence the whole paragraph because of the introduction of new line breaks. The paragraph has a member of the type SwCharRange, which represents a range which has been changed due to a user interaction. Simply typing a character would result in a range with the current input position and length = 1. The nDelta member of the SwParaPortion is the sum of all added or deleted characters. This value is considered during a reformat process of a paragraph. A value of -2 means that two characters have been deleted from this paragraph. During formatting of the paragraph, we first skip each line, which does not lie inside the SwCharRange. The new length of each line, which is formatted, is compared to its old length. The difference between these two values is subtracted from the nDelta value.If the new end of a line does not anymore lie inside the reformatting range and nDeltaequals 0, we have reached a stable situation, we do not have to reformat any following lines.
Font Caches
Font caches are used for faster construction of fonts and faster access to then.
SwFontCache
The SwFontCache is used for faster construction of a font. The SwFontCache stores SwFontObj objects, which in turn encapsulates an SwFont object. The keys for the cache are attribute sets, the values are the fonts. You need a SwFontAccess object to obtain the appropriate font for a given attribute set. If the SwFontAccess::Get method does not find a font object for the attribute set, it generates a new entry for the cache.
SwFntCache
The SwFntCache is used to find the appropriate output font for a given font set by the user. The font set by the user does not have to be the font used for the output. It is passed over to the printer, which returns the font, which is used for printing. These two fonts (the user font and the printer font) are compared and the result is the font which is best suited for screen output. Since this is an expensive operation, the result of the comparison is cached. The key for the cache is a font, the result is the font for the screen. For faster access to the cached objects, each font in the cache has a magic number, which points to a position in the array of the stored output fonts. You need to have an SwFntAccess object to access the cache. If the key font you want to ask the cache for does not have a magic number, the key font is compared with all the other key fonts in the cache.
Drop Caps
Calculating drop caps can lead to some difficulties with endless loops, quite similar to the situation described in section 10.1. A drop cap portion is build by theSwTxtFormatter::NewDropCapPortion function. A first guess for the height of the drop cap is made by guessing the line height (which has not been calculated yet, because the line still unformatted) and multiplying it with the number of lines the drop cap should cover.
A drop cap portion can consist of several parts, one part for each attribute or script change:
|
---|
SwDropPortionPart* pPart // several parts due to script / attribute changes |
USHORT nLines // number of lines |
USHORT nDropHeight |
USHORT nDropDescent // distance to next line |
KSHORT nDistance // distance to the next portion |
short nY // Y offset the baseline for text output |
|
---|
SwDropPortionPart* pFollow // the next drop portion part |
SwFont* pFnt // the font used for output |
xub_StrLen nLen // the length of the part |
USHORT nWidth // the width of the part |
The nLen, pFollow and pFnt fields of the SwDropPortionParts are assigned during the building of the portion.
The widths of the drop cap parts are calculated in the SwDropPortion::Format function. We even want to allow different font sizes inside a drop cap. For this we have to calculate a common scaling factor for all parts of the drop cap parts, in order to:
- Let them have the same baseline within the drop cap
- Let them have a height, that comes quite close to the drop caps height.
The scaling factor and the common baseline for the drop portion parts are calculated in the SwTxtFormatter::CalcFontSize function (see also illustration 11):
- For a first guess for the scaling factor, we devide the wished drop cap height by the biggest font height.
- The common scaling factor is applied to all fonts used within the drop cap portion.
- We calculate a boundary rectangle for all glyphs of the same part.
- The rectangle is shifted to a common baseline, by subtracting the ascent of the font used from the rectangles top.
- The union of these shifted rectangles is used to determine the ascent and height of the whole drop cap text. The ascent is the distance from the unions top to the common baseline, the descent id the distance from the common baseline to the bottom of the union.
- A new and better scaling factor can be achieved, by dividing the wished height by the height of the union. We continue with step 2 until we get a factor which makes the unions height become quite close to the wished height.
- The final scaling factor is applied to the fonts of all drop portion parts and the descent of the final union is stored in the SwDropPortion as nY.
Returning from SwTxtFormatter::CalcFontSize, we can now calculate the widths of the drop portion parts. For each part, the font for the part is set at the SwTxtFormatInfoobject, and the text represented by this part is formatted with respect to its font. The sum of the widths is the width of the drop portion.
We continue with the formatting process, until all lines the drop cap should cover are formatted. Then we compare the final size of the lines with the height of
the drop cap portion. If it differs, we have to format the drop cap and the lines once more, the new wished height for the drop cap portion is now the size of the lines. This can lead to endless loops, therefore we only allow the drop portion either to grow of shrink continuously.
Spellchecking
There are two modes for spell checking: Online spell checking and interactive spell checking. The online mode shows red wave lines under wrong words while typing the text. The interactive mode can be accessed by pressing F7.
Online (Auto) Spell Checking
During online mode, a so-called wrong list is built and updated each time the user modifies the text. The wrong list continues words which has been identified as wrong words. The invalidate range for the wrong list is set in the SwTxtFrm::Modify function. This range has to be checked again (SwTxtFrm::_AutoSpell). Each word inside the range is spelled again, wrong words remain in the wrong list, new words are added and others are removed from the wrong list. Also the words for auto completion are collected. Finally a rectangle is returned, which indicates the area of change, in order to have the red wave lines be repainted correctly.
Interactive Spell Checking
The interactive spell checking (SwTxtNode::Spell) works on a given range in the text. You can specify the range by selecting some text, otherwise the whole text is assumed to be checked. The interactive spell checking comes up with a dialog if a wrong word has been found. If we trigger an interactive spell checking while the online mode is enabled, the interactive spell checking only considers the words listed in the wrong list. Otherwise it checks all words in the given range.
Vertical Formatting
Some languages require a vertical formatting of text, e.g., Chinese or Japanese. For them a vertical text formatting and layout has to be integrated. The basic idea for the text formatting is to swap frames, i.e., the width and height of a frame are swapped. The three main tasks of the text formatting (formatting paragraphs, painting of text and cursor travelling) are performed on swapped frames and afterwards the results are translated back.
The advantage is that most of the code inside the text formatting does not have to be adapted to vertical formatting. There are many functions at the SwTxtFrm class to do the conversion from horizontal to vertical formatting. As an example consider the calculation of the cursor position. First the frame which currently contains the cursor is determined. Before calculating the correct screen coordines of the cursor inside the text frame, the frame is swapped. Now we assume that we deal with a common horizontally formatted text frame. The usual functions for calculating the cursor position are called and finally the swapped frame and the resulting rectangle for the cursor position are rotated back.
Asian Grid Mode
In some Asian countries, people use a grid layout for writing. Usually a page has 10 rows and 20 columns.Above (or optional below) the main cells there is a line reserved for ruby characters. An Asian character should snap to the grid, i.e., it is centered inside a cell. An exception are punctuation characters, which are aligned to the right or left inside a cell. The main funtionality is provided inside the txtnode/fntcache.cxx file, especially in the GetTxtSize(), GetCrsrOfst() and DrawText() functions. TheGetTxtSize function returns only multiples of a cell width as the widths of some Asian text. The DrawText() function centeres the characters inside their cells, by considering their real width and height.
Western text should be centered inside as many cells as are needed for them. This is achieved by inserting SwKernPortions between Asian and Western portions (SwTxtFormatter::BuildPortions). First a new SwKernPortions with width = 0 is inserted and a pointer to it is stored before formatting a western text segment. When the end of the western text segment is reached, another SwKernPortion is appended at the end of the western text. The number of required grid cells for the western text is determined and the remaining space inside these cells is distributed to the SwKernPortions in front of and behind the western text. Attention: Due to an underflow event the first SwKernPortion could have been deleted.
Alphabetic Index Sorting
Indices for an alphabetic index can be inserted via the Insert – Indexes and Tables – Entry dialog. Some languages require additional information, how the indices should be sorted. Asian languages for example use addition phonetic strings to be sorted. Each entry is represented by an SwTOXIndex object. Illustration 14 shows the class model for the main components of an alphabetic index. The SwTOXInternational class has an Compare function, which compares two SwTOXIndex objects considering language specific rules.
Files in the Star Writer Project
inc/atrhndl.hxx
The attribute handler, which maintains the collection of attribute stacks used to change font objects is specified here (see section 5.2).
- class SwAttrHandler
inc/drawfont.hxx
The script type data structure (see section 6.1) is defined here.
- class SwScriptInfo
- class SwDrawTextInfo
text/inftxt.hxx
This file contains specifications for classes encapsulating information required during the formatting/painting/cursor positioning processes (see chapter 8).
- class SwLineInfo
- class SwTxtInfo
- class SwTxtSizeInfo : public SwTxtInfo
- class SwTxtPaintInfo : public SwTxtSizeInfo
- class SwTxtFormatInfo : public SwTxtPaintInfo
text/itratr.hxx
The base class of all iterators (see chapter 7):
- class SwAttrIter
text/itrform2.hxx
Iterator class for text formatting purposes (see chapter 7 and section 9.1).
- class SwTxtFormatter : public SwTxtPainter
text/itrpaint.hxx
Iterator class controlling the painting process (see chapter 7 and section 9.2).
- class SwTxtPainter : public SwTxtCursor
text/itrtxt.hxx
Some other iterator classes for more special operations.
- class SwTxtIter : public SwAttrIter
- class SwTxtMargin : public SwTxtIter
- class SwTxtAdjuster : public SwTxtMargin
- class SwTxtCursor : public SwTxtAdjuster
text/pordrop.hxx
Special portion used for initials.
- class SwDropPortion : public SwTxtPortion
text/porexp.hxx
Expanding portions for fields, blanks, and notes.
- class SwExpandPortion : public SwTxtPortion
- class SwBlankPortion : public SwExpandPortion
- class SwPostItsPortion : public SwExpandPortion
text/porfld.hxx
Different kinds of field portions are defined here.
- class SwFldPortion : public SwExpandPortion
- class SwHiddenPortion : public SwFldPortion
- class SwNumberPortion : public SwFldPortion
- class SwBulletPortion : public SwNumberPortion
- class SwGrfNumPortion : public SwNumberPortion
- class SwCombinedPortion : public SwFldPortion
text/porfly.hxx
Portions used for frames.
- class SwFlyPortion : public SwFixPortion
- class SwFlyCntPortion : public SwLinePortion
text/porfnt.hxx
Footnote portions and portions required for widows/orphans handling are defined here.
- class SwFtnPortion : public SwExpandPortion
- class SwFtnNumPortion : public SwNumberPortion
- class SwQuoVadisPortion : public SwFldPortion
- class SwErgoSumPortion : public SwFldPortion
text/porhyph.hxx
Portions introduced during hyphenation are defined in the file.
- class SwHyphPortion : public SwExpandPortion
- class SwHyphStrPortion : public SwHyphPortion
- class SwSoftHyphPortion : public SwHyphPortion
- class SwSoftHyphStrPortion : public SwHyphStrPortion
text/porlay.hxx
General text structuring portions, see also chapter 4.
- class SwLineLayout : public SwTxtPortion
- class SwParaPortion : public SwLineLayout
text/porlin.hxx
Abstract base class for all portions.
- class SwLinePortion: public SwPosSize
text/pormulti.hxx
Portions used for multi line style.
- class SwMultiPortion : public SwLinePortion
- class SwDoubleLinePortion : public SwMultiPortion
- class SwRubyPortion : public SwMultiPortion
- class SwRotatedPortion : public SwMultiPortion
text/porref.hxx
References are represented by this portions.
- class SwRefPortion : public SwTxtPortion
- class SwIsoRefPortion : public SwRefPortion
text/portab.hxx
Portions used for different types of tabulators.
- class SwTabPortion : public SwFixPortion
- class SwTabLeftPortion : public SwTabPortion
- class SwTabRightPortion : public SwTabPortion
- class SwTabCenterPortion : public SwTabPortion
- class SwTabDecimalPortion : public SwTabPortion
text/portox.hxx
These portions are used for tables of contents.
- class SwToxPortion : public SwTxtPortion
- class SwIsoToxPortion : public SwToxPortion
text/portxt.hxx
Most frequently simple text portions are used.
- class SwTxtPortion : public SwLinePortion
- class SwHolePortion : public SwLinePortion
text/possize.hxx
Base class of all portions, stores width and height of an portion (in document coordinates).
- class SwPosSize
- Complex Text Layout: Term for languages whose writing system needs complex transformations in order to visualize the text stored in memory. Examples are bidirectional scripts like Arabic or Hebrew, languages using clustered characters like Thai or languages with characters, whose visual representation depends on their context (e.g., ligatures).
其实只看是不能深刻理解的,对一些概念如LinePortion,SwtxtFormatInfo还是不知道怎么用。
具体的简单实践可以查看sw/source/core/text/Itrform2.cxx 中mergecharacterborder()函数的实现逻辑,实现了将文字边框进行合并。其中有个问题如有中英文还有数字的一段文字,加上字符边框,边框会断开。可以尝试进行修复。
多调试几次便可以对概念有一些稍微深入的了解。
链接:https://wiki.openoffice.org/wiki/Writer/Text_Formatting