Uniscribe 内容详解

最新推荐文章于 2019-04-12 14:08:32 发布

GreenArrowMan

最新推荐文章于 2019-04-12 14:08:32 发布

阅读量3.9k

点赞数 2

分类专栏： Uniscribe 字体处理文章标签： uniscribe

字体处理同时被 2 个专栏收录

17 篇文章 1 订阅

订阅专栏

Uniscribe

2 篇文章 0 订阅

订阅专栏

Uniscribe

Uniscribe 是一组APIs用来精细真实控制复杂文本处理。因为字符、符号不是以一个简单的方式排版，所以一个复合文本需要特殊处理以显示和编辑。控制符号的形状和位置的规则被指定在The Unicode Standard:Worldwide Character Encoding ,Version 2.0, Version 2.0, Addison-Wesley Publishing Company.

这个主题讨论处理复杂文本不同的方面，在下面列出。

About Uniscribe

Uniscribe 是处理复杂文本几种方法之一。放它到设备环境中，我们从一个复杂文本的简短描述和特殊问题以及讨论其它处理复杂文本的标准方法开始。

About Complex Scripts

一个复杂文本至少有下列一个特征:

l 允许双向绘制

l 有上下主修整

l 有组合字符

l 有专门的字中断和对齐规则

l 筛选出非法字符组合

双向绘制引用文本的能力以处理从左到右和从右到左的读取文本.举个例子：阿拉伯数字的双向绘制，对文本的默认读取方向是从右到左，但一些数字，它是从左到右，处理一个复杂文本必须解决符号的逻辑顺序和可视顺序之间的不同。另外，必须适当的处理(caret)插入符号的移动和击中测试,在屏幕位置和字符序号之间映射，也就是说文本选择或字符显示需要布局算法知识.

当一个文本的字符依照围绕它的字符改变形状时”上下文修整”重现,这个重现在英语草写体中比如当一个小写的”l”改变形状时它要取决于它前面的字符例如一个”a”(连接低音到”l”)或一个”o”(连接高音),阿拉伯数字是一个显示上下文修整的文本。

组合字符或连体字当它们一块放置时联合成一个字符。一个例子”ae”，在英语中联结；它有时由一个单一字符表示。阿拉伯数字是一个有好多组合字符的文本.

专门的字中断和字对齐引用有复杂规则的在一个文本行上在行和对齐文本之间划分字的文本。

当一种语言不允许某些字符组合,筛选出违例字符组合重现，泰语就是这样的文本。

Built on Tuesday, May 09, 2000

Uniscribe

Uniscribe 能够非常精细的处理复杂文本,它支持在文本中复杂规则的查找，如阿拉伯数字、印度语、泰语，经也处理文本从到到左写，如阿拉伯数字和希伯来文，并且支持混合文本。

OpenType Font Format

Unicode-based Microsoft? OpenType? 字体格式扩展了TrueType 字体文件格式，OpenType字体允许在字符和glyphs(字形)之间映射，允许支持连体字，位置格式，替换和其它代替。OpenType字符也可以包含支持二维字形位置和字形附属的信息，并可以包含TrueTYpe或PostScript形状.

在OpenType字符内部的布局特征是由文本和语言组合的，允许一个单一字体支持多重书写系统，甚至在同一个文本之中，在应用程序中确保在文本布局操作中的一致性(连贯性、相容性)并避免不必要的系统开销。多数文本布局和语言语义算法被包含在Uniscribe中.它减轻了开发者在一个字体中不得不定义广义字体规则的负担。

应用程序关于文本布局可以引用它们自己的知识或参数选择.OpenType布局字体甚至可以包含复制或取代那些由操作系统服务的应用布局规则,操作系统服务的分层结构支持文本布局允许一个客户选择使用哪一个布局信息和如何应用它。

Using Uniscribe

下面的章节展示了处理Uniscribe的代码例子.

确定一个文本是否需要字形修整

下面的代码例子调用ScriptGetProperties以检测文本是否需要字形修整.

The following code sample calls ScriptGetProperties to check if the script requires glyph shaping.

const SCRIPT_PROPERTIES **g_ppScriptProperties;

 int g_iMaxScript;

 ScriptGetProperties(&g_ppScriptProperties,

                     &g_iMaxScript);

 hResult = ScriptItemize( … , pItems, &cItems);

 for (i=0; i<cItems; i++) {

     if (g_ppScriptProperties[pItems[i].a.eScript]

         >fComplex) {

         // Item [i] is complex script text

         // requiring glyph shaping

ScriptGetProperties

ScriptGetProperties函数返回当前文本的信息.

HRESULT WINAPI ScriptGetProperties(

 const SCRIPT_PROPERTIES ***ppSp,

  int *piNumScripts

);

Parameters

ppSp

[out]接收一个指向一个由文本编入索引的SCRIPT_PROPERTIES结构的指针数组。

piNumScripts

[out] 接收文本数量，这个值的有效范围是0到 NumScripts-1;

Return Values

如果函数成功，返回值是零。

如果函数失败，它返回一个非零埴，如果任何不可校正的错误出现，它也返回一个HRESULT值。举例，从Win32 API出错返回使用 HRESULT_FROM_WIN32宏被转换为HRESULT并返回在HRSULT中给用户。

转换鼠标击中”x”的偏移位置为插入记号位置

按照惯例，插入记号位置(cp)可以通过点击字符二分之一的后面一半或字符二分之一的前面一半。这个可以按下面实现：

int iCharPos;

int iCaretPos

int fTrailing;

ScriptXtoCP(iMouseX, ..., &iCharPos, &fTrailing);

iCaretPos = iCharPos + fTrailing;

对于文本, 揿钮接头

For scripts that snap the caret to cluster boundaries, ScriptXtoCP returns ftrailing set to either 0 or the width of the cluster in code points.

在双向字符串中显示插入记号

在单向文本中，在插入记号位置上没有二义性，因为字符前沿与先前字符的尾沿是在相同的位置上，但在双向文本中，插入记号位置在反向(反接方向) 的run之间是模糊的。举个例子:在LTR(从左到右)的段落中”helloMAALAS”,最后的字符直接先于”salaam”的第一个字符。在这个串中的最好位置显示插入记号取决于是否它被认为按照以”hello”的”o”或者以”salaam”前面的”s”.

Uniscribe使用下面的插入记号协议。

状态	可视的插入记号位置
键入	最后键入字符的后沿
粘贴	最后粘贴字符的后沿
Caret advancing	Trailing edge of last character passed over.
Caret retiring	Leading edge of last character passed over.
Home (键)	行的前沿。Leading edge of line.
End (键)	行的后沿。Trailing edge of line.

插入记号可以按照下面定位:

if (advancing) {

    ScriptCPtoX(iCharPos-1, TRUE, ..., &iCaretX);

} else {

    ScriptCPtoX(iCharPos, FALSE, ..., &iCaretX);

或者更简单，给定一个fAdvancing　BOOL限制为TRUE或者FALSE:

Or, more simply, given an fAdvancing BOOL restricted to TRUE or FALSE:

ScriptCPtoX(iCharPos-fAdvancing, fAdvancing, ..., &iCaretX);

ScriptCPtoX 逻辑上处理溢出:对于iCharPos<0它返回run的前沿,对于iCharPos=length它返回run的后沿.

handles out-of-range positions logically: It returns the leading edge of the run for iCharPos <0, and the trailing edge of the run for iCharPos =length.

Processing Complex Scripts with Uniscribe

Uniscribe provides APIs to support the display and editing of international text, including the complex rules of Middle Eastern and Asian scripts. Uniscribe provides low level routines for handling fully formatted text, and an easier ScriptString API set for unformatted text.

Using Uniscribe, applications need only manage a backing store of Unicode character codes. Text layout applications do not need to maintain any other buffer or mapping table to track character order. An application only needs to store and manage the order in which the characters were entered by the user, which is the same logical order as defined by Unicode. The application's backing store never changes as a result of layout operations. Uniscribe maintains an index from the reordered clusters to the original character boundaries passed by the application. The following topics are covered in this section.

· Shaping Engines

· Caching

· Displaying Text with Uniscribe

· The ScriptString Functions

· Related Processing for Complex Scripts

· Caret Placement and Hit Testing

· Word Break Points

· Character Clusters

· Relationship Between Caret Positions, Justification Points, and Clusters

· Notes on ScriptXtoCP and ScriptCPtoX

Shaping Engines

Uniscribe 使用对于特殊文本包含布局知识的多重修整引擎。它也利用 Microsoft? OpenType? 布局修整引擎处理特殊字符文本如字形生成，范围测量，和字中断支持。 uniscribe 使用 Unicode 双向算法管理双向字符重新排序，并且对于阿拉伯数字、希伯来文，和泰文理解 non-OpenType 布局字体格式。

精确的码点赋予每个修整引擎可以多样化，因此除了 SCRIPT_UNDEFINED 之外，文本数字没有被公布，但是，你能够通过调用 ScriptGetProperties 函数测试文本的属性，它访问全部文本属性表。应用程序可以使用全局文本属性以帮助组合它们自己所需的图形引擎划分的布局规则。

所有的复杂文本图形引擎、数字图形引擎和 ASCII 图形引擎在修整 ( 图形 ) 以前引擎验证 hdc 中的字体，并且如果字体不能包含足够的字形或修整 ( 图形 ) 表将返回 USP_E_SCRIPT_NOT_IN_FONT. 只有有属性 fComplex 的文本会被以由 ScriptItemize 函数返回的文本修整。所有其它的 runs 可能会被以指定在 SCRIPT_ANALYSIS 结构中的 SCRIPT_UNDEFINED 合并和修整。注意如果字符没有支持的字体， SCRITP_UNDEFINED 不会以 USP_E_SCRIPT_NOT_IN_FONT 失败 . 缺少的字形通常会以一个空的矩形显示。一个应用程序能够通过调用 ScriptGetFontProperties 函数获得默认字形序号而确定是否一个代码点由一个字体支持，并且 ScriptGetCMap 函数对 Unicode 代码点查找字体字形。但是，一些代码点能够被通过一个字形组合显示，举个例子， 00c 9; LATIN CAPITAL LETTER E WITH ACUTE. 在这个例子中，如果一个字体支持大写字母 E 字形并且敏锐字形但不支持一个单一字形 009c .ScriptGetCMap 将标记 009c 是未支持的。

对一个包含这些代码点的字符串可靠的确定字体支持调用 ScriptShape. 如果它返回 S_OK ，对缺少的字形检测输出。

Caching

Uniscribe 保存 Unicode 为字形映射 (CMAP) 、字形宽度、和 OpenType 文本图形表。一个用于特定尺寸的特定字体表的句柄被叫做一个 script cache( 文本缓存 ) 。许多 Uniscribe 函数要求两个参数一个 HDC 和一个 SCRIPT_CACHE 参数。这些函数通过 script cache 查找第一个信息，使用这个设备环境只有当所需的表不是已经缓存时。当调用 ScriptShape 、 ScriptPlace 或者 ScriptTextOut 函数时你必须提供一个 SCRIPT_CACHE 结构指针，它必须被初始化为 NULL..

一个应用程序可以在任何时候释放一个 script cache,Uniscribe 在它的字体和图形器缓存中维护引用计数，并只有当所有字体的尺寸被释放时释放字体数据。当你使用一种格式时，也就是说，某一套典型的包括字体、尺寸和颜色属性，调用 ScriptFreeecache 函数以释放用于文件的 script cache.

对于 ScriptShape 和 ScriptPlace ，传递一个 NULL 设备环境是有效的。大多数经常调用将是成功的作为所需的表将已经被缓存。如果图形或布局需要访问一个设备环境， ScriptShape 或 ScriptPlace 将直接返回 E_PENDING 错误代码。然后应用程序必须选择字体进入设备环境，这个除去大大多数对 SelectObject 函数的调用。

For ScriptShape and ScriptPlace, it is valid to pass a NULL device context. Most often the call will be successful as required tables will already be cached. If the shaping or placement requires access to a device context, ScriptShape or ScriptPlace will return immediately with the E_PENDING error code. Then the application must select the font into the device context. This eliminates most calls to the SelectObject function.

International Features

Displaying Text with Uniscribe

一个使用复杂文本的应用程序有一个简单的接近格式和显示的问题。

首先，复杂文本的宽度取决于它的上下文。保存宽度在简单表中是不可能的。

第二文本中在字之间中断象泰文需要字典支持因为在泰文中的字之间没有分隔字符。

第三，阿拉伯文、希伯来文、波斯语、乌尔都语和其它双向文本在显示前需要记录。

最后，字体关联的格式经常需要容易的使用复杂文本。

充分的处理这些版本， Uniscribe 使用段落作为显示单元。注意，这个意思是 Uniscribe 必须被用于整个段落。即使段落的章节不是复杂文本。

在使用 Uniscribe 之前，一个应用程序划分段落为 runs, 也就是说，一个有相同风格的字符串，风格取决于应用程序完成的实现，但典型的包括如字体的属性如尺寸和颜色。 Uniscribe 划分段落为 items— 有同一种文本和方向的字符串。应用程序应用 item 信息以产生 run s,rusn 在文本和方向中是唯一的 .

Uniscribe 在每个 run 中识别 cluster( 串、群集 ) 并确定每个 cluster( 串、群集 ) 的尺寸，一个 cluster 是一个文本定义，是一个不可分割的组。对于欧洲语言，一个 cluster 是一个单一字符，但在语言中例如泰语，它是一个字形组， Uniscribe 合计 cluster 以确定一个 run 的尺寸。然后应用程序合计 run 的长度直到它们溢出一行 ( 或到达边距 ) 。并在当前行和下一行之间划分溢出行的 run 。

对于每一行，一个映射从可视的位置被建立对一个 run 对于每一个 run ，代码点被图形化为字形，它然后被定位和绘制。

以这个溢出智能，我们能够查看详细处理和 Uniscribe 如何装配。一个应用程序做文本布局，或格式一次。然后它保存图形和位置以用于显示或者它每次产生它们它的显示文本。典型的一个应用程序每次显示时将产生字形和位置，因此处理被呈现为一个布局过程和一个显示过程。

使用Uniscribe布置文本

这个过程假定应用已经划分了段落为 runs

1. Call ScriptRecordDigitSubstitution only when the application starts, or when receiving a WM_SETTINGCHANGE message.

2. (optional) Call ScriptIsComplex to determine if the paragraph requires complex processing.

3. For automatic digit substitution, call ScriptApplyDigitSubstitution to prepare the SCRIPT_CONTROL and SCRIPT_STATE structures in ScriptItemize. If the application does its own reordering and layout, it must substitute the proper digits for Unicode U+0030 through U+0039 (the Western digits).

4. Call ScriptItemize to divide the paragraph into items. If an application already knows the bidirectional order -- for example, because of the keyboard layout used to enter the character -- it can call ScriptItemize with NULL for the SCRIPT_CONTROL and SCRIPT_STATE parameters. This generates items only by shaping engine. The application can then reorder the items using its information.

5. Merge the item information with the run information to produce runs with a single style, script, and direction.

6. Call ScriptGetCMap to assign a font to a run and get glyphs. If some glyphs are not supported by the font, either substitute another font or set the eScriptmember to SCRIPT_UNDEFINED. Note that if a font renders a code point by a combination of glyphs instead of a single glyph, this method may indicate that the code point is unsupported. In this case, call ScriptShape, check for an S_OK return code, and then check the output for missing glyphs.

7. Call ScriptShape to identify clusters and generate glyphs.

8. Call ScriptPlace to generate advance widths and x and y positions for the run width.

9. Sum the run widths until the line overflows.

10. Break the run on a word boundary by using the fSoftBreak and fWhiteSpace members in the logical attributes. To break a single character cluster off the run, use the information returned by calling ScriptBreak.

This completes layout of the line. Repeat steps 6 through 10 for each line in the paragraph. However, if the application needed to break the last run on the line, call ScriptShape to reshape the remaining part of the run as the first run on the next line.

To Display Text Using Uniscribe

This procedure is done for each line. It assumes that the text has already been laid out using Uniscribe, and that the glyphs and positions from the layout process were not saved. If speed is a concern, an application can save the glyphs and positions from the layout procedure and start at #2.

1. For each run, in logical order.

a. If the style has changed since the last run, update the hdc.

b. Call ScriptShape to generate glyphs for the run.

c. Call ScriptPlace to generate an advance width and an x,y offset for each glyph.

2. Establish the correct visual order for the runs in this line:

a. Extract an array of bidi embedding levels, one per run, from the merged item and run information. The embedding level is given by (SCRIPT_ITEM) si.(SCRIPT_ANALYSIS) a. (SCRIPT_STATE) s.uBidiLevel.

b. Pass this array to ScriptLayout to generate a map of visual to logical positions.

3. (optional) To justify the text, either call ScriptJustify or use specialized knowledge of the text. For more information, see Related Processing by Uniscribe.

4. Use the visual to logical map to display the runs in visual order. Starting at the left end of the line, call ScriptTextOut to display the run given by the first entry in the visual to logical map. For each subsequent entry in the visual to logical map, call ScriptTextOut to display the indicated run to the right of the previously displayed run.

Note that step 2 may be omitted if the text contains no characters from right-to-left scripts, contains no bidi control characters, and the base embedding level is left-to-right. In this case, step 4 becomes: start at the left end of the line and call ScriptTextOut to display the first logical run and then to display each logical run to the right of the previous run.

ScriptGetCMap

ScriptGetCMap函数接受一个字符串并依照TrueType cmap表或依照为老式字体而实现的标准的 cmap 表返回Unicode字符字形目录.

The ScriptGetCMap function takes a string and returns the glyph indices of the Unicode characters according to the TrueType cmap table or the standard cmap table implemented for old style fonts.

HRESULT WINAPI ScriptGetCMap(

 HDC hdc,

  SCRIPT_CACHE *psc,

  const WCHAR *pwcInChars,

  int cChars,

  DWORD dwFlags,

  WORD *pwOutGlyphs

);

Parameters

hdc

[in] 设备环境句柄,这个参数是可选的.

psc

[in/out] 指向一个 SCRIPT_CACHE 类型的结构体.

pwcInChars

[in] 指向一个Unicode字符的字符串.

cChars

[in] 在 pwcInChars中的Unicode字符的数量.

dwFlags

[in] 这个参数可能是下面的值This parameter can be the following value.

Value	Meaning
SGCM_RTL	指示glyph数组pwOutGlyps应当包含一个镜像的字形以为那些有一个镜像等价的字形。 Indicates the glyph array pwOutGlyps should contain mirrored glyphs for those glyphs that have a mirrored equivalent.

pwOutGlyphs

[out] 指向一个接收字形目录的数组.

Return Values

如果所有Unicode代码点在字体中是存在的,返回值是S_OK.

如果函数失败，它可以返回下列的非零值之一:

Return value	Meaning
E_HANDLE	字体或系统不支持字形目录 The font or the system does not support glyph indices.
S_FALSE	一些Unicode代码点被映射到默认字形. Some of the Unicode code points were mapped to the default glyph.

如果其它不可校正的错误遇到，它也返回一个HRESULT值，举个例子，

If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.

Remarks

ScriptGetCMap可以被用于确定所选择的字体支持一个run中的哪些字符。调用者可以扫描返回的字形缓冲区寻找默认字形以确定哪些字符不是可用的。用于所选的字体的默认字形目录应当由调用ScriptGetFontProperties确定.

返回值标志了任何缺失字形的出现。

The return value indicates the presence of any missing glyphs.

一些代码点可能由一个组合字形绘制，也可以由一个单一字形.举个例子,00C9; LATIN CAPITAL LETTER E WITH ACUTE,在这个例子中，如果字体支持大写E字形并且......,ScriptGetCMap将展示00C9是未支持的。确定字体支持包含这些种类代码点的字符串，调用ScriptShape,如果它返回S_OK,对于缺失的字形检测输出.

Note that some code points can be rendered by a combination of glyphs as well as by a single glyph -- for example, 00C9; LATIN CAPITAL LETTER E WITH ACUTE. In this case, if the font supports the capital E glyph and the acute glyph but not a single glyph for 00C9, ScriptGetCMap will show 00C9 is unsupported. To determine the font support for a string that contains these kinds of code points, call ScriptShape. If it returns S_OK, check the output for missing glyphs.

ScriptGetGlyphABCWidth

ScriptGetGlyphABCWidth函数返回一个给定字形的ABC宽度.

HRESULT WINAPI ScriptGetGlyphABCWidth(

 HDC hdc,

  SCRIPT_CACHE *psc,

 WORD wGlyph,

 ABC *pABC,

);

Parameters

hdc

[in] 设备环境句柄，它是可选的取决于psc.

psc

[in/out] SCRIPT_CACHE 结构指针.

wGlyph

[in] 被分析的Glyph .

pABC

[out] wGlyph的ABC 宽度.

Return Values

如果字形的ABC宽度被返回，函数返回S_OK.

如果字符或系统不支持字形目录，函数返回E_HANDLE,并且如果任何其它不可校正的错误出现，它也返回一个HRESULT值。

If the ABC width of the glyph is returned, the function returns S_OK.

If the font or system does not support glyph indices, the function returns E_HANDLE. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.

Remarks

ScriptGetGlyphABCWidth函数在绘制字形画格表是相当有用的，它不应当用于普通的复杂文本格式化。

The ScriptGetGlyphABCWidth function may be useful for drawing glyph charts. It should not be used for ordinary complex script text formatting.

ABC

ABC结构包含了在一个TrueType字体中的字符宽度.

The ABC structure contains the width of a character in a TrueType font.

typedef struct _ABC {

  int     abcA;

  UINT    abcB;

  int     abcC;

} ABC, *PABC;

Members

abcA

指定字符的A间距,A间距是在绘制字符字形之前增加(到)当前位置的距离.

Specifies the A spacing of the character. The A spacing is the distance to add to the current position before drawing the character glyph.

abcB

指定字符的B间距,B间距是字符字形绘制部分的宽度.

Specifies the B spacing of the character. The B spacing is the width of the drawn portion of the character glyph.

abcC

指定字符的C间距,C间距是增加当前位置以对字符字形右侧提供空白的距离。

Specifies the C spacing of the character. The C spacing is the distance to add to the current position to provide white space to the right of the character glyph.

Remarks

一个字符总的宽度是A、B、C的和，A或者C间距可以是负值以标志

The total width of a character is the summation of the A, B, and C spaces. Either the A or the C space can be negative to indicate underhangs or overhangs.

SCRIPT_CACHE

SCRIPT_CACHE是一个不透明指针，指向一个Uniscribe字形度量调整存储器结构。

SCRIPT_CACHE is an opaque pointer to a Uniscribe font metric cache structure.

typedef void *SCRIPT_CACHE;

Remarks

用户必须为每种使用的字体分配和保持一个SCRIPT_CACHE变量,它必须被客户初始化为NULL.

许多script函数带一个HDC与SCRIPT_CACHE的组合，Uniscribe将首先试图使用SCRIPT_CACHE访问字体数据并且如果所需的数据没有被缓存将只是检查HDC。

Many script functions take a combination of HDC and SCRIPT_CACHE. Uniscribe will first attempt to access font data by using the SCRIPT_CACHE and will only inspect the HDC if the required data is not already cached.

HDC可以作为一个NULL传递。如果Uniscribe所需的数据已经被缓存，HDC将不被访问，并且操作正常继续。

The HDC may be passed as NULL. If data required by Uniscribe is already cached, the HDC won't be accessed, and the operation continues normally.

如果HDC作为一个NULL传递，并且因为任何原因Uniscribe需要访问它，Uniscribe将返回E_PENDING.

If the HDC is passed as NULL, and Uniscribe needs to access it for any reason, Uniscribe will return E_PENDING.

E_PENDING被快速返回，允许用户避免耗时的SelectObject调用。下面的例子适用于所有带一个SCRIPT_CACHE并且HDC是一个可选的参数的函数

E_PENDING is returned quickly, allowing the client to avoid time-consuming SelectObject calls. The following example applies to all functions that take a SCRIPT_CACHE and an optional HDC.

hr = ScriptShape(NULL, &sc, ..);

if (hr == E_PENDING) {

    ... select font into hdc ...

    hr = ScriptShape(hdc, &sc, ...);

ScriptStringAnalyse

ScriptStringAnalyse函数分析一个纯文本字符串

HRESULT WINAPI ScriptStringAnalyse(

 HDC hdc,

 const void *pString,

 int cString,

 int cGlyphs,

 int iCharset,

 DWORD dwFlags,

 int iReqWidth,

 SCRIPT_CONTROL *psControl,

 SCRIPT_STATE *psState,

 const int *piDx,

 SCRIPT_TABDEF *pTabdef,

 const BYTE *pbInClass,

 SCRIPT_STRING_ANALYSIS *pssa

);

Parameters

hdc [in] 设备环境句柄.如果 dwFlags参数是SSA_GLYPH , hdc是需要的,如果 dwFlasg参数是SSA_BREAK , hdc是可选的.

如果 hdc存在，在 hdc中的当前字体被检查，如果当前字符是一个符号字体，字符串被对待作为一个单一中性的SCRIPT_UNDEFINED item选项。

If the hdc is present, the current font in the hdc is inspected. If the current font is a symbolic font, the character string is treated as a single neutral SCRIPT_UNDEFINED item.

pString

[in] 指向一个被分析的字符串指针，它至少要有一个字符..

cString

[in] 字符串长度，它必须至少是1。.

cGlyphs

[in] 字形缓冲尺寸,这是必需的，默认的尺寸是 cString * 3/2 +1.

iCharset

[in] 字符集描述符,如果这是一个ANSI字符串,这个是字符集,如果这是一个Unicode字符串,这个值是–1.

dwFlags

[in] 分析所需的标志,这个参数可以是下列值之一。

值	意思
SSA_BREAK	返回中断标志,就是说,字符和字停止 Returns break flags, that is, character and word stops.
SSA_CLIP	夹住字符在 iReqWidth宽度中 Clips the string at iReqWidth.
SSA_DZWG	为控制字符提供表示字形 Provides representation glyphs for control characters.
SSA_FALLBACK	使用fallback字体 Uses fallback fonts.
SSA_FIT	调整字符串到 iReqWidth Justifies string to iReqWidth.
SSA_GLYPHS	产生字形、位置和属性 Generates glyphs, positions, and attributes.
SSA_GCP	返回缺失字形和pwLogClust以GetCharacterPlacement协议. Returns missing glyphs and pwLogClust with GetCharacterPlacement conventions.
SSA_HIDEHOTKEY	从显示字符串移走第一个’&’. Removes the first '&' from displayed string.
SSA_HOTKEY	在后续的代码点上以下划线替换’&’ Replaces '&' with underline on subsequent codepoint.
SSA_HOTKEYONLY	Displays underline only. 只显示下划线
SSA_LINK	Applies East Asian font linking and association to noncomplex text 应用东亚字体链接并联合非复杂文本
SSA_METAFILE	以ExtTextOutW写条目，而不是使用glyphs. Writes items with ExtTextOutW calls, not with glyphs.
SSA_PASSWORD	输入字符串包含一个被复杂iLength次的单一字符 Inputs string contains a single character to be duplicated iLength times.
SSA_RTL	Uses base embedding level 1. 使用基本嵌入级别1
SSA_TAB	Expands tabs.扩展tabs

iReqWidth

[in]所需宽度以适合或夹住. Required width for fitting or clipping.

psControl

[in] 一个SCRIPT_CONTROL结构指针.Pointer to a SCRIPT_CONTROL structure.

psState

[in] Pointer to a SCRIPT_STATE structure. The uBidiLevel member of SCRIPT_STATE is ignored. The value used is derived from the SSA_RTL flag in combination with the layout of the hdc.

一个SCRIPT_STATE结构指针,SCRIPT_STATE的uBidiLevel成员被忽略,这个值源于在以hdc的布局组合中的SSA_RTL标志。

piDx

[in] 指向一个所需的逻辑dx数组。Pointer to the requested logical dx array.

pTabdef

[in] 指向一个SCRIPT_TABDEF结构体的指针Pointer to a SCRIPT_TABDEF structure.

pbInClass

[in] Pointer to a BYTE that indicates GetCharacterPlacement character classifications.

指向一个BYTE的指针，以标志GetCharacterPlacement字符分类

pssa

[out] Pointer to a SCRIPT_STRING_ANALYSIS structure.

指向一个SCRIPT_STRING_ANALYSIS结构体.

Return Values

如果函数成员，返回S_OK。

如果有一个无效的参数，它返回E_INVALIDARG.

如果SA_FALLBACK没被指定，或者如果缺少一个标准的fallback字体，它返回 USP_E_SCRIPT_NOT_IN_FONT.它也将返回任何Win32错误(使用HRESULT_FROM_WIN32宏转换一个 HRESULT)，例如那些因缺少内存或GDI使用hdc调用.

If SSA_FALLBACK was not specified, or if a standard fallback font is missing, it returns USP_E_SCRIPT_NOT_IN_FONT. It will also return any Win32 error (converted to an HRESULT by the HRESULT_FROM_WIN32 macro), such as those from lack of memory or GDI calls using the hdc.

如果任何其它不可校正的错误出现，它也返回一个HRESULT值，举个例子，错误从Win32 API函数返回的错误使用HRESULT_FROM_WIN32宏转换并在HRESULT中返回给客户。

Remarks

The ScriptStringAnalyse function is the first step in handling plain text strings. Plain text is a string that has only one font, one style, one size, one color, and so forth. ScriptStringAnalyse allocates temporary buffers for item analyzes, glyphs, advance widths, and so forth. Then it automatically runs ScriptItemize, ScriptShape, ScriptPlace, and ScriptBreak. The results are then available through all the other ScriptString functions.

ScriptStringAnalyse函数是处理纯文本字符串中的第一步。纯文本是一个字符串，在这个字符串中的字符有唯一一种字体、一种风格、一种尺寸、一种颜色等等。ScriptStringAnalyse为条目分析、字形、向前宽度等等分配临时缓冲区，然后它自动的运行 ScriptItemsize、ScriptShape、ScriptPlace和ScriptBreak。之后的结果可用于所有其它的 ScriptString函数.

Although the functionality of ScriptStringAnalyse can be implemented by direct calls to other functions, ScriptStringAnalyse drastically reduces the amount of code required in the application for plain text handling.

虽然ScriptStringAnalyse的性能可能通过直接调用其它函数实现，ScriptStringAnalyse彻底的减小在应用程序中纯文本处理所需的代码数量。

Note that the uBidiLevel member in the initial SCRIPT_STATE value is ignored—the uBidiLevel that is used is derived from the SSA_RTL flag in combination with the layout of the hdc.

注意:在初始的SCRIPT_STATE值中的uBidiLevel成员被忽略—uBidiLevel它源自于与hdc的布局相结合的的SSA_RTL标志。

  Windows NT/2000: Requires Windows 2000.
  Header: Declared in Usp10.h.
  Library: Use Usp10.lib.

See Also

Uniscribe Overview, Uniscribe Structures, ExtTextOut, GetCharacterPlacement, GetTextExtentExPoint, ScriptGetProperties, ScriptItemize, ScriptPlace, ScriptShape, ScriptTextOut, SCRIPT_STATE

SCRIPT_STATE

The SCRIPT_STATE structure is used both to initialize the Unicode algorithm state as an input parameter to ScriptItemize, and is also a component of the analysis returned by ScriptItemize.

SCRIPT_STATE结构被用于作为ScriptItemize函数的一个输入参数以初始化Unicode算法状态。也用于通过ScriptItemize函数返回的分析结果(SCRIPT_ANALYSIS)的一个组成部分。

typedef struct tag_SCRIPT_STATE {

  WORD uBidiLevel :5;

  WORD fOverrideDirection :1;

  WORD fInhibitSymSwap :1;

  WORD fCharShape :1;

  WORD fDigitSubstitute :1;

  WORD fInhibitLigate :1;

  WORD fDisplayZWG :1;

  WORD fArabicNumContext :1;

  WORD fGcpClusters :1;

  WORD fReserved :1;

  WORD fEngineReserved :2;

} SCRIPT_STATE;

Members

uBidiLevel

The embedding level associated with all characters in this run according to the Unicode bidirectional algorithm. When passed to ScriptItemize, should be initialized to zero for an LTR base embedding level, or 1 for RTL.

嵌入级别以Unicode双向算法关联这个run中的所有字符。当我们传递给ScriptItemize时，应当初始化它为零以用于一个LTR基本嵌入级别，或初始化它为1以用于RTL。

fOverrideDirection

TRUE if this level is an override level (LRO/RLO). In an override level, characters are laid out in one direction only, either left-to-right or right-to-left. No reordering of digits or strong characters of opposing direction takes place. Note that this initial value is reset by LRE, RLE, LRO or RLO codes in the string.

fInhibitSymSwap

TRUE if the shaping engine is to bypass mirroring of Unicode mirrored glyphs such as brackets. Set by Unicode character ISS, cleared by ASS.

fCharShape

TRUE if character codes in the Arabic Presentation Forms areas of Unicode should be shaped. (Not implemented).

fDigitSubstitute

TRUE if character codes U+0030 through U+0039 (European digits) are to be substituted by national digits. Set by Unicode NADS, cleared by NODS.

fInhibitLigate

TRUE if ligatures are not to be used in the shaping of Arabic or Hebrew characters.

fDisplayZWG

TRUE if control characters are to be shaped as representational glyphs. (Typically, control characters are shaped to the blank glyph and given a width of zero).

fArabicNumContext

TRUE indicates prior strong characters were Arabic for the purposes of rule P0 as discussed in The Unicode Standard, version 2.0. This should normally be set to TRUE before itemizing an RTL paragraph in an Arabic language, and FALSE otherwise.

fGcpClusters

For GetCharacterPlacement legacy support only. Initialize to TRUE to request ScriptShape to generate the pwLogClust array the same way as GetCharacterPlacement does in Arabic and Hebrew Windows 95. Affects only Arabic and Hebrew items.

fReserved

Reserved. Always initialize to zero.

fEngineReserved

Reserved. Always initialize to zero.

SCRIPT_ITEM

The SCRIPT_ITEM structure includes a SCRIPT_ANALYSISwith the string offset of the first character of the item.

SCRIPT_ITEM结构包含一个SCRIPT_ANALYSIS

typedef struct tag_SCRIPT_ITEM {

  int iCharPos;

  SCRIPT_ANALYSIS a;

} SCRIPT_ITEM;

Members

iCharPos

Specifies the offset from the beginning of the itemized string to the first character of this item, counted in Unicode code points, that is, in words.

指定从要分条列举的字符串开始到这个条目的首字符的偏移量

Specifies a SCRIPT_ANALYSIS structure containing analysis specific to this item.

指定一个SCRIPT_ANALYSIS结构包含对这个条目的详细分析。

SCRIPT_VISATTR

The SCRIPT_VISATTR structure contains the visual (glyph) attribute buffer generated by ScriptShape that identifies clusters and justification points.

SCRIPT_VISATTR结构包含了由ScriptShape生成的用于标识簇(cluster)和对齐点的可视(字形)属性缓冲

typedef struct tag_SCRIPT_VISATTR {

  WORD uJustification :4;

  WORD fClusterStart :1;

  WORD fDiacritic :1;

  WORD fZeroWidth :1;

  WORD fReserved :1;

  WORD fShapeReserved :8;

} SCRIPT_VISATTR;

Members

uJustification

Justification class for this glyph. See SCRIPT_JUSTIFY.

用于字形的对齐分类.

fClusterStart

Set for the logical first glyph in every cluster, even for clusters containing just one glyph.

设置在每个簇中的逻辑首字形,

fDiacritic

Set for glyphs that combine with base characters.

设置用于结合基本字符字形

fZeroWidth

Set by the shaping engine for some, but not all, zero-width characters, such as ZWJ and ZWNJ.

由图形引擎设置，但不全是，零-宽度字符，如ZWJ和ZWNJ.

fReserved

保留字必须是0

fShapeReserved

保留字由图形引擎使用。

GOFFSET

The GOFFSET structure contains the x and y offsets of the combining glyph.

GOFFSET结构包含了组合字形的x和y坐标

typedef struct tagGOFFSET

 LONG du;

 LONG dv;

} GOFFSET;

Members

x offset, in logical units, for the combining glyph.

x偏移,逻辑单位，用于组合字形

y offset, in logical units, for the combining glyph.

y偏移，逻辑单位,用于组合字形

ScriptShape

The ScriptShape function takes a Unicode run and generates glyphs and visual attributes.

ScriptShape函数用一个Unicode run并产生字形和可视属性

HRESULT WINAPI ScriptShape(

 HDC hdc,

  SCRIPT_CACHE *psc,

  const WCHAR *pwcChars,

  int cChars,

  int cMaxGlyphs,

  SCRIPT_ANALYSIS *psa,

  WORD *pwOutGlyphs,

  WORD *pwLogClust,

  SCRIPT_VISATTR *psva,

  int *pcGlyphs

);

Parameters

hdc

[in] 设备环境句柄.

psc

[in/out] 一个 SCRIPT_CACHE结构指针

pwcChars

[in] 一个包含run的Unicede字符串指针.

cChars

[in] 在Unicode run中的字符数量.

cMaxGlyphs

[in] Maximum number of glyphs to generate. 生成字形的最大数.

psa

[in/out] Pointer to the SCRIPT_ANALYSISstructure for the run, containing the results from an earlier call to ScriptItemize.

一个关于run的SCRIPT_ANALYSIS结构体指针，包含早期调用ScriptItemize而获得的结果.

pwOutGlyphs

[out] 接收字形的数组指针.

pwLogClust

[out] Pointer to an array that receives the logical cluster information. Each element in * pwLogClust corresponds to a character in *pwcChars; the value of each element is the offset from the first glyph in the run to the first glyph in the cluster containing the corresponding character. For an example, see Remarks. Note, when * psa.fRTL is TRUE, the elements in *pwLogClust decrease when reading the array.

一个接收逻辑簇信息的数组指针。在*pwLogClust中的每个元素对应*pwcChars中的一个字符。每个元素的值是从在run中首个字形到包含相应字符的簇中第一个字形的偏移量。当* psa.fRTL为TRUE时，当读取数组时在*pwLogClust中的元素递减。

psva

[out] Pointer to an array of SCRIPT_VISATTRstructures that receives visual attributes information. There is one visual attribute per glyph, so * psva has cMaxGlyphs elements.

一个SCRIPT_VISATTR结构体数组的指针，用以接收可视属性信息。每个字形有一个可视属性，因此* psva有 cMaxGlyphs个元素.

pcGlyphs

[out] Pointer to an integer that receives a count of the number of glyphs written to pwOutGlyphs.

一个整数指针，用以接收一个写入 pwOutGlyphs中的字形编号(数量)计数

Return Values

如果函数成功，返回值是非零的.

If the function fails, it returns a nonzero value. The function returns E_OUTOFMEMORY if the output buffer length, cMaxGlyphs, is insufficient. Note that in all error cases the content of all output parameters are undefined.

如果函数失败，它返回一个非零值.如果输出缓冲区长度 cMaxGlyphs不够用函数返回E_OUTOFMEMORY，注意在所有的出错情况下输出参数是未定义的。

如果其它不可校正的错误被遇到，它也返回HRESULT值。举个例子，从Win32 API函数返回的错误使用HRESULT_FROM_WIN32宏被转换为HRESULT，并在HRESULT中返回给用户.

Remarks

If ScriptShape returns E_OUTOFMEMORY, pcGlyphs is undefined; you may need to call ScriptShape repeatedly until a large enough buffer is found. The number of glyphs generated by a code point varies according to the script and the font. For a simple script, a Unicode code point may generate a single glyph. However, a complex script font might construct characters from components, and thus generate several times as many glyphs as characters. Also, there are special cases like invalid character representations, where extra glyphs are added to represent the invalid sequence. Therefore, a reasonable guess for the buffer pointed to by pwOutGlyphs might be 1.5 times the length of the character buffer, plus an additional 16 glyphs for rare cases like invalid sequence representation.

如果ScriptShape返回E_OUTOFMEMORY，pcGlyphs是未定义的;你可能需要不断(重复)的调用 ScriptShape直到一个足够大的缓冲区被发现.由一个编码点生成的字形数量按照文字和字体改变。对于一个简单的文字，一个Unicode编码点可能产生一个单一字形。然而，一个复杂文字字体可能用组件构成字符。并且因此产生几倍于字符的字形。同样特殊的例子像无效字符表示，在这儿附加字符被增加以表示无效序列.因此，一个合理的对于由pwOutGlyphs指向的缓冲区的推测是长度为字符缓冲区长度的1.5倍，加上一个附加的用于稀少情况(如无效的序列表示)的16个字形.

If ScriptShape returns E_OUTOFMEMORY it will be necessary to call ScriptShape again, and possibly more than once, until a large enough buffer is found.

如果ScriptShape返回E_OUTOFMEMORY它将必需再次调用ScriptShape,并且可能多于一次，直到一个足够大的缓冲区被发现。

Clusters are sequenced uniformly within the run, as are glyphs within the cluster. The fRTL item flag (from ScriptItemize) identifies whether sequencing is left-to-right, or right-to-left.

在run内部Clusters被均匀的排序, 然后作为custer内部的字形。fRTL条目标志(从ScriptItemize获得的)标识是否排序是从左到右，或从右到左。

ScriptShape may set the fNoGlyphIndex flag in psa if the font or operating system cannot support glyph indices.

如果字体或操作系统不支持字形目录ScriptShape可能设置在 psa中的fNoGlyphIndex标志。

To determine if a font supports the characters in a given string, call ScriptShape. If it returns S_OK, check the output for missing glyphs.If fLogicalOrder is requested in psa, glyphs will be always be generated in the same order as the original Unicode characters. If fLogicalOrder is not set, RTL items are generated in reverse order, so ScriptTextOut does not need to reverse them before calling ExtTextOut.

调用ScriptShape确定是否一个字体支持在给定字符串中的字符,如果它返回S_OK,为缺失字形检查输出，如果在 psa中需要fLogicalOrder，字形将始终被以最初Unicode字符相同的顺序生成。如果fLogicalOrder没有被设置。RTL条目被以相反顺序生成，因此ScriptTextOut在调用ExtTextOut之前不需要反序它们。

If SCRIPT_ANALYSIS.eScript is set to SCRIPT_UNDEFINED, shaping is disabled and ScriptShape displays whatever glyph is in the font cmap table; if none, it displays the missing glyph.

如果SCRIPT_ANALYSIS.eScript被设置为SCRIPT_UNDEFINED,图形化被禁止并且字形在字体cmap表中是什么ScriptShape就显示什么，如果cmap表中没有,它显示缺失字形(可能是方框)

The following example shows how ScriptShape would generate the logical cluster array ( *pwLogClust) from a character array (* pwcChars) and a glyph array (* pwOutGlyphs). The run has four clusters:

1st cluster: one character represented by one glyph
2nd cluster: one character represented by three glyphs
3rd cluster: three characters represented by one glyph
4th cluster: two characters represented by three glyphs

下面的例子展示了ScriptShape如何用一个字符数组(* pwcChars)生成逻辑cluster数组(* pwLogClust)和一个字形数组(* pwOutGlyphs),有四个clusters返回:

1. 第一个cluster：由一个字形代表的一个字符。

2. 第二个cluster：由三个字形代表的一个字符。

3. 第三个cluster：由一个字形代表的三个字符。

4. 第四个cluster：由三个字形代表的二个字符。

Character array (where c<n>u<m> means cluster n, Unicode codepoint m):

| c1u1 | c2u1 | c3u1 c3u2 c3u3 | c4u1 c4u2 |

字符数组(在这儿 c<n>u<m> 的意思是 cluster n , Unicode 编码点 m):

Glyph array (where c<n>g<m> means cluster n, glyph m):

| c1g1 | c2g1 c2g2 c2g3 | c3g1 | c4g1 c4g2 c4g3 |

Cluster array (the offset, in glyphs, to the cluster containing the character):

| 0 | 1 | 4 4 4 | 5 5 |

SCRIPT_VISATTR

The SCRIPT_VISATTR structure contains the visual (glyph) attribute buffer generated by ScriptShape that identifies clusters and justification points.

SCRIPT_VISATTR结构包含了由ScriptShape生成的可视(字形)属性缓冲区，用以标识clusters和对齐点。

typedef struct tag_SCRIPT_VISATTR

  WORD uJustification :4;

  WORD fClusterStart :1;

  WORD fDiacritic :1;

  WORD fZeroWidth :1;

  WORD fReserved :1;

  WORD fShapeReserved :8;

} SCRIPT_VISATTR;

Members

uJustification

Justification class for this glyph. See SCRIPT_JUSTIFY.

对于这个字形的调整类型。

fClusterStart

Set for the logical first glyph in every cluster, even for clusters containing just one glyph.

设置在每个cluster中的逻辑首字形,即使clusters只包含一个字形。

fDiacritic

Set for glyphs that combine with base characters.

设置与基本字符联合的字形。

fZeroWidth

Set by the shaping engine for some, but not all, zero-width characters, such as ZWJ and ZWNJ.

由图形引擎设置用于一些，但不是所有，zero-width(宽度为零)，如ZWJ和ZWNJ。

fReserved

Reserved. Always initialize to zero.

保留始终初始化为0。

fShapeReserved

Reserved for use by the shaping engines.

由图形化引擎为用户保留。

Design and Implementation of a Win32 Text Editor

Part 14 - Drawing styled text with Uniscribe

Download the UspLib Demo (40Kb)

Introduction

The last tutorial saw the completion of the UspAnalyze function, one of the main APIs of the new UspLib text-rendering engine. We will now switch our attention to the implementation of the UspTextOut function. Our goal is to divide up the glyph-lists we ceated in the last tutorial, and apply colour information prior to display with ScriptTextOut . The method we will use to identify which colour belongs to each glyph is the central theme of this tutorial.

Now, there is alot of very specific information in this tutorial related to Unscribe and you are only going to find it interesting if you have also been trying to understand how to draw styled text. So feel free to skip to the next tutorial if you want to see UspLib in action.

The image above shows another small utility I wrote whilst working with Uniscribe. The purpose of this app is to demonstrate (and test) the UspLib library. You can download the demo, and also the UspLib sourcecode, at the top of this article.

6. Drawing styled text

At this point we could quite simply call ScriptTextOut with a whole run of glyphs and be done with it. It would display correctly and we would have succeeded in our goal to display Unicode text. However the text would only be drawn in a single font and colour, and it would have been much simpler to use the ScriptString API instead! Remember, the entire reason we are looking at Uniscribe is because we need to apply font and colour information in a very fine-grained manner.

Back in part#10 of this series I proposed a new method for rendering text in Neatpad, using three separate passes. I have implemented this rendering scheme with the UspTextOut function:

void UspTextOut(USPDATA * uspData,

                 HDC         hdc, 

                 int         xpos, 

                 int         ypos, 

                 RECT     * bounds

    )

    //

    // 1. Draw all background colours, including selection-highlights;

    //     selected areas are added to the HDC clipping region which prevents

    //     step#2 (below) from drawing over them

    //

    PaintBackground(uspData, hdc, xpos, ypos, bounds);

    //

    // 2. Draw the text normally. Selected areas are left untouched

    //     because of the clipping-region created in step#1

    //

    SetBkMode(hdc, TRANSPARENT);

    PaintForeground(uspData, hdc, xpos, ypos, bounds, FALSE);

    //

    // 3. Redraw the text using a single text-selection-colour (i.e. white)

    //     in the same position, directly over the top of the text drawn in step#2

    //     Before we do this, the HDC clipping-region is inverted, 

    //     so only selection areas are modified this time 

    //

    PaintForegound(uspData, hdc, xpos, ypos, bounds, TRUE);

UspTextOut is quite similar to ScriptStringOut , in that it requires a string to be analyzed prior to display. It takes as input the USPDATA object that contains the information generated by UspAnalyze . Although there are three passes involved, there are only two functions that need to be implemented - DrawBackground , and DrawForeground which will be used to draw both regular (styled) as well as 'selected' text. We will take a look at the implementation of these functions a little further down.

Characters vs Glyphs vs Clusters

The major problem with Uniscribe is understanding how to decipher the results of the ScriptShape and ScriptPlace calls. There is just so much information returned about each run of text that it takes a fair amount of time and effort to understand it all. Hopefully by the end of this tutorial you will have a little more insight into how all of the Uniscribe functions hang together.

使用 Uniscribe 主要的问题是如何解释 ScriptShape 和 ScriptPlace 调用的结果。正是返回如此多关于每个文本的 run 的信息 ……, 在这个指南的结尾你将对所有的 Uniscribe 函数是如何结合在一起的有一些较多的了解。

The key detail to understand about Uniscribe (and computer Typography in general) is the difference between characters and glyphs. Up until this point the main focus with Neatpad has been logical Unicode character sequences. However once Uniscribe has been involved the focus is very much on glyphs. The thing to understand here, is that there is no direct relationship between characters and glyphs.

解答详细对于理解关于 Uniscribe( 和通常的计算机排字版式 ) 对于字符和字形之间是不同的。直到这个主要的焦点以 Neatpad 逻辑 Unicode 字符排序。然后一旦 Uniscribe 被焦点

For simple scripts such as English a font usually contains one glyph per Unicode character. However for more complex scripts this relationship can change. Sometimes a single Unicode character can result in more than one glyph. The opposite is also true - there can also be multiple Unicode characters resulting in just a single glyph. This behaviour various depending on what font is being used. This separation between characters and glyphs presents a problem because our attribute style-runs are all character based, and we somehow need to translate this styling information onto specific glyphs.

对于一个简单的文字如英文一种字体对于一个 Unicode 字符通常包含一个字形。然而对于更多复杂的文字这个关系可能改变。有时一个单一的 Unicode 字符能够产生多于一个字形。相反也成立 ---- 也可能多个 Unicode 字符只生成一个字形。这个不同的特性取决于被使用的字体。这个字符与字形之间的区别存在一个问题因为我们的属性风格 ----runs 全部都是基于字符的。并且我们由于某种原因需要转换这个格式信息到具体字形。

To make things even more complicated, the concept of glyph clusters must be understood. A cluster is basically a grouping of glyphs which must be treated as a single selectable unit. Whilst this is not a problem in itself, it does make rendering glyph sequences a little more complicated because cluster boundaries must be respected.

使最后一行排足是错综复杂的 / 具体做时更是错综复杂， glyph clusters 的概念必须被理解。一个 cluster 基本上是一个字形组，它必须被作为一个单一的可选择的单位。虽然这本身不是一个问题，

Understanding the Logical Cluster List

The logical-cluster list is key to establishing a relationship between characters and glyphs. This list is returned by ScriptShape in the pwLogClust[] array. It provides the mapping between logical character-positions and glyph-cluster positions. UspLib stores each run's logical-cluster information inside the clusterList[] field of the USPDATA object.

对于在字符和字形之间建立一个联系 logical-cluster 列表是关键。这个列表由 ScriptShaped 在 pwLogClust( 参数 ) 中返回。它提供了逻辑 character-position 和 glyph-cluster 之间的映射。

To support this idea of character-to-glyph mapping, the logical-cluster list must represent two important concepts:

· Firstly, it identifies the cluster boundaries in the original Unicode string - that is, the offsets in logical character units (WCHARs) of each cluster. Each entry in the clusterList corresponds exactly to a single character in the original string - so clusterList is always the same length as the Unicode string we are processing.

· Secondly, this same array also identifies the offsets of each glyph-cluster, within the glyph-buffers generated by ScriptShape and ScriptPlace .

支持这种字符 - 到 - 字形的观念， logical-cluster 列表必须表示两个重要的概念：

首先，在原始 Unicode 字符串中它标识了 cluster 界限，换句话说，每个 cluster 的以逻辑字符单位 (WCHARs) 的偏移。在 clusterList 中每个条目正好对应在原始字符串中的单一字符 — 这样 clusterList 始终是与我们要处理的 Unicode 字符串相同的长度。

第二，这个数组也标识了每个 glyph-cluster 在由 ScriptShape 和 ScriptPlace 生成的 glyph-buffer 内部的偏移。

In other words, the individual element values (the content) of the clusterList defines the glyph-clusters, whilst the positions of the array elements represents the clusters in logical character terms.

换句话说， ClusterList 的单个元素值 ( 内容 ) 定义了 glyph-Clusters, 和代表 clusters 在逻辑字符项数组元素的位置

As an example we will use the same Arabic string " ??????? " we were looking at previously. This string of seven Unicode characters results in the following logical cluster information being generated by ScriptShape . Note that the logical array-index positions are listed across the top of the table.

作为一个例子我们将使用同一个我们先前看到的阿拉伯字符串 ” ??????? ” 。这个字符串有 7 个 Unicode 字符，导致下面的逻辑 cluster 信息由 SciprtShape 生成。注意逻辑数组下标位置被列出交叉表的顶部。

Array	[0]	[1]	[2]	[3]	[4]	[5]	[6]
WCHAR wszText[]	U+064A	U+064F	U+0633	U+0627	U+0648	U+0650	U+064A
WORD clusterList[]	6	6	4	3	2	2	0

Whole clusters are identified by grouping together any identical numbers in the logical-cluster list. As you can see from the cluster-list in the table above, there are two 6's and two 2's (in addition to the other singlular numbers), resulting in a total of five whole clusters all together. The image below illustrates this grouping concept.

整个 clusters 由集合标识，任何在 logical-cluster 中列表中相同的数字，你能够从上面的 cluster-list 表中看到。有两个 6 和两个 2( 除了其它单个数字 ) ，产生一个整个所有为 5 的 clusters 。下面的图像救命了这个组的概念。

Notice that cluster-list is always stored in logical order, whilst the glyph-list is always in visual order. This means that for right-to-left scripts (such as the Arabic string above), the cluster-list elements will decrease when reading the array. As a result of this, the first glyph that must be drawn will be at the very end of the glyph-list. Bearing this in mind, the breakdown of the clusters is as follows:

注意 cluster-list 始终以逻辑顺序存储。同时 glyph-list 始终以可视顺序存储。这意味着对于从右到左的文本 ( 如上面的阿拉伯字符串 ) ，在读取数组时 cluster-list 元素将递减。作为结果，必须被绘制的首字形将是在 glyph-list 的最末尾 . 记住这点， clusters 的分解在下面列出：

· 1st cluster: two characters represented by two glyphs.

· 2nd cluster: one character represented by one glyph.

· 3rd cluster: one character represented by one glyph.

· 4th cluster: two characters represented by two glyphs.

· 5th cluster: one character represented by one glyph.

· 第一：两个字形代表两个字符。

· 第二：一个字形代表一个字符

· 第三：一个字形代表一个字符

· 第四：两个字形代表两个字符

· 第五：一个字形代表一个字符

Hopefully it should be fairly obvious how the logical clusters were identified - with the number of WCHARs in each cluster calculated by the number of characters in each grouping. Calculating the number of glyphs in each cluster is less obvious. The key here is looking at the difference between the cluster values. This is how the identification of each cluster occurred:

逻辑 clusters 如何被标识是相当清楚的 ---- 由在每个组中的字符数目以在每个 cluster 中 WCHARs 的数目计算。在每个 cluster 中计算字形的数目不太清楚。这儿主要是看 cluster 值之间的不同。这是每个出现的 cluster 如何标识。

1. The first two 6's identify cluster#1, comprising two WCHARs (in character-positions 0 and 1). This value of 6 points to the end of the glyphList which contains the glyphs for this cluster. We know that this cluster is represented by two glyphs (#5 and #6) because:

头两个 6 标识 1 号 cluster, 组成两个 WCHARS 字符 ( 在字符中位置是 0 和 1) 。这个为 6 的值指向了包含对应于这个 cluster 字形的 glyphList 的末尾。我们知道这个 cluster 通过两个字形 (#5 和 #6) 表示。

2. The next value in the cluster-list (4) tells us two things. Obviously this cluster starts at glyph #4 in the glyph-list. However this also means that there were 2 (two) glyphs in the last cluster (6-4 = 2).

下一个在 cluster-list 中的值 (4) 告诉我们两件事。很明显这个 cluster 开始于 glyph-list 中的字形 #4(4 号字形 ) ，然而它也意味着在上一次 ( 最前面 ) 的 cluster 中有两个字形 (6-4=2).

3. The third cluster is comprised of the single glyph#3, and a single WCHAR.

第三个 cluster 由一个单一的 3 号字形 (#3) 组成 , 对应一个单一的 WCHAR.

4. The fourth cluster is comprised of two WCHARs again, which are represented by glyphs #1 and #2.

第四个 cluster 又是由两个 WCHARs 组成。它由字形 #1 和 #2 表示。

5. The fifth and final cluster is again a single WCHAR, represented by glyph#0 in the glyph-list.

第五个也是最后一个 cluster 也是一个单一的 WCHAR, 由 glyph-list 中的 #0 字形表示。

As you can see the key detail here is looking at the difference between glyph-indices in order to count the number of glyphs in each cluster. Special consideration must also be taken with right-to-left scripts because of the way that glyphs are stored in reverse order. The way I handled this was to advance the x-coordinate to the end of the run, call SetTextAlign(TA_RIGHT) , and then move the output-location to the left each time, resulting in the glyphs being output in logical (right-to-left) order.

The important thing to understand is that we always follow the cluster-list in logical order, even for right-to-left scripts. We rely on the ordering of the element values to locate each glyph-cluster as it should be drawn.

你能够领会这一块的关键细节是观察在 glyph-indices 之间的不同以便计算在每个 cluster 中的字形数量。必须特殊考虑的是从右到左的文字，因为字形是以反序存储。我处理的方法是 x- 坐标向前直到 run 的末尾，调用 SetTextAlign(TA_RIGHT), 然后每次向左移动 output-location( 输出位置 ) ，致使字形以逻辑顺序 ( 从右到左 ) 输出。

Another example

The previous example was of cause a right-to-left script and highlighted the unique way in which these scripts are represented by Uniscribe. The example shown next is based on the example in MSDN under the ScriptShape documentation, and highlights how complex left-to-right text is represented by Uniscribe.

先前的例子是一个从右到左文字并突出了在这些文字中以 Uniscribe 表示的唯一方法，在这种情况下唯一的方向以 Uniscirbe 来表现。展示的下一个例子是基于 MSDN 中 ScriptShape 文本中的例子。并突出复杂的从左到右文本是如何用 Uniscribe 来表示的。

U+920, U+911, U+915, U+94D, U+937, U+91D, U+949

The string this time is from the Devanagari script. I have no idea what it means because I just strung a sequence of code-points together which happened to have the right "look". If anyone can supply me with a Unicode phrase of 7 characters, which results in the glyph+cluster properties shown below, then please get in touch!

这次的这个字符串是来自于梵文文字。我不知道它意味着什么意思，因为我只排列一个 code-points( 编码点 ) 序列同时产生可以看的图形。如果任何人能够提供给我一个 7 个字符的 Unicode 短语，它生成 glyph+cluster 特性在下面给出。于是请进入触觉。

Array	[0]	[1]	[2]	[3]	[4]	[5]	[6]
Unicode string	U+0920	U+0911	U+0915	U+094D	U+0937	U+091D	U+0949
clusterList[]	0	1	4	4	4	5	5

The key difference here is how the cluster-list elements increase when reading the array. For left-to-right runs, the glyphs are stored in the same order as the original Unicode characters. This is the ordering that many Western readers will find most natural.

这儿关键的区别是当读取数组时 cluster-list 元素如何递增。对于从左到右的 runs ，字形以与最初的 Unicode 字符相同的顺序存储。这是许多西文阅读器的顺序。

图表又形象的说明了逻辑字符与 glyph-clusters 之间的关系。这次用于一个从左到右文本的 run. 这个例子完全是虚构的，因为我不能找到任何对它所需的 glyph+cluster 感到满意的短语、字体与文字。再一次，每个 cluster 的字符数量通过 cluster-list 元素之间的不同被计算。

The diagram hopefully again illustrates the relationship between logical characters and glyph-clusters, this time for a left-to-right run of text. This example is purely fictitious as I couldn't find any phrase, font & script which satisfied the required glyph+cluster properties. Again, the number of glyphs per cluster is calculated by the difference between the cluster-list elements.

· 1st cluster: one character represented by one glyph (1-0=1)

· 2nd cluster: one character represented by three glyphs (4-1=3)

· 3rd cluster: three characters represented by one glyph (5-4=1)

· 4th cluster: two characters represented by three glyphs (8-5=3)

第一个cluster:一个字形表示一个字符(1-0 = 1)

第二个 cluster : 三个字形表示一个字符(4-1=3)

第三个cluster:一个字形表示三个字符(5-4=1)

第四个cluster:三个字形表示二个字符(5-5=3)

The number of glyphs for the last cluster was calculated because we knew how many glyphs (8 in total) were generated for this run by ScriptShape .

最后一个 cluster 字形数量被计算因为我们知道 ScriptShape 为这个 run 生成了多少个字形 ( 总共 8 个 ) 。

Interpolation is the key

The one thing to understand about Uniscribe is the separation between characters and glyphs. So looking now at another example, what happens when we have three characters comprising two glyphs? The problem we face is, how do we distribute the colour information for each character across the glyphs, and which glyph takes which colour?

要理解的是关于 Uniscribe 在字符和字形之间的区别。因此，现在看另一个例子，当我们有两个字形表示三个字符时发生了什么？表面问题是，我们如何为每个穿越字形的字符分配颜色信息，并且哪个字形使用哪种颜色？

U+0635 U+0651 U+0650

For some scripts where the glyphs typically order horizontally you can almost infer the colour relationships. However when glyphs stack vertically on top of each other within a cluster, or when there is an unequal number of characters-to-glyphs, there is no easy way to associate a colour with a particular glyph.

对于一些典型字形水平顺序的文字，你总能够推断颜色关系。但是在一个 cluster 内部当字形垂直的堆叠在另一个的上面。或当有一个字符到字形的数量不相等。没有容易的方法把一个特殊字形与一种颜色联系。

With UspLib I solved this problem in two ways - using one method for drawing the background, and another when drawing the actual glyphs themselves (the foreground). Drawing the foreground was easy: I decided simply to paint all glyphs in a cluster as a single colour. Should there be multiple colour-attributes for the cluster, only the first is chosen and the rest are ignored. This is by far the easiest method and in reality you wouldn't expect individual glyphs to have their own colours when in a cluster.

使用 UspLib 我以两个方法解决了这个问题 ---- 使用一种方法绘制底色，并且另一种方法当绘制实际字形本身 ( 前景 ) 时。绘制前景是容易的：我决定以单一颜色简单的绘制在一个 cluster 中所以字形。对于这个 cluster 应当是多重颜色属性，只是首先的选择并忽略其它的。到目前为止这是非常容易的方法并且在一个 cluster 中实际上你不希望单个字形有它们自己的颜色。

Painting the background is rather different, because the inversion-highlighting scheme must be taken into account. The strategy I have used here is to interpolate the colours across thewidth of each cluster, when drawing the background. This method is hinted at by Microsoft in the following quote from MSDN, in the section "notes on ScriptXtoCP and ScriptCPtoX":

绘制底色是相当困难的，因为反相 — 突出方案必须被考虑到。我在这儿使用的策略是当绘制前景时内插颜色穿越每个 cluster 的宽度，这个方法经微软在下面来自于 MSDN 的引号部分被暗示。在这个章节中 ”notes on ScriptXtoCP and ScriptCPtoX”.

"Cluster information in the logical cluster array is used to share the width of a cluster of glyphs equally among the logical characters they represent."

It took me a while to make the logical leap that this strategy could also be used for text-rendering, however after implementing it I realised it is the exact same method used by the ScriptString API.

The process is very simple. We know how many characters make up each cluster, and also how many glyphs make up each cluster. We therefore sum the width of these glyphs to calculate the total width of the cluster. We then divide the cluster-width by the number of characters in the cluster - this number tells us how wide each colour band should be.

advanceWidth = clusterWidth / charCount;

For some scripts, dividing the clusters this way makes alot of sense, especially for Arabic because the caret is conventially positioned at character boundaries rather than glyph-cluster boundaries. However for most scripts this would be viewed as incorrect. Rather than having a special-case just for Arabic, I have instead written UspTextOut so that it always interpolates over glyph-clusters, should any colour-attributes happen to be this fine-grained. We will rely on the fact that ScriptCPtoX will only allow the caret (and therefore selection-highlghts) to be placed in the middle of clusters when appropriate.

Lastly, using integer math for the cluster-division will be result in potential rounding errors. Whilst this is not a massive problem, we need to have the exact same results that ScriptCPtoX produces when it does its own presumed division-operations (otherwise we could be out by a pixel occasionally). Presumably ScriptCPtoX uses MulDiv in its calculations because this appears to give the correct results, and is what I have used for UspLib.

Drawing the background

As mentioned above, drawing the background is a little different because of the use of interpolation. We will start by looking at the PaintBackground routine:

void PaintBackground(USPDATA * uspData, HDC hdc, int xpos, int ypos, RECT * bounds)

    int         i;

    ITEM_RUN * itemRun;

    // Process the item-runs in visual-order

    for(i = 0; i < uspData->itemRunCount; i++)

        itemRun = GetItemRun(uspData, i);

        // paint the background of the specified item-run

        PaintItemRunBackground(uspData, itemRun, hdc, xpos, bounds);

        xpos += itemRun->width;

As you can see this function is very simple. It merely processes the item-runs in visual-order and advances the x-coordinate by the item-width for each run. Each item-run background are rendered individually by the PaintItemRunBackground function.

void PaintItemRunBackground(USPDATA *uspData, ITEM_RUN *itemRun, HDC hdc, int xpos, int ypos)

    int i, lasti;

    // locate the item-run buffers

    WORD * clusterList = uspData->clusterList + itemRun->charPos; 

    ATTR * attrList     = uspData->attrList     + itemRun->charPos;

    int   * widthList    = uspData->widthList    + itemRun->glyphPos;

    for(lasti = 0, i = 0; i < itemRun->len; i++)

        // search for a logical cluster boundary (or end of run)

        if(i == itemRun->len || clusterList[lasti] != clusterList[i])

        {

            << process cluster >>

The primary task is to identify the logical-cluster positions. The two loop-indices ( lasti and i ) represent these cluster positions in the original text-string. The number of WCHAR s in each cluster is therefore ( i-lasti ). Because we always iterate in logical order, this is true for both LTR and RTL texts.

<< process cluster >>

        int glyphIdx1, glyphIdx2;

        // locate glyph-positions for the cluster

        GetGlyphClusterIndices(itemRun, clusterList, i, lasti, &glyphIdx1, &glyphIdx2);

        // measure width of this group of glyphs

        for(runWidth = 0; glyphIdx1 <= glyphIdx2; )

            runWidth += widthList[glyphIdx1++];

        // divide the cluster-width by the number of code-points that cover it

        advanceWidth = MulDiv(runWidth, 1, i-lasti);

Once a cluster has been identified, GetGlyphClusterIndices is callled. This function inspects the clusterList and returns the corresponding glyph-index positions for i and lasti .

The width of the glyph-cluster is computed next, by simply iterating between glyphIdx1 and glyphIdx2 , before dividing the cluster-width by the number of characters (WCHARs). We now know how far to advance each time we paint a bit of background.

    for(a = lasti; a <= i; a++)

        // look for change in attribute background

        if(a == itemRun->len           ||

           attr.bg != attrList[a].bg   ||

           attr.sel != attrList[a].sel )

            PaintRectBG(uspData, itemRun, hdc, xpos, &rect, &attr);

            rect.left = rect.right;

The final task is to interpolate the colour-attributes over the cluster. We only ever paint the background if we detect a change in colour, so most of the time an item-run background is painted with just one operation. Missing from the code listing above is the small detail of correcting for rounding errors in the (integer) division - however this is not necessary for understanding the code.

I won't bother including the code for the PaintRectBG function - suffice to say it is not really very interesting, other than the fact that it calls ExcludeClipRect after drawing any selection-highlight background area.

void GetGlyphClusterIndices( USPDATA * uspData,

                             ITEM_RUN * itemRun,

                             int        clusterIdx1,

                             int        clusterIdx2,

                             int      * glyphIdx1,

                             int      * glyphIdx2

    WORD *clusterList = uspData->clusterList + itemRun->charPos;

    // locate glyph-positions for the cluster

    if(itemRun->analysis.fRTL)

        // RTL scripts

        *glyphIdx1 = clusterIdx1 < itemRun->len ? clusterList[clusterIdx1] + 1 : 0;

        *glyphIdx2 = clusterList[clusterIdx2];

    else

        // LTR scripts

        *glyphIdx1 = clusterList[clusterIdx2];

        *glyphIdx2 = clusterIdx1 < itemRun->len ? clusterList[clusterIdx1] - 1 : itemRun->glyphCount - 1;

Above is the GetGlyphClusterIndices function. Note the two distinct cases for LTR and RTL scripts - this is required because the cluster-elements decrease when reading the cluster array (for RTL scripts), but increase for LTR scripts.

Drawing the Foreground

The process for drawing the text is so similar to that of the background that I won't bother including too much code this time. We'll jump straight in to the start of the DrawForegroundItemRun function:

 // right-left runs can be drawn backwards for simplicity

 if(itemRun->analysis.fRTL)

 {

     oldMode = SetTextAlign(hdc, TA_RIGHT);

     xpos += itemRun->width;

     runDir = -1;

The first thing we do is set the text-alignment to TA_RIGHT for any right-to-left string, and advance the x-coordinate to the end of the run. This will allow us to draw the text in logical order (as we walk the logical-cluster-list). This is important because apart from this one detail, it means we can maintain a single function for drawing both LTR and RTL texts.

 // loop over all the logical character-positions

 for(lasti = 0, i = 0; i <= itemRun->len; i++)

     // find a change in attribute

     if(i == itemRun->len || attrList[i].fg != attrList[lasti].fg )

         // scan forward to locate end of cluster (we must always

         // handle whole-clusters because the attr[] might fall in the middle)

         for( ; i < itemRun->len; i++)

             if(clusterList[i - 1] != clusterList[i])

                 break;

        // locate glyph-positions for the cluster [i,lasti]

        GetGlyphClusterIndices(itemRun, clusterList, i, lasti, &glyphIdx1, &glyphIdx2);

         << display text >>

The next difference between foreground and background rendering is how we identify cluster-boundaries. This time we look for changes in colour first of all. Once a new colour is found we scan forward to locate the end of the cluster. This means we can paint the whole cluster in one colour and not worry about interpolation.

<< display text >>

    // measure the width (in pixels) of the run

    for(runWidth = 0, g = glyphIdx1; g <= glyphIdx2; g++)

        runWidth += widthList[g];

    // only need the text colour as we are drawing transparently

    SetTextColor(hdc, forcesel ? uspData->selFG : attrList[lasti].fg);

//

    // Finally output the run of glyphs

//

    hr = ScriptTextOut(

        hdc,

        &uspFont->scriptCache,

        xpos,

        ypos,

0,

        NULL,

        &itemRun->analysis,

        NULL,

0,

        glyphList + glyphIdx1,

        glyphIdx2 - glyphIdx1 + 1,

        widthList + glyphIdx1,

        NULL,

        offsetList + glyphIdx1

);

    // +ve/-ve depending on run direction

    xpos     += runWidth * runDir;           

    lasti     = i;

Once the text-colour has been set, ScriptTextOut is called with the range of glyphs which fall within the cluster. Once again, we only output any text should there be a change in colour so usually there would only be one call to ScriptTextOut .

7. ScriptTextOut

For the sake of completeness here's the prototype for ScriptTextOut :

HRESULT WINAPI ScriptTextOut(

   HDC                hdc,

   SCRIPT_CACHE     * psc,

   int                x,

   int                y,

   UINT               fuOptions,   // ExtTextOut options

   RECT             * rect,

   SCRIPT_ANALYSIS * analysis,

   WCHAR            * pwcReserved,

   int                iReserved,

   WORD             * pwGlyphs,     // in - results of ScriptShape

   int                cGlyphs,

   int              * piAdvance,    // in - results of ScriptPlace

   int              * piJustify,

   GOFFSET          * pGoffset      // in - results of ScriptPlace

);

That's a pretty intimidating function by anyone's standards! The parameters of note are:

· fuOptions is can be one of ETO_CLIPPED , ETO_OPAQUE , or zero. These are the standard ExtTextOut flags. Because we have drawn the background ourselves there is no need to use these parameters. Note that if opaque was specified, whole clusters of glyphs must be passed to this function.

· pwGlyphs and cGlyphs identify the list of glyph values returned by ScriptShape .

· piAdvance and pGoffset point to the glyph-placement buffers returned by ScriptPlace .

· piJustify points to an optional array of justified advance values.

ScriptTextOut is basically a wrapper around ExtTextOut - however you will notice that there is no WCHAR* parameter to this function. This is because ScriptTextOut calls ExtTextOut with the ETO_GLYPH_INDEX option, and passes the buffer of glyphs we specifed.

ScriptTextOut may perform additional processing (such as glyph-reordering) before calling into GDI, so don't be tempted to bypass ScriptTextOut by calling ExtTextOut directly.

Uniscribe Limitations

One of the drawbacks of Uniscribe is the very thing it does best - the breaking up of a string into individually shapable items. The problem is that some strings containing alot of whitespace or punctuation result in a large number of item-runs. Whilst this is not bad in itself, it does present a problem when it comes to rendering the line of text. The shear number of calls to ScriptTextOut has a performance penalty - in comparison to calling ExtTextOut with the same line of text.

For complex-scripts there is no alternative but to break up the string using ScriptItemize . However it would be nice if for non-complex (i.e. English) scripts we could somehow re-combine the item-runs and reduce the potential number of calls to ScriptTextOut . I haven't ventured too far down this path yet, but it is certainly possible to identify if an item-run is complex or not by inspecting the SCRIPT_ANALYSIS::eScript field.

struct SCRIPT_ANALYSIS

    WORD eScript : 10;

    WORD fRTL    : 1;

...

};

Now, the eScript field is 'opaque' which means we shouldn't make any assumptions about its value. However it can be used as an index into the "global script table", which contains information about the specific script-shaping engines installed in a system.

HRESULT WINAPI ScriptGetProperties(SCRIPT_PROPERTIES ***ppSp, int *piNumScripts);

The ScriptGetProperties function returns a pointer to this global-script-table, and each entry in the table is a pointer to a SCRIPT_PROPERTIES structure:

struct SCRIPT_PROPERTIES

    DWORD langid;

    DWORD fNumeric;

    DWORD fComplex;

...

};

There are many information-fields in this structure, however the interesting one for us is the fComplex flag. Drawing all this together results in the following function, which returns a boolean indicating if an item-run is complex or not:

BOOL IsRunComplex(ITEM_RUN *itemRun)

    SCRIPT_PROPERTIES ** propList;

    int                  propCount;

    int                  scriptIndex;

    // get pointer to the global script table

    ScriptGetProperties(&propList, &propCount);

    // the SCRIPT_ANALYSIS::eScript is an index to the global script table

    scriptIndex = itemRun->analysis.eScript;

    // locate the script from the script-index

    return propList[scriptIndex]->fComplex;

Any non-complex item-runs could theoretically be identified and then merged together into a single run, with the SCRIPT_ANALYSIS::eScript field set to SCRIPT_UNDEFINED . All this should happen before ScriptShape is called.

Coming up in Part 15

Every time I post a new tutorial I promise that there'll be another update to Neatpad, and of course it hasn't happened (again!). Uniscribe is just so damn complicated it has taken me far more time to document than I first anticpated. For now you can download the UspLib demo at the top of this tutorial, and next time we really will be seeing a new-and-improved Neatpad.

Please send any comments or suggestions to: james@catch22.net

Last modified: 20 May 2006 12:44:09

2 Character to Glyph Mapping (‘cmap’)

The ‘cmap’ table is used to convert from an outside encoding (such as Unicode) to internal glyph ids. The rendering system uses the ‘cmap’ to convert the Unicode code points in a string to glyph ids and then renders the appropriate glyph shapes at the proper positions on the screen or printer. Glyph ids are used exclusively to reference glyphs in all other font tables.

cmap表被用于转换一个外部编码(例如Unicode)到一个内部字形 ID,绘制系统使用cmap转换一个字符串中的Unicode编码点为字形ids，然后在屏幕或打印机的适当位置绘制适当字形图形。字形ids唯一的用于引用所有其它字体表中的字形。

The ‘cmap’ table consists of a set of mapping subtables for different technologies and architectures. This allows the same font to work on multiple operating systems. For example, most TrueType fonts have three subtables within the ‘cmap’, two for Apple and one for Microsoft.

cmap表由一组用于不同工艺和结构的子表组成。允许同一种字体工作在多重操作系统上。举个例子，在cmap之中最大(大多数)的TureType字体有三个子表。两个用于苹果一个用于微软。

Each mapping subtable has two numbers associated with it: the platform id and the encoding id. The platform id indicates what architecture the subtable is designed for. For example, a platform id of 0 indicates Apple Unicode, 1 indicates Apple Script Manager, 2 indicates ISO, and 3 indicates Microsoft. The meaning of the encoding id depends on the platform. For example, some older fonts have only two subtables: Platform 1, encoding 0 (Mac Roman 8-bit simple 256 glyph encoding) and Platform 3, encoding 1 (Windows Unicode).

每一个映射子表有两个号码关联它：平台id和编码id.平台id标志子表被设计用于什么体系结构。举例：一个为0的平台id标志苹果 Unicode，1标志Apple Script Manager,2标志ISO，3标志微软。编码id的意思取决于平台。举例，一个老字体只有两个子表：平台1、编码0(Mac Roman 8-bit simple 256 glyph encoding)和平台3、编码1(Widows Unicode).

The mapping subtables use various formats to reduce their size, but all formats map from a character code to a glyph id. Multiple character codes can map to the same glyph id. For example, space (U+0020) and no-break space (U+00A0) often map to the same glyph id.

There are some special glyph ids which are reserved by convention as described in the below table. It is advisable to follow these conventions when modifying fonts. Glyph 0 is normally an open rectangle and is used by rendering systems to substitute for characters that are not present in the font.

Glyph Id	Character
0	unknown glyph
1	null
2	carriage return
3	Space

映射子表使用不同的格式减小它们的尺寸。但所有从一个字符代码到一个字形id字形的格式映射。多个字符代码能被映射为相同字形id.举例：空格(u+0020)和不间断空格(u+00A0)通常映射为同一个字形id.

一些特殊字形id由协议被保留。作为描述在下面列出。当修改字体时下面这些协议是可取的。字形0通常是一个空矩形(方框)，它用于绘制系统代替没能出现在字体中的字符。

ScriptPlace

The ScriptPlace function takes the output of a ScriptShape call and generates glyph advance width and two-dimensional offset information.

ScriptPlace函数带一个对ScriptShape函数调用的结果为参数，返回字符前进宽度和二维偏移信息。

HRESULT WINAPI ScriptPlace(

 HDC hdc,

  SCRIPT_CACHE *psc,

  const WORD *pwGlyphs,

  int cGlyphs,

  const SCRIPT_VISATTR *psva,

  SCRIPT_ANALYSIS *psa,

  int *piAdvance,

  GOFFSET *pGoffset,

  ABC *pABC

);

参数

hdc

[in] 设备环境句柄

psc

[in/out] Pointer to a SCRIPT_CACHE structure.

一个SCRIPT_CACHE结构指针。

pwGlyphs

[in] Pointer to a glyph buffer obtained from an earlier call to the ScriptShape function.

从先前调用ScriptShape函数而获取的一个字形缓冲区指针。

cGlyphs

[in] Count of glyphs in the glyph buffer.

在字形缓冲区中的字形数量。

psva

[in] Pointer to an array of SCRIPT_VISATTRstructures.

一个SCRIPT_VISATTR结构数组指针。

psa

[in/out] Pointer to an array of SCRIPT_ANALYSIS structures obtained from a previous call to ScriptItemize.

从先前调用ScriptItemize函数而获得的一个SCRIPT_ANALYSIS结构数组指针。

piAdvance

[out] Pointer to an array that receives the advance width information.

一个数组指针用于接收前进宽度信息。可能是每个字形的宽度，即每个字形的abcA+abcB+abcC.

pGoffset

[out] Pointer to a GOFFSET structure that receives the x and y offset of the combining glyph.

一个GOFFSET结构指针用于接收组合字形的x和y偏移量.

pABC

[out] Pointer to an ABC structure that receives the ABC widths for the entire run.

一个ABC结构指针用于接收整个run的ABC宽度。

返回值:

如果函数成功，返回值是零。

If the function fails, it returns a nonzero value. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.

如果函数失败，返回一个非零值。并且如果遇到任何不可校正的错误。它也返回一个HRESULT类型的值。

Remarks

The composite ABC width for the whole item identifies how much the glyphs overhang to the left of the start position and to the right of the length implied by the sum of the advance widths. The total advance width of the line is exactly abcA+abcB+abcC. The abcA and abcC values are maintained as proportions of the cell height represented in 8 bits and are thus roughly +/-1%. The total width returned, which is the sum of the abcA+abcB+abcC values pointed to by piAdvance, is accurate to the resolution of the TrueType shaping engine.

All arrays are in visual order unless the fLogicalOrder member is set in the psa parameter.

对整个条目的合成ABC宽度标识了多少个字形向起始位置的左边突出和向长度右边突出。这个长度是前进宽度的总数。行总的前进宽度正好是abcA+abcB+abcC. abcA和abcC值被维持为以8位表示并且大概是+/-1%的单元高度的比例，由piAdvance返回的总的宽度是abcA+abcB+abcC的值。是准确的TureType图形引擎的分辨率。

所有数组是以可视顺序除非在psa参数中的fLogicalOrder成员被设置。

ScriptLayout

The ScriptLayout function converts an array of run-embedding levels to a map of visual-to-logical position and/or logical-to-visual position.

ScriptLayout函数转换一个run-embedding级别数组为一个可视到逻辑位置的映射表或逻辑到可视位置的映射表。

HRESULT WINAPI ScriptLayout(

 int cRuns,

  const BYTE *pbLevel,

  int *piVisualToLogical,

  int *piLogicalToVisual

);

Parameters

cRuns

[in] 要处理的run的数量。

pbLevel

[in] Pointer to an array of run-embedding levels.

一个run-embedding级别数组指针。

piVisualToLogical

[out] Pointer to an array that receives the run levels reordered to visual order.

一个用于接收run级别重新排序为可视顺序的数组指针。

piLogicalToVisual

[out] Pointer to an array that receives the visual run positions.

接收可视run位置的数组指针。

Return Values

If the function succeeds, the return value is zero.

若函数成功，返回值是零。

If the function fails, it returns a nonzero value. If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.

Remarks

The run-embedding levels, defined in the Unicode bidirectional algorithm, describe the direction of a run, the direction of any runs it is embedded in, and the direction of the paragraph.

run-embedding级别，定义了在Unicode双向算法中，描述了一个run的方向，任何runs的方向被嵌入到，和段落的方向。

Level	Meaning
0	A left-to-right run in a left-to-right paragraph. 在一个从左到右的段落中一个从左到右的run
1	A right-to-left run embedded in a left-to-right run in a left-to-right paragraph, or a right-to-left run (not embedded in another run) in a right-to-left paragraph. 一个从右到左的run嵌入在一个从左到右的段落中的一个从左到右的run中或者一个从右到左的run(没有嵌入在另一个run中)在一个从右到左的段落中。
2	A left-to-right run embedded in a right-to-left run of type 1. 一个从左到右的run嵌入在一个从右到左的类型为1的run
3	A right-to-left run embedded in a left-to-right run of type 2. 一个从右到左的run嵌入在一个从左到右的类型为2的run
etc.	The embedding levels can continue as far as necessary. 嵌入级别能继续直到必须。

The logical position refers to the placement of a run relative to other runs. It is the position in backing store, and corresponds to the order in which you would read the text aloud. The visual position of a run is the way the run is visually displayed on the line, taking into account the possible directions of each run.

逻辑位置引用了一个run相对于其它runs的位置,这是向后存储的位置，对应于你大声读取文本顺序。一个run的可视位置是run在视觉上显示在行上的方法，并为每个run考虑可能的方向性。

The pbLevel parameter must contain the embedding levels for all runs on the line, ordered logically.

pbLevel参数必须包含在一行上对于所有runs在逻辑顺序上的嵌入级别。我们通过ScriptItemize返回的 SCRIPT_ITEM数组(其实ScriptItemize返回的是一个数组的指针)划分出run，并由划分出的run组成了一个链表，如果我们把些 run是按SCRIPT_ITEM(ScriptItemize返回的)中的元素的在数组中的顺序(由小到大)而创建一个链表。那么从这个run链表的头节点到这个链表的尾节点就是逻辑顺序。而由这个run链表的每个节点的SCRIPT_ANALYSIS成员的SCRIPT_STATE成员的 uBidiDirection所组成的一个数组就是逻辑顺序上的embedding level(双向性是从左到右还是从右到左).pbLevel就是这样的一个数组指针。

On output, piVisualToLogical[0] is the logical index of the run to display at the far left. Subsequent entries should be displayed progressing from left-to-right.

在输出上,piVisualToLogical[0]是显示在最左边的run的逻辑序号。以后的条目应当被显示从左到右进行。是否可以这样理解：这个数组的元素是以显示位置排序的即数组的下标代码显示位置，而该下标所引用的元素值则是逻辑run的索引。

The piLogicalToVisual[0] parameter is the relative visual position where the first logical run should be displayed, the leftmost display position being zero.

piLogicalToVisual[0]参数相对于第一个逻辑run应当被显示的可视位置,最左边显示位置是0。是不是可以这样理解，所谓位置指的是从左到右，既然最左边的是0，那么它的下一个位置即右边的一个就是1，依此类推。那么这个数组中元素的下标就是逻辑run的索引，而元素的值则是索引所对应的逻辑run应当被显示的位置。

上面这两个输出参数所存储的内容是相反的。

piVisualToLogical是以显示位置为下标，元素的值则是该在位置上应当显示的run的索引。

pvLogiclaVisual是以run的索引为下标，而元素的值则是索引为该下标的run应当显示的位置值。

The caller may request either piLogicalToVisual or piVisualToLogicalI, or both.

调用者需要piLogicalToVisual或piVisualToLogical两个之中的一个。

Note that no other input is required because the embedding levels give all necessary information for layout.

注意，没有其它输入被需要因为嵌入级别给出了所有用于布局的必要信息。

ScriptGetLogicalWidths

The ScriptGetLogicalWidths function converts the glyph advance widths for a specific font into logical widths.

ScriptGetLogicalWidhts函数转换一个指定字体的字形向前宽度为逻辑宽度。

HRESULT WINAPI ScriptGetLogicalWidths(

 const SCRIPT_ANALYSIS *psa,

 int cChars,

 int cGlyphs,

 const int *piGlyphWidth,

 const WORD *pwLogClust,

 const SCRIPT_VISATTR *psva,

 int *piDx,

);

Parameters

psa

[in]一个 SCRIPT_ANALYSIS 结构指针.

cChars

[in] 在RUN中的逻辑编码点数量

cGlyphs

[in] 在一个RUN中的字形数量

piGlyphWidth

[in] 字形向前宽度的数组指针

pwLogClust

[in] 逻辑Cluster的数组指针

psva

[in] SCRIPT_VISATTR 结构指针

piDx

[out] 逻辑宽度的数组指针

Return Values

目前, ScriptGetLogicalWidths 始终返回 S_OK.

Remarks

ScriptGetLogicalWidths is useful for recording widths in a font-independent manner. ScriptGetLogicalWidths converts the glyph advance widths calculated for a specific font into logical widths, one per code point, in the same order as the code points. If the same string is then displayed on a different device using a different font, the logical widths may be applied, by using ScriptApplyLogicalWidth, to approximate the original placement. This would be useful when implementing print preview—on the preview screen it is important to match the layout and placement of the final printed result.

Ligature glyph widths are divided evenly among the characters they represent.

ScriptGetLogicalWidths对于以一个字体独立的方式记录宽度是有用的。ScriptGetLoaicalWidhts转换对于指定字体计算出的字形向前宽度为逻辑宽度。每个代码点一个逻辑宽度，和编码点相同的顺序。如果同一个字符串使用不同的字体被显示在不同的设备上。通过使用 ScriptApplyLogicalWidth函数逻辑宽度可以被应用。以近似最初的安排。当在预览屏幕上实现打印预览这个将是有用的，以匹配布局和最终打印结果的位置。

连字字形宽度以它们代表的字符均匀的划分。

ScriptBreak

The ScriptBreak function returns information for determining line breaks.

ScriptBreak函数返回用于确定行中断的信息。

HRESULT WINAPI ScriptBreak(

  const WCHAR *pwcChars,

  int cChars,

  const SCRIPT_ANALYSIS *psa,

  SCRIPT_LOGATTR *psla

);

Parameters

pwcChars

[in]要处理的Unicode字符.

cChars

[in]要处理的Unicode字符数量.

psa

[in] Pointer to the SCRIPT_ANALYSIS structure obtained from an earlier call to the ScriptItemize function.

从先前调用ScriptItemize函数而获得的一个SCRIPT_ANALYSIS结构指针

psla

[out] Pointer to a buffer that receives the character attributes as a SCRIPT_LOGATTR structure.

用于接收字符属性的SCRIPT_LOGATTR结构体缓冲区指针。

Return Values

If the function succeeds, the return value is zero.

Remarks

The ScriptBreak function returns cursor movement and formatting break positions for an item in an array of SCRIPT_LOGATTR structures. To support mixed formatting within a single word correctly, ScriptBreak should be passed whole items as returned by ScriptItemize and not the finer formatting runs.

ScriptBreak在一个SCRIPT_LOGATTR结构数组中对一个item函数返回光标移动和格式化中断位置.在一个单字内字支持复合格式。ScriptBreak应当被传递给ScriptItemize返回的整个item并且不是格式化的run

ScriptBreak does not require an hdc and does not perform shaping.

ScriptBreak不需要一个hdc也不执行图形化。

The SCRIPT_LOGATTR structure, pointed to by psla, identifies valid caret positions and line breaks. The SCRIPT_LOGATTR.fCharStop flag marks cluster boundaries for those scripts where it is conventional to restrict from moving inside clusters. The same boundaries could also be inferred by inspecting the pwLogCLust array returned by ScriptShape, however ScriptBreak is considerably faster in implementation and does not require an hdc to be prepared. The fWordStop, fSoftBreak, and fWhiteSpace flags in SCRIPT_LOGATTR are only available through ScriptBreak.

Most shaping engines that identify invalid sequences do so by setting the fInvalid flag in ScriptBreak. The fInvalidLogAttr flag in SCRIPT_PROPERTIES identifies which scripts do this.

由psla指向的SCRIPT_LOGATTR结构，标识了有效的插入记号位置和行中断，SCRIPT_LOGATTR的fCharStop标记标志了cluster那些文字的界限，这个位置是个协议以限制在clusters内部移动。同一个限制也能被以检查由ScriptShape返回的 pwLogClust数组而推断。然而ScriptBreak是以相当快的并且不需要准备hdc,在SCRIPT_LOGATTR中的fWordStop、fSoftBreak、 fWhiteSpace标记只是通过ScriptBreak才生效。

SCRIPT_LOGATTR

The SCRIPT_LOGATTR structure describes attributes of logical characters that are useful when editing and formatting text.

SCRIPT_LOGATTR结构描述了逻辑字符属性，当编辑和格式化文本时它是有用的。

typedef struct tag_SCRIPT_LOGATTR

  BYTE fSoftBreak :1;

  BYTE fWhiteSpace :1;

  BYTE fCharStop :1;

  BYTE fWordStop :1;

  BYTE fInvalid :1;

  BYTE fReserved :3;

} SCRIPT_LOGATTR;

Members

fSoftBreak

It is valid to break the line in front of this character. This member is set on the first character of Southeast Asian words.

在这个字符前面中断行是有效的。这个成员被设置在东南亚字的首字符上。

fWhiteSpace

This character is one of the many Unicode characters that are classified as breakable white space. Breakable white space can break words—that is, it is all white space except NBSP (nonbreaking space) and ZWNBSP (zero-width nonbreaking space).

这个字符是许多被划分为一个可中断空白的Unicode字符之一，可中断空白能够中断字，也就是说，它是所有除了NBSP(nonbreaking space非中断空格)和ZWNBSP(zero-width nonbreaking space 零宽度非中断空格)的空白。

fCharStop

Valid caret position. Set on most characters, but not on code points inside Indian and Southeast Asian character clusters. May be used to implement LEFT ARROW and RIGHT ARROW operations in editors.

有效的插入记号位置，设置在大多数的字符上，但不在印度和东南亚字符clustersr的编码点上。可以被使用以在编辑中实现LEFT_ARROW和RIGHT_ARROW操作。

fWordStop

Valid caret position. It is the correct place to show the caret when you use a word movement keyboard action such as CTRL+LEFT ARROW and CTRL+RIGHT ARROW. May be used to implement the CTRL+LEFT ARROW and CTRL+RIGHT ARROW operations in editors.

有效的插入记号位置，当你使用一个字移动键盘的方式如CTRL+LEFT ARROW和CTRL+RIGHT ARROW,这是以显示插入记号的正确位置，可以被使用于在编辑中实现CTRL+LEFT ARROW和CTRL+RIGHT ARROW操作。

fInvalid

Marks characters which form an invalid or undisplayable combination. Scripts which can set this flag have the flag fInvalidLogAttr set in their SCRIPT_PROPERTIES structure.

标记字符哪种格式一个无效的不可显示的组合。能被设置为这个标志的文字在它们的SCRIPT_PROPERTIES结构有fInvalidLogAttr标志设置。

ScriptBreak

ScriptBreak works alongside ScriptItemize to identify the logical attributes of each character in a string. ScriptBreak must be called once for each individual item-run in the string (as identified by ScriptItemize ) and returns an array of SCRIPT_LOGATTR structures. Each entry in the array represents a single WCHAR in the Unicode string, and must be allocated by the caller to have the same number of elements as there are WCHARs in the run.

ScriptBreak 工作依靠 ScriptItemize 以标识一个字符串中每个字符的逻辑属性。对于在字符串中每一个单个的 item-run ( 以 ScriptItemize 标识 )ScriptBrak 必须被调用一次并返回一个 SCRIPT_LOGATTR 的结构数组。每个数组元素代表了 Unicode 字符串中一个单一 WCHAR ，并且必须通过调用者分配与 run 中的 WCHARs 数量相同的元素个数即数组大小。

HRESULT WINAPI ScriptBreak (

WCHAR * pwcChars,

int cChars,

SCRIPT_ANALYSIS * psa,

SCRIPT_LOGATTR * psla

);

The individual attributes for each character are held within the SCRIPT_LOGATTR structure, shown below:

针对每个字符的单个属性被固定在 SCRIPT_LOGATTR 结构之内，展现在下面：

struct SCRIPT_LOGATTR

{

BYTE fSoftBreak : 1;

BYTE fWhiteSpace : 1;

BYTE fCharStop : 1;

BYTE fWordStop : 1;

BYTE fInvalid : 1;

BYTE fReserved : 3;

};

Although each field of the SCRIPT_LOGATTR structure has a specific purpose, this information as returned by ScriptBreak is generally useful for two purposes: word-wrapping and keyboard navigation:

虽然 SCRIPT_LOGATTR 结构体的每个字段有一个特殊用途，由 ScriptBreak 返回的这个信息通常对于两个目的有用： word-wrapping( 字绕加，自动换行，字环绕 ) 和键盘导航。

· fSoftBreak indicates the positions within a string where word-wrapping can take place - in other words, the positions where the string can be broken into smaller units suitable for display over multiple lines.

· fSoftbreak标志在一个字符串中字回绕能产生(进行)的位置，换名话说，在这个位置上字符串能被中断为较小的单位适合于以多行显示。

· fWhiteSpace indicates that the corresponding character should be treated as white-space. This could potentially be set for many more characters than just tabs and spaces.

· fWhiteSpace标志相应的字符应当作为一个空白对待，这个可能为许多字符被设置

· fCharStop and fWordStop identify valid caret positions within the string. These positions can be used to support single character- based navigation and 'word' navigation.

· fCharStop和fWordStop标识在一个字符串中有效的插入记号位置，这些位置能被用于以支持单一字符-基于导航和“字”导航。

Don't under-estimate just how much work ScriptBreak is doing on our behalf. The identification of character and word positions alone saves us a tremendous amount of effort. Added to this is the fact that ScriptBreak supports all of the various Unicode scripts, so for languages such as Thai (which require dictionary support to identify 'soft breaks'), all of the hard work is already done.

不能低估 ScriptBreak 在我们的利益上仅仅作了多少工作。字符的标识和字位置就保存给我们一个极大的工作数量。增加到这个事实上 Scriptbreak 支持所有不同的 Unicode 文字，因此对于语言如泰语 ( 这种语言需要方向支持以标识 ”soft breaks”), 所有困难工作已经被做掉了。

The task of calling ScriptBreak for each item-run is handled by the UspAnalyze function, which we looked at in previous tutorials. The SCRIPT_LOGATTR buffer is allocated and stored inside the USPDATA object's breakList * member. A simple loop is then used to iterate over each item-run, and the results of ScriptBreak stored inside the USPDATA::breakList array. The array holds the results for all item-runs, concatenated together:

对于每个 item-run 调用 ScriptBreak 的任务由 UspAnalyza 函数处理，我们在先前的指南中看这个， SCRIPT_LOGATTR 被分配和存储在 USPDATA 内部对象的 breakList * 成员。一个简单循环是然后使用迭代器遍历每个 item-run,Scriptbreak 的结果存储在 USPDATA::breakList 数组中。数组为所有 item-runs 保存结果，连接在一起。

<< UspLib.c - UspAnalyze(...) >>

// allocate memory for SCRIPT_LOGATTR structures

uspData->breakList = malloc(wlen * sizeof(SCRIPT_LOGATTR));

// Generate the word-break information for each item-run

for(i = 0; i < uspData->itemRunCount; i++)

{

ITEM_RUN *itemRun = &uspData->itemRunList[i];

ScriptBreak(

wstr + itemRun->charPos,

itemRun->len,

&itemRun->analysis,

uspData->breakList + itemRun->charPos

);

}

Any string (or paragraph) of text analyzed with UspAnalyze will therefore automatically have it's SCRIPT_LOGATTR information stored inside the USPDATA object. Because the information for each run has been concatenated into the same buffer in effect individual item-runs do not need to be taken into account when inspecting the logical-attributes for each character in the string.

以 UspAnalyze 生成的文本分析的任何字符串 ( 或段落 ) 将因此自动有它的存储在 USPDATA 对象内部的 SCRIPT_LOGATTR 信息，因为对于每个 run 的信息

Let's look at a quick example and see how the string "Hello ??????? World" would be treated by ScriptBreak . Note that there are two spaces in the string, one either side of the Arabic phrase:

让我们看一个快速的例子并且看字符串 ”Hello ??????? World” 如何由 ScriptBreak 对待。注意在这个字符串中有两个空格，在阿拉伯语的两端。

SCRIPT_LOGATTR	H	E	L	L	O		?	?	?	?	?	?	?		W	O	R	L	D
SoftBreak	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0
WhiteSpace	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0
CharStop	1	1	1	1	1	1	1	0	1	1	1	0	1	1	1	1	1	1	1
WordStop	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0
Invalid	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

ScriptXtoCP

The ScriptXtoCP function converts an x offset from the left end (!fLogical) or leading edge (fLogical) of a run to a logical character position and a flag that indicates whether the x position fell in the leading or the trailing half of the character.

ScriptXtoCP函数转换一个x偏移量

HRESULT WINAPI ScriptXtoCP(

 int iX,

  int cChars,

  int cGlyphs,

  const WORD *pwLogClust,

  const SCRIPT_VISATTR *psva,

  const int *piAdvance,

  const SCRIPT_ANALYSIS *psa,

  int *piCP,

  int *piTrailing

);

Parameters

[in]偏移量，逻辑单位，从run左侧.

cChars

[in] run中逻辑编码点的数量

cGlyphs

[in] run中的字形数量

pwLogClust

[in] 指向一个包含逻辑clusters的数组。

psva

[in]一个SCRIPT_VISATTR结构体数组指针，这个数组包含字形的可视信息

piAdvance

[in] Pointer to an array of advance widths. 前进宽度数组指针

psa

[in] SCRIPT_ANALYSIS结构体指针

piCP

[out] Pointer to an integer to receive the character position.

一个整数指针，接收字符位置

piTrailing

[out] Pointer to a flag to receive information about whether the position is the leading or trailing edge of the character.

标志指针，接收关于是否位置是字符的前沿或者后沿的信息。

Return Values

If the function succeeds, the return value is zero.

Remarks

For scripts where the caret may conventionally be placed into the middle of clusters (for example, Arabic and Hebrew), the returned character position may be for any code point in the line, and piTrailing will be either zero or one.

For scripts in which the caret is conventionally snapped to the boundaries of a cluster, the returned character position is always the position of the first code point in a cluster (considered logically), and piTrailing is either zero or the number of code points in the cluster.

Thus the appropriate caret position for a mouse hit is always the returned character position plus the value of fTrailing.

If the x position passed is not in the item at all, the resulting position will be the trailing edge of character –1 (for x positions before the item), or the leading edge of character "cChars" (for x positions following the item).

ScriptTextOut

The ScriptTextOut function takes the output of both ScriptShape and ScriptPlace calls and calls the operating system ExtTextOut function appropriately.

All arrays are in display order unless the fLogicalOrder member is set in the SCRIPT_ANALYSIS structure pointed to by psa.

HRESULT WINAPI ScriptTextOut(

 const HDC hdc,

  SCRIPT_CACHE *psc,

  int x,

  int y,

  UINT fuOptions,

  const RECT *lprc,

  const SCRIPT_ANALYSIS *psa,

  const WCHAR *pwcReserved,

  int iReserved,

  const WORD *pwGlyphs,

  int cGlyphs,

  const int *piAdvance,

  const int *piJustify,

  const GOFFSET *pGoffset

);

Parameters

hdc

[in] Handle to a device context.

psc

[in/out] Pointer to a SCRIPT_CACHE structure.

[in] Value of the x-coordinate of the first glyph.

[in] Value of the y-coordinate of the first glyph.

fuOptions

[in] Options equivalent to the fuOptions parameter of ExtTextOut. It may contain ETO_CLIPPED or ETO_OPAQUE (or neither of them, or both of them).

lprc

[in] Pointer to a rectangle used to clip the display. This parameter is optional and can be NULL.

psa

[in] Pointer to a SCRIPT_ANALYSISstructure obtained from a previous call to ScriptItemize.

pwcReserved

Reserved, must be NULL.

iReserved

Reserved, must be zero.

pwGlyphs

[in] Pointer to an array of glyphs obtained from a previous call to ScriptShape.

cGlyphs

[in] Count of the glyphs in the pwGlyphs array.

piAdvance

[in] Pointer to an array of advance widths obtained from a previous call to ScriptPlace.

piJustify

[in] Pointer to an array of justified advance widths. This parameter is optional and can be NULL.

pGoffset

[in] Pointer to a GOFFSET structure containing the x and y offsets for the combining glyph.

Return Values

If the function succeeds, the return value is zero.

Remarks

For any run that is rendered right-to-left (fRTL) and was generated in logical order by forcing the fLogicalOrder flag, call SetTextAlign(hdc, TA_RIGHT) and give the right-side coordinate before calling ScriptTextOut.

The piJustify array provides requested cell widths for each glyph. When the piJustifywidth of a glyph differs from the unjustified width (in piAdvance), space is added to or removed from the glyph cell at its trailing edge. The glyph is always aligned with the leading edge of its cell. (This rule applies even in visual order.)

When a glyph cell is extended the extra space is usually made up by the addition of white space, however for Arabic scripts, the extra space is made up by one or more kashida glyphs, unless the extra space is insufficient for the shortest kashida glyph in the font. (The width of the shortest kashida is available by calling ScriptGetFontProperties.)

The piJustify parameter should be passed only if the string must be justified again. Normally, pass NULL to this parameter.

The pwcinChars and cChars parameters are required only if output is to a metafile DC. If hdc is not a metafile, these parameters may be passed as NULL and zero.

Do not use ScriptTextOut to write to a metafile unless you are sure that the metafile will be played back without any font substitution. ScriptTextOut records glyph numbers in the metafile, and since glyph numbers vary considerably from one font to another, such a metafile is unlikely to play back correctly when different fonts are substituted. For example, when a metafile is played back at a different scale, a CreateFont request that is recorded in the metafile may resolve to a bitmap instead of a TrueType font. Likewise, if the metafile is played back on a different machine, the requested fonts may not be installed. To write complex scripts in a metafile in a font-independent manner, use ExtTextOut to write the logical characters directly, so that glyph generation and placement do not occur until the text is played back.

GreenArrowMan

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Uniscribe 内容详解

UniscribeUniscribe是一组APIs用来精细真实控制复杂文本处理。因为字符、符号不是以一个简单的方式排版，所以一个复合文本需要特殊处理以显示和编辑。控制符号的形状和位置的规则被指定在The Unicode Standard:Worldwide Character Encoding ,Version 2.0, Version 2.0, Addison-Wesley Publishing Company.
复制链接

扫一扫

专栏目录

Uniscribe 内容详解

“相关推荐”对你有帮助么？