ICU FAQs

ICU FAQs

Contents

  1. 1 Introduction to ICU
    1. 1.1 What is ICU?
    2. 1.2 Where can I get ICU?
    3. 1.3 How do I install the binary versions of ICU?
    4. 1.4 What is the ICU binary compatibility policy?
    5. 1.5 How is the ICU licensed?
    6. 1.6 Can I use ICU from other languages besides C/C++ and Java?
    7. 1.7 How do I upgrade to a new version of ICU? Should I be concerned about API changes, a new Unicode version or a new CLDR version)?
  2. 2 Building and Testing ICU
    1. 2.1 How do I build ICU?
    2. 2.2 How do I get 32- or 64-bit versions of the ICU libraries?
    3. 2.3 How do I build an optimized, non debug ICU?
    4. 2.4 Why am I getting so many test failures when I use "gmake check"?
    5. 2.5 How can I reduce the size of the ICU data library?
    6. 2.6 Why am I seeing a small ( only a few K ) instead of a large ( several megabytes ) data shared library (icudt)?Opening ICU services fails with U_MISSING_RESOURCE_ERROR and u_init() returns failure.
    7. 2.7 Can I add or remove a converter from ICU?
    8. 2.8 Why don't the makefiles work?
    9. 2.9 What version of the C iostream is used in ICU4C?
    10. 2.10 I only want to use the C APIs, do I need a C++ compiler?
  3. 3 Features of ICU
    1. 3.1 What computer languages does ICU support?
    2. 3.2 How are the APIs documented for deprecation?
    3. 3.3 What version of Unicode standard does ICU support?
    4. 3.4 Does ICU support UTF-16 surrogates and Unicode supplementary characters?
    5. 3.5 Does Java support UTF-16 surrogates and Unicode supplementary characters?
    6. 3.6 How does ICU relate to Java's java.text.* package?
  4. 4 Using ICU
    1. 4.1 Can I use any of the features of ICU without Unicode strings?
    2. 4.2 How do I declare a Unicode string in ICU?
    3. 4.3 How is a Unicode string represented in ICU4C?
    4. 4.4 How do I index into a UTF-16 string?
    5. 4.5 What is the performance difference between UTF-8 and UTF-16?
    6. 4.6 How do the converters work?
    7. 4.7 What does a locale look like in ICU?
    8. 4.8 How is ICU versioned?
    9. 4.9 What is the relationship between ICU locale data and system locale data?
    10. 4.10 How are errors handled in ICU?
    11. 4.11 With calendar classes, why are months 0-based?
    12. 4.12 Is there a guideline for COBOL programs that want to use ICU?
    13. 4.13 Where can I get more information about using ICU?

Introduction to ICU

What is ICU?

ICUis a cross-platform Unicode based globalization library. It includessupport for locale-sensitive string comparison,date/time/number/currency/message formatting, text boundary detection,character set conversion and so on.

Where can I get ICU?

You can get ICU4C and ICU4J from http://www.icu-project.org/download/

Why don't you build binaries for my platform?

There aremany versions of compilers on so many platforms that we cannot buildthem all and guarantee compatibility between them all even on the sameplatform. Due to these restrictions, we only distributea limited number of binary versions of ICU, but we will assist in building other versions from source.

Why don't you provide project files for my MSVC version (MSVC 2008, etc)?

You can use the Cygwin build environment to build ICU from source against the MSVC compiler. See the ICU4C Readme.

How do I install the binary versions of ICU?
  • Windows:
    • The DLLs you may need for your application are located in bin\icuXX##.dll, where "XX" are two letters (such as "uc" for the "common" library, "in" for the "i18n" library, etc.) and ## is the major and the minor version number (such as 42 for 4.2 / 4.2.0.1 or 4.2.4 ).
    • Either place the DLLs in the same directory as your application's .EXE files, or set the PATH variable to point to the directory containing the ICU DLLs.
    • For compiling applications, add the "include" direcotry (the parent of the "unicode" and "layout" directories) to the include search path.
    • For linking applications, add the "lib" directory to the appropriate path.
  • Other Platforms:
    • For other platforms, the .tgz file unpacks to a "/usr/local" type hierarchy. For system-wide installation, you can unpack all of the files into /usr/local/bin, /usr/local/include, etc. 
    • The configuration script /usr/local/bin/icu-config  or the similar Makefile include fragment /usr/local/lib/icu/current/Makefile.inc can be used in building applications.
What is the ICU binary compatibility policy?

Please see the section on binary compatibility (§) in the design chapter.

How is the ICU licensed?

The ICU projects since ICU 1.8.1 and ICU4J 1.3.1 are covered by the ICU license, a simple, permissive non-copyleft free software license, compatiblewith the GNU GPL.   The ICU license is identical to the version of theX license that was formerly available at http://www.x.org/Downloads_terms.html. (This site no longer exists, but can still be retrieved through internet archive services)

TheICU license is intended to allow ICU to be included both in freesoftware projects and in proprietary or commercial products.

Can I use ICU from other languages besides C/C++ and Java?

There are a number of wrappers available, please see the Related Projects page.

How do I upgrade to a new version of ICU? Should I be concerned about API changes, a new Unicode version or a new CLDR version)?
Our goal is for ICU upgrades to go smoothly. Here are some steps you can take to prepare for an upgrade, or to make sure that your usage of ICU is upgrade-friendly.
  • API: ensure that you are not using draft APIs which may have changed in a future release. See the section on API compatibility (§) in the design chapter.
  • Unicode: See the release notes for particular versions of Unicode to ensure that your code is not affected by property changes or other specification changes.
  • CLDR: If your application has test cases which depend on specific translations, these assumptions may become invalid if the translation of an item changes, new support is added, or if a country changes its currency. Try not to depend on specific translations, or be prepared to change test cases. Also, a newer version may support additional translations, currencies, types of calenders
  • Building/Deploying your Application (ICU4C): ICU4C usually builds with symbol renaming  ( See:  binary compatibility (§) in the design chapter  ). Be sure that you build your application with the updated ICU header files, so that it will link against the current ICU. Also, don't hard-code the names of ICU libraries in your build scripts and projects.  Where possible,  link against just the 'base name' such as libicuuc.so or icuuc.lib  rather than a name containing the version number such as libicuuc.so.46 or icuuc46.dll.

Building and Testing ICU

How do I build ICU?

See the readme.html that is included with ICU.

How do I get 32- or 64-bit versions of the ICU libraries?

From ICU version 4.2 on, the configure script will build with the default bit width of your platform. You can request 64 or 32 bits with the --with-library-bits= option,  (e.g. runConfigureICU Linux --with-library-bits=64 or runConfigureICU MacOSX --with-library-bits=32).  (For the behavior of attempting 64 bits if possible, use --with-library-bits=64else32).

How do I build an optimized, non debug ICU?

OnWin32, choose the 'Release' configuration from the drop down menu. Onother platforms, use the runConfigureICU script, which uses theconfigure script. The runConfigureICU script uses the safest level ofoptimization for the ICU libraries. If your platform is not specified,set the following environment variables before running configure orrunConfigureICU: CFLAGS=-O CXXFLAGS=-O

Why am I getting so many test failures when I use "gmake check"?

Pleaseview the readme that is included with ICU. It has all the details onhow to build and test ICU, and it usually answers most problems.

Ifyou are using a compiler that hasn't been tested with ICU before, youmay have encountered an optimization bug with the compiler. On Unixplatforms you can specify --disable-release when you are usingrunConfigureICU (e.g. runConfigureICU --disable-release LinuxRedHat).If this fixes your problem, it is recommended that you report theoptimization bug to the compiler manufacturer.

If neither of these fix your problem, please send an e-mail to the ICU4C Support List.

How can I reduce the size of the ICU data library?

Use the Data Customizer or see "Customizing ICU's Data Library" (§) in the ICU Data Management chapter of this User's Guide.

Why am I seeing a small ( only a few K ) instead of a large ( several megabytes ) data shared library (icudt)?
Opening ICU services fails with U_MISSING_RESOURCE_ERROR and u_init() returns failure.
ICU libraries always must link with the ICU data library. However, so that ICU can bootstrap itself, it first builds a 'stub' data library, in icu\source\stubdata, so that the tools can function. You should only use this in production if you are NOT using DLL-mode data access, in which case you are accessing ICU data as individual files, as an archive (.dat) file, or some other means.  Normally, you should be using the larger library built from icu\source\data.  If you see this issue after ICU has completed building, re-run 'make' in icu\source\data, or the ' makedata' project in Visual Studio. 
Can I add or remove a converter from ICU?

Yes. Please see "Customizing ICU's Data Library" (§) in the ICU Data Management chapter of this User's Guide. You can also get extra converters from http://www.icu-project.org/charts/charset/  or use the ICU Data Customizer tool.

Why don't the makefiles work?

Youneed GNU's make program version 3.8 or later, and you need to run therunConfigureICU script, which is located in the icu/source directory.You may be using a platform that ICU does not support. If the first twoanswers do not apply to you, then you should send an e-mail to the ICU4C Support List.

Here are some places you can find gmake:

  1. GNU: http://www.gnu.org/software/make/

  2. Sun® Source/Binaries: http://www.sunfreeware.com

  3. z/OS (OS/390) Source/Binaries: http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc

  4. IBM i (OS/400) Source/Binaries: http://www.ibm.com/servers/enable/site/porting/iseries/overview/gnu_utilities.html

Due to differences in every platform's make program, we will not support other versions of our make files.

What version of the C iostream is used in ICU4C?

ICU4C uses the latest available version of the iostream on the target platform. Only the io library uses iostream.

I only want to use the C APIs, do I need a C++ compiler?
Large portions of ICU4C were always implemented in C++, and over time we are moving more into that direction. We continue to support and add C APIs, in order to provide binary-compatible APIs. For the implementation, C++ is much better: It is generally easier to work with, which reduces bugs and maintenance. It is closer to Java, which is important for porting between ICU4C and ICU4J. We use RAII (e.g., LocalPointer) to reduce opportunities for memory leaks, we use inline functions and type-safe constants instead of #define, etc. However, we do not use exceptions, and we do not use the Standard Template Library (STL), so ICU4C's dependencies on the C++ library are minimal. See the new  dependencies.txt and search for "group: cplusplus".

As ICU does not use exceptions, the GCC option  -fno-exceptions will reduce or remove the dependencies on the standard C++ library. In GCC 4.5 there is an option -static-libstdc++ which will remove C++ library dependencies. (Also see this article.) Visual Studio has the /MT option, and other compilers may have similar options.  See the  How To Use ICU page for related information on this topic. 

Features of ICU

What computer languages does ICU support?

ICU4C (ICU) is written in C and C++, and ICU4J is written in Java™.

How are the APIs documented for deprecation?

Please read the API lifecycle section in the ICU Design chapter.

What version of Unicode standard does ICU support?

ICU versions 4.0 supports Unicode version 5.1.

The Unicode versions for older versions of ICU are listed on the ICU download page, http://www.icu-project.org/download/

Does ICU support UTF-16 surrogates and Unicode supplementary characters?

Yes.

Does Java support UTF-16 surrogates and Unicode supplementary characters?

Java 5 introduced support for Unicode supplementary characters. Java 1.4 and earlier do not directly support them.

How does ICU relate to Java's java.text.* package?

TheInternational Components for Unicode is available both as a C/C++library and a Java class library. ICU provides internationalizationutilities for writing global applications in C, C++ or Java programminglanguages. ICU was originally developed by the Unicode group at the IBMGlobalization Center of Competency in Cupertino, and ICU wascontributed to Sun for inclusion into the JDK 1.1. ICU4J includesenhanced versions of some of these contributed classes plus additionalclasses that complement the classes in the JDK.

ICU4C started asa C++ port of the original Java Internationalization classes. Theseclasses are now partially implemented in C, with largely parallel C andC++ APIs. ICU4C and ICU4J continue to leapfrog each other with featuresand bug fixes. Over time, features from ICU4J get added to the JDK aswell.

Both versions of ICU have a goal to implement the latestUnicode standard, maintain a single portable source code base, and tomake it easier for software developers to create global applications.

Using ICU

Can I use any of the features of ICU without Unicode strings?

No.In order to use the collation, text boundary analysis, formatting orother ICU APIs, you must use Unicode strings. In order to get Unicodestrings from your native codepage, you can use the conversion API.

How do I declare a Unicode string in ICU?

Use the U_STRING_DECL and U_STRING_INIT macros or use the UnicodeString class for C++. Strings are represented as UChar * as the base string type.

Even though most platforms declare wide strings as wchar_t * or L"" as the base string type, that declaration is not portable because the sizeof(wchar_t) can be 1, 2 or 4, and the encoding may not even be Unicode. On the platforms where sizeof(wchar_t)is 2 bytes, UChar is defined as wchar_t. In that case you can use ICU'sstrings with 3rd party legacy functions; however, we do not suggestusing Unicode strings without the U_STRING_DECL and U_STRING_INIT macros or UnicodeString class because they are platform independent implementations.

How is a Unicode string represented in ICU4C?

AUnicode string is currently represented as UTF-16. The endianess ofUTF-16 is platform dependent. You can guarantee the endianess of UTF-16by using a converter. UTF-16 strings can be converted to other Unicodeforms by using a converter or with the UTF conversion macros.

ICUdoes not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not supportsurrogates, and UTF-16 does support surrogates. This means that UCS-2only supports UTF-16's Base Multilingual Plane (BMP). The notion ofUCS-2 is deprecated and dead. Unicode 2.0 in 1996 changed its defaultencoding to UTF-16.

If you need to do a quick and easy conversionbetween UTF-16 and UTF-8, UTF-32 or an encoding in wchar_t, you shouldtake a look at unicode/ustring.h. In that header file you will findu_strToWCS, u_strFromWCS, u_strToUTF8, u_strFromUTF8, u_strToUTF32 andu_strFromUTF32 functions. These functions are provided for yourconvenience instead of using the ucnv_* API.

You can also take alook at the UTF_*, UTF8_*, UTF16_* and UTF32_* macros, which aredefined in unicode/utf.h, unicode/utf8.h, unicode/utf16.h andunicode/utf32.h. These macros are helpful for programmers that need tomanipulate and process Unicode strings.

How do I index into a UTF-16 string?

Typically, indexes and offsets in strings count string units, not characters (although in c and java they have a char type).

Forexample, in old-fashioned MBCS strings, you would count indexes andoffsets by bytes, not by the variable-width character count. In UTF-16,you do the same, just count 16-bit units (in ICU: UChar).

What is the performance difference between UTF-8 and UTF-16?

Mostof the time, the memory throughput of the hard drive and RAM is themain performance constraint. UTF-8 is 50% smaller than UTF-16 forUS-ASCII, but UTF-8 is 50% larger than UTF-16 for East and South Asianscripts. There is no memory difference for Latin extensions, Greek,Cyrillic, Hebrew, and Arabic.

For processing Unicode data, UTF-16is much easier to handle. You get a choice between either one or twounits per character, not a choice among four lengths. UTF-16 also doesnot have illegal 16-bit unit values, while you might want to check forillegal bytes in UTF-8. Incomplete character sequences in UTF-16 areless important and more benign. If you want to quickly convert smallstrings between the different UTF encodings or get a UChar32 value, youcan use the macros provided in utf.h and its siblings utf8.h andutf16.h. For larger or partial strings, please use the conversion API.

How do the converters work?

Theconverters act like a data stream. This means that the state of thelast character is saved in the converter after each call to the ucnv_fromUnicode() and ucnv_toUnicode() functions. So if the source buffer ends with part of a surrogate Unicode character pair, the next call to ucnv_toUnicode() will write out the equivalent character to the destination buffer. Please see the Conversion chapter of the User's Guide for details.

What does a locale look like in ICU?

ICUlocales are lightweight, and they are represented by just a string.Lightweight means that there is just a string to represent a locale andnothing more. Many platforms have numbers and other data structures torepresent a locale, but ICU has one simple platform independent stringto represent a locale.

ICU locales usually contain an ISO-639language name (2-3 characters), an ISO-3166 country name (2-3characters), and a variant name which is user specified. When alanguage or country is not represented by these standards, ICU uses 3characters to represent that part of the locale. All three parts areseparated by an underscore "_". For example, US English is "en_US", andGerman in Germany with the Euro symbol is represented as "de_DE_EURO".Traditionally the language part of the locale is lowercase, the countryis uppercase and the variant is uppercase. More details are availablefrom the Locale Chapter of this User's Guide.

How is ICU versioned?

Please read the ICU Design chapter of the User's Guide.

What is the relationship between ICU locale data and system locale data?

There is no relationship. ICU is not dependent on the operating system for the locale data.

This also means that uloc_setDefault() does not affect the operating system. The function uloc_setDefault()only sets ICU's default locale. Normally the default locale for ICU iswhatever the operating system says is the default locale.

How are errors handled in ICU?

Sincenot all compilers can handle exceptions, we return an error fromfunctions with a UErrorCode parameter. The UErrorCode parameter of afunction will return any errors that occurred while it was executing.It's usually a good idea to check for errors after calling a functionby using the U_SUCCESS and U_FAILURE macros. U_SUCCESS returns true when the function did run properly, and U_FAILUREreturns true when the function did NOT run properly. You may handlespecific errors from a function by checking the exact value of error.The possible values of UErrorCode are located in utypes.h of the commonproject. Before any function is called with a UErrorCode, it must beinitialized to U_ZERO_ERROR.

Here is an example of UErrorCode being used.

    UErrorCode err = U_ZERO_ERROR;
    callMyFunction(&err);
    if (U_FAILURE(err)) {
        puts("callMyFunction() Failed!");
    }

Please see the ICU Design chapter for details.

With calendar classes, why are months 0-based?

"Ihave been using ICU for its calendar classes, and have found it to beexcellent. That said, I am wondering why the decision was made to keepmonths 0-based while almost all the other calendrical units (years,weeks of year, weeks of month, date, days of year, days of week, daysof week in month) are 1-based? This has been the source of several bugswhenever the mind is slightly less than razor sharp." --Contributor

This was not our choice. We inherited it from the Java Calendar API, unfortunately.

Is there a guideline for COBOL programs that want to use ICU?

There is a COBOL/ICU guideline available since ICU 2.2. For more details, please refer to the COBOL section of this User's Guide.

Where can I get more information about using ICU?

Please send an e-mail to the ICU4C Support List.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值