【读书笔记】" An introduce to Unicode (chapter 2)

Chapter2 – An introduce to Unicode

    ·Unicode is an extension of ASCII character encoding set.

    ·ASCII is now using a byte of 8-bit per character, and Unicode use full of 16 bits for character encoding.

    ·In this case, it allows Unicode to represent all the letters and all ideographs, and other symbol written in other language of the world are used to computer communication.

    ·Unicode is intended initially to supplement ASCII and, with any luck, eventually replace it.

·The C programming language as formalized by ANSI inherently supports Unicode through its support of wide characters.

 

A brief of character sets

Character sets

Introduce period

Feature

Telegraph encoding set

between 1838 and 1854

·Each letter in the alphabet corresponded to a series of short and long pulses (dots and dashes)

·No distinction uppercase and lowercase letters but numbers and punctuation marks had their own codes

Morse code

Between 1821 and 1824

·essentially a 6-bit code that encodes letters

·common letter combinations, common words, and punctuation

Telex codes

standardized in 1931

·5-bit codes that included letter shifts and figure shifts

BCDIC

 

·Binary-Coded Decimal Interchange Code"

8-bit EBCDIC

1960s

 

ASCII

origins in the late 1950s and was finalized in 1967

·a total of 128 codes

·The 26 letter codes are contiguous

·The codes for the 10 digits are easily derived from the value of the digits

ANSI character set

1985

 

Double-Byte Character Sets(DBCS)

 

·maintain all kinds of language character sets

·introduce Code-Page concept

·not compatible to ASCII which is 1 byte

·insufficient and awkward.

Unicode

 

·allowing the representation of 65,536 characters

· sufficient for all the characters and ideographs

· compatible with ASCII

· simply no ambiguity with only one character set

 

Wide character set in C

ANSI C also supports multibyte character set, and wide characters aren't necessarily Unicode.

 

The char Date Type

char data type is encoded by one byte. The definition likes so.

char c = ‘A’;          1byte

char* p = “Hello, World!”;            12bytes

char a[] = “Hello, World!”;           sizeof(a) is 13byte; with ‘/ 0’ as its end

char a[10];                                 sizeof(a) is 13byte;

 

Wide characters

wide char type in C is based on wchar_t data type which is defined in <wchar.h>. The definition likes so.

typedef unsigned short wchar_t

we can use following statement to define some wide characters.

wchar_t c = ‘A’;           2bytes  equivalent to wcha_t c = L‘A’;

wchar_t* p = L“Hello, World!”;           26bytes

wchar_t a[] = L“Hello, World!”;           sizeof(a) is 28bytes;      with ‘/ 0’ as its end

 

Wide character functions library

original char data type character functions is showed below

char *pc = “Hello!”;

wchar_t *pw = “Hello!”;

int iLength = strlen(pc);

iLength = strlen(pw) is syntax error as strlen() is defined to process strlen( const char*) while pw is wchar* ( as defined unsigned short* ). This statement will be considered by complier as error or warning.

 

The form of string stored in memory:

The 6 characters of the character string "Hello!" have the 16-bit values:

0x0048 0x0065 0x 006C 0x 006C 0x 006F 0x0021

       and stored in intel processor as this form:

       48 00 65 00 6C 00 6C 00 6F 00 21 00

       If iLength = strlen(pw) could be complied by complier the iLength will be assigned 1;

      

       wide character function in C

There are alternations of 1byte character functions while us wchar_t data type, and hese functions are declared both in < wchar.h> and in the header file where the normal function is declared

1byte char data type functions

wide char data type functions

strlen( const char*)

wcslen( const wchar_t*)

printf( const char*, …)

wprintf( const wchar_t*, …)

Maintain a single source code

    ·It is obvious to provide two version of the source code. One is complied for ASCII char encoding and the other is complied for wide encoding system.

    ·Use <TCHAR.H> head file to maintain one version source code which is defined in VC++ by Microsoft and it is not the ANSI C Standard.

    How to use TCHAR.H?

    There are some very useful definitions in TCHAR.H :

    #ifdef _UNICODE

    typedef wchar_t TCHAR 
    
    
    #define __T(x) L##x
    
    
    #define _tcslen wcslen
    
    
    #else
    
    
    #define __T(x) x
    
    
    typedef char TCHAR
    
    
    #define _tcslen strlen
    
    
    #endif      /* _UNICODE*/
    
    
    #define _T(x) __T(x)
    
    
    #define _TEXT(x) __T(x)
    
    

  
  
   
    
  
  

       So we can use _tcslen to declare characters whatever there are char or wide char. The translate between wcslen and strlen is automatic by complier. we can only transfer option “ –D _UNICODE ” to complier if we want to use wide char functions in our program.

we can make declarations like so:

TCHAR *pstr = _TEXT(“Hello, World!”);

 

Wide Characters and Windows

WINNT supports not only ASCII character set but UNICODE set. So it can accept both 8-bit and 16-bit character strings.

WIN98 has much less supports of UNICODE than WINNT. Only a few Windows 98 function calls support wide-character strings

 

Windows Header File Types

Windows program includes the header file WINDOWS.H. This file includes a number of other header files, including WINDEF.H, which has many of the basic type definitions used in Windows and which itself includes WINNT.H. WINNT.H handles the basic Unicode support.

There are some new data types and useful Macros in WINNT.H:

These definitions let you mix ASCII and Unicode characters strings in the same program or write a single program that can be compiled for either ASCII or Unicode

typedef char CHAR ;
typedef wchar_t WCHAR ;     // wc
typedef CHAR * PCHAR, * LPCH, * PCH, * NPSTR, * LPSTR, * PSTR ;
typedef CONST CHAR * LPCCH, * PCCH, * LPCSTR, * PCSTR ;

 

typedef WCHAR * PWCHAR, * LPWCH, * PWCH, * NWPSTR, * LPWSTR, * PWSTR ;
typedef CONST WCHAR * LPCWCH, * PCWCH, * LPCWSTR, * PCWSTR ;
#ifdef  UNICODE                   
typedef WCHAR TCHAR, * PTCHAR ;
typedef LPWSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef LPCWSTR LPCTSTR ;

      
      
       
        
      
      
#define __TEXT(quote) L##quote 

      
      
       
        
      
      
#else 
typedef char TCHAR, * PTCHAR ;
typedef LPSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef LPCSTR LPCTSTR ;

      
      
       
        
      
      
#define __TEXT(quote) quote
#endif

      
      
       
        
      
      
#define TEXT(quote) __TEXT(quote)

      
      
       
        
      
      

 

8-bit character variables and strings,

use CHAR, PCHAR (or one of the others),

explicit 16-bit character variables and strings

use WCHAR, PWCHAR, and append an L before quotation marks

8 bit or 16 bit depending on the definition of the UNICODE identifier

use TCHAR, PTCHAR, and the TEXT macro

 

 

Windows' String Functions

Microsoft C includes wide-character and generic versions of all C run-time library functions that require character string arguments.

ILength = lstrlen (pString) ;
pString = lstrcpy (pString1, pString2) ;
pString = lstrcpyn (pString1, pString2, iCount) ;
pString = lstrcat (pString1, pString2) ;
iComp = lstrcmp (pString1, pString2) ;
iComp = lstrcmpi (pString1, pString2) ;

These work much the same as their C library equivalents. They accept wide-character strings if the UNICODE identifier is defined and regular strings if not.

 

 

Using printf in Windows

The printf() function in C could not be used in Window programming.

use fprintf() function to output to files.

use sprintf() function to format strings, and then we can pass it to MessageBox().

char szBuffer [100] ;
         
         
        sprintf (szBuffer, "The sum of %i and %i is %i", 5, 3, 5+3) ;
         
         
        puts (szBuffer) ;

 

int sprintf (char * szBuffer, const char * szFormat, ...)
         
         
{
         
         
     int     iReturn ;
         
         
     va_list pArgs ;
         
         
     va_start (pArgs, szFormat) ;
         
         
     iReturn = vsprintf (szBuffer, szFormat, pArgs) ;
         
         
     va_end (pArgs) ;
         
         
     return iReturn ;
         
         
}
         
         
The va_start macro sets pArg to point to the variable on the stack right above the szFormat argument on the stack.
         
         
        

 

 

ASCII

Wide-Character

Generic

Variable Number
of Arguments

 

 

 

Standard Version

sprintf

swprintf

_stprintf

Max-Length Version

_snprintf

_snwprintf

_sntprintf

Windows Version

wsprintfA

wsprintfW

wsprintf

Pointer to Array
of Arguments

 

 

 

Standard Version

vsprintf

vswprintf

_vstprintf

Max-Length Version

_vsnprintf

_vsnwprintf

_vsntprintf

Windows Version

wvsprintfA

wvsprintfW

wvsprintf

 

A Formatting Message Box

SCRNSIZE.C

#include <windows.h>
         
         
#include <tchar.h>     
         
         
#include <stdio.h>     
         
         

       
       
        
         
       
       
int CDECL MessageBoxPrintf (TCHAR * szCaption, TCHAR * szFormat, ...)
         
         
{
         
         
     TCHAR   szBuffer [1024] ;
         
         
     va_list pArgList ;
         
         
          // The va_start macro (defined in STDARG.H) is usually equivalent to:
         
         
          // pArgList = (char *) &szFormat + sizeof (szFormat) ;
         
         

       
       
        
         
       
       
     va_start (pArgList, szFormat) ;
         
         

       
       
        
         
       
       
          // The last argument to wvsprintf points to the arguments
         
         

       
       
        
         
       
       
     _vsntprintf (szBuffer, sizeof (szBuffer) / sizeof (TCHAR), 
         
         
                  szFormat, pArgList) ;
         
         

       
       
        
         
       
       
          // The va_end macro just zeroes out pArgList for no good reason
         
         

       
       
        
         
       
       
     va_end (pArgList) ;
         
         

       
       
        
         
       
       
     return MessageBox (NULL, szBuffer, szCaption, 0) ;
         
         
}
         
         

       
       
        
         
       
       
int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance,
         
         
                    PSTR szCmdLine, int iCmdShow) 
         
         
{
         
         
     int cxScreen, cyScreen ;
         
         

       
       
        
         
       
       
     cxScreen = GetSystemMetrics (SM_CXSCREEN) ;
         
         
     cyScreen = GetSystemMetrics (SM_CYSCREEN) ;
         
         
     MessageBoxPrintf (TEXT ("ScrnSize"), 
         
         
                       TEXT ("The screen is %i pixels wide by %i pixels high."),
         
         
                       cxScreen, cyScreen) ;
         
         
     return 0 ;
         
         
}

      
      
       
       
      
      
 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值