Why
standardize on a common Chinese language environment
There
are many coding standards today that define how an alphabet
or a character of a language is represented in a computer.
These standards typically define uniquely the corresponding
numerical value that represents each alphabet or character
(including some special symbols that come or work with a language)
of a language as well as the number of bytes necessary to
represent all possible numerical values assumed by the entire
set of characters or alphabets. The main Chinese language
coding standards in the past such as BIG-5, GB, etc were designed
to support only a single language while the more recent ones
like Unicode support multiple languages. To complicate the
matter further, even under the same Chinese coding standard,
the same character can be represented by bytes of different
length, either in fixed or variable length.
Besides,
as there is no "lead-in" characters defined in any
of the coding standards that can tell all software components
(namely, the operating system, database, application software,
etc running on both server and client sides) what kind of
coding scheme (i.e. name of the coding standard, the number
of bytes that constitutes the corresponding numerical value
of a character/alphabet, etc) the subsequent character streams
is employed, the coding scheme thus has to be explicitly agreed
or made known beforehand in order for these software to interpret
and handle correctly each character/alphabet during input,
processing and output. Any mishandling of one character in
any one of the software components (e.g. improperly inserting
a "line feed" character after the "carriage
return" character, improperly removing a "null"
character or a parity bit, etc) or incorrect translation of
one character (e.g. using the incompatible version or wrong
coding standard) will render the subsequent characters unreadable
or incomprehensible.
As such,
it is easy to see that the complexity of ensuring language
coding compatibility among software components is proportional
to the number of different coding standards multiplied by
the number of operating systems supported at client and server
sides. Thus, it is desirable, if possible, to at least standardize
on a common coding standard (e.g. Unicode) in order to avoid
confusion, incompatibilities, and encoding/decoding errors
during information exchange as well as to eliminate all unnecessary
coding conversions among software on the same or different
machines. Moreover, it also shortens software development
time, reduce support and maintenance support, etc. when only
one single coding standard is needed to be tested, implemented
and supported across all software at both client and server
sides.
Users'
experience without standardization of a common Chinese environment
The main
reasons why users encounter Chinese language problems are
(1) the software they are using employs different or incompatible
Chinese coding standards without their knowing, and (2) one
or more characters are being improperly inserted or removed
during processing without their knowing by the operating system,
database software, or other application software. Without
knowing what coding standard is being employed and what character
has been changed, added or removed, when users cannot read
a Chinese document or an email, they often have to force the
viewer or reader software to manually try each coding standard
in turn until either the content becomes readable or the coding
standards are exhausted. In the case where its unreadable
contents has been improperly changed or translated without
any hint, it will be difficult, if not impossible, to recover
the original content. Thus it is always desirable for the
author and the reader of a Chinese document to agree on (1)
a common coding standard, and (2) a set of common system software,
installed on client and/or server, of identical or compatible
version running on a common operating system of the same version
to create and read a Chinese document.
The
need for standardizing on Chinese environment for office PCs
However,
standardizing on a common coding standard, system software,
database software are not sufficient to support a language
environment. We also have to define a standard set-up for
an environment under which (1) a "selected" set
of enterprise software including system software and application
software especially the legacy ones, on both server and client
side are capable of processing and exchanging the characters
correctly according to the chosen coding standard, (2) every
client machine has the necessary character fonts to properly
output the characters to the screen and printer, and (3) a
standard set of input methods that can correctly encode input
characters using the chosen coding standard is in place.
Given
the complexity arisen from the number of operating system
and application software involved and the compatibility problem
among different versions of the same coding standard, we need
to define a standard computer environment under which we can
realize our information exchange systematically, quickly,
and effectively. However, in view of the fact that the current
technology still cannot ensure all Chinese language applications
that run perfectly on a native Chinese system can run equally
well on a native English system and that most of our staff
must work or prefer to work on a native English system, it
is therefore necessary that we define a single common standard
Chinese environment on a client computer within the University,
namely, the native English Windows XP SP2 for running centralized
University software such as Blackboard, AIMS, Email etc.,
which can meet the Chinese language requirements of most users.
Those users who have special Chinese language needs or whose
application software cannot be run under this common Chinese
environment will have to be self-supported by running the
software on dedicated workstations that may have their own
unique character sets and fonts. In the latter case, users
will not need to follow any standards imposed by the common
Chinese environment on both the client and server sides.
Summary
of standards chosen under the University Chinese language
environment
We are
happy to report here that the University has already adopted
Unicode as the language coding standard, Sun Solaris &
Microsoft (MS) Windows server 2003 (or the latest version)
as the operating system standard for central servers, Oracle
& MS-SQLServer as the database standard, and UTF-8 (or
later UTF-16) as the encoding method for contents of University's
web pages, e-learning, and database. By adopting the standards
within the common Chinese environment of Client PCs described
below, we hope to eliminate most, if not all, of the language
problems currently experienced by our users. Even when the
language problems arise, it will be easy to troubleshoot as
there is only a single language environment implemented at
the University.
The following
summarizes the set-up for the client PCs' general Chinese
environment. The set-up should be reviewed regularly to ensure
that they can meet the changes in technology advancement and
the University's needs.
- Microsoft
English Windows XP Professional SP2 (or the latest upgrade)
will be adopted as the operating system standard for the client
PCs to run the University's administrative systems and communication
software such as Email.
- The
Hong Kong Special Administrative Region's Supplementary Chinese
Character Set (HKSCS) will be adopted as the add-on support
for Chinese characters specific to the Hong Kong environment.
- There
will not be any local add-on character set being created by
the University in general.
- 細明體
(or MingLiu) which comes with the MS Windows XP will be the
standard font to be supported for general output of Chinese
content. Users may need to acquire other font types for specialized
output requirements.
- The
MS Office and Adobe Acrobat Reader with Chinese Language Packs
will be the standard office tools for the preparation and
manipulation of Chinese text content.
- MS
Windows XP, 九方(Q9), and 縱橫輸入法
will be the Chinese input methods used on all client PCs.
- Standard
procedure will be devised for converting the required data
and content not compatible with the current coding standard.