1 priedas. Kompozicinių sekų vaizdavimas USI
L2/00-150R
Universal Multiple-Octet Coded Character Set (UCS)
ISO/IEC JTC 1/SC 2/WG 2
N _______
Date:
2000-04-27
Source:US national body (Author: V.S.
Umamaheswaran)
Title:Proposal for Unique Sequence
Identifiers (USI-s) and repertoire specifications including these USI-s.
Action:For consideration and
adoption by WG 2
Status:New
proposal
Distribution:ISO/IEC JTC 1/SC 2
and WG 2
Summary
This document proposes to add a new identifier called Unique Sequence
Identifier, and its use in repertoire specifications, as an enhanced response
to the proposal for a composition identifier in document SC2/WG2 N2189, and
similar other requirements.
Requirement
Outside the context of the Unicode (and ISO/IEC 10646) standard, there
is a need for expressing collections of entities that are required by a
specific application -- for example, to state all the letters (including
accented letters), digits, symbols etc. of a given national language such as
Lithuanian.
While most of the entities of such a collection may have a single code
position allocated to them, others can be represented only as sequences of code
positions. Such sequences could be either combining character sequences (for
example, accented Latin letters), or sequences of coded characters (such as
Philippino NG, or Swiss Ch). At present, such sequences do not have a
standardized unique identifier in the same sense as the characters that are
encoded in the standard.
This requirement is expressed in the contribution from Finland and
Germany (in document L2/00-89). When one examines the reasons behind the
Lithuanian proposal, again, one of the driving requirements is the desire to be
able to state uniquely the repertoire required by Lithuania. While the elements
of such sequences -- such as the combining accent marks that are needed -- do
have coded representations (and hence unique identifiers) that could be
referenced in a repertoire, the specific sequences cannot be assigned a
standardized unique identifier.
Background
One of the principles of encoding fully composed characters in the
Unicode Standard (and in ISO/IEC 10646) has been to include it only when it can
be shown that a decomposed representation is not acceptable. A set of fully
composed characters, that could be decomposed, were included in the first
version of the standard for reasons of compatibility with the then-existing
international, national or industry standards. There have been some recent
proposals for adding fully composed characters, for example from Lithuania (see
L2/99-349). These have not been accepted by the UTC or by ISO/IEC JTC1/SC2/WG2,
for several reasons -- the primary reason being the implication of
Normalization (see UTR #22, and L2/00-078). Clause 6.5 -of ISO/IEC 10646-1:
2000 contains a short identification mechanism to reference characters that are
encoded in the standard. It can
be summarized as:
The full syntax of the
notation of a short identifier, in Backus-Naur form, is:
{ U | u } [ {+}xxxx |
{-}xxxxxxxx ]
where "x"
represents one hexadecimal digit (0 to 9, A to F, or a to f).
Annex A of 10646-1 contains identified collections of graphic
characters for subsets of 10646. This identification is done as enumeration of
individual or range(s) of code positions (one form of the short identifier)
within the standard, or as a union or enumerated individual or range(s) of
collections of identified collections, for example:
24 MALAYALAM 0D00 - 0D7F
2000C 200D
250 GENERAL FORMAT
CHARACTERS Collections 200 - 203
However, such enumeration are constrained at present only to characters
defined in the standard.
Proposal
Paragraphs along the following lines is proposed to be added to an
appropriate clause (such as clause 6.5) or Annex A, or a separate Annex to the
standard.
Note: This proposal is worded towards amending 10646-1: 2000. However,
equivalent paragraphs should be considered for the Unicode standard also.
Unique Identification of Sequences
An entity that is represented by a sequence of 'n' code positions from
the standard, is identified in the following form:
<UID1, UID2, UID3,
.. UIDn>
where, UID1, UID2, etc. represent the unique identifiers of the
corresponding characters from the standard, in the same sequence as needed to
represent the identified entity. The syntax for UID1, UID2, … is specified in
clause 6.5. A Comma (optionally followed by a Space character) separates the
UIDs, and a pair of Angle Brackets enclose the whole sequence of UIDs.
Examples of such sequences are:
a composite sequence containing a base character plus one or more
combining characters
a sequence of characters representing a conjunct
a sequence of characters representing a digraph or ligature
a sequence of standalone characters. or,
a sequence of any mix of the above.
When there are multiple sequences that may be used to represent the
same entity, each such sequence will be considered as a separate USI, and the
choice of which one of these needed has to be made, or distinguishing entity
names should be assigned to differentiate between these sequences.
Some examples:
Entity nameSequenceUSI
Philippino NG (N G)<004E, 0047>
Latin Small Letter U With Macron And Tilde(u combining macron combining
tilde)<0075, 0304, 0303>
Malayalam SHWA(sha virama va)<0D36, 0D4D, 0D35>
Repertoires including Uniquely Identified Sequences:
In addition to the unique character identifiers from the standard, a
repertoire definition may include entities represented by unique sequence
identifiers as defined above -- for example to specify a Lithuanian repertoire.
Such a repertoire can be defined in any document, for example in a National
Standard, or a standard that defines all the possible sequences to represent
all of Devanagari or Thai (including the specific valid conjuncts). When
sufficient justification exists, such a repertoire may be proposed to be
included in ISO/IEC 10646 as "an identified collection". To be able
to accommodate such a request, the definition of "collections" in the
standard should be enhanced to specifically recognize the possibility of
inclusion of Uniquely Identified Sequences in a collection.
Note: We have to keep in mind that 10646 collections are the only
current standardized means of being able to identify repertoires which are
subsets of 10646.
The above proposals should meet the stated requirements for repertoire
definitions in document L2/00-89, and other such requirements.
Naming of entities
Single names (as opposed to a sequence of names) of entities which are
represented by sequences will remain outside the scope of Unicode (and ISO/IEC
10646). A sequence of standardized names, corresponding to the elements of
Unique Sequence Identifier may be used to reference a single name that may be
assigned (by the referencing document) to make the correspondence unique.
Note: Situations may arise when an entity may be represented using more
than one sequence -- for example, a multiple-accented character may be
expressed as a sequence of an already encoded composed character and another
combining accent, or as a completely decomposed sequence. The UIS-s for these
sequences will be different. Different entity names will be necessary to be
able to reference the correct UIS.
Reference Documents:
L2/99-349 Proposal to
add Lithuanian accented letters to ISO/IEC10646-1, SC2/WG2 N2075R, 1999-09-09
L2/00-089Identification of decomposed characters in ISO/IEC 10646-1,
Kolehmainen, Küster, SC2/WG2 N2189, 2000-03-14
L2/00-078Implications
of Normalization on Character Encoding (for addition to principles and
procedures); Mark Davis, SC2/WG2 N2176,
2000-03-07
Clause 6.5 Short
identifiers for characters, ISO/IEC 10646-1: 2000
ISO/IEC 10646-1:
2000Annex A, Collections of Graphic Characters for Subsets
.
L2/01-191R
Dotting the i’s
Kent Karlsson and Vladas
Tumasonis
2001-05-05
This is a proposal to update the SpecialCasing.txt data file in the
Unicode Character Database. The current handling of dots above for lowercase
i’s and j’s in SpecialCasing.txt for case mapping is not sufficient, in
particular for Lithuanian where an explicit dot above sometimes needs to be
introduced. This proposal also attempts a somewhat more systematic treatment of
dots above lowercase i’s and j’s for other languages too.
The dot above lowercase i and lowercase j are 'soft' in the sense that
they usually disappear upon uppercasing as well as upon given accents above the
i or j. There are, however exceptions to this.
For these exceptions, where the dot is not 'soft', a 'hard dot above'
(U+0307) is the best way to deal with this matter. For Turkish, the soft dot must be “hardened” for uppercasing
(when there are no accents above, otherwise the soft dot is already gone), but
for Lithuanian it must be “hardened” before accenting above, but not for
uppercasing.
The tables in the exposition are not complete. The formal table in the update to
SpecialCasing.txt are, however, intended to be complete.
to upper and to title
Normal
Any lowercase variant
of i or j with an unblocked extra dot above, if there are no more accents above
on that base letter: remove the extra dot, then uppercase. This removes any
spurious dot above, a dot that is not recommended to be there in the first
place.
|
|
i+dot (no more accents above) |
I |
|
|
i-ogonek+dot (no more accents above) |
I-ogonek [etc.] |
|
|
j+dot (no more accents above) |
J |
Lithuanian
Any lowercase variant
of i or j with an unblocked extra dot above, even if there are more accents
above on that base letter: remove the extra dot, then uppercase.
|
|
i+dot |
I |
|
|
j+dot |
J |
Turkish
An i with an unblocked
extra dot above, if there are no more accents above on that base letter: keep
the extra dot, but don’t add another one (for the cases below), then uppercase.
This, again, takes care of the spurious case where
|
|
i (no more accents above) |
I-dot |
|
|
i+dot (no more accents above) |
I-dot |
to lower
Normal
Any lowercase or
uppercase variant of i or j with an unblocked extra dot above, if there are no
more accents above on that base letter: remove the extra dot.
|
|
i+dot (no more accents above) |
i |
|
|
|
i-ogonek+dot (no more accents above) |
i-ogonek |
|
|
|
... |
... |
|
|
|
j+dot (no more accents above) |
j |
|
|
|
I-dot (if more accents above) |
i -dot |
|
|
|
I-dot (if no more accents above) |
i (already in UniData.txt) |
|
|
|
I -dot (if more accents above) |
i -dot (for NFD—NFC consistency; already in UniData) |
|
|
|
I -dot (if no more accents above) |
i (for NFD—NFC consistency) |
|
|
|
J -dot (if no more accents above) |
j (some degree of systematic...) |
|
Lithuanian
Any lowercase variant
of i or j with an unblocked extra dot above, if there are no more accents above
on that base letter: remove the extra dot. Uppercase I’s and J’s that have
extra accents above must get an extra dot above inserted.
|
|
I (if more accents above) |
i -dot |
|
|
J (if more accents above) |
j -dot |
|
|
I-ogonek (if more accents above) |
i-ogonek -dot |
|
|
I-grave |
i -dot -grave |
|
|
I-acute |
i -dot -acute |
|
|
I-tilde |
i -dot -tilde |
For NFD—NFC consistency a number of “I-letters” that are not used
in Lithuanian must be handled too.
Turkish
Any lowercase variant
of i or j with an unblocked extra dot above, if there are no more accents above
on that base letter: remove the extra dot.
Turkish and Azeri (at least) use a dotless i as the lowercase of I. It
should not be used if there are more accents above (then use an ordinary i
which then looses the dot...).
|
|
I (no more accents above) |
i-dotless |
Suggested
changes to SpecialCasing.txt regarding dotting i’s and j’s
The exposition tables above were not intended to be complete. The formal tables below are intended to be
complete enough to cover the orthographic requirements and also be such that
NFD and NFC are handled consistently. Cases like barred i or j-crosstail are
not covered. Review and comments are welcome.
The intent is for these modifications to be included in Unicode 3.2, or
if possible, in an update to Unicode 3.1.
Old lines (to remove)
1st-------------------
#
characters where they are 1-1, and does not have locale-specific mappings.)
2nd-------------------
#
The <condition_list> is optional. Where present, it consists of one or
more locales or contexts,
# separated by spaces.
3rd-------------------
# A
locale is defined as:
# <locale> := <ISO_639_code> ( "_" <ISO_3166_code>
( "_" <variant> )? )?
# <ISO_3166_code> := 2-letter ISO country code,
# <ISO_639_code> := 2-letter ISO language code
4th-------------------
# A
context is one of the following choices:
5th-------------------
#
AFTER_i: The last base character was "i" 0069
6th-------------------
7th-------------------
#
================================================================================
# Locale-sensitive mappings
# ================================================================================
# Lithuanian
0307; 0307; ; ; lt AFTER_i; # Remove DOT ABOVE after "i" with upper
or titlecase
# Turkish, Azeri
0049; 0131; 0049; 0049; tr; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0049; 0131; 0049; 0049; az; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
# Note: the following cases are already in the UnicodeData file.
# 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I
# 0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
end-------------------
New lines (to insert, replacing the old ones listed above)
1st-------------------
#
characters where they are 1-1, and does not have language-specific mappings.)
#
# Note that when case mapping a string in a normal form,
# the result need not be in any normal form.
#
2nd-------------------
#
The <condition_list> is optional. Where present, it consists of one or
more
# contexts, one of which may be a language code, separated by spaces.
3rd-------------------
#
A _subset_ of RFC 3066 conforming language codes, _sufficient for this file_,
# can be described as:
# <langcode> := two-letter ISO 639-1 language code
4th-------------------
#
A context is a <langcode> or one of the following choices (test on
original string):
5th-------------------
#
AFTER_i: The last preceding base character was "i" (0069),
"j" (006A),
# or has a canonical decomposition that begins with an "i" or
"j" but has no
# combining characters above (i.e., i-ogonek (012F), i-tilde-below (1E2D),
# or i-dot-below (1ECB)); AND no combining character class 230 (above) has
# intervened. (Neither i-stroke (0268) or j-crosstailed (029D) need be
# specially handled below, while they also have a soft dot above that
# is lost on normal uppercase or accenting above.)
#
# AFTER_CAP_I: The last preceding base character was "I" (0049),
"J" (004A),
# or has a canonical decomposition that begins with an "I" or
"J" but has no
# combining characters above (i.e., I-ogonek (012E), I-tilde-below (1E2C),
# or I-dot-below (1ECA)); AND no combining character class 230 (above) has
# intervened. (I-stroke (0197) need not be specially handled below, while
# it also has a soft dot above in lowercase form.)
#
# MORE_ACCENTS_ABOVE: The current combining sequence has at least one class 230
# (above) combining character after the currently considered character.
6th-------------------[no old text]
#-----
# Normal dotting/undotting of i's and j's (capital and small):
#-----
# Remove spurious explicit dot above small i or j when case mapping,
# if no more accents above:
0307; ; ; ; AFTER_i NON_MORE_ACCENTS_ABOVE # COMBINING DOT ABOVE
# Remove explicit dot above capital i or j when lowercasing,
# if no more accents above (mainly for NFC-NFD consistency for i--I-dot):
0307; ; 0307; 0307; AFTER_CAP_I NON_MORE_ACCENTS_ABOVE # COMBINING DOT ABOVE
# For NFC-NFD consistency for I-dot--i:
0130; 0069 0307; 0130; 0130; MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I WITH
DOT
# Note: the following cases are already in the UnicodeData file.
# 0131; 0131; 0049; 0049; # LATIN SMALL LETTER DOTLESS I
# 0130; 0069; 0130; 0130; [NON_MORE_ACCENTS_ABOVE] # LATIN CAPITAL LETTER I
WITH DOT ABOVE
7th-------------------
#
================================================================================
# Language-sensitive mappings
#
================================================================================
#
# Lithuanian:
#
# Remove dot above small i's or j's when uppercasing,
# even if there are more accents above:
0307; 0307; ; ; lt AFTER_i # COMBINING DOT ABOVE
# Introduce an explicit dot above when lowercasing capital I's and J's
# if there are more accents above (grave, acute, tilde above, and ogonek
# occur in Lithuanian; the rest are just for consistency between NFC and NFD):
0049; 0069 0307; 0049; 0049; lt MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I
004A; 006A 0307; 004A; 004A; lt MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER J
012E; 012F 0307; 012E; 012E; lt MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I WITH
OGONEK
00CC; 0069 0307 0300; 00CC; 00CC; lt # LATIN CAPITAL LETTER I WITH GRAVE
00CD; 0069 0307 0301; 00CD; 00CD; lt # LATIN CAPITAL LETTER I WITH ACUTE
0128; 0069 0307 0303; 0128; 0128; lt # LATIN CAPITAL LETTER I WITH TILDE
1E2C;
1E2D 0307; 1E2C; 1E2C; lt MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I WITH
TILDE BELOW
1ECA; 1ECB 0307; 1ECA; 1ECA; lt MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I
WITH DOT BELOW
00CE; 0049 0307 0302; 00CE; 00CE; lt # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0134; 004A 0307 0302; 0134; 0134; lt # LATIN CAPITAL LETTER J WITH CIRCUMFLEX
0128; 0049 0307 0303; 0128; 0128; lt # LATIN CAPITAL LETTER I WITH TILDE
012A; 0049 0307 0304; 012A; 012A; lt # LATIN CAPITAL LETTER I WITH MACRON
012C; 0049 0307 0306; 012C; 012C; lt # LATIN CAPITAL LETTER I WITH BREVE
01CF; 0049 0307 030C; 01CF; 01CF; lt # LATIN CAPITAL LETTER I WITH CARON
0208; 0049 0307 030F; 0208; 0208; lt # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE
020A; 0049 0307 0311; 020A; 020A; lt # LATIN CAPITAL LETTER I WITH INVERTED
BREVE
1E2E; 0049 0307 0308 0301; 1E2E; 1E2E; lt # LATIN CAPITAL LETTER I WITH
DIAERESIS AND ACUTE
1EC8; 0049 0307 0309; 1EC8; 1EC8; lt # LATIN CAPITAL LETTER I WITH HOOK ABOVE
#
# Turkish, Azeri:
#
# Remove spurious dot above small i's when lowercasing, if no more accents
above:
0307; ; 0307; 0307; tr AFTER_i NON_MORE_ACCENTS_ABOVE # COMBINING DOT ABOVE
0307; ; 0307; 0307; az AFTER_i NON_MORE_ACCENTS_ABOVE # COMBINING DOT ABOVE
# I—i-dotless and I-dot--i-with-soft-dot are case pairs in Turkish and Azeri,
# when there are no more accents above (otherwise use the ordinary casing
rules):
0069; 0069; 0130; 0130; tr NON_MORE_ACCENTS_ABOVE # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az NON_MORE_ACCENTS_ABOVE # LATIN SMALL LETTER I
0049; 0131; 0049; 0049; tr NON_MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I
0049; 0131; 0049; 0049; az NON_MORE_ACCENTS_ABOVE # LATIN CAPITAL LETTER I
end-------------
3 priedas. HTML dokumentas su
kirčiuotomis raidėmis
<!DOCTYPE HTML
PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>USI</TITLE>
<META
http-equiv=Content-Type content="text/html; charset=utf-8">
<META
content="MSHTML 5.50.4134.600" name=GENERATOR></HEAD>
<BODY>
<H1>Lithuanian
USI</H1>
<TABLE
border=1>
<TBODY>
<TR>
<TH>#</TH>
<TH>Graphic symbol</TH>
<TH>Name</TH>
<TH>USI</TH>
<TH>Composed glyph</TH>
<TH>Without COMBINING DOT ABOVE
(U+0307)</TH></TR>
<TR>
<TD>1</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER A WITH
OGONEK AND ACUTE</TD>
<TD><U+0104,
U+0301></TD>
<TD>Ą́</TD></TR>
<TR>
<TD>2</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER A WITH OGONEK
AND ACUTE</TD>
<TD><U+0105,
U+0301></TD>
<TD>ą́</TD></TR>
<TR>
<TD>3</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER A WITH OGONEK
AND TILDE</TD>
<TD><U+0104,
U+0303></TD>
<TD>Ą̃</TD></TR>
<TR>
<TD>4</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER A WITH OGONEK
AND TILDE</TD>
<TD><U+0105,
U+0303></TD>
<TD>ą̃</TD></TR>
<TR>
<TD>5</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER E WITH
OGONEK AND ACUTE</TD>
<TD><U+0118,
U+0301></TD>
<TD>Ę́</TD></TR>
<TR>
<TD>6</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER E WITH OGONEK
AND ACUTE</TD>
<TD><U+0119, U+0301></TD>
<TD>ę́</TD></TR>
<TR>
<TD>7</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER E WITH
OGONEK AND TILDE</TD>
<TD><U+0118,
U+0303></TD>
<TD>Ę̃</TD></TR>
<TR>
<TD>8</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER E WITH OGONEK
AND TILDE</TD>
<TD><U+0119,
U+0303></TD>
<TD>ę̃</TD></TR>
<TR>
<TD>9</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER E WITH DOT
ABOVE AND ACUTE</TD>
<TD><U+0116,
U+0301></TD>
<TD>Ė́</TD></TR>
<TR>
<TD>10</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER E WITH DOT
ABOVE AND ACUTE</TD>
<TD><U+0117,
U+0301></TD>
<TD>ė́</TD></TR>
<TR>
<TD>11</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER E WITH DOT
ABOVE AND TILDE</TD>
<TD><U+0116, U+0303></TD>
<TD>Ė̃</TD></TR>
<TR>
<TD>12</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER E WITH DOT
ABOVE AND TILDE</TD>
<TD><U+0117,
U+0303></TD>
<TD>ė̃</TD></TR>
<TR>
<TD>13</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER I WITH DOT
ABOVE AND GRAVE</TD>
<TD><U+0069, U+0307,
U+0300></TD>
<TD>i̇̀</TD>
<TD>ì</TD></TR>
<TR>
<TD>14</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER I WITH DOT
ABOVE AND ACUTE</TD>
<TD><U+0069, U+0307,
U+0301></TD>
<TD>i̇́</TD>
<TD>í</TD></TR>
<TR>
<TD>15</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER I WITH DOT
ABOVE AND TILDE</TD>
<TD><U+0069, U+0307,
U+0303></TD>
<TD>i̇̃</TD>
<TD>ĩ</TD></TR>
<TR>
<TD>16</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER I WITH
OGONEK AND ACUTE</TD>
<TD><U+012E,
U+0301></TD>
<TD>Į́</TD></TR>
<TR>
<TD>17</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER I WITH OGONEK
AND DOT ABOVE AND ACUTE</TD>
<TD><U+012F, U+0307,
U+0301></TD>
<TD>į̇́</TD>
<TD>į́</TD></TR>
<TR>
<TD>18</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER I WITH
OGONEK AND TILDE</TD>
<TD><U+012E,
U+0303></TD>
<TD>Į̃</TD></TR>
<TR>
<TD>19</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER I WITH OGONEK
AND DOT ABOVE AND TILDE</TD>
<TD><U+012F, U+0307,
U+0303></TD>
<TD>į̇̃</TD>
<TD>į̃</TD></TR>
<TR>
<TD>20</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER J WITH
TILDE</TD>
<TD><U+004A,
U+0303></TD>
<TD>J̃</TD></TR>
<TR>
<TD>21</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER J WITH
TILDE</TD>
<TD><U+006A, U+0307,
U+0303></TD>
<TD>j̇̃</TD>
<TD>j̃</TD></TR>
<TR>
<TD>22</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER L WITH
TILDE</TD>
<TD><U+004C,
U+0303></TD>
<TD>L̃</TD></TR>
<TR>
<TD>23</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER L WITH
TILDE</TD>
<TD><U+006C, U+0303></TD>
<TD>l̃</TD></TR>
<TR>
<TD>24</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER M WITH
TILDE</TD>
<TD><U+004D,
U+0303></TD>
<TD>M̃</TD></TR>
<TR>
<TD>25</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER M WITH
TILDE</TD>
<TD><U+006D,
U+0303></TD>
<TD>m̃</TD></TR>
<TR>
<TD>26</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER R WITH
TILDE</TD>
<TD><U+0052,
0303></TD>
<TD>R̃</TD></TR>
<TR>
<TD>27</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER R WITH
TILDE</TD>
<TD><U+0072,
U+0303></TD>
<TD>r̃</TD></TR>
<TR>
<TD>28</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER U WITH
OGONEK AND ACUTE</TD>
<TD><U+0172,
U+0301></TD>
<TD>Ų́</TD></TR>
<TR>
<TD>29</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER U WITH OGONEK
AND ACUTE</TD>
<TD><U+0173,
U+0301></TD>
<TD>ų́</TD></TR>
<TR>
<TD>30</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER U WITH
OGONEK AND TILDE</TD>
<TD><U+0172,
U+0303></TD>
<TD>Ų̃</TD></TR>
<TR>
<TD>31</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER U WITH OGONEK
AND TILDE</TD>
<TD><U+0173,
U+0303></TD>
<TD>ų̃</TD></TR>
<TR>
<TD>32</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER U WITH
MACRON AND ACUTE</TD>
<TD><U+016A,
U+0301></TD>
<TD>Ū́</TD></TR>
<TR>
<TD>33</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER U WITH MACRON
AND ACUTE</TD>
<TD><U+016B,
U+0301></TD>
<TD>ū́</TD></TR>
<TR>
<TD>34</TD>
<TD> </TD>
<TD>LATIN CAPITAL LETTER U WITH
MACRON AND TILDE</TD>
<TD><U+016A,
U+0303></TD>
<TD>Ū̃</TD></TR>
<TR>
<TD>35</TD>
<TD> </TD>
<TD>LATIN SMALL LETTER U WITH MACRON
AND TILDE</TD>
<TD><U+016B,
U+0303></TD>
<TD>ū̃</TD></TR></TBODY></TABLE><!--
------------------------- --><!-- END OF CONVERTED OUTPUT --><!--
------------------------- -->
<P><FONT
size=2><I><BR>Last updated on 2001.12.09 <BR>By Vladas
Tumasonis
<BR>Email:
</I></FONT><A
href="mailto:vladas.tumasonis@maf.vu.lt"><FONT
size=2><I>vladas.tumasonis@maf.vu.lt</I></FONT></A><FONT
size=2><I>
</I></FONT></P></BODY></HTML>
[1] Standartas ISO/IEC 10646 yra Unicode viršaibis. Jis apibrėžia ženklų kodavimą 32 bitais (4 baitais). Visi Unicode ženklai yra standarte ISO/IEC 10646, visų jų kodų pirmieji 16 bitų lygūs nuliui, o kitų 16 bitų Unikodo kodai sutampa su ISO/IEC 10646 kodais. Todėl nėra esminių skirtumų tarp šių dviejų kodavimų. Unicode kuria Unicode konsorciumas, kuris nepriklauso Tarptautinei standartų organizacijai. Todėl Unicode nelaikomas tarptautiniu standartu. Tačiau abi organizacijos glaudžiai bendradarbiauja. Dėl to ženklų kodavimas standarte ISO/IEC 10646 ir Unicode yra suderintas