This page contains some accompanying examples to Alan Flavell’s “I18n – text direction”. Examples that are supposed to display incorrectly (i.e. not as intended) in either Mozilla Firefox 2 or Internet Explorer 6 are in red. Read the source text to understand how it’s done!
You can specify text direction by (paired) Unicode control characters,
by (paired) control characters written as numeric references, by HTML markup, or by CSS properties.
Control characters are restricted to plain text and are
not suitable for use with markup languages
(except ‎ and ‏).
The preferred method for HTML is to use HTML markup.
Use control characters written as numeric references only in places where no markup is possible,
such as attribute values (alt, title, etc.).
Occasionally it may be convenient to specify
text direction via CSS;
for example, to set the direction of columns in tables
rather than to put a dir attribute into each and every <td>.
In the following table, div represents any block-level element,
and span represents any inline element.
| Plain text | HTML 4 | HTML 4 | CSS 2 |
|---|---|---|---|
| control chars | control chars | markup | properties |
| not applicable | not applicable | <div dir=ltr>...... </div> |
direction: ltr;unicode-bidi: normal |
| not applicable | not applicable | <div dir=rtl>...... </div> |
direction: rtl;unicode-bidi: normal |
U+202A ...... U+202C |
‪ ...... ‬ |
<span dir=ltr>...... </span> |
direction: ltr;unicode-bidi: embed |
U+202B ...... U+202C |
‫ ...... ‬ |
<span dir=rtl>...... </span> |
direction: rtl;unicode-bidi: embed |
U+202D ...... U+202C |
‭ ...... ‬ |
<bdo dir=ltr>...... </bdo> |
direction: ltr;unicode-bidi: bidi-override |
U+202E ...... U+202C |
‮ ...... ‬ |
<bdo dir=rtl>...... </bdo> |
direction: rtl;unicode-bidi: bidi-override |
U+200E |
‎ |
not applicable | not applicable |
U+200F |
‏ |
not applicable | not applicable |
If the line below is displayed as
“( 12 11 10 9 8 7 6 5 4 3 2 1 0 )”,
then your browser recognizes the dir attribute
and it is probably ready for right-to-left text.
Preferably, the line should be right-aligned.
( 0 1 2 3 4 5 6 7 8 9 10 11 12 )
The control or formatting characters U+202A to U+202E are not suitable for use with HTML. If they are written directly into the source text, they interfere with the left-to-right markup and make editing or even viewing the source a nightmare. Furthermore, the bidirectional algorithm stops at newlines. It would no longer be possible to structure the source text by newlines, which could separate, for example, the paired U+202B and U+202C.
The closing U+202C or ‬ is sometimes implied and
may be omitted like the closing </p> and </td> in HTML.
Nevertheless, it is safer to close always explicitly.
To write “שבת [שאבעס]”, you can use HTML markup with <span dir=rtl> or, exceptionally, write the control characters ‫ and ‬ as numeric references.
Inserting the control characters U+202B and U+202C directly results in a mess
when viewing the source.
‫<Blang="he">שבת</b>[<I>שאבעס</i>]‬
<Blang="he">שבת</b>[<I>שאבעס</i>]
Never use UTF-8-encoded control characters,
but only character references
like ‫ and ‏.
dir attributeThree or more directional levels (here: Latin > Hebrew > Latin)
must be defined by control characters or, preferably, by HTML markup.
The third line has no dir markup and is thus displayed as having only two directional levels.
The words mean “Congratulations!”
The words “מזל טוב” mean “Congratulations!”
The words “מזל [mazl] טוב [tov]” mean “Congratulations!”
The words “מזל [mazl] טוב [tov]” mean “Congratulations!”
![]()
Numbers, which are always written from left to right, are likely to mess with right-to-left text.
For example, “12 345” denote two numbers and
should be displayed as “345 12”.
On the other hand, “12 345” denotes a single number and
should always be displayed as “12 345”.
The first line is from Google’s Urdu interface with overall dir=rtl; the second line has proper dir markup.
(Both lines are written in the restricted MacUrdu character set.)
© 2004 Google — 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google — 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google — 9 000 000 veb safahāt kī talāš ho rahī hai
Always specify the dir attribute for each piece of text, starting with
<body dir=ltr> or <body dir=rtl>.
bdo elementTo write Arabic or Hebrew letters from left to right,
you need the bdo element in addition to the attribute dir=ltr.
The vowels α ε η ι ο derive from
א ה ח י ע, resp.
The vowels α ε η ι ο derive from
א ה ח י ע, resp.
The next examples assume a right-to-left context (dir=rtl)
such as an Arabic-language page.
The date 31 December 1999 is to be shown in
all-numeric form:
1999-12-31.
The first line in each example is the one where Internet Explorer 6 fails.
The ASCII hyphen is a
European number separator.
Therefore, no special markup should be necessary.
However, Internet Explorer 6 needs dir=ltr.
1999-12-31
1999-12-31
The en-dash (–) is
another neutral.
Therefore, markup with <bdo dir=ltr> is necessary for all browsers.
١٩٩٩–١٢–٣١
١٩٩٩–١٢–٣١
The traditional Arabic date format calls for the slash as separator
and the suffix م (mīlād = birth), meaning “AD”.
The slash is a
common number separator.
Therefore, no special markup should be necessary.
However, Internet Explorer 6 needs <bdo dir=ltr>.
١٩٩٩/١٢/٣١ م
١٩٩٩/١٢/٣١ م
Use the attribute dir=ltr with European digits
and the tag <bdo dir=ltr> with Arabic-Indic digits.
lrm and rlm charactersThe left-to-right mark
(‎ = ‎)
and the right-to-left mark
(‏ = ‏)
are alternative ways to specify the direction of neutral characters such as punctuation marks or spaces.
The above examples are rewritten here using ‎.
The vowels α ε η ι ο derive from
א ה ח י ע, resp.
The vowels α ε η ι ο derive from
א ה ח י ע, resp.
1999-12-31
1999-12-31
١٩٩٩–١٢–٣١
١٩٩٩–١٢–٣١
١٩٩٩/١٢/٣١ م
١٩٩٩/١٢/٣١ م
© 2004 Google — 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google — 90 00 000 ويب صفحات كى تلاش هو رهى هے
© 2004 Google — 9000000 ويب صفحات كى تلاش هو رهى هے
The second line did not work in Internet Explorer 5, which needed a number without spaces.
This example shows that the explicit markup with the dir attribute is more reliable
than the implicit ‎ and ‏ marks.
zwnj characterThe zero-width non-joiner
(‌ = ‌)
is necessary for writing Persian
where certain affixes and compound words do not join.
It is shown by a hyphen in the transliterated words below.
| هفته | hafteh | week |
| هفتهها | hafteh-hā | weeks |
| هفتهها | haftehhā | wrong |
| موزه | mūzeh | museum |
| موزهها | mūzeh-hā | museums |
| موزهها | mūzehhā | wrong |
| سه | seh | three |
| سهشنبه | seh-šanbeh | Tuesday |
| سهشنبه | sehšanbeh | wrong |
| راه | rāh | way, road |
| راهآهن | rāh-āhan | railway |
| راهآهن | rāh’āhan | wrong |
| نرم | narm | soft |
| نرمافزار | narm-afzār | software |
| نرمافزار | narmāfzār | wrong |
zwj characterThe zero-width joiner
(‍ = ‍)
is necessary to show isolated glyphs of the Arabic letters.
At least Mozilla Firefox 2 needs it when Arabic letters are separated by HTML markup.
(The zero-width joiner did not work with earlier browser versions such as Netscape 7.0
or Internet Explorer 5.)
ن · س · ت · ع · ل · ي · ق ← ن س ت ع ل ي ق ← نستعليق
ن · س · ت · ع · ل · ي · ق ← ن س ت ع ل ي ق ← نستعليق
On the other hand, Internet Explorer 6 joins letters even when they are separated by markup.
Therefore you still need an additional ‌ if the letters shall not join.
سههزار ،
دههزار
سههزار ،
دههزار
The zero-width joiner can also be used to write Urdu text in and for the restricted
MacUrdu character set where the
two-eyed hē (ھ) is not available.
| هفته | haftah | week |
| هاته | hāth | wrong |
| هاته | hāth | hand |
| ديده | dīdah | eye |
| دوده | dūdh | wrong |
| دوده | dūdh | milk |
The sequence ‍‌ is needed for Sindhi where the initial form
of the letter hē (ﻫ)
is used as consonant, while the connecting form (ﻬ) is reserved for aspiration.
| جهنگل | jhangalu | jungle |
| گهر | gharu | house |
| منهن | munhun | wrong |
| منهن | munhun | mouth |
| ويه | vīha | wrong |
| ويه | vīha | twenty |
Persian word processing / ZWNJ – ZWJ
Andreas Prilop
10 October 2012