Indian Script Code for Information Interchange

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Assamese, Bengali (Bangla), Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India based on Arabic, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Arabic-based writing systems were subsequently encoded in the PASCII encoding.

The Brahmi-derived writing systems are mostly rather similar in structure, but have different letter shapes. So ISCII encodes letters with the same phonetic value at the same codepoint, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the ATR code described below.

One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea. See About ISCII.

ISCII is an 8-bit encoding. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are ISCII-specific. In addition to the codepoints representing characters, ISCII makes use of a codepoint with mnemonic ATR that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.

ISCII has not been widely used outside of certain government institutions and has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.

Codepage layout

The following table shows the character set for Devanagari. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the equivalent form in each writing system. Each character is shown with its decimal code and its Unicode equivalent.

ISCII Devanagari
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
 
0_
 
NUL
0000
0
SOH
0001
1
STX
0002
2
ETX
0003
3
EOT
0004
4
ENQ
0005
5
ACK
0006
6
BEL
0007
7
BS
0008
8
HT
0009
9
LF
000A
10
VT
000B
11
FF
000C
12
CR
000D
13
SO
000E
14
SI
000F
15
 
1_
 
DLE
0010
16
DC1
0011
17
DC2
0012
18
DC3
0013
19
DC4
0014
20
NAK
0015
21
SYN
0016
22
ETB
0017
23
CAN
0018
24
EM
0019
25
SUB
001A
26
ESC
001B
27
FS
001C
28
GS
001D
29
RS
001E
30
US
001F
31
 
2_
 
SP
0020
32
!
0021
33
"
0022
34
#
0023
35
$
0024
36
%
0025
37
&
0026
38
'
0027
39
(
0028
40
)
0029
41
*
002A
42
+
002B
43
,
002C
44
-
002D
45
.
002E
46
/
002F
47
 
3_
 
0
0030
48
1
0031
49
2
0032
50
3
0033
51
4
0034
52
5
0035
53
6
0036
54
7
0037
55
8
0038
56
9
0039
57
:
003A
58
;
003B
59
<
003C
60
=
003D
61
>
003E
62
?
003F
63
 
4_
 
@
0040
64
A
0041
65
B
0042
66
C
0043
67
D
0044
68
E
0045
69
F
0046
70
G
0047
71
H
0048
72
I
0049
73
J
004A
74
K
004B
75
L
004C
76
M
004D
77
N
004E
78
O
004F
79
 
5_
 
P
0050
80
Q
0051
81
R
0052
82
S
0053
83
T
0054
84
U
0055
85
V
0056
86
W
0057
87
X
0058
88
Y
0059
89
Z
005A
90
[
005B
91
\
005C
92
]
005D
93
^
005E
94
_
005F
95
 
6_
 
`
0060
96
a
0061
97
b
0062
98
c
0063
99
d
0064
100
e
0065
101
f
0066
102
g
0067
103
h
0068
104
i
0069
105
j
006A
106
k
006B
107
l
006C
108
m
006D
109
n
006E
110
o
006F
111
 
7_
 
p
0070
112
q
0071
113
r
0072
114
s
0073
115
t
0074
116
u
0075
117
v
0076
118
w
0077
119
x
0078
120
y
0079
121
z
007A
122
{
007B
123
|
007C
124
}
007D
125
~
007E
126
DEL
007F
127
 
8_
 
 
9_
 
 
A_
 
ँ
0901
161
ं
0902
162
ः
0903
163
अ
0905
164
आ
0906
165
इ
0907
166
ई
0908
167
उ
0909
168
ऊ
090A
169
ऋ
090B
170
ऎ
090E
171
ए
090F
172
ऐ
0910
173
ऍ
090D
174
ऒ
0912
175
 
B_
 
ओ
0913
176
औ
0914
177
ऑ
0911
178
क
0915
179
ख
0916
180
ग
0917
181
घ
0918
182
ङ
0919
183
च
091A
184
छ
091B
185
ज
091C
186
झ
091D
187
ञ
091E
188
ट
091F
189
ठ
0920
190
ड
0921
191
 
C_
 
ढ
0922
192
ण
0923
193
त
0924
194
थ
0925
195
द
0926
196
ध
0927
197
न
0928
198
ऩ
0929
199
प
092A
200
फ
092B
201
ब
092C
202
भ
092D
203
म
092E
204
य
092F
207
य़
095F
206
र
0930
205
 
D_
 
ऱ
0931
208
ल
0932
209
ळ
0933
210
ऴ
0934
211
व
0935
212
श
0936
213
ष
0937
214
स
0938
215
ह
0939
216
INV

217
ा
093E
218
ि
093F
219
ी
0940
220
ु
0941
221
ू
0942
222
ृ
0943
223
 
E_
 
ॆ
0946
224
े
0947
225
ै
0948
226
ॅ
0945
227
ॊ
094A
228
ो
094B
229
ौ
094C
230
ॉ
0949
231
्
094D
232
़
093C
233
।
0964
234
ATR

239
 
F_
 
EXT

240
०
0966
241
१
0967
242
२
0968
243
३
0969
244
४
096A
245
५
096B
246
६
096C
247
७
096D
248
८
096E
249
९
096F
250

Special code points

INV character—code point D9 (217)
The INV character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क् (half ka). The Unicode equivalent is no break space 00A0 or dotted circle ◌ 25CC.
ATR character—code point EF (239)
The ATR character followed by a byte code is used to switch to a different font attribute (such as bold) or language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
EXT character—code point F0 (240)
The EXT character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.
Halant character ़—code point E8 (232)
The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्त.
ISCII Unicode
single halant E8 halant 094D
halant + halant E8 E8 halant + ZWNJ 094D 200C
halant + nukta E8 E9 halant + ZWJ 094D 200D
Nukta character ़—code point E9 (233)
The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
ISCII
code point
Original
character
Character
with nukta
Unicode
code point
A1 (161) ँ ॐ 0950
A6 (166) इ ऌ 090C
A7 (167) ई ॡ 0961
AA (176) ऋ ॠ 0960
B3 (179) क क़ 0958
B4 (180) ख ख़ 0959
B5 (181) ग ग़ 095A
BA (186) ज ज़ 095B
BF (191) ड ड़ 095C
C0 (192) ढ ढ़ 095D
C9 (201) फ फ़ 095E
DB (219) ि ॢ 0962
DC (220) ी ॣ 0963
DF (223) ृ ॄ 0944
EA (234) । ऽ 093D

Code pages for ISCII conversion

To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:

Code points for all languages

Each alphabet is listed in the order of its ISCII code point. Code points with asterisks (*) indicate the code point followed by nukta, e.g. क (ka) + ़ = क़ (qa); इ (i) + ़ = ऌ (ḷ). Each character is listed along with its Unicode code point.

External links

This article is issued from Wikipedia - version of the Wednesday, April 13, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.