Tamil Script Code for Information Interchange

Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the IANA in 2007.[1]

TSCII encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter.

Unicode has used the logical order encoding strategy for Tamil, following ISCII, in contrast to the case of Thai, where the visual order encoding grandfathered by TIS-620 was adopted.

The government of Tamil Nadu endorses its own TAB/TAM standards for 8-bit encoding and other, older encoding schemes can still be found on the WWW.

The free etext collection at Project Madurai uses the TSCII encoding, but has already started to provide Unicode versions.

History

The need for a common encoding for Tamil was felt by members of various mailing list based forums in mid-1990s, as there were multiple custom coded fonts were prevalent in those forums. While some of the commercial encodings were popular than the others, they were not accepted by wider community due to conflicting commercial interests. While Unicode was accepted by most as the future standard, most of the desktop systems at that time were still not capable of handling Unicode for Tamil language, and an interim 8-bit encoding was required.

A separate mailing list for discussion of such encodings (webmasters@tamil.net) was created in 1997 to initiate this discussion, starting with an email written by Dr.K.Kalyanasundaram to the popular Tamil author Sujatha who headed the committee for standardization of Tamil keyboard.[2] This forum quickly attracted enthusiastic participants from across the globe, including several prominent Tamil scholars. Archives of these discussion are maintained by INFITT.[3]

Subsequent to publishing TSCII, most of the members of webmasters@tamil.net mailing list became part of INFITT, which is a wider initiative to bring in standardization and continued development in various areas of Tamil computing.

Codepage layout

TSCII
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
 
8_
 
௦
0BE6
128
௧
0BE7
129
ஸ்ரீ
0BB8 0BCD 0BB0 0BC0
130
ஜ
0B9C
131
à®·
0BB7
132
ஸ
0BB8
133
ஹ
0BB9
134
க்ஷ
0B95 0BCD 0BB7
135
ஜ்
0B9C 0BCD
136
ஷ்
0BB7 0BCD
137
ஸ்
0BB8 0BCD
138
ஹ்
0BB9 0BCD
139
க்ஷ்
0B95 0BCD 0BB7 0BCD
140
௨
0BE8
141
௩
0BE9
142
௪
0BEA
143
 
9_
 
௫
0BEB
144
‘
2018
145
’
2019
146
“
201C
147
”
201D
148
௬
0BEC
149
௭
0BED
150
௮
0BEE
151
௯
0BEF
152
ஙு
0B99 0BC1
153
ஞு
0B9E 0BC1
154
ஙூ
0B99 0BC2
155
ஞூ
0B9E 0BC2
156
௰
0BF0
157
௱
0BF1
158
௲
0BF2
159
 
A_
 
NBSP
00A0
160
ா
0BBE
161
ி
0BBF
162
ீ
0BC0
163
ு
0BC1
164
ூ
0BC2
165
ெ
0BC6
166
ே
0BC7
167
ை
0BC8
168
©
00A9
169
ௗ
0BD7
170
à®…
0B85
171
ஆ
0B86
172
ஈ
0B88
174
உ
0B89
175
 
B_
 
ஊ
0B8A
176
எ
0B8E
177
ஏ
0B8F
178
ஐ
0B90
179
à®’
0B92
180
ஓ
0B93
181
à®”
0B94
182
ஃ
0B83
183
க
0B95
184
à®™
0B99
185
ச
0B9A
186
ஞ
0B9E
187
ட
0B9F
188
ண
0BA3
189
த
0BA4
190
ந
0BA8
191
 
C_
 
ப
0BAA
192
à®®
0BAE
193
ய
0BAF
194
à®°
0BB0
195
ல
0BB2
196
வ
0BB5
197
à®´
0BB4
198
ள
0BB3
199
à®±
0BB1
200
ன
0BA9
201
டி
0B9F 0BBF
202
டீ
0B9F 0BC0
203
கு
0B95 0BC1
204
சு
0B9A 0BC1
205
டு
0B9F 0BC1
206
ணு
0BA3 0BC1
207
 
D_
 
து
0BA4 0BC1
208
நு
0BA8 0BC1
209
பு
0BAA 0BC1
210
மு
0BAE 0BC1
211
யு
0BAF 0BC1
212
ரு
0BB0 0BC1
213
லு
0BB2 0BC1
214
வு
0BB5 0BC1
215
ழு
0BB4 0BC1
216
ளு
0BB3 0BC1
217
று
0BB1 0BC1
218
னு
0BA9 0BC1
219
கூ
0B95 0BC2
220
சூ
0B9A 0BC2
221
டூ
0B9F 0BC2
222
ணூ
0BA3 0BC2
223
 
E_
 
தூ
0BA4 0BC2
224
நூ
0BA8 0BC2
225
பூ
0BAA 0BC2
226
மூ
0BAE 0BC2
227
யூ
0BAF 0BC2
228
ரூ
0BB0 0BC2
229
லூ
0BB2 0BC2
230
வூ
0BB5 0BC2
231
ழூ
0BB4 0BC2
232
ளூ
0BB3 0BC2
233
றூ
0BB1 0BC2
234
னூ
0BA9 0BC2
235
க்
0B95 0BCD
236
ங்
0B99 0BCD
237
ச்
0B9A 0BCD
238
ஞ்
0B9E 0BCD
239
 
F_
 
ட்
0B9F 0BCD
240
ண்
0BA3 0BCD
241
த்
0BA4 0BCD
242
ந்
0BA8 0BCD
243
ப்
0BAA 0BCD
244
ம்
0BAE 0BCD
245
ய்
0BAF 0BCD
246
ர்
0BB0 0BCD
247
ல்
0BB2 0BCD
248
வ்
0BB5 0BCD
249
ழ்
0BB4 0BCD
250
ள்
0BB3 0BCD
251
ற்
0BB1 0BCD
252
ன்
0BA9 0BCD
253
இ
0B87
254
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

In the table above 80 is U+0BE6 TAMIL DIGIT ZERO, which has been accepted in Unicode version 4.1. A0 is the NO-BREAK SPACE. The codes AD and FF are unassigned.

Conversion Tools

You can convert TSCII encoded documents to UTF-8 using the GNU iconv tools as follows,

$ iconv -f utf-8 -t tscii hello.utf8 > hello.tscii

Whereas conversion from TSCII to UTF-8 is done by interchanging -f and -t flags.

References

  1. ↑ http://www.iana.org/assignments/charset-reg/TSCII
  2. ↑ http://www.infitt.org/tscii/archives/msg00001.html
  3. ↑ http://www.infitt.org/tscii/archives/maillist.html

External links

This article is issued from Wikipedia - version of the Tuesday, May 05, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.