रविवार, 23 फ़रवरी 2020

Uses of Unicode

आज पूरे विश्व में इंटरनेट का बोलबाला है। भारत जैसा देश भी इससे अछूता नहीं रहा है। इस वैश्वीकरण के कारण इंटरनेट और Computer तकनीक को भी बढ़ते बाज़ार की ज़रूरतों के अनुसार खुद को ढालना पड़ा है। इसीलिए वह कंप्यूटर जो करीब पन्द्रह वर्ष पहले तक केवल ABC तक सीमित था, आज वह पूरे विश्व की भाषाओं को दिखाने में सक्षम है। ख़ा कर के भारतीय भाषआओं के लिए तो यह वरदान जैसा है। हम कभी सोच भी नहीं सकते थे कि भारतीय भाषाओं को कंप्यूटर तकनीक में इतना विस्तार मिलेगा जितना आज है। आज आप हर जगह Facebook, WhatsApp में हिन्दी देख सकते हैं। हिन्दी समेत तमाम भारतीय भाषाओं को पहले Computer में लिखने के लिए हर Computer में font install करने पड़ते थे, जो बहुत ही सीमित तबका इसतेमाल करता था जिनका प्रकाशन या मुद्रण व्यवसाय के साथ लेना देना होता था। पर आज हर आम व्यक्ति के लिए Computer या mobile में उसकी भाषा पढ़ पाना और लिख पाना बहुत आसान हो गया है।
Purpose of this presentation is to increase awareness about the recent developments in the computing world in context of Indian languages. Although all facts given in this presentation refer to Hindi language and Devnagri Script but all these facts apply equally to all other Indian languages and scripts, and even to a lot of non standard languages in the world. Some of the important points to be discussed are:
  • how to write Hindi in computer and mobile without using old fashioned fonts. How to search Hindi text in a search engine. How to rename files and folders in Hindi.
  • how to spell check the Hindi text for mistakes
  • how to let the computer or mobile read Hindi text aloud (text to speech synthesis)
  • how to write Hindi in computer by speaking, not by typing (speech to text synthesis)
  • how to scan Hindi printed text to an editable format (OCR, Optical Character Recognition)
  • Examples of use of Unicode Hindi in Databases, Electronic Hardware, Software, File System and so on.

Google has contributed a lot to give these sophisiticated facilities to Indian languages. All these technologies have developed during last 15 years with the invention of Unicode. Unicode is a new standard for computer languages which consists of 16 bit binary characters instead of traditional 8 bit binary characters. Whereas 8 bit binary characters offer a possibility of using maximum 256 characters, which are easily occupied by Latin characters, Greek characters, numerals and some special characters, 16 bit characters can accomodate more than 64 thousand characters which are enough to accommodate all non standard languages in the world. In short, all world languages have now their own identity in the computer world. Computer / Internet is not any more a monopoly of latin characters but hindi, tamil, bengali, punjabi and so on also belong to it. That is why now we see a lot of publications and media houses maintaining their websites in Hindi. Also we see a lot of Hindi text in Facebook, Whatsapp and other social media platforms. This is all because of Unicode. Although this new technology is still restricted to online world. It has not been really able to penetrate the Hindi print market because the typical print layout software like Quark Express, Adobe Indesign still can't render Unicode Hindi properly because of some inherent complexities of the Devnagri script. One such complexity is the use of halant (हलन्त) which is used to half the length of a consonant. This halved consonant can be further combined with other consonants to write complex sounds (see examples). Another complexity is displaying the छोटी इ की मात्रा (ि) on the left side of consonant, although technically it is written after the consonant in Unicode. This plays a great role in sorting the text (you will examples in point number 2. That is why the page setters employed in Hindi print media houses are used to type fast in their favorite ASCII font. Typing fast in Unicode needs some extra practice.
Below are listed some online utilities for Hindi language. Of course they are not the only solutions, but the purpose here is to make at least one example in each category:
  1. You can write Hindi in a computer by simply opening this online page and just start typing in the box. You don't need any kind of installation. That means you can use it on any computer. After finishing, just copy paste to your main document. It is just 20kb small Javascript based HTML page which you can also store on your pen drive for offline use. This works best with Internet Explorer. For other languages and special symbols, this online editor can be used which even can be customised by anyone. Main characteristic of these editors is that with time, you can write the text really fast because they are mapped one to one. They don't give any suggestions, they don't predict your typing behaviour. They just do what you type.
  2. The font based text can be converted to Unicode text using this online free application. A unicode Hindi text can be sorted much better than its ASCII counterpart. See the examples below.
    sorting in unicode
    किशोर
    कीर्ति
    पिपली
    पिस्ता
    पीसना
    बिहारी
    बीहड़
    सिमरदीप
    सीता
    
    sorting in krutidev font
    chgM+
    dhfRkZ
    fCkgkjh
    fd'kksj
    fIkiyh
    fIkLrk
    fLkejnhi
    ihlUkk
    lhrk
    
    बीहड़
    कीर्ति
    बिहारी
    किशोर
    पिपली
    पिस्ता
    सिमरदीप
    पीसना
    सीता
    
    sorting in chanakya font
    •èçÌü
    âèPææ
    ç•àææðÚ
    çâ×Ú¼èŒæ
    çÂŒæÜè
    çÂSPææ
    çÕãæÚè
    ÕèãǸ
    ŒæèâÙæ
    
    कीर्ति
    सीता
    किशोर
    सिमरदीप
    पिपली
    पिस्ता
    बिहारी
    बीहड़
    पीसना
    
    sorting in Shusha font
    
    baIhD,
    ibaharI
    ikSaoar
    ipplaI
    ips%aa
    isamardIp
    kIit¹
    pIsanaa
    saI%aa
    
    
    बीहड़
    बिहारी
    किशेार
    पिपली
    पिस्ता
    सिमरदीप
    कीति-
    पीसना
    सीता
    
    
  3. Unicode Hindi text can be checked for spelling mistakes just like you have been used to spell check English or German text in MS-Word. Google Docs and Open Office offer this facility without problem. With Microsoft products it is still a bit tricky. See the picture below as an example.
  4. Google offers APIs to let other websites use its text to speech synthesizer. One such website is text2speech.org. Google also incorporates its text to speech synthesizer in Google Books, which makes it easy just to listen to Hindi Books on Google. Example of one such book.
  5. Google Docs also converts speech to text.
  6. Devnagri script is ideal for learning other languages because it is spoken exactly as it is written. The pronunciation rules belong to the script, not to the language. Devnagri is also used for many languages like Hindi, Sanskrit, Marathi, Nepali, Rajsthani. But the pronunciation rules don't change with the language (unlike European languages). That means a Nepali word will be read exactly in the same way by a Marathi person, although he might not be knowing its meaning. One such effort is given here to convert German text to Devnagri. This is under development. There are many grammatical and vocabular similarities between Hindi and German. For example, in German, all infinitive verbs end with en (laufen, gehen etc.). In Hindi all infinitive verbs end with ना (दौड़ना, चलना आदि). Even many Germans words have unbelievable resemblance with Hindi words. For example:
    Ananas, अनानास
    bedrohen, द्रोहकाल
    besser, better, बेहतर
    eng, तंग
    fangen, फ़ंसना, फ़ांसना
    Hansa (lufthansa), goose, हंस
    Leiche, लाश
    
  7. now the file and folders can named in Hindi. Especially to see the titles of Hindi songs in Devnagri Hindi is a satisfying experince. Some examples below.






  8. Now the Hindi text can also be stored directly in various databases like mySQL (example below).


  9. Even many computer hardware can in the meantime also display Unicode Hindi text. For example, the following DJ console from Native Instruments.


  10. Google translator has already become a standard tool to translate english or German text to Hindi. Although it can't translate exactly but at least it gives words which are not so popular anymore.