Chinese input methods for Emacs

Sunday, 8th February, 2009

From the llaisdy blog archives.

This all feels a bit antiquated now I’m working on a Mac (I use the Mac input methods instead of the emacs input methods), but it’s useful whenever I have to work on a Windows machine. The notes below are not complete and I’d appreciate any comments to help fill in the gaps.

Introduction

Emacs provides 25 input methods for Chinese. Although each input method has its own describe-input-method page, these pages can be rather terse. There is also no overview or comparison between the different input-methods, neither have I been able to find one on the web.

Here I have gathered together the information I’ve been able to find. I’d be pleased to hear about any errors I’ve made, and where I can find further information to correct my omissions. I’ll keep this page up-to-date.

I’m learning Mandarin Chinese, I’m interested in simplified script, and for the moment I find a pinyin-based approach to the written language easiest. For my own current requirements, chinese-tonepy is fitting the bill, but I’m interested in learning a structural input method (i.e., not based on pronunciation). See the Conclusion for further discussion.

Methods overview

Table: Chinese input methods provided by Emacs

Method name Method name
chinese-4corner chinese-array30
chinese-b5-quick chinese-b5-tsangchi
chinese-ccdospy  
chinese-cns-quick chinese-cns-tsangchi
chinese-ctlau chinese-ctlaub
chinese-ecdict chinese-etzy
chinese-punct chinese-punct-b5
chinese-py chinese-py-b5
chinese-py-punct chinese-py-punct-b5
chinese-qj chinese-qj-b5
chinese-sisheng chinese-sw
chinese-tonepy chinese-tonepy-punct
chinese-ziranma chinese-zozy

The basic idea common to all these input methods is that you type in a key sequence, bringing up a manu of options, from which you chose the character you want.

For example, in chinese-tonepy, typing in ‘hao3’ brings up this menu in the minibuffer window:

echimp11

If you want hao3 = ‘good’, choose option 2 (by typing 2 or clicking on the menu). Character 好 appears at point.

At any point in any of the input methods you can press <tab> for a full tree-list of the options available to you from that point, e.g.:

echimp21

punct

Some input methods have versions with or without -punct (see the table above). The -punct versions support proper Chinese punctuation characters. However, (a) although chinese-punct works, chinese-py-punct (& poss. others) doesn’t seem to; (b) versions without -punct use ascii punctuation, which meets my needs for the moment.

In the discussion below I ignore the -punct versions.

b5

Some input methods have versions with or without -b5. The -b5 input methods generate characters in the Big Five character set. Big Five is a Taiwanese character set for traditional characters. Other input methods support GB2312, which is a character set from the People’s Republic of China, for simplified characters. For example:

Input Method Input Output
chinese-py-b5 guo2/1 yu3/2 國 語
chinese-tonepy guo2/1 yu3/7 国 语

For the full skinny on character sets see Lunde (1999).

I presume that, apart from chinese-b5-quick and chinese-b5-tsangchi, the -b5 input methods work the same way as the non-b5 methods, but just output Big Five (i.e., traditional script) instead of GB2312 (i.e., simplified script). Consequently, in the following discussion I ignore the -b5 versions (apart from chinese-b5-quick and chinese-b5-tsangchi).

I presume that, apart from chinese-b5-quick and chinese-b5-tsangchi, the -b5 input methods work the same way as the non-b5 methods, but just output Big Five (i.e., traditional script) instead of GB2312 (i.e., simplified script). Consequently, in the following discussion I ignore the -b5 versions (apart from chinese-b5-quick and chinese-b5-tsangchi).

Methods in detail

For each method I’ll give the character set (if not GB2312), and, as a simple illustration of use, how to generate 你好吗 ( nǐ hǎo ma = are you well?).

chinese-4corner

No description in description!

The Wikipedia provides a description [http://en.wikipedia.org/wiki/Four_corner_method]. You use four digits to describe the four corners of a character (top left, top right, bottom left, bottom right).

Quoting the Wikipedia article:

Digit Meaning
1 a horizontal stroke,
2 a vertical or diagonal stroke,
3 a dot stroke,
4 two strokes in a cross shape,
5 three or more strokes in which one stroke intersects all others,
6 a box-shape,
7 where a stroke turns a corner,
8 the shape of the Chinese character 八 and its inverted form, and
9 the shape of the Chinese character 小 and its inverted form.
0 where there is either nothing in a corner, the part in a corner is already represented by a previous corner, or where a corner has a dot stroke followed by a horizontal stroke

Usage example:

Char Digits Interpretation
2729/2 Diagonal; Corner; Vertical; 小; ‘2’?
47447/1 Cross; Corner; Cross; Cross?; option 1
??? ???

To be honest, I cheated here: the Wiktionary gives four corner codes for Chinese characters (e.g., http://en.wiktionary.org/wiki/好).

chinese-array30

Outputs Big Five.

Some docs:

Seems to be quite popular, and the MS Windows Chinese input method seems to use it. However, I can’t make head or tail of it. Also, it outputs Big Five.

chinese-b5-tsangchi

This is a Taiwanese method based on geometrical decomposition of characters. See: http://en.wikipedia.org/wiki/Cangjie_method.

Char Input Interpretation
onf O = 人 (LHS); N = hook (top of RHS); F = 火 (bottom of RHS)
vnd V = 女 (LHS); N = hook (top of RHS); D = 木 wood (?)
rsqf R = 口 (LHS); S = 尸 (L & top of RHS); Q = 手 (next top RHS); F = 火 (bottom of RHS)

Again, I got these from the Wiktionary, but I can understand how they were made up. Notice that ‘ma’ is the traditional 嗎 and not the simplified 吗.

chinese-b5-quick

This looks like it should be a ‘quick’ version of b5-tsangchi. Indeed it takes fewer keystrokes to get to an end character – but I can’t find the end character I want. In other words, I don’t know how it uses the Tsangchi system.

chinese-ccdospy

From the description:

This input method works almost the same way as chinese-py_.  The
difference is that you type a single key for these Pinyin spelling.
    Pinyin:  zh  en  eng ang ch  an  ao  ai  ong sh  ing  yu(ü)
    keyseq:   a   f   g   h   i   j   k   l   s   u   y   v
Char Input Interpretation
ni7 7th option
hk6 6th "
ma9 9th "

chinese-cns-tsangchi

Probably the same as chinese-b5-tsangchi, but outputing a CNS (Chinese National Standards) character set instead of Big Five (possibly CNS 11643-1992). My Emacs hasn’t got the fonts for it.

chinese-cns-quick

See chinese-cns-tsangchi and chinese-b5-quick.

chinese-ctlau

An input method based on Sidney Lau’s Romanisation system. (a) it’s for Cantonese; (b) it generates Big5; (c) I can’t get it to work.

chinese-ctlaub

See chinese-ctlau.

chinese-ecdict

Pretty impressive. You type in the English word and the Chinese (Big5) word appears! Wow!

Char Input Interpretation
香蕉 banana banana
冰箱 refrigerator refrigerator
you you
良好的 good good

Impressive but:

  • notice that ‘good’ doesn’t actually give us hao3 (好), it gives us liang2 hao3 de (良好的). 良好 still means ‘good’ but 的 is a connector making this adjective ready to add on to a noun.
  • I don’t know how I would get a grammatical particle like 吗 (ma).
  • This will be handy for emergencies but I think I’ll keep a dictionary around too.

chinese-etzy

A Zhuyin input method with Big5 output.

chinese-py

Type in pinyin, without tones.

Char Input Interpretation
ni1 1st menu option
hao guessed correctly
ma2 2nd option

chinese-qj

Possibly another Zhuyin input method. The description has virtually no information.

chinese-sisheng

This will be quite useful. This input method does not generate Hanzi, but it transforms pinyin with tone numbers into ‘proper’ pinyin with diacritics.

Char Input Interpretation
ni3
hǎo hao3
ma ma
nv3

chinese-sw

Type in a pair of radicals. Quite clever and fast (and not pronunciation-based), easy to pick up. Usefulness will depend on coverage.

Char Input Interpretation
kv4 亻+ 小 + 4th option
[好] ??? can’t find it
fd9 口 + 刀 + 9th option

chinese-tonepy

Type in pinyin, with tones.

Char Input Interpretation
ni3  
hao32 2nd menu option
ma5  

n.b.: chinese-py-b5 is the same as this method, but outputs Big Five i.e., traditional characters. A useful alternative to chinese-tonepy for when I want to write Trad.

chinese-ziranma

A pinyin-based input method where the pinyin initials and finals are mapped onto the qwerty keyboard (see http://eyegene.ophthy.med.umich.edu/unicode/KeyboardLayouts/ZiRanMaPinYinKeyboard.html).

The basic idea is a four-stroke pattern: initial, final, tone, quote (‘). This may not be that impressive for, say, 好 (hke’), but the pattern extends to cover words, not just characters. That means that two-, three- or four- character words can be input with the same four-stroke pattern. The example from the description is 北京电视台 (bjdt) (Běijīng diànshìtái; Beijing TV station), which is impressive for four keystrokes.

Char Input Interpretation
n guessed correctly
hke’  
mae’5 5th option

chinese-zozy

A Big5 Zhuyin input method.

Conclusion

chinese-tonepy is looking the best for now for most uses (see also chinese-ccdospy, chinese-ziranma).

chinese-sisheng will be useful for writing pinyin.

I think Zhuyin is probably in my future if/when I start learning Chinese seriously, but unfortunately both the Emacs Zhuyin output Big5 (i.e. traditional script). How much is Zhuyin used in the People’s Republic?

It would be useful to know a method not based on pronunciation. I might not always know how to say what I want to write – eg "How do you say X?" Emacs’ structural input methods generating GB characters are:

I’ll explore these, and update this page as I found out more.

References

Lunde, K. (1999). CJKV Information Processing. Sebastopol, CA.: O’Reilly.

http://en.wikipedia.org/wiki/Chinese_input_methods_for_computers

Advertisements

12 Responses to “Chinese input methods for Emacs”

  1. Max Says:

    Thanks for this thorough survey of available input methods. My personal favorite is ZiRanMa, because it works so quickly when you a familiar with pinyin and when you are comfortable with a qwerty keyboard.

  2. llaisdy Says:

    Dear Max

    Thanks for your comment!

    I like the sound of ZiRanMa, but my Chinese is nowhere near good enough yet for it to stand out. Next year!

  3. llaisdy Says:

    Come to think of it, ZiRanMa does offer more than the Mac input methods.

  4. Max Says:

    About ZiRanMa: It took me a few days to remember the mapping of vowel-endings on a qwerty keyboard (I had to print out the diagram and leave it next to the keyboard for a while). But once I knew them, it became very fast to type. I’m expecting an even higher speed gain once I’m familiar with the mapping for compound words.

    I’m curious about the four-corner method. It sounds intriguing but I have yet to take the time to try it out.

  5. David Says:

    Very useful overview, thx.

    The punct versions chinese-tonepy-punct and chinese-py-punct are working fine for me though. But one has to remember to enter “v” before entering the punctuation.

  6. Michael Hayes Says:

    Dear Ivan,
    a very useful comparison. Thanks for sharing it with the public. I have a question about the chinese-py-b5 IM: how can I access the ü character needed for 女 for example? Some other Chinese IMs e.g. Microsoft’s, accept v for ü but chinese-py-b5 doesn’t. I can’t work it out. Must I switch to something like chinese-py to get the v for ü facility?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: