An attempt at a "naive" character dictionary (字典)
mark
November 06, 2010, 05:52 PM posted in General DiscussionThere are a number of online dictionaries that one can use to look up Chinese characters that are in electronic form, but it is not so easy if you see a character printed somewhere and you don't have background in Chinese caligraphy (e.g. were raised Chinese). Traditional printed dictionaries require that you count strokes or expect you to be able to guess what radical a character might be classified under. You can attempt to trace the character on the screen of an e-dict, but ones with this capability are somewhat expensive, they are timed for use by someone who is facile at writing Chinese characters, and their character recognition algorithms often rely on stroke order.
I have attempted to create a Web page that allows one to look up Simplified Chinese characters only based on the fact that they contain certain distinguishing shapes, and unlike traditional radical dictionaries, you can start from any distinguishing shape that is in the character and still find it.
What I don't know is whether or not this page will actually help anyone in the way that I hoped. So, if some of you, my fellow Cpod users, want to give it a try and let me know what you think, I would be delighted. (http://huamake.com/baseindex.htm)
I should give a couple credits. Fellow Cpod user Baomingguang his given me some very helpful ideas and feedback off-line. Character definition and pronunciation data is from ccedict (a.k.a. mdgb).
mark
All of the e-dicts, I have used, allow a very limited think time between strokes before they decide you are done with the character and they should start producing results. It can be adjusted, but even the maximum is more appropriate for someone writing from memory than someone copying a character they are not familiar with. Then again, maybe, you are a better quick sketch artist than me.
bababardwan
hehe, no, my writing is nothing to emulate. I've just not struck that problem before and wanted to clarify if that's what you meant, so thanks for your reply.
bodawei
I can slow mine right down, loads of time. But another little trick is to leave your stylus on the screen while you think of the next stroke, if you need to. Have you tried this?
bababardwan
having just seen bodawei's comment, it's made me think. The times that I have used this edictionaries for character recognition, they're usually presenting with a list of options [supposedly starting with the most likely] from either the first stroke or shortly thereafter [but I've always taken that to be like predictive text] but as you continue with more strokes it keeps updating/modifying the list of options. At least from memory I think that's what happens. I'll pay more attention next time. But certainly I've never had to adjust any time setting [don't know where that would be as I've not had to go look for it]. So I was just wondering whether you thought the show was over when the first predictive list popped up when you actually weren't out of time?
bodawei
And seeing your comment, I have just explored by dictionary and found some cool features. (I have always been aware that I am using it in first gear.) I didn't realise what I could do with radicals - helps you find a character if you recognise just a radical.
But mine doesn't work exactly as you describe - I don't think I can ADD to the original strokes once I have lifted the stylus. But it certainly brings up all of the most likely characters.
(I wouldn't swear to this; as I say, I am using my dictionary in first gear only. I have not explored all the features.)
bababardwan
ok, well I just tried it on mdbg to verify 'cos it's been quite some time since I've used it [yeah, I generally use the radicals to look up characters I don't know] and yeah, on mdbg at least, it starts making suggestions from the very first stroke and then the list changes with each subsequent stroke. Also, after writing this comment up to the start of this sentence I went back and started adding more stokes and it continued on changing it's suggestions with each stroke, so on mdbg at least, there doesn't appear to be a limit. Just tried again and it's still continuing on. ...
ps. I thought the input on my iphone was the same also...no time limit and progressive suggestions with each stroke but that I am less sure of as it's also been a while and pretty rare.
bodawei
I have never tried writing in mdbg, so that is good to know. Definitely a generation beyond the little electronic dictionary I use. If I am faced with a character I don't know and can't find on the electronic dictionary I go back to the paper dictionary.
mark
I must have an older model e-dict. It doesn't allow for more strokes once it pops up a list of suggestions. Another look up trick that I use is to guess the pronunciation and see if I can get a pinyin IME to suggest the character. However, I still think all of these things require quite a bit of study before you can use them effectively.
daniel70
November 07, 2010, 02:06 AMVery groovy. For the character 语, I can click a dot, and see 讠, and find it in that list.
When i look at 语, the three groupings of strokes jump out at me. With your database, would it be possible to click dot and grab 讠, then click 二 and grab 五, then click and grab 口 ... (by grab I mean to indicate to the system that this bit is a part of my target character), and thus specify the desired character in chunks. One of the problems I have, when looking through a list of characters is trying to keep the three or four groupings in mind while I'm scanning. My short term memory fails me and I find it quite difficult. It would be great if the dictionary could remember the chunks for me.
I see that your page for 语 lists all the bits with plenty of redundency. If I could say "I'll have a 讠五 and a 口" or even "words,five,mouth" and get a short list with 语 included that would be great.
Do you attach english keys to your components?
Anyway, cool stuff.
bababardwan
November 06, 2010, 10:55 PMwow, this sounds really interesting. I agree with the guessing which one is the radical part. I have also wondered why you can't look up using any of the component parts before also. Look forward to giving this a go. In the interim, I'm curious when you say:
they are timed for use by someone who is facile at writing Chinese characters
..what you mean by "timed"? Are you saying you only have a certain amount of time to draw the character? I have never encountered this particular problem.