speak chinese like a native

Progress on a Web Page for exploring 汉字

Posted by mark December 23, 2009 in the Group General Discussion .

Tags: written Chinese; character maps; user contribution

A month or so back, I posted about making a Web page for decomposing Chinese characters.  I have since entered the information on 1067 characters.  This is basically, the first section from the book below.  It is by no means comprehensive, but it is enough to give users a good feel for how the page will work, and solicit feedback.  In this post I will also discuss my intent in putting this page together, some difficulties I encountered, and my plans for it.   The page is http://huamake.com/web2_0.htm .

My intentions for the Web page are  that it could serve as a study aid, work as a character dictionary for people like myself who were never formally educated on how to use traditional character dictionaries, and that it could be fun to play with.  I will say more about all of these goals later on in this post.   Many materials already exist for aiding students of written Chinese, but the ones I know of all follow a tradition that evolved long before the Web existed. The basic method is to pick a set of radicals to use as classifiers for characters and associate each character with its appropriate classifying radical.  For example,饣appears on the left hand side of several characters and these characters could be regarded as belonging to a group for classification and search purposes. When dealing with pen and paper reference materials, this makes perfect sense.  However, the right hand portion of these characters often have visual elements that are shared with other characters that are not in the same classification bucket.  For example,饭, 反 and 返 all share a radical. It would be interesting to explore these relationshps, as well, and the basic hyperlinking mechanism of the Web was designed to express exactly this kind of non-linear association.

Ok, so, the basic idea is to take each character, break it down into all its component visual elements (radicals) and link each component to all of the characters that make use of that visual element.  Then, one can happily explore the relationships, jumping from link to link according to one's fancy.  It turns out that it is not actually that easy.  The main problems are user interface design, identifying which portions of a given character are significant, whether they are the same or different than similar looking bits and pieces of other characters, and the fact that there are a lot of characters, therefore a lot of data entry to do. I have an approach to all of these, which is doubtless not perfect, but the result is now there for anyone who wants to take a look.

If the page works, I think it could be used in the following ways.

I think it could be an aid to studying characters.  For example, I know there are several characters that contain 旦 and are pronounced dan4, but keeping straight what to place on the left hand side to differentiate these characters is a bit hard for my poor memory.  This page would give me a handy way to refresh my memory on this and other similar questions.

I think the page could be used as a character dictionary.  When I find a character in print that I don't recognize, I could use any character that contains a similar visual element to narrow my search.  (I often have trouble, because I assume a character is classified under the the wrong radical.  For example, several bits and pieces could be used to classify 警. It sure would be nice, if any path I took would get me there, but with a traditional character dictionary there is only one correct path.)

I think one might be able to play some interesting games with the page, like a variant of "6 Degrees of Separation"; given two characters, find a path of visual elements that leads from one to the other.  For example, to get from 我 to 同, I might go 我 〉 戈 〉 咸 〉mouth with a line over it > 同.  Maybe, you can think of a shorter path, or we could have a race.  That would be the game.

So far, my sources of character lists have been:

Reading and Writing Chinese Simplified Character Edition, Third Edition
William McNaughton  -- basically the HSK A, B and C lists.

ccedict from MDBG (definitions and pronounciation information)

I tried following a suggestion from Andrew to nicely ask MDBG for their data on radicals, but they did not reply to my request.  Perhaps, they think I am a ridiculous person, like the ones who ask them for advice on tattoos, or are simply too busy to respond.  In any case, I didn't feel comfortable screen scraping information without permission.  So, plan B was to make my data entry method as efficient as possible.  Explaining the details would be too much of a digression.

As I mentioned, I have covered the first 1067 characters from the "Reading and Writing Chinese".  I will be adding the 1200+ from that book over time, then I will go looking for the HSK D list.  If and when I complete entry of the D list, I will consider the effort, more or less complete.

Once the data exists, it can be presented and used in different ways.  I am only putting one of many possible faces on it.

Comments (20) RSS

loading... Updating ...

New lesson idea? Please contact us.