Bring the ChinesePod experience to your Classroom. Learn more about ChinesePod SchoolZone BETA here.
Say It Right Series

Some statistics about the structure of Chinese Characters

Posted by mark October 8, 2010 in the Group General Discussion.

I now have some statistics to illustrate some facts about writen Chinese.  For the past year, or so, I have been working on a Web site that contains structural data about characters (see http://huamake.com/huamakefaq.htm if you are curious).  Anyway, I generated some statistics about character structure based on the characters in the old HSK vocabulary list (a larger set than for the new HSK, and a well defined sub-universe of Chinese characters). 

The first column in the table below is the number of times a character can be broken down into simpler component characters or radicals.  For example, 好 -> 女 + 子.  子 could be broken down into 了 and 一, giving 好 a count of 2. The 11 characters that can't be broken down are characters like 一 and 乙, which are already pretty simple.

The second column is the number of times that a character participates in the formation of a more complex character.  In my example above, 女 and 子 would each get a count for their participation in 好.  Interestingly, while some characters are active joiners, about 80% are stay-at-homes that don't participate in character formation at all, at least, not until someone needs to invent a new character, or at least, not in the sub-universe of characters I chose to work with.

0:               11    2219
1:             533      218
2:             845        94
3:             951        69

4:             432        60

5:              66        38
6:                4        23
7:                0        15
8:                0        13
9:                0        15
10 or more:   0        78


While these statistics are only for a limitted subset of all Chinese characters, I am confident you would see a similar pattern with any other reasonable sized set of Chinese characters.  Also, another person might decompose characters differently than I have.  However, it is only the last level of decomposition into "simplest" elements that is more of an art than a science.  Most decompositions are from one commonly used character into a couple other commonly used characters, like my example with 好.  There seem to be two forces operating.  One is combining a relatively small number of fixed elements to make a large number of characters.  The other is a limit on the acceptable complexity of a character.  The typical character has gone through 2 or three levels of compounding, and is a leaf node in the formation process.

Well, I don't actually know how characters were formed.  It is just my hypothesis from analyzing their appearance.  Perhaps, it is a useful observation.

Comments (3) RSS

loading... Updating ...

To comment, please login.

Not sure if your comment is appropriate? Check our Commenting Policy first.

New lesson idea? Please contact us.