iOS Users: Please click here for the latest information: (updated 5.10.2016)
Say It Right Series

Some statistics about the structure of Chinese Characters (汉字)

Posted by mark October 8, 2010 in the Group General Discussion.

I now have some statistics to illustrate some facts about writen Chinese.  For the past year, or so, I have been working on a Web site that contains structural data about characters (see if you are curious).  Anyway, I generated some statistics about character structure based on the characters in the old HSK vocabulary list (a larger set than for the new HSK, and a well defined sub-universe of Chinese characters). 

The first column in the table below is the number of times a character can be broken down into simpler component characters or radicals.  For example, 好 -> 女 + 子.  子 could be broken down into 了 and 一, giving 好 a count of 2. The 11 characters that can't be broken down are characters like 一 and 乙, which are already pretty simple.

The second column is the number of times that a character participates in the formation of a more complex character.  In my example above, 女 and 子 would each get a count for their participation in 好.  Interestingly, while some characters are active joiners, about 80% are stay-at-homes that don't participate in character formation at all, at least, not until someone needs to invent a new character, or at least, not in the sub-universe of characters I chose to work with.

0:                11    2219
1:              533      218
2:              845        94
3:              951        69
4:              432        60
5:                66       38
6:                 4        23
7:                 0        15
8:                 0        13
9:                 0        15
10 or more:    0        78

While these statistics are only for a limitted subset of all Chinese characters, I am confident you would see a similar pattern with any other reasonable sized set of Chinese characters.  Also, another person might decompose characters differently than I have.  However, it is only the last level of decomposition into "simplest" elsements that is more of an art than a science.  Most decompositions are from one commonly used character into a couple other commonly used characters, like my example with 好.  There seem to be two forces operating.  One is combining a relatively small number of fixed elements to make a large number of characters.  The other is a limit on the acceptable complexity of a character.  The typical character has gone through 2 or three levels of compounding, and is a leaf node in the formation process.

Well, I don't actually know how characters were formed.  It is just my hypothesis from analyzing their appearance.  Perhaps, it is a useful observation.

Comments (11) RSS

loading... Updating ...

To comment, please login.

Not sure if your comment is appropriate? Check our Commenting Policy first.

New lesson idea? Please contact us.