Some statistics about the structure of Chinese Characters
mark
October 08, 2010, 03:25 PM posted in General DiscussionI now have some statistics to illustrate some facts about writen Chinese. For the past year, or so, I have been working on a Web site that contains structural data about characters (see http://huamake.com/huamakefaq.htm if you are curious). Anyway, I generated some statistics about character structure based on the characters in the old HSK vocabulary list (a larger set than for the new HSK, and a well defined sub-universe of Chinese characters).
The first column in the table below is the number of times a character can be broken down into simpler component characters or radicals. For example, 好 -> 女 + 子. 子 could be broken down into 了 and 一, giving 好 a count of 2. The 11 characters that can't be broken down are characters like 一 and 乙, which are already pretty simple.
The second column is the number of times that a character participates in the formation of a more complex character. In my example above, 女 and 子 would each get a count for their participation in 好. Interestingly, while some characters are active joiners, about 80% are stay-at-homes that don't participate in character formation at all, at least, not until someone needs to invent a new character, or at least, not in the sub-universe of characters I chose to work with.
0: 11 2219
1: 533 218
2: 845 94
3: 951 69
4: 432 60
5: 66 38
6: 4 23
7: 0 15
8: 0 13
9: 0 15
10 or more: 0 78
While these statistics are only for a limitted subset of all Chinese characters, I am confident you would see a similar pattern with any other reasonable sized set of Chinese characters. Also, another person might decompose characters differently than I have. However, it is only the last level of decomposition into "simplest" elements that is more of an art than a science. Most decompositions are from one commonly used character into a couple other commonly used characters, like my example with 好. There seem to be two forces operating. One is combining a relatively small number of fixed elements to make a large number of characters. The other is a limit on the acceptable complexity of a character. The typical character has gone through 2 or three levels of compounding, and is a leaf node in the formation process.
Well, I don't actually know how characters were formed. It is just my hypothesis from analyzing their appearance. Perhaps, it is a useful observation.
mark
Hi Joyce,
At http://huamake.com/home.htm there is an animated demonstration of where I was trying to go with this. I know my breakdown of characters is non-traditional, but I was trying to find a way for us westerners to have an easier time looking up printed characters. I do use my site myself for that, but as far as I know, it hasn't really caught on with anyone else.
Best Regards,
Mark
joyce_counselor
Great Mark~
I'v checked the website~It's very good one to learn Chinese characters.
And FYI I'v done some exams, just for fun~LOL
It's great to see someone is doing this in a western way, Impresive~
joyce_counselor
March 10, 2011, 06:05 AM“好=女+子.” Correct! But we wouldn't break "子" down into “了”and “一”。
Because "子" is the minimized individual character in Chinese。
More example like: "李=木+子", but "木" will not be broken into "十"and"八"。
Good job though ! *^o^*