iOS Users: Please click here for the latest information: (updated 5.10.2016)
Say It Right Series

How to find the frequency of words on the Internet?

Posted by xiaophil October 24, 2010 in the Group I Have a Question.

Given the apparent lack of resources out there to tell a waiguoren which chengyus are high frequency and which ones are not, I thought to myself, perhaps I can do it myself.  Wikipedia has a list of chengyus.  Perhaps I can do some sort of analysis myself.  A technique some Poddies use here to determine the frequency of Chinese words and phrases is to do a google search, and then just look at the number of search results it gets. Therefore, I thought I could simply google the chengyus, record the number of hits, and then sort them to get an idea of what chengyus are common.  However, this does have its own drawbacks.

One, the Internet, while generally not as formal as most written literature, is likely to be more formal than spoken Chinese, which is what I am most concerned with.  I thought about it, and I decided I could live with it.  It's a starting point at least. 

Two, I noticed that baidu and google's search results differ significantly.  Who should I trust?  Google, the search engine most people respect, or baidu, the search engine made specifically for Chinese?

Three, this would be very time consuming.  So time consuming that I am not sure I would be willing to try.  I keep thinking to myself, is there some program out there that can batch search lists?  I looked around, but couldn't find any.

Anyway, I just thought I would throw this out here.  There are a few techies that frequent the community board.  Just maybe one of them will tell me that my issues are no problem and then will show me the light.  We'll see.  Seems to ambitious to me. But if anyone has any comments and/or advice, I would be grateful.

Comments (7) RSS

loading... Updating ...

To comment, please login.

Not sure if your comment is appropriate? Check our Commenting Policy first.

New lesson idea? Please contact us.