How to find the frequency of words on the Internet?

xiaophil
October 24, 2010, 06:12 AM posted in I Have a Question

Given the apparent lack of resources out there to tell a waiguoren which chengyus are high frequency and which ones are not, I thought to myself, perhaps I can do it myself.  Wikipedia has a list of chengyus.  Perhaps I can do some sort of analysis myself.  A technique some Poddies use here to determine the frequency of Chinese words and phrases is to do a google search, and then just look at the number of search results it gets. Therefore, I thought I could simply google the chengyus, record the number of hits, and then sort them to get an idea of what chengyus are common.  However, this does have its own drawbacks.

One, the Internet, while generally not as formal as most written literature, is likely to be more formal than spoken Chinese, which is what I am most concerned with.  I thought about it, and I decided I could live with it.  It's a starting point at least. 

Two, I noticed that baidu and google's search results differ significantly.  Who should I trust?  Google, the search engine most people respect, or baidu, the search engine made specifically for Chinese?

Three, this would be very time consuming.  So time consuming that I am not sure I would be willing to try.  I keep thinking to myself, is there some program out there that can batch search lists?  I looked around, but couldn't find any.

Anyway, I just thought I would throw this out here.  There are a few techies that frequent the community board.  Just maybe one of them will tell me that my issues are no problem and then will show me the light.  We'll see.  Seems to ambitious to me. But if anyone has any comments and/or advice, I would be grateful.

Profile picture
bodawei
October 24, 2010, 06:52 AM

Xiaophil

Here's another method you might consider.  You would know say 20 - 30 native speakers you could approach and ask 'would you mind listing (completely off the top of your head) the 10 chengyu you use most frequently in everyday speech?'

From this research you would have say 150 - 200 chengyu to play with, and ready access to someone who would be more than happy to explain its significance.  

Your problem might be that your native speakers will be hard-pressed to come up with ten. I'm guessing they will ring friends for help.  If you are willing to experiment could you report back? 

Profile picture
xiaophil

Good suggestion. I had another similar idea. Thinking...

Profile picture
suansuanru
October 24, 2010, 02:02 PM

It is a very interesting work to do so. And your words make realise that actually the chinese seldom use成语in their daily life,we use 熟语 more. Though both of them are called "idiom"in English. For example,"吃软饭"、“拍马屁”、“抱佛脚”are 熟语,they are widely used in daily life.

Profile picture
xiaophil

That's crazy. Until bodawei mentioned it a few days ago, nobody ever said to me Chinese rarely use these in daily life. I figured they were a little bookish, but I had no idea they were that bookish! It's almost if Chinese didn't want to admit to me they don't use them much. At any rate, I would still like to know some common ones, even if they are only common in written Chinese.

Profile picture
RJ

oops, I guess we need a short series on 熟语? Maybe John would like that better, hehe.

Profile picture
RJ

Phil,

I remember Henning making this post a couple years ago:

http://chinesepod.com/community/conversations/post/929

I thought you might find it interesting, and you might find somewhat of an ally in Him as well.

Profile picture
xiaophil

Thanks for dusting the cobwebs off and pulling this out of the closet. Well worth the read. I cannot decide if I am happy that my wife doesn't attack me with chengyus!