How to Count Chinese Characters

Posted on: September 2, 2019   in: Projects

What are Chinese characters?

Simple Definition of Chinese characters

A Chinese character (or a “汉字”, “Hanzi” in Chinese) is a written character used to represent meaning. Chinese characters have a very long history and have evolved and spread forming the basis for many writing systems in Asia (just like how many European languages use words derived from Latin or Ancient Greek). They can be found in Vietnam, Korea, Japanese and other parts of the world. In some languages, the word “symbol” or “sign” is taken to mean “character”, but in English, the correct word should always be “character”, not “letter”, “sign”, “symbol” or any other equivalent.

Technical Definition of Chinese characters

For a more precise idea we can take the very technical definition given in “The Chinese Lexicon: A Comprehensive Survey” by Yip Po-Ching, which states that: “[A character] in Chinese is a graphic form composed of a  number of strokes and confided to a square shaped area.” (Page 35). Take a look at the two characters below:

“One” = “一”

“Bird” = “鸟”

The character for one is composed of one single stroke, and the character for bird is composed of five strokes. So it’s tempting to think of strokes as being analogous to letters in English, but they are absolutely not.  In English we can simply add the letter s to the word apple, for example, to make a plural, referring to more than one apple. This is not true of Chinese, while some characters do look a bit like other characters, for example 鸣 (ming) looks a lot like the above character for bird, but with that extra square, and means the sound which animals (not just birds) make, sometimes translated as squawks. However, we cannot randomly or creatively add other parts to an established character to change the meaning like we could with English letters (we could add -ism on the end of anything to imply the ideology associated with it, Marx – Marxism). Chinese characters have been fixed for many years, and cannot be changed without creating confusion.

How to count Chinese characters

How to Count Chinese characters or words

Sometimes, one single character represents a particular word, for example “大” pronounced “da” means “big, or large”. There are also cases where one word is made from several characters. For example “北京” – “beijing” is the name of the city of Beijing. There are also characters which don’t generally get used by themselves but only in combination with other characters, an example might be “达” – “da” which combines with other characters to make words like “达到” which means “to arrive”. These characters are usually referred to as bound phrases in Chinese linguistics.

That means that calculating how many “words” are in a Chinese document can be difficult. Normally, the easiest solution is to calculate the number of characters instead. Most software programs will make this rather easy. For example in Word, we can easily see the number of Asian characters by selecting the word count tool:

Word Count Graphic

In Word, for Chinese documents, “Asian characters, Korean words” gives the number of Chinese characters. It does not include things like numbers and English words.

Word count in Word for numbers and characters

As we can see, the Asian character count only gives the number of Asian characters, and does not include the numbers. That means that documents like Bank Statements need different methods to estimate workloads. But generally for contracts and similar documents, Word counts using Microsoft Word work extremely well. When specifying the rate for a job, I always give the price in terms of “Chinese characters”. I avoid the word “word” when giving quotes.

Trados handles Chinese documents depending on the below settings on the Project Page:

Trados Word Count Options

Note the checkbox for “Use word-based tokenization for Asian target text”. This allows two possible approaches. Without selecting that box, Trados will just give the count of the number of Asian characters exactly as Word does above. However, if you select the above, it will try and use another method – it will look for spaces between characters, and decide that everything between two spaces must be one word. This works great for English, but it’s a horrible idea for Chinese as generally good Chinese text won’t include any spaces.

So we can safely say it’s difficult to give a count of the number of “words” if we define “word” in the English sense. So, most translators try to use the word “character” instead.

Counting Chinese characters per English word

Sometimes,  despite the above, clients will prefer to pay based on the target number of words. This is easy to estimate with a simple practical formulation: 100 Chinese characters will generally translate to about 70 English words. A good range would be from 60 to 80 English words. This is borne out by many years of experience, and virtually every forum and web-page gives the same range. Despite searching I was unable to find any formal academic research into this subject. Other formula you will find are: 1.5-1, or even 2-1 depending on the nature of the document.

Therefore, we can estimate for example that 2000 characters will translate to about 1400 English words. The estimate doesn’t work well for documents with lots of tables or very short sentences, or which use many technical terms, but it’s the “least worst” option available.

How many Chinese characters per Chinese word

If we use the term “word” in the English sense, as the smallest unit of meaning, (I’m talking about the general sense, not in the linguistic sense where there can be ever smaller units of meaning), there’s an argument that actually a Chinese character is a word. I mentioned above the number “one” which is one character in Chinese and one English word. However, as I mentioned above, there are also many examples where a character could be seen as something closer to a syllable (in bound phrases or set expressions). Thus, knowing the number of Chinese characters, there is no one-size fits all formula to calculate the number of words. It would be possible to do this only by scanning the text character by character and matching the characters together to count the words. Even a manual count would fail in some cases as surprisingly to non-Chinese speakers, it can be very subjective deciding how to clump characters together to make words.

The best estimate calculation would be extremely complex and varies hugely by several orders of magnitude, depending on the type of text, when it was written, and the style of writing, not to mention numerous other factors.

I know of a couple of agencies who will sneak the word “word” into their PO, even though the agreement in the email chain was for payment for a set number of characters. Only if you double check their calculation will you realize they have done some kind of arbitrary division to estimate the number of Chinese words, based on the number of Chinese characters. This inevitably will favor the agency over the translator. Sometimes I’ve been caught out by not checking carefully before accepting a job and subsequently had to deliver a job for less than half of a fair rate of pay. It’s something we have to watch out for.

Calculator

A good start would be to follow the above formula for calculating the number of English words. That means 100 Chinese characters is equal to about 70 Chinese words. Now we need to determine how many of those “Chinese words” are equivalent to one “English word”. Again, it really depends. For example, the word “but” in English can be written as “但是” in Chinese (two characters, but only one word) but can also be written just as “但” (one character, and also only one word). Then there are expressions like “虽然…但是”, (although…but); in English we don’t write the second but in the equivalent of that structure; for example “although he was late today, he’s usually punctual”, in Chinese we would say “although he was late today, but he’s usually punctual” – as an aside, that’s a very common issue I find in bad translations done by non-native speakers of English). Here the Chinese “但是” has vanished from the English text altogether (2 characters, but 0 words!). So depending on use we’ve got 2, 1 and 0 characters for the same meaning.

Generally I find the you are more likely to find Chinese characters that vanish in when we translate into English than vice-versa. Chinese tends to be more repetitive than English and has more redundant text (a simple example would be something like “to open the door, we found the key, so that we could open the door”). Thus, as a rule of thumb, I tend to take the following estimates: 100 Chinese characters is about 80 Chinese words, which will translate to about 70 English words.

Summary

In short:

  1. The only totally accurate way to count Chinese characters, is to count the actual characters (marked in Microsoft word as “Asian characters)
  2. However, numerals may need to be added to the above total number of characters if the text includes western-style numbers (but annoyingly Chinese numbers don’t will be counted – even though it’s common practice to mix them up in good Chinese writing).
  3. To estimate the number of English words, take a ratio of about 1.5 to 1; or 100:70 depending on the text
  4. To estimate the number of Chinese characters, take about 100:80 as a rough rule of thumb