Word count

This page summarizes how Translia calculates the word count for source files.

Firstly, only the text included in segments is calculated. For the files that include a lot of markups like HTML, the text to translate is extracted and stored in segment. The markups around these text are not included for word counting.

Secondly, the languages are divided into two types and different work count algorithms are applied.

  • For languages that words are separated with space, a word is the text string separated with space; for example, English, Indian languages, Indonesian, Farsi, Arabic, Turkish, Hebrew, and Korean.
  • For languages that do not use space as word separator, character count is used instead. However we still call it word count for simplicity; for example, Chinese, Japanese and Thai.

The text that includes two types of languages is calculated respectively and the word count is the sum of the word count for all languages.

A note on markups included in the segment strings: these tags will be included for word counting. We think this is fair because translators need handle them carefully although they do not need to translate them.