print Using the Scorpio Translation system

Choosing a storage format

The Scorpio translation system supports several storage formats as has already been mentioned, but what are the pros and cons of these? Is there any one that is "better" than the others? After having reviewed them all from a point of view of writing test cases, these are the main pros and cons that we've found for each adaptor.

Array

Pros
  • Probably the simplest format
  • Stored in a native PHP array
  • Easy for PHP developers to work with
  • Easily cached by APC or other opcode cache
Cons
  • Not friendly for a non-developer to work with
  • Files must always be stored in UTF-8, easy to forget
  • Only allows for a 1:1 mapping e.g. English to Spanish
  • No dev tools to aid translation

CSV

Pros
  • Second simplest format
  • Minimal processing overhead (PHP natively supports CSV files)
Cons
  • Not friendly for a non-developer to work with
  • Same UTF-8 concerns as with Array
  • No dev tools to aid translation

Gettext

Pros
  • Probably the widest used format
  • Part of the normal Unix language tools
  • CLI commands for text gathering
  • gettext is a native PHP function
Cons
  • Not friendly for a non-developer to work with
  • gettext is not always enabled in PHP
  • Language files must be 'compiled' into a binary format before being used
  • No dev tools to aid translation

INI

Pros
  • Relatively simple format (some caveats)
  • Minimal processing overhead (PHP natively supports INI files)
Cons
  • Not friendly for a non-developer to work with
  • Same UTF-8 concerns as with Array
  • Only allows for a 1:1 mapping
  • Constraints on source key (e.g. keys cannot contain spaces)
  • No dev tools to aid translation

QT (also known as TS format)

Pros
Cons
  • Not friendly for a non-developer to work with if the language tools cannot be used
  • Only allows for a 1:1 mapping
  • XML processing overhead

TBX

Pros
Cons
  • Extremely complex XML format for all those relationships
  • Documentation is VERY verbose
  • Not to be used by a non-developer
  • Status of GUI tools is unknown
  • Very verbose XML format requires a hefty amount of processing

TMX

Pros
  • Open standard managed by the Localisation Industry Standards Association
  • Relatively simple XML format
  • Allows multiple translations in a single file
  • Well documented
Cons
  • Not friendly for a non-developer to work with
  • Status of GUI tools is unknown
  • XML based format has processing performance penalties
  • Generally used in the CAT industry (according to LISA:TMX page)

Xliff

Pros
  • Widely used in other PHP frameworks (e.g. Symfony)
  • Relatively simple XML format
  • Several GUIs for editing and creating translations
  • Managed by an open group (OASIS)
Cons
  • Only allows for a 1:1 mapping
  • Must have a stated source and target language
  • XML based format has processing performance penalties

XMLTM

Pros
  • Open standard managed by LISA
  • Designed to work with other formats including transformation into them
  • Supports many advanced concepts to aid translation
Cons
  • Relatively new (only standardised in 2007)
  • Complex XML format
  • Not friendly for a non-developer to work with
  • Status of GUI tools unknown

That covers the available formats. Which one to use is up to the development team and who will be performing the translations. If your company is already using one of the above, it is a good idea to continue to.

If after reading and reviewing the documentation you are still not sure, then the Scorpio recommendation is something like:

QT / Xliff > Gettext > Array

Both QT and Xliff have decent GUIs that aid translating text. This means that if you have to translate in house, it is easy to train other users and they do not need to look at raw file data. Gettext is the next preferred option as it is widely used - but be aware that the PO files do need to be compiled before they can be used. Gettext supports some pretty advanced features, but they do have to be coded into the PO files before compiling. The last option is the straight PHP array - the simplest format, minimal processing but not ideal for a non-coder to be editing as values need to be correctly escaped etc (that is still a consideration of the others but less of an issue with a GUI).

It is not recommended to use INI files as there are many constraints on the key value - you cannot use spaces only A-Z, 0-9 and a handful of punctuation marks (,.-_). Scorpio does have an exporter for INI that will build new keys which must be used in your templates in place of actual text strings.