I am often asked about a wide variety of file formats and whether or not each one is useable for translation. I still have not come across a file format that is not useable for translation, but some file formats are certainly easier to translate than others, or at least, they require far fewer steps to complete. Here are some of the most common file formats and the steps required to take them from original file to newly translated document:
MS Word is by far the easiest file format to translate. No text needs to be converted, it can simply be translated in the current environment. It is very helpful that more often than not, the desired final file format is MS Word. Almost every CAT Tool (Computer Assisted Translation) in use today works directly with Word. This software is the ideal format for translation, and thus the goal for other file formats is simply to convert them to this format.
There are 2 types of PDF files I will discuss here, the first being PDF’s that contain text only, and the text is selectable. By selectable, I mean text that can be highlighted, copied, and pasted into another location. This format is almost as easy as MS word. If text is selectable, it can often be converted directly to Microsoft Word as a .docx. Otherwise, it can simply be copied and pasted into a Word document.
PDF’s that have unselectable text are trickier than their counterpart. The type of document I see most often in this format are handwritten files that have been scanned. There are a couple of options for translating a file such as this. The first is to use an Optical Character Recognition software (OCR). This software will read the text and try and replicate the original document in MS Word. The problem with this is that OCR is not guaranteed to be 100% accurate, so the final results with vary depending upon the quality of the original text. The second is to simply not work with the source text and translate directly into a word document. The problem with this is the inability to use a CAT tool with the assignment.
These file formats require more steps than all the previous file types, and requires the expertise of individuals familiar with the software. Text in these file formats must be extracted from the original and placed in an MS word file. From here, the linguist will provide the translation and return it to be placed back in the original format. While this does not seem especially difficult, this step must be repeated several times during the review process. It is simply more difficult because each change requires extraction and re-insertion.
Do you have Translation or Interpretation needs? Find out more about DLS' Translation and Interpretation department HERE.