I am often asked about various file formats and whether each is usable for translation. I still have not come across a file format that is not useable for translation, but some file formats are certainly easier to translate than others, or at least, they require far fewer steps to complete. Here are some of the most common file formats and the steps required to take them from the original file to a newly translated document:
Microsoft Word
MS Word is by far the easiest file format to translate. No text needs to be converted. It can simply be translated in the current environment. It is very helpful that, more often than not, the desired final file format is MS Word. Almost every CAT Tool (Computer Assisted Translation) in use today works directly with Word. This software is the ideal translation format; thus, the goal for other file formats is simply to convert them to this format.
PDF (Text)
I will discuss two types of PDF files here. The first is PDFs that contain text only, and the text is selectable. By selectable, I mean text that can be highlighted, copied, and pasted into another location. This format is almost as easy as MS Word. If text is selectable, it can often be converted directly to Microsoft Word as a .docx. Otherwise, it can simply be copied and pasted into a Word document.
PDF (Image/Scan)
PDFs that have unselectable text are trickier than their counterpart. The type of document I see most often in this format are handwritten files that have been scanned. There are a couple of options for translating a file such as this. The first is to use an Optical Character Recognition software (OCR). This software will read the text and try to replicate the original document in MS Word. The problem with this is that OCR is not guaranteed to be 100% accurate, so the final results with vary depending upon the quality of the original text. The second is to simply not work with the source text and translate it directly into a Word document. The problem is the inability to use a CAT tool with the assignment.
Quark/Indesign
These file formats require more steps than all the previous file types and require the expertise of individuals familiar with the software. Text in these formats must be extracted from the original and placed in an MS Word file. From here, the linguist will provide the translation and return it to be placed back in the original format. While this does not seem especially difficult, this step must be repeated several times during the review process. It is simply more difficult because each change requires extraction and re-insertion.
Do you have Translation or Interpretation needs? Find out more about DLS’ Translation and Interpretation department HERE.