OCR and Translation


Optical Character Recognition (OCR) is a specialized software used to scan text from PDF’s and other non-editable documents into a useable format. This is incredibly helpful for translators and translation agencies alike.  OCR software is not necessarily a costly investment, but it will provide a great return and save a great deal of time for linguists and translation agencies alike.

How does it work?

OCR software works with file types that are not editable in standard word processing applications such as Microsoft Word. This includes Adobe PDF files, but it can also include items such as images or handwritten documents. OCR will use an algorithm to detect patterns and characters in the text, and will export this data to a word processing software. This output can be used for a variety of purposes.

For the Translation Agency:

OCR provides great value to translation agencies in creating estimates and quotes for clients. Suppose there is a translation assignment that contains a plethora of handwritten documents, as well as several scanned images with large amounts of text. Without OCR software, the translation agency would likely do a visual scan of the documents and simply estimate the word count. These estimates are often not accurate, but are simply the best option as determining the word count manually is unrealistic, especially for large sets of documents. OCR allows a project manager to get a very precise word count without wasting a large amount of time.

For the Linguist:

A majority of professional translators utilize some sort of CAT tool, such as Trados. If a linguist is handed the same files mentioned above, this person will not be able to utilize any previously translated strings and phrases, as the software will not recognize the text of the document. When OCR is used, the output is an editable text file which can then be scanned by a CAT tool, saving the linguist a great deal of time on the project.

Do you have Translation or Interpretation needs? Find out more about DLS' Translation and Interpretation department HERE.