[Me and my real life cat tool*] |
As a professional legal translator, I often am called upon to translate PDF
versions of official documents and utility notices as well as bank statements. As these
documents significantly vary in terms of formatting complexity and font
clarity, I have to choose the most efficient way of approaching the
translation, i.e., by hand, using a CAT tool or some combination of the two. I
present a recent project that included some ten such documents that had to be
translated from Hebrew to English, explain my approach and state my personal
conclusions.
Before going into detail, non-translators may need a short explanation
of the methods. Hand translation involves building a text, line by line,
adjusting font size and column widths to create a document that is visually
identical to the original. Even with practice, this method can be quite
time-consuming unless one has a template already (which I did not have in this
case, unfortunately). The more efficient way is to use an OCR application, ABBYY
FineReader on my computer, to convert the PDF into a Word text. The application
creates a Word document after first asking for confirmation of any letter that
it seems uncertain. Such “verifications” can range from a few to a page to a quarter
of it in worse cases. The factors influencing the convertibility include the
complexity of the formatting, type of font and quality of the PDF. Translators
then take the resulting Word document and, import it to a Computer Assisted
Translation (CAT) tool, MemoQ here, which creates sentence-level segments,
which are then translated one by one, with numbers and repetitions
automatically entered. Upon export, translators receive the same document in
the target language but formatting and font often must be tweaked to produce a
final document. This method is significantly faster in many cases and much more
accurate if numbers are involved.
The project in question involved 10 pages ranging from a text with a
simple format and clear font, a simple letter, to complex formatting and poor
PDF quality, a government notice and a utility bill as well as texts that
contained significant percentages of numbers combined with the short but
complex formatting on top (bank statements). I priced the document in terms of time
as if I would do all the documents by hand with my “profit” being how efficient
I can be.
In practice, I immediately removed three documents from the OCR
application as their formatting and font would not convert well, specifically
the utility bill and two government notices. While processing the documents in the OCR, I then removed two more documents as “verifying” the text and then
redoing the formatting would have been more time-consuming than simply
translating it manually. Two of the remaining documents came out almost perfect
but the bank statements were problematical as the OCR did not produce a
document properly reflecting the complex formatting on the upper part and its varying
size fonts. However, I chose to complete the scanning process and label the
bank account inputs as a table as it would ensure that there were no errors in
numbers and reduce my QA time in terms of eliminating the need to double check the
numbers.
Ultimately, I translated five documents by hand, essentially the
government notices and utility bills, which are quite complex in terms of
formatting. Two of the documents, the simple letter and a simple notice,
required almost no additional work after export from the CAT tool. On the other
hand, I took a hybrid approach to the bank statements, hand creating the short upper
part with all the account details but pasting the chart from the CAT tool
import to ensure that the numbers were correct, with a few minor tweaks.
In terms of time, the CAT tool did improve my efficiency to a certain
degree but not as much as theoretically possible due to the quality of the PDF
image and the type of font in this set of documents. In the future, I will immediately
remove those segments that are currently beyond the capacity of the OPR to
properly recognize in terms of text and format in order to avoid wasting time
on “verifying” text that I will not use. On a positive note, I discovered
that the combination of manual translation and CAT usage on a single document
can be an effective method. Live and learn.
* Use picture captions to allow the blind to fully access the Internet.
No comments:
Post a Comment