Get a quote
Palexgroup - Case Studies - Complex Desktop Publishing

Complex Desktop
Publishing: Hieroglyphic PDF
File Recreation for
Translation

This case study is about collaboration between a world-leading language service provider and Palex Group on the task of making “live” a highly difficult PDF file on a tight schedule.

Background

Client urgently needs 14 scanned pages of 50-year-old medical manuscript in Japanese and English to be recreated in Word format for further translation. Such complicated request might seem flatly impossible for any other team but not for Palex.
Source text in PDF

When does one need to perform this kind of job?

There is a non-editable source file (image, video, PDF etc.) which has the content that you need to copy, edit or translate freely. Simple saving the file to an editable copy is not an option here. It might be converted to editable format but in a very odd way, so you still have to work hard to make the text and format similar to the original file.
Source text in PDF

What is OCR and preparation for translation?

To make an editable file look just as the source file you will always need to perform 2 steps.

1. Optical Character Recognition or OCR

OCR software is used to render live text and content to editable format. This step comes with many inaccuracies related to content and format.
That is why human input is still required.

2. Desktop Publishing Preparation or DTPP

During this step DTPP professional performs some actions to prepare the editable file for further use or translation. For example, they make sure that all the required text is visible, necessary formatting is applied consistently, no extra line breaks left etc.

Can I do OCR and DTPP by myself?

Sure, you can. Just do not forget to take into account some factors: locale and fonts of the language installed on your PC, appropriate OCR software, bunch of time for formatting the file and Quality Assurance of the content and format.

While working on easily formattable European languages, Palex DTP engineers can speed up to 20-25 pages per hour and 1 page per minute for QA.

Difficult Asian and Right-to-Left languages with sophisticated layout take about 7-9 pages per hour and 1 page per more than 2 minutes for QA. Production can take even more time depending on the file.
If you do not have much time for this job, contact an expert to
estimate the turnaround time and budget within 1 business day.
Сontact an expert

Can I skip the OCR and DTPP steps for translation?

Yes, the solution here could be to make an “at sight” translation in the editable file. It would work well in case you are good to go with target plain text without formatting and do not care about translation quality. It would hardly work with tables, formulas, flowcharts and other elements.

This solution also precludes the use of CAT tools and comes with a high risk of omissions, additions, inconsistent terminology translation and other translation issues that could be easily checkedand corrected using QA tool.

In case you use the target text as the source file for more target languages, you have to multiply the risks and issues by the number of languages.
We recommend you to contact an expert who will estimate all the risks
and provide you with the best solution under your budget.
Сontact an expert
Let us get back to our 14-page
medical manuscript and the challenges we enjoyed.

Japanese language facts

~2,800

common characters

50,000+

total characters set

Japanese font should support

  • Kanji characters
  • Kana (Hiragana + Katakana) scripts

as Japanese uses all three

Source text in PDF

Facts about case study file

✅ Text in the file could be highlighted and copied.
✅ All the fonts are available.

❌ Automated OCR by regular software fails: hieroglyphs are corrupted.

The problem is Japanese OCR still far from ideal. The methods used for the Latin alphabet do not perform well with Japanese. The reasons are the complexity and number of Japanese characters.

Source text in PDF

The challenge

Having analyzed the file, Palex DTP team
qualified it as “highly complicated”:
Medical content should be treated carefully to avoid any issue
Columns and Asian language require additional time for OCR and formatting the file
Automated OCR by means of regular software fails which means manual work on the hieroglyphs
Lack of Japanese DTP resources available under the requested budget
High quality risk
Medical content
Columns and mix of languages
Asian language
Two days' turnaround time
Limited budget
Failed automated OCR of hieroglyphs
Non-native DTP team

The solution

Pre-production THE TEAM It is the key factor that affects time and quality of this project.

Native Japanese
DTP&QA team

All the steps are
outsourced
Lower language
quality risks
Risk of TAT failure, formatting issues due to untested quality, unprofitable result, limited timeframes to find the resources

Palex non-native
DTP&QA team

All the steps are
in-house
Lower formatting and
budget risks
High language quality
risks and TAT risks due
to heavy manual work
💡 HYBRID SOLUTION

Palex DTP&QA team +
native Japanese QA

In-house OCR, outsourced QA
by native Japanese linguist +
in-house final DTP&QA
Lower language quality risks
Lower formatting and budget risks
Risk of TAT failure due to manual
work but it is not very high
Project Manager always
analyzes the risks, pros and
cons of every workflow. It is too
risky to involve both teams and
here came the 3rd variant.
Production
1 Step

Automated and Manual OCR

  • Palex DTP engineer
  • 6+ hours of work
Automated OCR and manual recognition. Difficult hieroglyphs and formatting ignored.
2 Step

Native QA

  • Native Japanese linguist
  • 2+ hours of work
  • 700+ corrections on 14 pages
Adding missed hieroglyphs. Checking other recognized content.
3 Step

Automated and Manual OCR

  • Palex DTP engineer & QA specialist
  • 6+ hours of work
  • 100+ comments
Implementation of the linguist corrections. Quality check and formatting.

Post-production

Project delivery to client
Project finalization and project references update to keep all the information in order
Project post-mortem to аnalyze what went right or wrong and to develop preventive actions for future projects

The results

2 days turnaround time
13 hours of DTP works
2 hours of native checks
5 hours of non-native checks
100% fit in budget

Client benefits

01 Excellent quality source file for translation recreated from
uneditable PDF to editable Word under cost effective solution
03 Strong expert reputation and loyalty of
the customer (with the help of Palex)
02 Complex Asian OCR is added into client services list
04 Old manuscripts are revitalized
for effective worldwide use
In general, the work on the file took 1+ hour per page. It is hardly the fastest result
but we are still proud of it. We learned our lessons.
The native Asian DTP team was successfully tested
and we are ready for the new challenges!

Expert says

There are some languages, i.e. hieroglyphic and right-to-left, that require more time for Desktop Publishing Preparation and Desktop Publishing steps. When the task is to process these languages as source or target, you need to thoroughly review the content and layout to estimate risks, turnaround time and budget.

Palex team of Project Managers, Desktop Publishing Engineers and Quality Control Specialists has deep understanding of the most complicated localization engineering tasks, huge experience and some magic hints.

We take care of the most difficult files that are rejected by other teams and that would seem impossible to process within a short time frame.

Denis Sergeev

Localization Engineering Team Leader

Why PALEX?

Localization Engineering Services We know everything about recent Desktop Publishing and Multimedia trends and technologies.

We support our clients on their way to conquer the world. We help to localize all types of materials starting from simple Microsoft Office files to non-editable PDF-files and complex e-Learning courses with video, interactive tasks, subtitles, voice-over and questionnaires.
Subject Expertise Palex has expertise and dedicated Multilanguage team for about 50 markets and ready to support you with international expansion.

Reliable Partner

Crystal-clear reputation
Experienced player on the market (since 2002)
Deep understanding of localization solutions
Localization Engineering and QA departments
$3M-liability insurance
Talented team committed to support client’s mission
ISO 17100 and ISO 9001 certified