Garbled text from PDF in Studio 2022
Autor de la hebra: toasty
toasty
toasty  Identity Verified
Italia
Local time: 07:34
Miembro 2013
italiano al inglés
Mar 6, 2023

Hello all,

Recently when trying to translate from PDFs in Studio 2022, the texts are all garbled, as you can see here:
Garbled pdf

This has happened with two different PDFs from two different clients, so I don't think it's a matter of a corrupted file.

Any way to fix this without converting to Word first?

Thanks in advance!


 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 07:34
Miembro 2006
inglés al afrikaans
+ ...
These look like OCR errors Mar 6, 2023

toasty wrote:
Any way to fix this without converting to Word first?

There is no way to fix this without converting to Word. In fact, converting to Word is what Trados does as well. Trados tries to perform OCR (optical character recognition) on the file, but whether it produces useful output depends on how bad the PDF file is to begin with. I'm guessing your PDF files are so poor that a standard OCR process can't convert it to text. This means that you have to convert it to text manually, by typing the text into a Word file.


 
toasty
toasty  Identity Verified
Italia
Local time: 07:34
Miembro 2013
italiano al inglés
PERSONA QUE INICIÓ LA HEBRA
High-quality PDFs actually Mar 6, 2023

Hi Sam,

The files are high-quality PDFs created from InDesign, not bad scans of old documents or something. In one case I managed to get the InDesign file, but as we all know, Studio is "incompatibile" with those now too.

In any case, thanks for confirming, I'll work directly from the PDF


 
Hans Lenting
Hans Lenting
Países Bajos
Miembro 2006
alemán al neerlandés
SolidPDF Mar 6, 2023

Samuel Murray wrote:

Trados tries to perform OCR (optical character recognition) on the file


Doesn't SolidPDF (the converter) try to convert without OCR first?

The extra markup is likely caused by kerning instructions in InDesign.

Perhaps you can convert the PDF manually to DOCX and then use either TransTools or David's CodeZapper?

BTW: You can send me an example and I will see how my CAT tool's OCR filter handles it.


 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 07:34
Miembro 2006
inglés al afrikaans
+ ...
Hans makes Mar 6, 2023

Hans Lenting wrote:
Doesn't SolidPDF (the converter) try to convert without OCR first?

Hans makes a good point -- the mistakes that I see, look like typical OCR errors, but it could be that Trados is just confused by some elements in the files.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Garbled text from PDF in Studio 2022







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »