A good PDF2JPG converter for Windows, anyone?
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 11:06
Member (2006)
English to Afrikaans
+ ...
Sep 18, 2017

Hello everyone

Can you recommend a PDF to JPG converter that runs on Windows 7 that can convert an editable PDF's individual pages to high quality JPGs of around 4000 x 5500 pixels? BMP or similar format is fine as well. My PDF is 400 pages long and weighs in at 1.4 MB. Free, if possible.

Thanks
Samuel


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
PDFBox Sep 18, 2017

Hello Samuel,

Have you tried PDFBox?

https://pdfbox.apache.org/

PDFBox comes with a series of command-line utilities. They are available as standard Java applications.

Check for PDFToImage utility on https://pdfbox.apache.org/2.0/commandline.html (JPG or
... See more
Hello Samuel,

Have you tried PDFBox?

https://pdfbox.apache.org/

PDFBox comes with a series of command-line utilities. They are available as standard Java applications.

Check for PDFToImage utility on https://pdfbox.apache.org/2.0/commandline.html (JPG or PNG supported)
You can tweak the output resolution by setting the -dpi option. Try -dpi 600.

It can also extract images from editable PDFs (see ExtractImages on the above link), among other things.

Note: I’m currently using it on GNU/Linux, but being a Java application, it should work just as fine on Windows.

PS: May I ask what you intend to do with those images? Other utilities can help you further manipulate the resulting images, depending on your purpose.

Jean

[Edited at 2017-09-18 14:21 GMT]

[Edited at 2017-09-18 14:22 GMT]
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 12:06
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
IrfanViewer Sep 18, 2017

Try IrfanViewer. It is a graphics viewer with a lot of conversion options.

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 11:06
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Jean Sep 18, 2017

Jean Dimitriadis wrote:
May I ask what you intend to do with those images?


I have a glossary in PDF format, and the page layout has two columns. My OCR program sometimes recognises such pages as a grid (i.e. converts it to a table), but sometimes recognises it as two columns (i.e. will write column 2 under column 1 in the final file), and sometimes recognises no columns (i.e. will "merge" the two columns on a per line basis with multiple spaces between the columns.

I want to convert this to JPG and then use e.g. XnView to slice the JPGs vertically in half, to feed it to OCR again. For this to work, the JPGs must high quality and they must be big (around 5500 x 2000 pixels per half), otherwise it leads to OCR errors.

When the OCR function runs on an editable PDF, it actually extracts the text, but when it runs off JPGs, it does actual OCR.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 11:06
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Re: IrfanView Sep 18, 2017

esperantisto wrote:
Try IrfanViewer. It is a graphics viewer with a lot of conversion options.


Thanks. IrfanView (with the PDF plugin installed) does convert individual pages of a PDF file to images, but the user has no control over the size of the images. It converts my PDF to images of 816 x 1056 pixels, which is woefully too small.


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
@Samuel Sep 18, 2017

I see, thank you for providing more details.

In this case, I would suggest you try Tabula directly on the PDF.

http://tabula.technology/

Tabula is a tool for liberating data tables locked inside PDF files.

If the columns are well defined, it can help you quickly select the tables area and then export as CSV, and other formats.

I have already used
... See more
I see, thank you for providing more details.

In this case, I would suggest you try Tabula directly on the PDF.

http://tabula.technology/

Tabula is a tool for liberating data tables locked inside PDF files.

If the columns are well defined, it can help you quickly select the tables area and then export as CSV, and other formats.

I have already used it to extract glossaries from PDFs.

Works fine, as long as the PDF is editable and the information in table form.

Jean
Collapse


 
neilmac
neilmac
Spain
Local time: 11:06
Spanish to English
+ ...
The culprit Sep 18, 2017

I hate getting sent PDFs which turnout to be JPGs. Now I know who's behind it all!

But seriously, I've often wondered why people do this and usually assume it's an oversight, especially when they are sending something which has to be modified/edited/translated... It would also be interesting to know if there is some kind of program that does the reverse, i.e. converts JPG back into PDF, or some other more amenable format.


 
esperantisto
esperantisto  Identity Verified
Local time: 12:06
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
OCR Sep 18, 2017

Samuel, I think, you're about to choose a wrong way. You should better explore the features of your OCR program. ABBYY FineReader has a feature of manually splitting pages. Or, even better, manually marking up tables.

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 11:06
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Re: ABBYY FineReader Sep 18, 2017

esperantisto wrote:
I think, you're about to choose a wrong way. You should better explore the features of your OCR program. ABBYY FineReader has a feature of manually splitting pages. Or, even better, manually marking up tables.


Manually marking up 400 pages is a non-starter, I'm afraid.

The "manually split" option is slightly faster than manually marking up whole pages, but only by a little bit. There is an option to automatically split images, but that is meant for cases where two pages were scanned onto a single PDF page. FineReader can't detect my pages' split point automatically.

You can see a sample page here (I've marked the left column in red, and two glossary entries with blue):
to split


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 11:06
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Jean and @Mac Sep 18, 2017

Jean Dimitriadis wrote:
In this case, I would suggest you try Tabula directly on the PDF.
http://tabula.technology/


Thanks, I can confirm that Tabula produces useful output. One has to select the two columns manually, but then there is an option to repeat the selections on subsequent pages automatically. It doesn't appear to have an option to preserve font colours and e.g. bold etc.

neilmac wrote:
I hate getting sent PDFs which turn out to be JPGs. ... I've often wondered why people do this and usually assume it's an oversight, especially when they are sending something which has to be modified/edited/translated...


Yes, but that is a different topic altogether. A previously editable PDF that was converted to a PDF with embedded images is a one-way conversion. The only way to convert in the other direction is to use an OCR program or a human typist.


 
Lincoln Hui
Lincoln Hui  Identity Verified
Hong Kong
Local time: 17:06
Member
Chinese to English
+ ...
Irfanview Sep 19, 2017

Samuel Murray wrote:

esperantisto wrote:
Try IrfanViewer. It is a graphics viewer with a lot of conversion options.


Thanks. IrfanView (with the PDF plugin installed) does convert individual pages of a PDF file to images, but the user has no control over the size of the images. It converts my PDF to images of 816 x 1056 pixels, which is woefully too small.

As far as individual pages are concerned, it's basically a matter of resizing it to whatever you please then export to a graphics format just as you would with any other type of image. I don't know if it has a way to deal with multiple pages.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

A good PDF2JPG converter for Windows, anyone?






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »