How to check if pdf is PDF/A or not? And is there a way to convert PDF->PDF/A and vice versa? #2169

PasaOpasen · 2023-01-11T14:07:22Z

PasaOpasen
Jan 11, 2023

As I see, mostly PDF/A is PDF + fonts. Is there a way to detect PDF/A and to convert from/to PDF/A?

Jan 11, 2023

As I see, mostly PDF/A is PDF + fonts. Is there a way to detect PDF/A and to convert from/to PDF/A?

Well, that's a simplified characterization.
Anyway, PyMuPDF does not support PDF/A on output and saves in standard format. Cannot convert to PDF/A - and there is no intention to do so in the foreseeable future.

To determine if an input file is in some of those formats (yes: plural!), check the catalog. It will contain an "output intent dictionary", see page 641 / "Table 365" ("Document management - Portable document format - Part 1: PDF 1.7") for a description. To access it, get the catalog xref = doc.pdf_catalog() and then inspect doc.xref_get_key(xref, "OutputIntents"). This should be a…

View full answer

JorjMcKie · 2023-01-11T14:44:20Z

JorjMcKie
Jan 11, 2023
Maintainer

As I see, mostly PDF/A is PDF + fonts. Is there a way to detect PDF/A and to convert from/to PDF/A?

Well, that's a simplified characterization.
Anyway, PyMuPDF does not support PDF/A on output and saves in standard format. Cannot convert to PDF/A - and there is no intention to do so in the foreseeable future.

To determine if an input file is in some of those formats (yes: plural!), check the catalog. It will contain an "output intent dictionary", see page 641 / "Table 365" ("Document management - Portable document format - Part 1: PDF 1.7") for a description. To access it, get the catalog xref = doc.pdf_catalog() and then inspect doc.xref_get_key(xref, "OutputIntents"). This should be an array containing a list of one or more xrefs pointing to output intent dictionaries.

4 replies

PasaOpasen Jan 11, 2023
Author

@JorjMcKie see page 641 / "Table 365" ("Document management - Portable document format - Part 1: PDF 1.7") what document do u mean?

JorjMcKie Jan 11, 2023
Maintainer

theone referenced in the pymupdf documentation https://pymupdf.readthedocs.io/en/latest/app3.html#adobe-pdf-references

PasaOpasen Jan 11, 2023
Author

@JorjMcKie Thank u! So, again:

there is no way to PDF->PDF/A conversion (with fitz)
it is possible to check if document is like PDF/A by OutputIntents
what about PDF/A->PDF conversion?

JorjMcKie Jan 11, 2023
Maintainer

as I wrote: on saving any PDF, non-PDF/* will be created.

indigoviolet · 2023-12-15T22:03:00Z

indigoviolet
Dec 15, 2023

@JorjMcKie

Anyway, PyMuPDF does not support PDF/A on output and saves in standard format. Cannot convert to PDF/A - and there is no intention to do so in the foreseeable future.

Do you have any suggestions for how to convert PDFs to PDF/A programmatically? In my experience, Ghostscript conversion doesn't work well (I see missing text). The alternative I've found is Apryse which does work well for my small sample of texts, but we would ideally like to standardize on one PDF library. We already license PyMuPDF, so this would be a great fit for this problem as well.

1 reply

JorjMcKie Dec 16, 2023
Maintainer

This would have to be enabled by our base library MuPDF.
May I recommend you reach out to the developers in MuPDF's public Discord channel?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to check if pdf is PDF/A or not? And is there a way to convert PDF->PDF/A and vice versa? #2169

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to check if pdf is PDF/A or not? And is there a way to convert PDF->PDF/A and vice versa? #2169

PasaOpasen Jan 11, 2023

Replies: 2 comments · 5 replies

JorjMcKie Jan 11, 2023 Maintainer

PasaOpasen Jan 11, 2023 Author

JorjMcKie Jan 11, 2023 Maintainer

PasaOpasen Jan 11, 2023 Author

JorjMcKie Jan 11, 2023 Maintainer

indigoviolet Dec 15, 2023

JorjMcKie Dec 16, 2023 Maintainer

PasaOpasen
Jan 11, 2023

Replies: 2 comments 5 replies

JorjMcKie
Jan 11, 2023
Maintainer

PasaOpasen Jan 11, 2023
Author

JorjMcKie Jan 11, 2023
Maintainer

PasaOpasen Jan 11, 2023
Author

JorjMcKie Jan 11, 2023
Maintainer

indigoviolet
Dec 15, 2023

JorjMcKie Dec 16, 2023
Maintainer