How to check if pdf is PDF/A or not? And is there a way to convert PDF->PDF/A and vice versa? #2169
-
As I see, mostly PDF/A is PDF + fonts. Is there a way to detect PDF/A and to convert from/to PDF/A? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Well, that's a simplified characterization. To determine if an input file is in some of those formats (yes: plural!), check the catalog. It will contain an "output intent dictionary", see page 641 / "Table 365" ("Document management - Portable document format - Part 1: PDF 1.7") for a description. To access it, get the catalog |
Beta Was this translation helpful? Give feedback.
-
Do you have any suggestions for how to convert PDFs to PDF/A programmatically? In my experience, Ghostscript conversion doesn't work well (I see missing text). The alternative I've found is Apryse which does work well for my small sample of texts, but we would ideally like to standardize on one PDF library. We already license PyMuPDF, so this would be a great fit for this problem as well. |
Beta Was this translation helpful? Give feedback.
Well, that's a simplified characterization.
Anyway, PyMuPDF does not support PDF/A on output and saves in standard format. Cannot convert to PDF/A - and there is no intention to do so in the foreseeable future.
To determine if an input file is in some of those formats (yes: plural!), check the catalog. It will contain an "output intent dictionary", see page 641 / "Table 365" ("Document management - Portable document format - Part 1: PDF 1.7") for a description. To access it, get the catalog
xref = doc.pdf_catalog()
and then inspectdoc.xref_get_key(xref, "OutputIntents")
. This should be a…