Replies: 4 comments
-
You need to know which part of the image is covered by other stuff. This would be a sequence of rectangles which have a non-empty intersection with the image bbox on the page.
|
Beta Was this translation helpful? Give feedback.
-
Also note, that there is |
Beta Was this translation helpful? Give feedback.
-
With the coming v1.19.0, you can detect things like
For this to work, you must match the images (or other objects) in question with a new "bboxlog". This is a list of rectangles in the same sequence as they are used to build the page appearance. So an image rectangle wth a higher index in that list will cover (parts of) every object appearing earlier with an intersecting rectangle. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
My PDF has tons of cropped images and AFAIK PyMuPDF only allows me to extract raw (uncropped) ones.
I was wondering if any of the following is possible?
a. Discard hidden / cropped part of all images (similar to "Redact" -> "Sanitize" in Acrobat, without rasterizing) prior to extracting.
b. Obtain cropbox of each images so I can crop the extracted raw images using another library.
c. (Preferrably) Ignore cropped data during extracting (aka extract just the cropped images instead of raw ones).
Beta Was this translation helpful? Give feedback.
All reactions