Skip to content

Use page.get_text("json", flags=2) to extract the text, and the bbox in the extraction result has a negative number #1104

Discussion options

You must be logged in to vote

Negative coordinates are not a bug necessarily. They may happen and if so, it was the PDF creator who is responsible.
By the way: your width and height are both positive, so this is not the problem.
If you want to see only those which has positive coordinate, specify a rectangle when extracting text: page.get_text(..., clip=rect).

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 participants
Converted from issue

This discussion was converted from issue #1103 on June 23, 2021 05:51.