Skip to content

pymupdf4llm for multi-page table #3954

Answered by JorjMcKie
bjmvercelli asked this question in Q&A
Discussion options

You must be logged in to vote

No, we don't. This is a request that exceeds syntactical extraction logic. We are currently producing MD text page by page.
There is no effort to detect things crossing multiple pages. This not only applies to tables but also to e.g. text paragraphs.
To detect that a table on some page, actually continues the last table on the previous page would turn the existing (page-wise) logic on its head. In addition: if a table has no header row: how would we even ensure that it continues an earlier table:

  • Number of columns? No safe indicator!
  • In addition equal column widths? Still not safe. What is more: same column count but different column widths may still be a continuation.
  • So remains checkin…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by bjmvercelli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants