Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/OCRAgentGoogleVision takes 1 positional argument but 2 were given #3659

Open
pprados opened this issue Sep 24, 2024 · 2 comments
Open

bug/OCRAgentGoogleVision takes 1 positional argument but 2 were given #3659

pprados opened this issue Sep 24, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@pprados
Copy link

pprados commented Sep 24, 2024

Describe the bug
Try to parse a pdf with OCR_AGENT=unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision.

To Reproduce
Provide a code snippet that reproduces the issue.

import os

os.environ[
    "OCR_AGENT"] = "unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision"

from unstructured.partition.pdf import partition_pdf

partition_pdf("fake-memo.pdf",
              strategy="hi_res",
              )

Expected behavior
No error

Environment Info
OS version: Linux-6.8.0-45-generic-x86_64-with-glibc2.39
Python version: 3.11.4
unstructured version: 0.15.14.dev1
unstructured-inference version: 0.7.36
pytesseract is not installed
Torch version: 2.4.1
Detectron2 is not installed
PaddleOCR version: None
Libmagic version: file-5.45
magic file from /etc/magic:/usr/share/misc/magic

Additional context
Add any other context about the problem here.

@pprados pprados added the bug Something isn't working label Sep 24, 2024
@DavidBlore
Copy link
Contributor

I’m experiencing the same issue. The issue arises in the OCRAgent's get_instance method as it expects all OCRAgents to have a language parameter in their constructor. See below for clarity:

@staticmethod
@functools.lru_cache(maxsize=None)
def get_instance(ocr_agent_module: str, language: str) -> "OCRAgent":
    module_name, class_name = ocr_agent_module.rsplit(".", 1)
    if module_name not in OCR_AGENT_MODULES_WHITELIST:
        raise ValueError(
            f"Environment variable OCR_AGENT module name {module_name} must be set to a "
            f"whitelisted module part of {OCR_AGENT_MODULES_WHITELIST}."
        )

    try:
        module = importlib.import_module(module_name)
        loaded_class = getattr(module, class_name)
        return loaded_class(language) # <--- This is where the issue occurs
    except (ImportError, AttributeError) as e:
        logger.error(f"Failed to get OCRAgent instance: {e}")
        raise RuntimeError(
            "Could not get the OCRAgent instance. Please check the OCR package and the "
            "OCR_AGENT environment variable."
        )

However, the OCRAgentGoogleVision class's constructor does not take in a language parameter in its constructor. Thus, the exception of OCRAgentGoogleVision takes 1 positional argument but 2 were given is thrown.

I'm willing to submit a PR to address this but want to know what the desired approach to solving this would be. Some possible options are:

  • Standardize Constructor Parameters: Ensure all OCRAgent classes have a language parameter by defining it in the OCRAgent abstract base class (ABC).
    • make this optional
    • don't make this optional
  • Modify OCRAgentGoogleVision: Add a language parameter to its constructor.
  • Conditional Instantiation: Modify ocr_interface.py to check if language is a parameter in the constructor. If it is, call loaded_class(language), otherwise call loaded_class().

@christinestraub
Copy link
Collaborator

christinestraub commented Sep 30, 2024

Hi @DavidBlore, Thank you for your willingness to submit a PR to address this issue. After considering the options you've presented, I believe the most suitable approach would be:
Modify OCRAgentGoogleVision: Add a language parameter to its constructor.
This approach offers several advantages:

  • Consistency: It aligns OCRAgentGoogleVision with other OCR agents like OCRAgentTesseract and OCRAgentPaddle, which already have language parameters.
  • Flexibility: By adding a language parameter with a default value, we maintain backwards compatibility while allowing for language specification when needed.
  • Simplicity: This change is straightforward to implement and doesn't require modifying the abstract base class or the ocr_interface.py file.

Implementation suggestion:

class OCRAgentGoogleVision(OCRAgent):
    def __init__(self, language='en'):
        super().__init__()
        self.language = language
        # ... rest of the constructor

This change allows users to specify a language if needed, but defaults to English ('en') if not provided, similar to other OCR agents.

Next steps:

  • Implement this change in the OCRAgentGoogleVision class.
  • Update any relevant documentation or type hints.
  • Add tests to ensure the new parameter works as expected.

github-merge-queue bot pushed a commit that referenced this issue Oct 14, 2024
This PR addresses issue #3659 by adding an optional `language` parameter
to the `OCRAgentGoogleVision` class constructor.

This parameter serves as a "language hint" for the
`document_text_detection` method in the `ImageAnnotatorClient`. For more
information on language hints, refer to the [Google Cloud Vision
documentation](https://cloud.google.com/vision/docs/languages).


**Default Behavior**: 
The language parameter defaults to None, allowing Google Cloud Vision to
auto-detect the language, as recommended in their documentation.

**Purpose**: 
This change is necessary because the `OCRAgent`'s `get_instance` method
expects all `OCRAgent`s to include a language parameter in their
constructors.

**Context on Issue:**
When trying to parse a PDF with
`OCR_AGENT=unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision`,
an error occurs in the `get_instance` method. The method expects a
`language` parameter, which the current `OCRAgentGoogleVision`
constructor does not support, leading to a positional argument error.

---------

Co-authored-by: Christine Straub <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants