Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding chat completion task to endpoint models #281

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

sadra-barikbin
Copy link
Contributor

@sadra-barikbin sadra-barikbin commented Aug 27, 2024

  • Package into a PR
  • Add tests
  • Adapt with huggingface_hub change in ChatCompletionInputMessage
  • Fix sth in tgi_model
  • Fix tiny bugs
  • Adapt integration test to new Pipeline
  • Adapt PR to new PromptManager

Hi there!

This PR attempts to address the need for evaluating endpoint models on chat completion tasks, i.e. using chat templating. BaseModel and NanotronModel
supported it through FewshotManager.fewshot_context() which applies chat template to the fewshot & query examples. For endpoint models we could either use
the very InferenceClient.text_generation() or the native IneferenceClient.chat_completion() apis. This PR attempts to use the latter.

Generally, could be fruitful if Lighteval makes use of huggingface_hub types extensively? At least for GenerativeResponse's result attribute to be of type ChatcompletionOutput|TextGenerationOutput and metrics work with inputs of these types as well so that we could evaluate function calling and tools easily. Or for GreedyUntilRequest's context attribute to be of type Conversation : TypeAlias = List[ChatCompletionInputMessage] to be able to feed tools params.

@@ -181,35 +182,33 @@ def init_fewshot_sampling_balanced(
def get_examples_with_chat_template(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this method to return List[ChatCompletionInputMessage] as InferenceClient.chat_completion() doesn't accept string. I made changes accordingly to BaseModel and NanotronModel to consider conversational contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment now is relevant to PromptManager.get_examples().

@@ -220,7 +219,7 @@ def get_examples(
return instruction + labeled_examples + example

def create_multi_turn_contexts(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a follow-up PR for multi-turn contexts to work with ChatCompletionInputMessage instead of str in FewshotManager and BaseModel .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is relevant to PromptManager._multi_turn_contexts().

)
from lighteval.utils.utils import EnvConfig, as_list


EndpointInput: TypeAlias = TextGenerationInput | ChatCompletionInput
EndpointOutput: TypeAlias = TextGenerationOutput | ChatCompletionOutput
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes I made to endpoint model was to pave the way for the day Lighteval might add evaluation of commercial models, or add the evaluation of other base tasks e.g. visual question answering, reusing most of the logic in the parent endpoint model. Endpoint model methods are organized as follows:

  • greedy_until(), loglikelihood(), loglikelihood_rolling(): public apis of the model that could be reused in inheriting endpoint models. These methods call _process_batch() or _async_process_batch()
  • _process_batch() and _async_process_batch(): for batch processing and could be reused in inheriting endpoint models. They call _prepare_request() and then _process_request().
  • _prepare_request(): bears the responsibility to convert the incoming request to EndpointInput which is one of the huggingface_hub.InferenceType predefined types. This also could be reused among different endpoint classes.
  • _process_request(): given the EndpointInput, it creates the EndpointOutput using the client. This is somewhat endpoint specific.
  • _process_generate_response() and _process_logprob_response(): endpoint specific methods taking care of creating ModelResponse using the EndpointOutput. Before, these were part of the greedy_until() and loglikelihood() methods.

Specifically, I wanted to propose this directory structure for endpoint models:

lighteval/
    models/
        endpoints/
            endpoint_model.py
            inference_endpoint_model.py
            tgi_model.py
            anthropic_model.py
            openai_model.py

in which endpoint_model.py holds most of the logic and the child models override some methods if necessary.

from lighteval.utils.imports import NO_TGI_ERROR_MSG, is_tgi_available


if is_tgi_available():
Copy link
Contributor Author

@sadra-barikbin sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TGI recommends using huggnigface_hub over text-generation.
https://github.com/huggingface/text-generation-inference/tree/main/clients/python

@@ -38,6 +44,9 @@ class RequestType(Enum):
GREEDY_UNTIL_MULTI_TURN = auto()


Context: TypeAlias = object
Copy link
Contributor Author

@sadra-barikbin sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced this type to account for both str and Conversation but in the future it could be for example ‍huggingface_hub.DocumentQuestionAnsweringInputData for Document Question Answering.

  • We could put additional types like Conversation, Context ,etc. in a lighteval/types.py as well.

Copy link
Contributor Author

@sadra-barikbin sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea: currently task.fewshot_sampler.fewshot_context() is the ultimate responsible for creating the context for a doc even if the task hasn't a few-shot setting. We could imagine having a context_augmenters attribute for the task giving it to prompt manager, containing everything that could augment the context like a few-shot manager or a RAG retriever and have them one by one apply themselves to the context ,starting from initial context which is the instruction+query, in the prompt manager's add_context_to_doc() method.

@sadra-barikbin sadra-barikbin marked this pull request as ready for review August 27, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant