Adding chat completion task to endpoint models #281

sadra-barikbin · 2024-08-27T08:46:35Z

Package into a PR
Add tests
Adapt with huggingface_hub change in ChatCompletionInputMessage
Fix sth in tgi_model
Fix tiny bugs
Adapt integration test to new Pipeline
Adapt PR to new PromptManager

Hi there!

This PR attempts to address the need for evaluating endpoint models on chat completion tasks, i.e. using chat templating. BaseModel and NanotronModel
supported it through FewshotManager.fewshot_context() which applies chat template to the fewshot & query examples. For endpoint models we could either use
the very InferenceClient.text_generation() or the native IneferenceClient.chat_completion() apis. This PR attempts to use the latter.

Generally, could be fruitful if Lighteval makes use of huggingface_hub types extensively? At least for GenerativeResponse's result attribute to be of type ChatcompletionOutput|TextGenerationOutput and metrics work with inputs of these types as well so that we could evaluate function calling and tools easily. Or for GreedyUntilRequest's context attribute to be of type Conversation : TypeAlias = List[ChatCompletionInputMessage] to be able to feed tools params.

sadra-barikbin · 2024-08-27T09:48:08Z

src/lighteval/few_shot_manager.py

@@ -181,35 +182,33 @@ def init_fewshot_sampling_balanced(
    def get_examples_with_chat_template(


I had to change this method to return List[ChatCompletionInputMessage] as InferenceClient.chat_completion() doesn't accept string. I made changes accordingly to BaseModel and NanotronModel to consider conversational contexts.

This comment now is relevant to PromptManager.get_examples().

sadra-barikbin · 2024-08-27T09:50:47Z

src/lighteval/few_shot_manager.py

@@ -220,7 +219,7 @@ def get_examples(
        return instruction + labeled_examples + example

    def create_multi_turn_contexts(


I will create a follow-up PR for multi-turn contexts to work with ChatCompletionInputMessage instead of str in FewshotManager and BaseModel .

Now this is relevant to PromptManager._multi_turn_contexts().

sadra-barikbin · 2024-08-27T11:17:20Z

src/lighteval/models/endpoint_model.py

 )
 from lighteval.utils.utils import EnvConfig, as_list


+EndpointInput: TypeAlias = TextGenerationInput | ChatCompletionInput
+EndpointOutput: TypeAlias = TextGenerationOutput | ChatCompletionOutput


The changes I made to endpoint model was to pave the way for the day Lighteval might add evaluation of commercial models, or add the evaluation of other base tasks e.g. visual question answering, reusing most of the logic in the parent endpoint model. Endpoint model methods are organized as follows:

greedy_until(), loglikelihood(), loglikelihood_rolling(): public apis of the model that could be reused in inheriting endpoint models. These methods call _process_batch() or _async_process_batch()

_process_batch() and _async_process_batch(): for batch processing and could be reused in inheriting endpoint models. They call _prepare_request() and then _process_request().

_prepare_request(): bears the responsibility to convert the incoming request to EndpointInput which is one of the huggingface_hub.InferenceType predefined types. This also could be reused among different endpoint classes.

_process_request(): given the EndpointInput, it creates the EndpointOutput using the client. This is somewhat endpoint specific.

_process_generate_response() and _process_logprob_response(): endpoint specific methods taking care of creating ModelResponse using the EndpointOutput. Before, these were part of the greedy_until() and loglikelihood() methods.

Specifically, I wanted to propose this directory structure for endpoint models:

lighteval/ models/ endpoints/ endpoint_model.py inference_endpoint_model.py tgi_model.py anthropic_model.py openai_model.py

in which endpoint_model.py holds most of the logic and the child models override some methods if necessary.

src/lighteval/models/nanotron_model.py

sadra-barikbin · 2024-08-27T11:20:34Z

src/lighteval/models/tgi_model.py

-from lighteval.utils.imports import NO_TGI_ERROR_MSG, is_tgi_available
-
-
-if is_tgi_available():


TGI recommends using huggnigface_hub over text-generation.
https://github.com/huggingface/text-generation-inference/tree/main/clients/python

sadra-barikbin · 2024-08-27T11:39:29Z

src/lighteval/tasks/requests.py

@@ -38,6 +44,9 @@ class RequestType(Enum):
    GREEDY_UNTIL_MULTI_TURN = auto()


+Context: TypeAlias = object


I introduced this type to account for both str and Conversation but in the future it could be for example ‍huggingface_hub.DocumentQuestionAnsweringInputData for Document Question Answering.

We could put additional types like Conversation, Context ,etc. in a lighteval/types.py as well.

An idea: ~~currently task.fewshot_sampler.fewshot_context() is the ultimate responsible for creating the context for a doc even if the task hasn't a few-shot setting~~. We could imagine having a context_augmenters attribute for the task giving it to prompt manager, containing everything that could augment the context like a few-shot manager or a RAG retriever and have them one by one apply themselves to the context ,starting from initial context which is the instruction+query, in the prompt manager's add_context_to_doc() method.

forgot to add in base_model.py

sadra-barikbin added 4 commits August 23, 2024 11:04

Package into a PR

99247bc

Add tests

7dbad3a

Adapt with huggingface_hub change in ChatCompletionInputMessage

e1a5bc1

Fix sth in tgi_model

b44362b

sadra-barikbin commented Aug 27, 2024

View reviewed changes

src/lighteval/models/nanotron_model.py Show resolved Hide resolved

sadra-barikbin commented Aug 27, 2024

View reviewed changes

sadra-barikbin marked this pull request as ready for review August 27, 2024 12:35

Merge branch 'main' into feature-chat-models

f881dc3

sadra-barikbin force-pushed the feature-chat-models branch from de60b36 to f881dc3 Compare August 27, 2024 13:15

sadra-barikbin added 3 commits August 28, 2024 07:01

Fix tiny bugs

a31d2a2

Adapt to Pipeline

ec769c5

Fix a tiny bug

b291871

forgot to add in base_model.py

sadra-barikbin force-pushed the feature-chat-models branch from a111ce0 to c3ac5d6 Compare September 5, 2024 15:07

Improve endpoint tests and bug fix in endpoint model

8c0018e

sadra-barikbin force-pushed the feature-chat-models branch from c3ac5d6 to 8c0018e Compare September 6, 2024 21:53

sadra-barikbin added 2 commits September 23, 2024 19:48

Fix tests

cafb1e6

Add grammar param to endpoint model inputs

fbe1398

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding chat completion task to endpoint models #281

Adding chat completion task to endpoint models #281

sadra-barikbin commented Aug 27, 2024 •

edited

Loading

sadra-barikbin Aug 27, 2024

sadra-barikbin Aug 27, 2024

sadra-barikbin Aug 27, 2024

sadra-barikbin Aug 27, 2024

sadra-barikbin Aug 27, 2024

sadra-barikbin Aug 27, 2024 •

edited

Loading

sadra-barikbin Aug 27, 2024 •

edited

Loading

sadra-barikbin Aug 27, 2024 •

edited

Loading

		@@ -181,35 +182,33 @@ def init_fewshot_sampling_balanced(
		def get_examples_with_chat_template(

		@@ -220,7 +219,7 @@ def get_examples(
		return instruction + labeled_examples + example

		def create_multi_turn_contexts(

		from lighteval.utils.imports import NO_TGI_ERROR_MSG, is_tgi_available


		if is_tgi_available():

		@@ -38,6 +44,9 @@ class RequestType(Enum):
		GREEDY_UNTIL_MULTI_TURN = auto()


		Context: TypeAlias = object

Adding chat completion task to endpoint models #281

Are you sure you want to change the base?

Adding chat completion task to endpoint models #281

Conversation

sadra-barikbin commented Aug 27, 2024 • edited Loading

sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sadra-barikbin Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sadra-barikbin commented Aug 27, 2024 •

edited

Loading

sadra-barikbin Aug 27, 2024 •

edited

Loading

sadra-barikbin Aug 27, 2024 •

edited

Loading

sadra-barikbin Aug 27, 2024 •

edited

Loading