Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage Stats in Intermediate Steps #559

Open
jdp8 opened this issue Sep 12, 2024 · 3 comments
Open

Usage Stats in Intermediate Steps #559

jdp8 opened this issue Sep 12, 2024 · 3 comments

Comments

@jdp8
Copy link

jdp8 commented Sep 12, 2024

Hello, I saw that recently the runtimeStatsText() function might be deprecated and that now the usage metadata can be accessed with the streamOptions: { include_usage: True} in the stream request. However, I read that this can only be accessed in the last chunk, instead of at any time such as with runtimeStatsText().

I was wondering if it is possible to get this metadata in the intermediate steps when streaming. In other words, to get the usage metadata when the output chunks are being streamed.

Any assistance with this will be greatly appreciated. Thank you!

@tqchen
Copy link
Contributor

tqchen commented Sep 12, 2024

unfortunately dong so would mean the output won't align with the openai proctol, so likely we cannot support such a case, note that async streaming(between worker and the client) is still necessary for best performance

@jdp8
Copy link
Author

jdp8 commented Sep 12, 2024

I see, thank you. I saw that LangChain (Python) has support for this specific feature but only for OpenAI for now as mentioned here, referencing usage metadata in the intermediate steps. At least that's what I understood.

Just out of curiosity, will this support be added to WebLLM or is it something that has been discussed?

@CharlieFRuan
Copy link
Contributor

Thanks for the inquiry! IIUC, you are inquiring about accessing stats in the middle of a streaming generation of the model.

I do not exactly understand how the Langchain example in the link uses stats in the middle of a streaming generation. I think the "intermediate" is in terms of the event in Langchain's terminology, instead of in the middle of a generation?

Besides, WebLLM is integrated with Langchain.js, perhaps it is worth trying to in-place substitute OpenAI endpoint with WebLLM, and see if the behavior is the same API-wise: https://js.langchain.com/v0.2/docs/integrations/chat/web_llm/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants