Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] PT model support follow up #217

Open
8 tasks
masahi opened this issue Feb 21, 2024 · 0 comments
Open
8 tasks

[Tracking] PT model support follow up #217

masahi opened this issue Feb 21, 2024 · 0 comments

Comments

@masahi
Copy link
Member

masahi commented Feb 21, 2024

#207 is only the first cut. Many TODO items are left

  • Fix memory profiling Enable running PyTorch models  #207 (comment)
  • Make single-gpu performance at parity with the MLC model
  • Make multi-gpu performance sane
  • Consider using cuda graph if we decide to keep the 2D padded input representation
  • Or, consider reverting the 2D input change
  • Revisit custom changes to our vllm fork https://github.com/octoml/vllm/tree/for-mlc-serve and minimize them
  • Figure out how to support other models besides the ones in vllm
  • Support parallel-sampling eviction by recompute (requires model change)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant