-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for serverless inference and multi-model endpoints on sagemaker #263
Comments
Thanks for these thoughts @ncullen93! You are the first one to express interest in these but we are definitely up for supporting more than only default deployments on SageMaker. For serverless inference, I believe we have exposed all the different parameters/args for model endpoints in When you say multimodel endpoints, do you mean one API with different endpoints for different models (like My own knowledge about how to not spend so much 💸 on SageMaker is limited to the usual advice, like choose the smallest instance you can work with and basic advice like that. One good thing about using vetiver is you are bringing your own container rather than using the pre-built containers (which have costs associated with them). Depending on your compliance needs, you might consider not keeping old versions of models around for very long (write a script to delete pin versions older than X days, or don't store more than one version to start with with |
Thanks for the response! I tried a ton to get serverless deployment working but no luck. With just some small tweaking of the endpoint config (see below) to match what's expected I was able to get the build endpoint creation process going. But it fails right when the plumber endpoint runs due to not paws not being able to find any credentials. Weird since the real-time inference works fine. I tried messing around with aws roles, etc but couldn't figure it out. Oh well... may try some more eventually. Config for serverless:
And re: the multi-model support, the idea I guess is mainly just to serve a ton of different models from the same server.. but I think serverless support would be most helpful in that direction. Thanks again! |
Thanks for looking into this! 🙌 Sounds like a blocker right now is getting the credentials set up for the serverless inference, to be able to access the S3 bucket where the model is stored. From this example, it looks like they say to just use the default SageMaker execution role but I know the permissions can get real fussy. I'm going to leave this issue open for now to get more info/feedback from folks about the interested in setting up serverless inference. |
Thanks to the fantastic maintainers for I still think it's worth researching more if the docker should be structured differently for a serverless deployment. I'm not sure if it's worth setting up the plumber api on serverless instead of just pulling the vetiver model from a board and running inference on it directly using |
Huge props to @DyfanJones as usual, for all his work on paws! 🙌 Sounds like we have some remaining issues to consider:
|
Hi there! I was wondering if either of these two things have been discussed or brought up as potential additions to the development roadmap.
Serverless inference means your endpoint wont always be available on sagemaker but greatly reduces the cost. I believe this is just changing a few parameters in the
vetiver_sm_endpoint
call so I will check it out.And I think multi-model endpoints are a solution to the need to have multiple models, as having a constantly running endpoint for every model would be so expensive. But this requires changing the docker file a bit from my understanding. So that may be something to handle in the sm-docker package.
Besides multi-model endpoints, is there any existing strategy for deploying on the order of ~100 different vetiver models to sagemaker? Or anywhere else for that matter - with emphasis on low cost / low memory while potentially giving up some good latency.
The text was updated successfully, but these errors were encountered: