Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of making a custom handler to "advanced use case" vignette #114

Open
tylerlittlefield opened this issue Jun 28, 2022 · 4 comments

Comments

@tylerlittlefield
Copy link

I am building two models, each one will predict a single outcome, both are based on the same dataset but will have their own recipe/workflow. With that said, I want to serve both of these models at the same URL e.g., models.example.com/mymodel. If I want to use vetiver, what are some options for this task? I think I can make multiple endpoints like this:

#* @plumber
function(pr) {
  pr %>% 
    vetiver_pr_post(v1, path = "/predict_label") %>% 
    vetiver_pr_docs(v1, path = "/predict_label") %>% 
    vetiver_pr_post(v2, path = "/predict_type") %>%
    vetiver_pr_post(v2, path = "/predict_type")
}

However, I was hoping that I could continue to use predict(endpoint, newdata) and return both outcomes but I am struggling to figure out how that works. Everything I am looking at seems to be focused on a single model returning one outcome. I am really interested in keeping the predict(endpoint, newdata) approach (vs numerous endpoints) so that I can provide a consistent developer experience for models I develop in the future.

@juliasilge
Copy link
Member

juliasilge commented Jun 29, 2022

I think the best way to do this is to write your own custom handler. It would look like this:

library(tidymodels)

fit1 <- workflow(mpg ~ ., linear_reg()) %>% fit(mtcars)
fit2 <- workflow(mpg ~ ., svm_linear(mode = "regression")) %>% fit(mtcars)

library(vetiver)
#> 
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#> 
#>     load_pkgs
v1 <- vetiver_model(fit1, "linear-cars")
v2 <- vetiver_model(fit2, "svm-cars")

library(plumber)


handle_both <- function(v1, v2, ...) {
    function(req) {
        new_data <- req$body
        new_data <-  vetiver_type_convert(new_data, v1$ptype)
        bind_cols(
            pred_linear = predict(v1$model, new_data = new_data, ...)$.pred,
            pred_svm = predict(v2$model, new_data = new_data, ...)$.pred
        )
    }
}

pr() %>%
    vetiver_pr_post(v1, path = "linear") %>%
    vetiver_pr_post(v2, path = "svm") %>%
    pr_post(path = "predict", handler = handle_both(v1, v2)) %>%
    vetiver_pr_docs(v1)
#> # Plumber router with 5 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/linear (POST)
#> ├──/logo
#> │  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/vetiver
#> ├──/ping (GET, GET)
#> ├──/predict (POST)
#> └──/svm (POST)

Created on 2022-06-29 by the reprex package (v2.0.1)

I believe this will work best if you do add both of the models separately as well at their own endpoints. This would be a good example for #68 as a more advanced use case, how to write a custom handler and integrate it. This only works for two models that use the same input data prototype.

You can read more about vetiver handlers in the docs.

@tylerlittlefield
Copy link
Author

This is close to what I was looking for but this part:

This only works for two models that use the same input data prototype.

I think this might be a problem for me as I can imagine certain models will require different features. Would it make sense to support "wrappers" or "handlers" such as the one I am describing? If not, it shouldn't be too much hassle on my end. I can always just refer people to the API docs and they can make individual requests from the various endpoints.

@juliasilge
Copy link
Member

If the different models start with the same data but involve different feature engineering, that is no problem because recipes will handle that. If they need different input variables altogether (like v1 uses hp and drat but v2 uses drat and wt) then you would have to decide how to handle that. You could do something like use step_rm() from recipes to get rid of variables you don't want, use a custom input data prototype in save_ptype that includes the union of all the needed variables, and/or turn type checking off (but the checking is definitely one of the great features of using vetiver).

@tylerlittlefield
Copy link
Author

tylerlittlefield commented Jun 29, 2022

Ah, I see! So I could do something like this:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(vetiver)
#> 
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#> 
#>     load_pkgs
library(plumber)

preproc1 <- mtcars %>% 
  recipe(mpg ~ .) %>% 
  step_rm(everything(), -c(mpg, cyl))

preproc2 <- mtcars %>% 
  recipe(mpg ~ .) %>% 
  step_rm(everything(), -c(mpg, wt, carb))

fit1 <- workflow(preproc1, linear_reg()) %>% fit(mtcars)
fit2 <- workflow(preproc2, svm_linear(mode = "regression")) %>% fit(mtcars)

v1 <- vetiver_model(fit1, "linear-cars")
v2 <- vetiver_model(fit2, "svm-cars")

handle_both <- function(v1, v2, ...) {
  function(req) {
    new_data <- req$body
    new_data <- vetiver_type_convert(new_data, v1$ptype)
    bind_cols(
      pred_linear = predict(v1$model, new_data = new_data, ...)$.pred,
      pred_svm = predict(v2$model, new_data = new_data, ...)$.pred
    )
  }
}

pr() %>%
  vetiver_pr_post(v1, path = "linear") %>%
  vetiver_pr_post(v2, path = "svm") %>%
  pr_post(path = "predict", handler = handle_both(v1, v2)) %>%
  vetiver_pr_docs(v1)
#> # Plumber router with 5 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/linear (POST)
#> ├──/logo
#> │  │ # Plumber static router serving from directory: /Users/tlittlef/Library/R/x86_64/4.1/library/vetiver
#> ├──/ping (GET, GET)
#> ├──/predict (POST)
#> └──/svm (POST)

Created on 2022-06-29 by the reprex package (v2.0.1)

If so, I think I might chose this over a custom data prototype as that prototype may change over time and I would have to go back and update all my models. With the step_rm() approach, I think I wouldn't need to worry about this. The only side effect I can think of is that the input data might require more data than is actually necessary. But now that I think about it, I will likely try to lock down all necessary features for all models involved early on.

Either way, really appreciate all the support. It has been very helpful!

@juliasilge juliasilge changed the title Multiple outcomes support? Add example of making a custom handler to "advanced use case" vignette Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants