Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Rework Minibatch functionality #7496

Open
ferrine opened this issue Sep 7, 2024 · 0 comments
Open

ENH: Rework Minibatch functionality #7496

ferrine opened this issue Sep 7, 2024 · 0 comments

Comments

@ferrine
Copy link
Member

ferrine commented Sep 7, 2024

Before

Minibatches are tricky to work with, and they also rely on carefully constructed random graph which does not work as expected in particular scenarios. It was supposed to work with ADVI graph and when used outside the scope it misbehaves. Couple of immediate issues I know about:

  • Can't be used in leapfrog step iteration, random state will change between evaluation
  • Complicates the internals of pymc
  • Consumes a lot of memory since all the dataset is stored in memory
  • Does not scale well since it uses advanced indexing or a random draw

With all that in an attempt to improve and ADVI it does very poor job. Not scalable, fragile.

After

What can be done differently, is using minibatches in a traditional way, like they are used in e.g. Pytorch. There is a function producing a new batch that is passed to the loss function. In our case we can use callbacks that are called after every ADVI iteration and this callback will reset the shared variable state, making the approach much more scalable and less hacky.

with pm.Model():
    a = pm.Normal("a", total_size=1000000, observed=data) # apply scaling
    minibatch = MinibatchCallback(iterable, [data])
    fit = pm.fit(callbacks=[minibatch])

Context for the issue:

No response

@ferrine ferrine changed the title ENH: Deprecate Minibatch functionality ENH: Rework Minibatch functionality Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant