Introduce graph rewrite for mixture sub-graphs defined via `IfElse` Op #169

larryshamalama · 2022-08-28T02:35:25Z

Closes #76.

Akin to #154, this PR introduces a node_rewriter for IfElse. Effectively, this builds on the recently added switch_mixture_replace to accommodate mixture sub-graphs as the same essence but defined with a different Op: IfElse. Below is an example of the new functionality.

import aesara.tensor as at
from aesara.ifelse import ifelse

from aeppl.joint_logprob import joint_logprob

srng = at.random.RandomStream(seed=2320)

I_rv = srng.bernoulli(0.5, name="I")
X_rv = srng.normal(-10, 0.1, name="X")
Y_rv = srng.normal(10, 0.1, name="Y")

Z_rv = ifelse(I_rv, X_rv, Y_rv)
Z_rv.name = "Z"

z_vv = Z_rv.clone()
i_vv = I_rv.clone()

logp = joint_logprob({Z_rv: z_vv, I_rv: i_vv})
print(logp.eval({z_vv: -10, i_vv: 0})) # 0.6904993773247732
print(logp.eval({z_vv: -10, i_vv: 1})) # -19999.309500622676

tests/test_mixture.py

codecov · 2022-08-28T02:51:43Z

Codecov Report

Base: 95.15% // Head: 94.94% // Decreases project coverage by -0.21% ⚠️

Coverage data is based on head (3458414) compared to base (0959489).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 3458414 differs from pull request most recent head 0dd44af. Consider uploading reports for the commit 0dd44af to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #169      +/-   ##
==========================================
- Coverage   95.15%   94.94%   -0.22%     
==========================================
  Files          12       12              
  Lines        2023     1878     -145     
  Branches      253      280      +27     
==========================================
- Hits         1925     1783     -142     
+ Misses         56       53       -3     
  Partials       42       42

Impacted Files	Coverage Δ
aeppl/mixture.py	`97.72% <100.00%> (+1.13%)`	⬆️
aeppl/printing.py	`89.68% <0.00%> (-2.09%)`	⬇️
aeppl/rewriting.py	`94.00% <0.00%> (-0.18%)`	⬇️
aeppl/tensor.py	`85.71% <0.00%> (-0.14%)`	⬇️
aeppl/logprob.py	`98.02% <0.00%> (-0.05%)`	⬇️
aeppl/transforms.py	`96.43% <0.00%> (-0.02%)`	⬇️
aeppl/scan.py	`94.73% <0.00%> (ø)`
aeppl/dists.py	`94.50% <0.00%> (ø)`
aeppl/cumsum.py	`100.00% <0.00%> (ø)`
aeppl/abstract.py	`100.00% <0.00%> (ø)`
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

tests/test_mixture.py

larryshamalama · 2022-09-02T20:42:11Z

I added many tests; I used test_hetero_mixture_categorical for inspiration on what to cover. I have yet to do a thorough filtering on what exactly is needed. I seem to be running into issues with variable lifting, since computations are not exactly identical, making equal_computations to yield False. The reasons for failing tests can be more fundamental; I will look into this slowly.

larryshamalama · 2022-09-05T22:46:09Z

Here are some notes regarding why some tests are failing. The discrepancy between graphs occur here:

The first inputs to MixtureRV can be NoneConst or TensorConstant{0}.
Discrepancy in component shapes (e.g. TensorConstant{(1,) of -10.0} vs. TensorConstant{-10.0}).

This comment would serve as a reminder on what I am stuck on before the upcoming meeting.

larryshamalama · 2022-11-24T02:40:34Z

I'm revisiting this PR slowly and after quite some time. I'm investigating one of my failing test cases and I probably have forgotten many details just due to time passing... Consider the following code.

import aesara
import aesara.tensor as at
from aeppl.rewriting import construct_ir_fgraph

srng = at.random.RandomStream(29833)

X_rv = srng.normal(loc=[10, 20], scale=0.1, size=(2,), name="X")
Y_rv = srng.normal(loc=[-10, -20], scale=0.1, size=(2,), name="Y")

I_rv = srng.bernoulli([0.9, 0.1], size=(2,), name="I")
i_vv = I_rv.clone()
i_vv.name = "i"

Z1_rv = at.switch(I_rv, X_rv, Y_rv)
z_vv = Z1_rv.clone()
z_vv.name = "z1"

fgraph, _, _ = construct_ir_fgraph({Z1_rv: z_vv, I_rv: i_vv})
aesara.dprint(fgraph.outputs[0])

yields

SpecifyShape [id A]
 |MixtureRV{indices_end_idx=2, out_dtype='float64', out_broadcastable=(False,)} [id B]
 | |TensorConstant{0} [id C]
 | |bernoulli_rv{0, (0,), int64, False}.1 [id D] 'I'
 | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x16488AB20>) [id E]
 | | |TensorConstant{(1,) of 2} [id F]
 | | |TensorConstant{4} [id G]
 | | |TensorConstant{[0.9 0.1]} [id H]
 | |normal_rv{0, (0, 0), floatX, False}.1 [id I] 'X'
 | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x1648899A0>) [id J]
 | | |TensorConstant{(1,) of 2} [id F]
 | | |TensorConstant{11} [id K]
 | | |TensorConstant{[10 20]} [id L]
 | | |TensorConstant{0.1} [id M]
 | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'Y'
 |   |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x16488A340>) [id O]
 |   |TensorConstant{(1,) of 2} [id F]
 |   |TensorConstant{11} [id K]
 |   |TensorConstant{[-10 -20]} [id P]
 |   |TensorConstant{0.1} [id M]
 |TensorConstant{2} [id Q]
bernoulli_rv{0, (0,), int64, False}.1 [id D] 'I'

Where does the SpecifyShape Op come from? My understanding is that the switch_mixture_replace rewrite gets called and returns MixtureRV{indices_end_idx=2, out_dtype='float64', out_broadcastable=(False,)}.0 in this case, so the SpecifyShape Op must come from elsewhere...

rlouf · 2022-11-24T11:04:20Z

Hey, thanks for revisiting the PR! Do you mean running the code on this PR branch? If I run your code snippet on main I get the following result:

Elemwise{switch,no_inplace} [id A]
 |bernoulli_rv{0, (0,), int64, False}.1 [id B] 'I'
 | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F0239D0E880>) [id C]
 | |TensorConstant{(1,) of 2} [id D]
 | |TensorConstant{4} [id E]
 | |TensorConstant{[0.9 0.1]} [id F]
 |normal_rv{0, (0, 0), floatX, False}.1 [id G] 'X'
 | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F023B681B60>) [id H]
 | |TensorConstant{(1,) of 2} [id D]
 | |TensorConstant{11} [id I]
 | |TensorConstant{[10 20]} [id J]
 | |TensorConstant{0.1} [id K]
 |normal_rv{0, (0, 0), floatX, False}.1 [id L] 'Y'
   |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F0239D0DEE0>) [id M]
   |TensorConstant{(1,) of 2} [id D]
   |TensorConstant{11} [id I]
   |TensorConstant{[-10 -20]} [id N]
   |TensorConstant{0.1} [id K]

Have you considered dispatching Switch and Ifelse in separate functions for now? It may make it easier to progress without breaking Switch so you always have a working point of comparison. (you can keep the tests together)

You will also need to resolve the (small) merge conflict due to the new joint_logprob interface, but this should be easy.

larryshamalama · 2022-11-24T13:01:05Z

Hey, thanks for revisiting the PR! Do you mean running the code on this PR branch? If I run your code snippet on main I get the following result:

Yes, running the code on this branch! The graph rewrite for Switch mixtures on main does not work for non-scalar components...

Have you considered dispatching Switch and Ifelse in separate functions for now? It may make it easier to progress without breaking Switch so you always have a working point of comparison. (you can keep the tests together)

Yes, but I felt like that graph rewrite for both would be very similar. I can separate them as I work through them, for now...

You will also need to resolve the (small) merge conflict due to the new joint_logprob interface, but this should be easy.

Okay sounds good!

rlouf · 2022-11-24T13:22:00Z

Yes, but I felt like that graph rewrite for both would be very similar. I can separate them as I work through them, for now...

They likely will, and we may want to merge them later. But I think it would be easier for you to move from one stable state to another, changing one thing at a time, and always keeping a reference implementation (Switch) working.

larryshamalama · 2022-11-24T13:35:58Z

I just rebased my code. I am working on having Switch/IfElse-induced mixture subgraphs yield the same canonical (or IR? the graph obtained after running switch_mixture_replace) graph as if it were defined as a Subtensor. The SpecifyShape Op still comes up in the example above and I believe that I should modify my rewrite such that it does not show up. Is there a way to check what sequence of graph rewrites have been applied?

rlouf · 2022-11-24T14:32:16Z

Of course: https://aesara.readthedocs.io/en/latest/extending/graph_rewriting.html#detailed-profiling-of-aesara-rewrites.

Alternatively, you can add a breakpoint here, aesara.dprint(fgraph), note the index of the SpecifyShape node, then inspect the node using node = fgraph.toposort()[index] and then node.tag may contain information about the rewrite that created it.

brandonwillard · 2022-11-24T18:21:10Z

Of course: https://aesara.readthedocs.io/en/latest/extending/graph_rewriting.html#detailed-profiling-of-aesara-rewrites.

Alternatively, you can add a breakpoint here, aesara.dprint(fgraph), note the index of the SpecifyShape node, then inspect the node using node = fgraph.toposort()[index] and then node.tag may contain information about the rewrite that created it.

There's also with aesara.config.change_flags(optimizer_verbose=True): .... I use that a lot.

In general, as @rlouf said, don't be afraid to put breakpoint()s wherever you want. Our pre-commit settings will prevent those from being added to commits, so you don't even need to worry about forgetting them.

rlouf · 2022-12-06T06:43:04Z

Adding to this, you can set any reasonable IDE up so when you run tests it will open a debugger console whenever it hits a breakpoint or fails. If you don't have that in place already, spend some time setting it up; it was a huge boost in my productivity.

larryshamalama · 2022-12-08T02:28:06Z

Adding to this, you can set any reasonable IDE up so when you run tests it will open a debugger console whenever it hits a breakpoint or fails. If you don't have that in place already, spend some time setting it up; it was a huge boost in my productivity.

Thanks for the tip. I also saw your recent related tweet 😅

As for this PR, I am thinking that it's best to close it to 1) split the tasks into smaller sub-PRs (I felt like too much was going on at once) and 2) address some other issues that came up. As for the latter, I divided them into subsections below. Any guidance would be helpful...

Reworking `switch_mixture_replace`

Firstly, switch_mixture_replace isn't all correct... I only considered scalar indices and components. For instance, to make equal_computations yield True, I used a as_nontensor_scalar wrapper for indices which would not work in general.

aeppl/aeppl/mixture.py

Lines 340 to 342 in 473c1e6

    
           new_node = mix_op.make_node( 
        
               *([NoneConst, as_nontensor_scalar(node.inputs[0])] + mixture_rvs) 
        
           )

`SpecifyShape` Op

The appearance of the SpecifyShape Op seems to be new... perhaps due to this recent addition to Aesara? Maybe a good first step would be to replace out_broadcastable in MixtureRV with the corresponding static shapes, if available. Would this be a good first step?

aeppl/aeppl/mixture.py

Lines 180 to 197 in 473c1e6

    
           class MixtureRV(Op): 
        
               """A placeholder used to specify a log-likelihood for a mixture sub-graph.""" 
        
               __props__ = ("indices_end_idx", "out_dtype", "out_broadcastable") 
        
               def __init__(self, indices_end_idx, out_dtype, out_broadcastable): 
        
                   super().__init__() 
        
                   self.indices_end_idx = indices_end_idx 
        
                   self.out_dtype = out_dtype 
        
                   self.out_broadcastable = out_broadcastable 
        
               def make_node(self, *inputs): 
        
                   return Apply( 
        
                       self, list(inputs), [TensorType(self.out_dtype, self.out_broadcastable)()] 
        
                   ) 
        
               def perform(self, node, inputs, outputs): 
        
                   raise NotImplementedError("This is a stand-in Op.")  # pragma: no cover

Mismatch in `MixtureRV` shapes generated by `Switch` vs. `at.stack`

With the hot fix replacing broadcastable with shape, the MixtureRV shapes seem to be different if they are generated by a Switch vs. Join. Is this because subtensors don't have static shape inference yet? That would be my guess (Aesara issue #922?), but I'm not sure. Below is an example that I created using this branch's additions.

import aesara.tensor as at
from aeppl.rewriting import construct_ir_fgraph
from aeppl.mixture import MixtureRV

srng = at.random.RandomStream(29833)

X_rv = srng.normal([10, 20], 0.1, size=(2,), name="X")
Y_rv = srng.normal([-10, -20], 0.1, size=(2,), name="Y")

I_rv = srng.bernoulli([0.99, 0.01], size=(2,), name="I")
i_vv = I_rv.clone()
i_vv.name = "i"

Z1_rv = at.switch(I_rv, X_rv, Y_rv)
z_vv = Z1_rv.clone()
z_vv.name = "z1"

fgraph, _, _ = construct_ir_fgraph({Z1_rv: z_vv, I_rv: i_vv})

assert isinstance(fgraph.outputs[0].owner.op, MixtureRV)
assert not hasattr(
    fgraph.outputs[0].tag, "test_value"
)  # aesara.config.compute_test_value == "off"
assert fgraph.outputs[0].name is None

Z1_rv.name = "Z1"

fgraph, _, _ = construct_ir_fgraph({Z1_rv: z_vv, I_rv: i_vv})

assert fgraph.outputs[0].name == "Z1-mixture"

# building the identical graph but with a stack to check that mixture computations are identical

Z2_rv = at.stack((X_rv, Y_rv))[I_rv]

fgraph2, _, _ = construct_ir_fgraph({Z2_rv: z_vv, I_rv: i_vv})

fgraph.outputs[0].type.shape # (2,)
fgraph2.outputs[0].type.shape # (None, None)

`IfElse` mixture subgraphs

Given that IfElse requires scalar conditions, maybe it would be good to start with them instead of refining switch-mixtures... Happy to hear any thoughts about these points above. I feel like there's a lot going on, and it can be challenging to address all at once (especially given that this is continuation from this summer's work...)

rlouf · 2022-12-08T15:50:31Z

PRs that touch on core mechanisms in Aesara, or simply that implement big changes, can easily get frustrating. Breaking the problem down like you did is a great reaction to this situation. Do you mind if I keep it open and I come back to you later next week at least with some questions, maybe some insight?

larryshamalama · 2022-12-08T21:13:44Z

PRs that touch on core mechanisms in Aesara, or simply that implement big changes, can easily get frustrating. Breaking the problem down like you did is a great reaction to this situation. Do you mind if I keep it open and I come back to you later next week at least with some questions, maybe some insight?

Of course, not a problem at all!

larryshamalama · 2022-12-17T05:44:39Z

PRs that touch on core mechanisms in Aesara, or simply that implement big changes, can easily get frustrating. Breaking the problem down like you did is a great reaction to this situation. Do you mind if I keep it open and I come back to you later next week at least with some questions, maybe some insight?

@rlouf Just a quick update that @brandonwillard and I conversed recently, hence the recent force-push. The current focus is to ensure that the current mixture indexing operations via at.switch and at.stack match what NumPy would yield with np.where. For now, we leverage mixture_replace in constructing tests that should be passing and then further optimize the switch_mixture_replace rewrite. Afterwards, we can add the IfElse rewrite which should be similar to switch_mixture_replace but the different length of branch shapes (with same dimension) would need a bit of thought.

rlouf · 2022-12-17T08:22:11Z

Glad to hear this is back on track!

larryshamalama commented Aug 28, 2022

View reviewed changes

tests/test_mixture.py Show resolved Hide resolved

larryshamalama force-pushed the ifelse-mixtures branch from d0d4c7b to 3458414 Compare August 28, 2022 02:37

brandonwillard reviewed Aug 28, 2022

View reviewed changes

tests/test_mixture.py Outdated Show resolved Hide resolved

larryshamalama force-pushed the ifelse-mixtures branch from 28b20db to 47b2117 Compare September 2, 2022 20:40

larryshamalama marked this pull request as draft September 2, 2022 20:42

brandonwillard force-pushed the ifelse-mixtures branch from 47b2117 to b284e35 Compare September 7, 2022 15:34

larryshamalama force-pushed the ifelse-mixtures branch from b284e35 to 3f5ae48 Compare September 18, 2022 17:22

larryshamalama force-pushed the ifelse-mixtures branch from 3f5ae48 to 5ff4105 Compare November 24, 2022 13:26

larryshamalama force-pushed the ifelse-mixtures branch from 5ff4105 to f475bc9 Compare December 8, 2022 02:15

larryshamalama closed this Dec 8, 2022

rlouf reopened this Dec 8, 2022

brandonwillard force-pushed the ifelse-mixtures branch from f475bc9 to 8c9c0f3 Compare December 16, 2022 21:01

larryshamalama added 2 commits December 17, 2022 16:52

Identifying mixture sub-graphs defined with an IfElse Op

13633dd

Allow vector-valued indices for switch/ifelse mixture sub-graphs

0dd44af

larryshamalama force-pushed the ifelse-mixtures branch from b6aa902 to 0dd44af Compare December 17, 2022 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce graph rewrite for mixture sub-graphs defined via `IfElse` Op #169

Introduce graph rewrite for mixture sub-graphs defined via `IfElse` Op #169

larryshamalama commented Aug 28, 2022 •

edited

Loading

codecov bot commented Aug 28, 2022 •

edited

Loading

larryshamalama commented Sep 2, 2022

larryshamalama commented Sep 5, 2022

larryshamalama commented Nov 24, 2022 •

edited by rlouf

Loading

rlouf commented Nov 24, 2022 •

edited

Loading

larryshamalama commented Nov 24, 2022

rlouf commented Nov 24, 2022

larryshamalama commented Nov 24, 2022

rlouf commented Nov 24, 2022 •

edited

Loading

brandonwillard commented Nov 24, 2022 •

edited

Loading

rlouf commented Dec 6, 2022

larryshamalama commented Dec 8, 2022

rlouf commented Dec 8, 2022 •

edited

Loading

larryshamalama commented Dec 8, 2022

larryshamalama commented Dec 17, 2022

rlouf commented Dec 17, 2022

Introduce graph rewrite for mixture sub-graphs defined via IfElse Op #169

Are you sure you want to change the base?

Introduce graph rewrite for mixture sub-graphs defined via IfElse Op #169

Conversation

larryshamalama commented Aug 28, 2022 • edited Loading

codecov bot commented Aug 28, 2022 • edited Loading

Codecov Report

larryshamalama commented Sep 2, 2022

larryshamalama commented Sep 5, 2022

larryshamalama commented Nov 24, 2022 • edited by rlouf Loading

rlouf commented Nov 24, 2022 • edited Loading

larryshamalama commented Nov 24, 2022

rlouf commented Nov 24, 2022

larryshamalama commented Nov 24, 2022

rlouf commented Nov 24, 2022 • edited Loading

brandonwillard commented Nov 24, 2022 • edited Loading

rlouf commented Dec 6, 2022

larryshamalama commented Dec 8, 2022

Reworking switch_mixture_replace

SpecifyShape Op

Mismatch in MixtureRV shapes generated by Switch vs. at.stack

IfElse mixture subgraphs

rlouf commented Dec 8, 2022 • edited Loading

larryshamalama commented Dec 8, 2022

larryshamalama commented Dec 17, 2022

rlouf commented Dec 17, 2022

Introduce graph rewrite for mixture sub-graphs defined via `IfElse` Op #169

Introduce graph rewrite for mixture sub-graphs defined via `IfElse` Op #169

larryshamalama commented Aug 28, 2022 •

edited

Loading

codecov bot commented Aug 28, 2022 •

edited

Loading

larryshamalama commented Nov 24, 2022 •

edited by rlouf

Loading

rlouf commented Nov 24, 2022 •

edited

Loading

rlouf commented Nov 24, 2022 •

edited

Loading

brandonwillard commented Nov 24, 2022 •

edited

Loading

Reworking `switch_mixture_replace`

`SpecifyShape` Op

Mismatch in `MixtureRV` shapes generated by `Switch` vs. `at.stack`

`IfElse` mixture subgraphs

rlouf commented Dec 8, 2022 •

edited

Loading