adding fixes so transducer can work again #247

bonham79 · 2024-09-15T06:19:08Z

Summary: I fixed maxwell so TQDM isn't a property of the SED parameters anymore, this allows pickling of the expert module again and thus allows multigpu training across the transducer.

In progress I changed how the expert module initializes so that it just copies the index vocabulary from the dataloader that's passed to it. So now you just need to pass an index to the action vocabulary and everything is managed in the backend. This allows free initialization of expert modules from checkpoints and thus skips epochs of em when resuming from a checkpointing. (I just do the same thing we do with indexes in which you write the sed parameters to the experiment directory and load it when initializing the model.)

I also added new flags so that you can just skip em training for the expert. This is for an upcoming change in which the transducer is no longer dependent on having an oracle function. (I've found through training that the SED actually doesn't add that much to training.) It also allows the creation of dummy experts that just hold the action vocabulary. Added error checks to prevent unsafe use. Feel free to point out more.

I also moved adams changes from #233 into the trainer so that there's no weird attribute managing going on in the init anymore. ( Turns out that checkpointing pickles that kwargs dict, so simply adding the action vocabulary was creating too large an embedding space.)

I ran experiments over the Polish data and was able to write to predictions fairly easily. Only major issue is that we're wasting some parameters on creating target vocabulary embeddings that will never be used. But that's a low bar on the efficiency stack.

Signed-off-by: Bonham79 <[email protected]>

…pochs > 0 Signed-off-by: Bonham79 <[email protected]>

yoyodyne/models/expert.py

kylebgorman

Looks good to me but some questions and style notes. Thanks Travis.

yoyodyne/models/expert.py

kylebgorman · 2024-09-16T15:01:01Z

yoyodyne/models/expert.py

-                actions.encode_actions(source)
-
-    actions = ActionVocabulary(unk_idx=train_data.index.unk_idx)
+        assert data.has_target, """Passed dataset with no target to expert


These are exceptions elsewhere, why is this an assertion?

This is more of a sanity check thing to me. Makes more sense to write a single assert line than defining an error class, doing a coinditional check, then raising. Just an economy thing I guess.

yoyodyne/models/expert.py

yoyodyne/train.py

kylebgorman · 2024-09-16T15:24:27Z

yoyodyne/models/transducer.py

@@ -557,11 +560,19 @@ def predict_step(self, batch: Tuple[torch.tensor], batch_idx: int) -> Dict:

    def convert_prediction(self, prediction: List[List[int]]) -> torch.Tensor:
        """Converts prediction values to tensor for evaluator compatibility."""
+        # FIXME: the two steps below may be partially redundant.


What are the "two steps" referred to here?

That's from Adam's PR, no comment.

Ah, let's remove it then since we don't know what it means.

I think this is copied from #233

iirc I meant that looping and stacking predictions, and calling util.pad_tensor_after_eos may do some redundant things that could be cleaned up at some point.

So should we leave the comment to clean up later or just remove now?

Adamits · 2024-09-16T16:25:34Z

yoyodyne/models/expert.py

@@ -32,19 +37,24 @@ class ActionVocabulary:
    start_vocab_idx: int
    target_characters: Set[Any]


Why Any and not str?

Technically the edit actions can accept any hashable symbol. So strings, ints, even tuples. So any is a better representation of its coverage given that maxwell is also symbol agnostic.

yoyodyne/models/base.py

Adamits · 2024-09-16T16:53:02Z

yoyodyne/models/expert.py

-        self.target_characters = set()
-        self.encode_actions([unk_idx])  # Sets unknown character decoding.
+        # Use index from dataset to create action vocabulary.
+        self.encode_actions([index(t) for t in index.target_vocabulary])


Can we start an issue to document exactly what's going on here? encode_actions converts vocab into Actions and stored them in a separate vocabulary, right?

Yeah, it nees to track all potential edit actions for a given symbol.

Adamits · 2024-09-16T16:54:44Z

yoyodyne/models/expert.py

-        SED training over the default data sampling is expensive.
-        Training is quicker if tensors are converted to lists.
-        For efficiency, we encode action vocabulary simultaneously.
+        We want just the encodings without BOS or EOS tokens. This


Can we add a more general comment here before this for the unfamiliar user: this basically converts the dataset into a format that is usable by the Expert -- is that right?

We just want the raw symbols string. We don't want to worry about BOS and EOS being encoded by maxwell.

What comment would make sense to you? (I'm too familiar for a good user friendly doc string.)

yoyodyne/models/expert.py

Adamits · 2024-09-16T17:01:09Z

yoyodyne/models/transducer.py

@@ -557,11 +560,19 @@ def predict_step(self, batch: Tuple[torch.tensor], batch_idx: int) -> Dict:

    def convert_prediction(self, prediction: List[List[int]]) -> torch.Tensor:
        """Converts prediction values to tensor for evaluator compatibility."""
+        # FIXME: the two steps below may be partially redundant.


I think this is copied from #233

iirc I meant that looping and stacking predictions, and calling util.pad_tensor_after_eos may do some redundant things that could be cleaned up at some point.

Adamits · 2024-09-16T17:03:55Z

yoyodyne/train.py

@@ -273,8 +277,16 @@ def get_model_from_argparse_args(
        source_attention_heads=args.source_attention_heads,
        source_encoder_cls=source_encoder_cls,
        start_idx=datamodule.index.start_idx,
-        target_vocab_size=datamodule.index.target_vocab_size,
-        vocab_size=datamodule.index.vocab_size,
+        target_vocab_size=(


So now we orchestrate the vocab sizes once in the trainer, which already has a handle on the initialized expert? This is much nicer than w/e I was trying to do before.

It still somehow feels clunky I think, but that is an effect of the single embeddings matrix updates not easily aligning to the expert.

Yeah it's still a bit clunky but at least the clunkiness is occuring in the train script (which is localized clumsiness). Probably a good next issue is to break up the train script some more so it's less of a monolith.

Adamits · 2024-09-16T17:13:04Z

yoyodyne/models/expert.py

+            )
+            sed_aligner.params.write_params(sed_params_path)
+        else:
+            sed_params = sed.ParamDict.read_params(sed_params_path)


Am I understanding correctly that in order to use an existing sed, the user would specify 0 or None for oracle_em_epochs? Avoiding rerunning em every time is a great feature, but I wonder if we can think through a more intuitive user interface for it, this feels a bit buried. Maybe you've already put some thought into this though so please let me know if you think this is already a good interface.

Oh sorry I see the comment in the method header. I missed it before.

I just added a bool to the train script that's triggered if a sed file already exists. I think this is a bit cleaner. Thoughts?

Adamits · 2024-09-16T17:24:14Z

yoyodyne/models/expert.py

+        sed_params_path (str): path to read/write location of sed parameters.
+            If epochs > 0, this is a write path.
+            If epochs == 0, this is a read path.
+            If empty string then creates 'dummy' expert.


Maybe add a comment here or in the README about what behavior a 'dummy' expert entails.

Removed dummy expert. Not important for this go around.

Adamits · 2024-09-16T17:28:09Z

Thanks for these changes + cleaning this stuff up!!

Summary: I fixed maxwell so TQDM isn't a property of the SED parameters anymore, this allows pickling of the expert module again and thus allows multigpu training across the transducer.

Awesome!

In progress I changed how the expert module initializes so that it just copies the index vocabulary from the dataloader that's passed to it. So now you just need to pass an index to the action vocabulary and everything is managed in the backend. This allows free initialization of expert modules from checkpoints and thus skips epochs of em when resuming from a checkpointing. (I just do the same thing we do with indexes in which you write the sed parameters to the experiment directory and load it when initializing the model.)

"is managed in the backend" means in the expert.py code? 2. Are the expert modules deterministic given a dataset? For the index, this is the case and it will be created and written every time you run an experiment I think right? I believe we only load it when it is explicitly asked for in the args (e.g. for inference), though correct me if i'm wrong. So here, I guess if we default to epochs > 0, then we just need to put something in the README that you can re-use the expert by setting epochs = 0 (and then maybe we raise an error if there is not expert where we expect it to be in this case.).

I also moved adams changes from #233 into the trainer so that there's no weird attribute managing going on in the init anymore. ( Turns out that checkpointing pickles that kwargs dict, so simply adding the action vocabulary was creating too large an embedding space.)

Nice, thanks. Yeah I was worried about that but in a big rush trying to make a fix when I made the PR and then promplty disappeared. It looks like you cleaned up some naming, etc as well, so is it safe to assume that you have valdated that those are the correct changes, or should I do more testing?

I ran experiments over the Polish data and was able to write to predictions fairly easily. Only major issue is that we're wasting some parameters on creating target vocabulary embeddings that will never be used. But that's a low bar on the efficiency stack.

How was the accuracy w/ and w/out features?

Signed-off-by: Bonham79 <[email protected]>

bonham79 · 2024-10-14T05:16:56Z

1. "is managed in the backend" means in the expert.py code? 2. Are the expert modules deterministic given a dataset? For the index, this is the case and it will be created and written every time you run an experiment I think right? I believe we only load it when it is explicitly asked for in the args (e.g. for inference), though correct me if i'm wrong. So here, I guess if we default to epochs > 0,  then we just need to put something in the README that you can re-use the expert by setting epochs = 0 (and then maybe we raise an error if there is not expert where we expect it to be in this case.).

Yep, expert.py manages. No dataset is passed to expert anymore. It just reads from the index file.

Indexes are pickled between runs no? So it should be deterministic with new initialization. Regardless, the expert is pickled now so the weights remain same between runs.

Nice, thanks. Yeah I was worried about that but in a big rush trying to make a fix when I made the PR and then promplty disappeared. It looks like you cleaned up some naming, etc as well, so is it safe to assume that you have valdated that those are the correct changes, or should I do more testing?

I've validated but always a fan of triple checking.

How was the accuracy w/ and w/out features?

I forget off hand but they were on par with my usual runs with polish.

Signed-off-by: Bonham79 <[email protected]>

bonham79 added 2 commits September 15, 2024 02:05

adding fixes so transducer can work again

a54518b

Signed-off-by: Bonham79 <[email protected]>

added additional check for dummy expert

6955bfb

Signed-off-by: Bonham79 <[email protected]>

bonham79 requested review from Adamits and kylebgorman September 15, 2024 06:19

removed the fit_from_data flag and just made the check dependent on e…

9e6cded

…pochs > 0 Signed-off-by: Bonham79 <[email protected]>

bonham79 mentioned this pull request Sep 15, 2024

Move EditAction Behavior to Indexes #248

Open

kylebgorman reviewed Sep 16, 2024

View reviewed changes

yoyodyne/models/expert.py Outdated Show resolved Hide resolved

kylebgorman reviewed Sep 16, 2024

View reviewed changes

Adamits reviewed Sep 16, 2024

View reviewed changes

Adamits mentioned this pull request Sep 24, 2024

Cleanup to schedulers #244

Open

merge conflict

7946077

Signed-off-by: Bonham79 <[email protected]>

adding style changes

a9a91ca

Signed-off-by: Bonham79 <[email protected]>

bonham79 requested review from kylebgorman and Adamits October 14, 2024 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding fixes so transducer can work again #247

adding fixes so transducer can work again #247

bonham79 commented Sep 15, 2024

kylebgorman left a comment

kylebgorman Sep 16, 2024

bonham79 Oct 14, 2024

kylebgorman Sep 16, 2024

bonham79 Sep 16, 2024

kylebgorman Sep 16, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits Sep 16, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits Sep 16, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits Sep 16, 2024

bonham79 Oct 14, 2024

Adamits commented Sep 16, 2024

bonham79 commented Oct 14, 2024

		@@ -32,19 +37,24 @@ class ActionVocabulary:
		start_vocab_idx: int
		target_characters: Set[Any]

adding fixes so transducer can work again #247

Are you sure you want to change the base?

adding fixes so transducer can work again #247

Conversation

bonham79 commented Sep 15, 2024

kylebgorman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Adamits commented Sep 16, 2024

bonham79 commented Oct 14, 2024