-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding fixes so transducer can work again #247
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Bonham79 <[email protected]>
Signed-off-by: Bonham79 <[email protected]>
…pochs > 0 Signed-off-by: Bonham79 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but some questions and style notes. Thanks Travis.
actions.encode_actions(source) | ||
|
||
actions = ActionVocabulary(unk_idx=train_data.index.unk_idx) | ||
assert data.has_target, """Passed dataset with no target to expert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are exceptions elsewhere, why is this an assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a sanity check thing to me. Makes more sense to write a single assert line than defining an error class, doing a coinditional check, then raising. Just an economy thing I guess.
@@ -557,11 +560,19 @@ def predict_step(self, batch: Tuple[torch.tensor], batch_idx: int) -> Dict: | |||
|
|||
def convert_prediction(self, prediction: List[List[int]]) -> torch.Tensor: | |||
"""Converts prediction values to tensor for evaluator compatibility.""" | |||
# FIXME: the two steps below may be partially redundant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the "two steps" referred to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's from Adam's PR, no comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, let's remove it then since we don't know what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is copied from #233
iirc I meant that looping and stacking predictions, and calling util.pad_tensor_after_eos
may do some redundant things that could be cleaned up at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we leave the comment to clean up later or just remove now?
yoyodyne/models/expert.py
Outdated
@@ -32,19 +37,24 @@ class ActionVocabulary: | |||
start_vocab_idx: int | |||
target_characters: Set[Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Any
and not str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the edit actions can accept any hashable symbol. So strings, ints, even tuples. So any is a better representation of its coverage given that maxwell is also symbol agnostic.
self.target_characters = set() | ||
self.encode_actions([unk_idx]) # Sets unknown character decoding. | ||
# Use index from dataset to create action vocabulary. | ||
self.encode_actions([index(t) for t in index.target_vocabulary]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we start an issue to document exactly what's going on here? encode_actions
converts vocab into Action
s and stored them in a separate vocabulary, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it nees to track all potential edit actions for a given symbol.
yoyodyne/models/expert.py
Outdated
SED training over the default data sampling is expensive. | ||
Training is quicker if tensors are converted to lists. | ||
For efficiency, we encode action vocabulary simultaneously. | ||
We want just the encodings without BOS or EOS tokens. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a more general comment here before this for the unfamiliar user: this basically converts the dataset into a format that is usable by the Expert -- is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just want the raw symbols string. We don't want to worry about BOS and EOS being encoded by maxwell.
What comment would make sense to you? (I'm too familiar for a good user friendly doc string.)
@@ -557,11 +560,19 @@ def predict_step(self, batch: Tuple[torch.tensor], batch_idx: int) -> Dict: | |||
|
|||
def convert_prediction(self, prediction: List[List[int]]) -> torch.Tensor: | |||
"""Converts prediction values to tensor for evaluator compatibility.""" | |||
# FIXME: the two steps below may be partially redundant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is copied from #233
iirc I meant that looping and stacking predictions, and calling util.pad_tensor_after_eos
may do some redundant things that could be cleaned up at some point.
@@ -273,8 +277,16 @@ def get_model_from_argparse_args( | |||
source_attention_heads=args.source_attention_heads, | |||
source_encoder_cls=source_encoder_cls, | |||
start_idx=datamodule.index.start_idx, | |||
target_vocab_size=datamodule.index.target_vocab_size, | |||
vocab_size=datamodule.index.vocab_size, | |||
target_vocab_size=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now we orchestrate the vocab sizes once in the trainer, which already has a handle on the initialized expert? This is much nicer than w/e I was trying to do before.
It still somehow feels clunky I think, but that is an effect of the single embeddings matrix updates not easily aligning to the expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's still a bit clunky but at least the clunkiness is occuring in the train script (which is localized clumsiness). Probably a good next issue is to break up the train script some more so it's less of a monolith.
yoyodyne/models/expert.py
Outdated
) | ||
sed_aligner.params.write_params(sed_params_path) | ||
else: | ||
sed_params = sed.ParamDict.read_params(sed_params_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding correctly that in order to use an existing sed, the user would specify 0
or None
for oracle_em_epochs
? Avoiding rerunning em every time is a great feature, but I wonder if we can think through a more intuitive user interface for it, this feels a bit buried. Maybe you've already put some thought into this though so please let me know if you think this is already a good interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I see the comment in the method header. I missed it before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a bool to the train script that's triggered if a sed file already exists. I think this is a bit cleaner. Thoughts?
yoyodyne/models/expert.py
Outdated
sed_params_path (str): path to read/write location of sed parameters. | ||
If epochs > 0, this is a write path. | ||
If epochs == 0, this is a read path. | ||
If empty string then creates 'dummy' expert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here or in the README about what behavior a 'dummy' expert entails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed dummy expert. Not important for this go around.
Thanks for these changes + cleaning this stuff up!!
Awesome!
Nice, thanks. Yeah I was worried about that but in a big rush trying to make a fix when I made the PR and then promplty disappeared. It looks like you cleaned up some naming, etc as well, so is it safe to assume that you have valdated that those are the correct changes, or should I do more testing?
How was the accuracy w/ and w/out features? |
Signed-off-by: Bonham79 <[email protected]>
Yep, expert.py manages. No dataset is passed to expert anymore. It just reads from the index file. Indexes are pickled between runs no? So it should be deterministic with new initialization. Regardless, the expert is pickled now so the weights remain same between runs.
I've validated but always a fan of triple checking.
I forget off hand but they were on par with my usual runs with polish. |
Signed-off-by: Bonham79 <[email protected]>
Merges: #233 #197
Fixes: #192 #191
Dependent on: CUNY-CL/maxwell#17
Summary: I fixed maxwell so TQDM isn't a property of the SED parameters anymore, this allows pickling of the expert module again and thus allows multigpu training across the transducer.
In progress I changed how the expert module initializes so that it just copies the index vocabulary from the dataloader that's passed to it. So now you just need to pass an index to the action vocabulary and everything is managed in the backend. This allows free initialization of expert modules from checkpoints and thus skips epochs of em when resuming from a checkpointing. (I just do the same thing we do with indexes in which you write the sed parameters to the experiment directory and load it when initializing the model.)
I also added new flags so that you can just skip em training for the expert. This is for an upcoming change in which the transducer is no longer dependent on having an oracle function. (I've found through training that the SED actually doesn't add that much to training.) It also allows the creation of
dummy
experts that just hold the action vocabulary. Added error checks to prevent unsafe use. Feel free to point out more.I also moved adams changes from #233 into the trainer so that there's no weird attribute managing going on in the init anymore. ( Turns out that checkpointing pickles that kwargs dict, so simply adding the action vocabulary was creating too large an embedding space.)
I ran experiments over the Polish data and was able to write to predictions fairly easily. Only major issue is that we're wasting some parameters on creating target vocabulary embeddings that will never be used. But that's a low bar on the efficiency stack.