-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRU support #254
base: master
Are you sure you want to change the base?
GRU support #254
Conversation
No logical changes yet.
Hey Kyle, I just started looking through the PR, but first, it occurs to me that I am unsure "because LSTMs return the cell state in addition to the hidden state, and because various models need to reshape, average, or otherwise manipulate that cell state" is a problem. I didnt look everywhere, but iirc when we manipulate the state, we manipulate the I did not think through what that actually buys us, but I am always pro-anything that reduces the amount of abstraction :D. EDIT: I suppose what I am suggesting here ADDS abstraction. But it may clean things up in the way that I thought you were suggesting in your comment. |
There is at least one case where both I suspect there is some abstraction that would reduce the amount of boilerplate here, but I'm not even sure if it's worth tracing down. If you can think of anything though please go ahead and suggest! |
This adds GRU support; everywhere there is an LSTM model, there is now a GRU model too.
I initially tried to make RNN type a general flag but because LSTMs return the cell state in addition to the hidden state, and because various models need to reshape, average, or otherwise manipulate that cell state, this was really not feasible. Thefore I just create, for each model that was previously "LSTM-backed", an abstract class called
FooRNN{Encoder,Decoder,Model}
.FooLSTM
subclasses this and returns a LSTM module (it may also have special logic in the forward method, or decode method, or whatever), as doesFooGRU
.I experimented with traditional Elman RNNs (they have the same simpler interface as GRUs) but performance was absymal so I'm not going to bother.
All models have been tested on CPU and GPU.
Other changes:
EncoderDecoder
in our naming convention with simply justModel
.Closes #180. (Note however there's still plenty to do to study the effects this has.)