Precompute DPO logprobs #213

natolambert · 2024-07-30T22:11:39Z

To save compute.
Another hard issue :)

natolambert · 2024-08-01T19:41:04Z

A sketch of how this could work.

Add an option in dpo_tune where instead of using concatenated_forward, we run just forward for each with an optional save of the logprobs.

Line 568 in 42c1fa3

for epoch in range(starting_epoch, args.num_train_epochs):

Then, you iterate over batches and compute loss and update the model.

Optional: logic to move one model into cuda at a time. Shouldn't be too hard.

hamishivi · 2024-08-01T20:03:15Z

Yeah, sounds about right. It's very easy to implement, I did it in EasyLM (although sharding issues mean its broken), but the logic should be right: https://github.com/hamishivi/EasyLM/blob/main/EasyLM/models/llama/llama_train_dpo.py#L372-L400

Provide feedback