-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add comm part to et_replay without lazy allocation of tensors #178
Conversation
@TaekyungHeo and Songyant, thanks for putting this together in a very short period of time. Overall looks good, I left some inline comments. Please check. |
@shengfukevin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: add comm part to et_replay without lazy allocation of tensors Test Plan: run resnet 2 gpu trace with both compute and comms Differential Revision: D62052138 Pulled By: shengfukevin
dc43809
to
aff5bc1
Compare
@shengfukevin has updated the pull request. You must reimport the pull request before landing. |
This pull request was exported from Phabricator. Differential Revision: D62052138 |
@TaekyungHeo and Songyant, I have done some initial tests, both resnet 1 gpu and 2 gpu failed in et_replay:generate_io_tensors, looks like input_tensors are none. The following are what I ran: mpirun -np 1 et_replay --input param_bench/fb/integration_tests/resnet_1gpu_et.json mpirun -np 2 et_replay --trace-path param_bench/fb/integration_tests/resnet-2gpu/ A minor issue: for single gpu run, I still need to use mpirun, otherwise the comm initialization will hang, it looks like the global rank info is wrong. It would be better if we can run et_replay without mpi_run for single gpu. Thanks |
Summary: add comm part to et_replay without lazy allocation of tensors Test Plan: run resnet 2 gpu trace with both compute and comms Differential Revision: D62052138 Pulled By: shengfukevin
aff5bc1
to
6f9041b
Compare
@shengfukevin has updated the pull request. You must reimport the pull request before landing. |
This pull request was exported from Phabricator. Differential Revision: D62052138 |
Hi Songyan, please check my inline comments. Thanks |
Summary: add comm part to et_replay without lazy allocation of tensors Test Plan: run resnet 2 gpu trace with both compute and comms Differential Revision: D62052138 Pulled By: shengfukevin
6f9041b
to
4473728
Compare
@shengfukevin has updated the pull request. You must reimport the pull request before landing. |
This pull request was exported from Phabricator. Differential Revision: D62052138 |
Summary: add comm part to et_replay without lazy allocation of tensors Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 ../buck-out/v2/gen/fbcode/6ef5f323b6193f0f/param_bench/et_replay/__comm_replay__/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu/ ../buck-out/v2/gen/fbcode/009ebbab256a7e75/param_bench/et_replay/__et_replay__/et_replay.par --input param_bench/fb/integration_tests/resnet_1gpu_et.json Reviewed By: sanrise Differential Revision: D62052138 Pulled By: shengfukevin
4473728
to
690575d
Compare
@shengfukevin has updated the pull request. You must reimport the pull request before landing. |
This pull request was exported from Phabricator. Differential Revision: D62052138 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D62052138 |
Summary: add comm part to et_replay without lazy allocation of tensors Pull Request resolved: #178 Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 ../buck-out/v2/gen/fbcode/6ef5f323b6193f0f/param_bench/et_replay/__comm_replay__/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu/ ../buck-out/v2/gen/fbcode/009ebbab256a7e75/param_bench/et_replay/__et_replay__/et_replay.par --input param_bench/fb/integration_tests/resnet_1gpu_et.json Reviewed By: sanrise Differential Revision: D62052138 Pulled By: shengfukevin
690575d
to
2196fd3
Compare
This pull request was exported from Phabricator. Differential Revision: D62052138 |
Summary: add comm part to et_replay without lazy allocation of tensors Pull Request resolved: #178 Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 ../buck-out/v2/gen/fbcode/6ef5f323b6193f0f/param_bench/et_replay/__comm_replay__/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu/ ../buck-out/v2/gen/fbcode/009ebbab256a7e75/param_bench/et_replay/__et_replay__/et_replay.par --input param_bench/fb/integration_tests/resnet_1gpu_et.json Reviewed By: sanrise Differential Revision: D62052138 Pulled By: shengfukevin
2196fd3
to
f49f8f7
Compare
Summary: add comm part to et_replay without lazy allocation of tensors Pull Request resolved: #178 Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 ../buck-out/v2/gen/fbcode/6ef5f323b6193f0f/param_bench/et_replay/__comm_replay__/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu/ ../buck-out/v2/gen/fbcode/009ebbab256a7e75/param_bench/et_replay/__et_replay__/et_replay.par --input param_bench/fb/integration_tests/resnet_1gpu_et.json Reviewed By: sanrise Differential Revision: D62052138 Pulled By: shengfukevin
This pull request was exported from Phabricator. Differential Revision: D62052138 |
f49f8f7
to
b329f52
Compare
@shengfukevin merged this pull request in d6e4dfd. |
Summary
add comm part to et_replay without lazy allocation of tensors
Test plan
run resnet 2 gpu trace with both compute and comms