You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue when running multi-gpu training with edit action transducer:
Traceback (most recent call last):
File "/home/salamander/anaconda3/envs/sigmorphon2024/bin/yoyodyne-train", line 8, in <module>
sys.exit(main())
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 390, in main
model = get_model_from_argparse_args(args, datamodule)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 214, in get_model_from_argparse_args
return model_cls(
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/transducer.py", line 43, in __init__
super().__init__(*args, **kwargs)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/lstm.py", line 36, in __init__
super().__init__(*args, **kwargs)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/base.py", line 155, in __init__
self.save_hyperparameters(
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/core/mixins/hparams_mixin.py", line 110, in save_hyperparameters
save_hyperparameters(self, *args, ignore=ignore, frame=frame)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py", line 275, in save_hyperparameters
obj._hparams_initial = copy.deepcopy(obj._hparams)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 161, in deepcopy
rv = reductor(4)
TypeError: cannot pickle '_io.TextIOWrapper' object
Exception ignored in: <function tqdm.__del__ at 0x7f96d86a6290>
Traceback (most recent call last):
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
self.close()
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1267, in close
if self.disable:
AttributeError: 'tqdm' object has no attribute 'disable'
From what I gather, the TQDM class within the expert module can't be pickled to distribute across multiple GPUs. This is fixed by adding expert to the ignore function when saving hyperparameters, but wanted to get feedback if there was a less 'hacky' way to deal with it.
When something doesn't pickle yet you usually can just give it the necessary methods, but I don't want to hack into TQDM so I think the hacky solution is fine.
Issue when running multi-gpu training with edit action transducer:
From what I gather, the TQDM class within the
expert
module can't be pickled to distribute across multiple GPUs. This is fixed by addingexpert
to theignore
function when saving hyperparameters, but wanted to get feedback if there was a less 'hacky' way to deal with it.@kylebgorman thoughts?
The text was updated successfully, but these errors were encountered: