Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Re] When Does Label Smoothing Help? #75

Open
sdwagner opened this issue Aug 30, 2023 · 22 comments
Open

[Re] When Does Label Smoothing Help? #75

sdwagner opened this issue Aug 30, 2023 · 22 comments

Comments

@sdwagner
Copy link

Original article: Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. "When does label smoothing help?." Advances in neural information processing systems 32 (2019). (https://arxiv.org/pdf/1906.02629.pdf)

PDF URL: https://github.com/sdwagner/re-labelsmoothing/blob/main/report/article.pdf
Metadata URL: https://github.com/sdwagner/re-labelsmoothing/blob/main/report/metadata.yaml
Code URL: https://github.com/sdwagner/re-labelsmoothing

Scientific domain: Machine Learning
Programming language: Python
Suggested editor: Georgios Detorakis or Koustuv Sinha

@rougier
Copy link
Member

rougier commented Sep 11, 2023

Thanks for your submission, we'll assign an editor soon.

@rougier
Copy link
Member

rougier commented Sep 11, 2023

@koustuvsinha @gdetor Can any of you two edit this submission in machine learning?

@gdetor
Copy link

gdetor commented Sep 15, 2023

@rougier I can handle this.

@rougier
Copy link
Member

rougier commented Oct 12, 2023

@gdetor Thanks!

@tuelwer
Copy link

tuelwer commented Apr 3, 2024

@gdetor thank you for agreeing to handle this submission! Is there anything we can do to move this submission forward?

@gdetor
Copy link

gdetor commented Apr 3, 2024

@tuelwer Sorry for the delay.

Hi @ogrisel and @benureau Could you please review this submission?

@rougier
Copy link
Member

rougier commented Apr 29, 2024

Any update?

@gdetor
Copy link

gdetor commented May 8, 2024

Dear reviewers @ReScience/reviewers Could anybody review this submission?

@mo-arvan
Copy link

mo-arvan commented May 8, 2024

I'd be interested in reviewing this submission, but I have to mention, I doubt I can rerun all the experiments due to computational constraints.

@rougier
Copy link
Member

rougier commented May 27, 2024

@mo-arvan Thanks and I think not re-doing everything is fine. @gdetor What do you think?

@gdetor
Copy link

gdetor commented May 27, 2024

@rougier @mo-arvan I'm OK with it.

@mo-arvan
Copy link

Okay, I will review this work by the end of July.

@mo-arvan
Copy link

mo-arvan commented Aug 2, 2024

I apologize, but I have not been able to review this submission yet, should be able to write the review within the next few weeks.

@rougier
Copy link
Member

rougier commented Sep 2, 2024

Thanks. Any progress?

@gdetor
Copy link

gdetor commented Oct 1, 2024

@mo-arvan gentle reminder

@mo-arvan
Copy link

In this paper, Wagner et al. provide a reproduction report of Müller et al.'s work on label smoothing. They begin with a concise introduction to the original study and the motivations behind it. The authors then present essential details regarding the models and datasets used, noting specific variations driven by limited computational resources.

The authors have done an excellent job of providing documentation and instructions for using their released code. Their repository includes multiple Jupyter notebooks detailing the conducted experiments, along with specified dependency requirements to facilitate the setup process. To further simplify future installations, I created a Docker container as part of the review process. The files and instructions are available in my forked repository.

In their initial results, the authors examine the effect of label smoothing on model accuracy. While Müller et al. claimed that label smoothing positively impacts the test accuracy of trained models, Wagner et al. suggest that it enhances accuracy by reducing overfitting—a claim not made by the original authors. However, their results indicate mixed effects; out of eight experiments, three showed higher accuracy without label smoothing. Upon reviewing their code (https://github.com/sdwagner/re-labelsmoothing/blob/fb6c3634d2049ef7f175e7a992f109c43680fae3/datasets/datasets.py), it appears that they do not load the test set, raising the possibility that the reported results are based on the validation set. Unlike the original study, this reproduction does not include confidence intervals, and the small differences in accuracy could be attributed to randomness in the training process. Adding uncertainty analysis would significantly strengthen this work.

In the next section, the authors reproduce the results of a visualization experiment from the original study that demonstrates the effect of label smoothing on the activations of the penultimate layer and the network output. Figure 2 in their work aligns with the findings of the original study, although there is a minor discrepancy in the order of the columns in the visualization.

The authors then investigate the impact of label smoothing on Expected Calibration Error (ECE). With the exception of the results from the MNIST dataset using a fully connected network, their findings generally align with those of the original study. The reported results for training a transformer model for translation are mixed, with not all findings matching the original study. Similar to the accuracy results, the authors report findings based on the validation set, which may account for some discrepancies.

Finally, the results of the distillation experiments on fully connected networks for MNIST are consistent with the original study, though there is a slight increase in error. Ultimately, the authors confirm the observation made by Müller et al. regarding accuracy degradation in students when the teacher is trained with label smoothing. Figure 7 and 8 lack the confidence intervals present in the original study, which would have been beneficial for comparison.

Minor editing suggestions:
"The authors state, that the logit dependents on the Euclidean distance" -> "The authors state that the logit depends on the Euclidean distance"
"The evaluation was performed using the ECE" -> ECE should be spelled out on first use.

@gdetor
Copy link

gdetor commented Oct 10, 2024

@mo-arvan Thank you for your report.
@tuelwer @sdwagner Could you please respond to the reviewer's comments?

@tuelwer
Copy link

tuelwer commented Oct 10, 2024

@mo-arvan Thank you for reviewing our submission and you thoughtful and detailed comments!
@gdetor We will update our submission in the next days to incorporate the reviewer's comments.

@tuelwer
Copy link

tuelwer commented Oct 11, 2024

@mo-arvan Thanks for creating a dockerfile! Feel free to open a PR to integrate it into our repository 😊

@mo-arvan
Copy link

Glad you find it useful. Sure, I'll submit a pull request. I'd be happy to engage in a discussion as well.

One last minor comment, your use of vector graphics in your figures is a step up from the original publication, I'd suggest changing the color palette and the patterns to further improve the presentation of the figures, e.g. Figure 3 (b).

@tuelwer
Copy link

tuelwer commented Oct 14, 2024

@mo-arvan Thanks again for your detailed comments! In the following we want to address each of the points that you raised:

  1. Confusion validation and test data: We carefully double-checked our datasets and can confirm that all experiments were performed on the test split of each dataset:

    • For the datasets implemented by PyTorch, we set train=False, which corresponds to the test split (please refer to, e.g., here).
    • For the CUB-200-2011 data we use the test split of the dataset which is defined in the file train_test_split.txt. The CUB-200-2011 dataset does not have a validation set.
    • For the Tiny ImageNet we use the split that is defined as validation split. We assume that the authors of the paper did this as well, since the test data split of the Tiny ImageNet dataset is not labeled.
      We apologize for the confusion, and we have refactored the code accordingly.
  2. Uncertainty quantification: We added confidence intervals for Figure 6 and 7.

  3. Color palette: We have chosen the colors that were used in the original work to allow easy comparison of the experimental results.

  4. Edits: We have incorporated the proposed changes into our report.

@gdetor
Copy link

gdetor commented Oct 16, 2024

@mo-arvan Please let me know if you agree with the responses so I can publish the paper. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants