Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient test for binary files in matrix.comp. #1035

Merged

Conversation

ukmo-ccbunney
Copy link
Collaborator

@ukmo-ccbunney ukmo-ccbunney commented Jul 6, 2023

Pull Request Summary

Provides a more efficient test for binary files when running matrix.comp

Description

Currently, matrix.comp recursively checks all files in a work directory using the command

 binaryfiles=`grep . -r * | grep 'Binary file' | sed -e "s/^Binary file //" -e "s/ matches$//"` 

This is very slow as:

  1. It will unnecessarily read the entirety of any text files (. matches "any character")
  2. It recurses into the CMake build* and exe directories which have a lot of files in.

A better solution is to use the standard linux file command to check whether individual files are "plain text" files or binary. This is not only a much quicker and more efficient test, but it will only be called when needed (as apposed on a complete recursive listing of the directory)

I assuming a fairly modern implementation of the file command that accepts the -i or --mime flags for robustness purposes. The test is encapsulated as a function at the top of the matrix.comp script to facilitate modification if needed.

Performance

Some results of running new vs old version of matrix.comp:

  • ww3_tp2.2 (10 work directories):
    • original version: 3m 7s
    • new version: 9s
  • ts1 (11 work directories):
    • original version 2m2s
    • new version: 7s
  • tp2.9 (20 work directories):
    • 6m 9s
    • 20s

Note: When testing performance, it is important to not rerun matix.comp twice on the same regression test directory as the file system will likely have cached the files and the timings will be much quicker the second time.

I make performance comparisons by testing in two separate WW3 regtest trees (and testing the regtests against themselves)

Issue(s) addressed

Fixes #1031

Commit Message

More efficient test for binary files in matrix.comp

Check list

Testing

  • How were these changes tested? Comparison of regression test output; diffs out matrixCompFull.txt are same
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) N/A
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)? N/A
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.) N/A

@ukmo-ccbunney ukmo-ccbunney marked this pull request as draft July 6, 2023 13:15
@ukmo-ccbunney ukmo-ccbunney self-assigned this Jul 6, 2023
@ukmo-ccbunney ukmo-ccbunney added the enhancement New feature or request label Jul 6, 2023
@ukmo-ccbunney ukmo-ccbunney marked this pull request as ready for review July 7, 2023 09:26
@ukmo-ccbunney
Copy link
Collaborator Author

ukmo-ccbunney commented Jul 7, 2023

As part of this PR, I wonder if others would also agree to the removal of the "do not compare build or exe directory" messages to STDOUT?

 testing case: ww3_tp2.9; run: ./work_PR2_UQ_a_MPI
do not compare build or exe directories build
do not compare build or exe directories build_SHRD
do not compare build or exe directories exe
do not compare build or exe directories build
do not compare build or exe directories build_SHRD
do not compare build or exe directories exe

these get printed multiple times for each test and dominate the output to the screen.

@JessicaMeixner-NOAA
Copy link
Collaborator

As part of this PR, I wonder if others would also agree to the removal of the "do not compare build or exe directory" messages to STDOUT?

 testing case: ww3_tp2.9; run: ./work_PR2_UQ_a_MPI
do not compare build or exe directories build
do not compare build or exe directories build_SHRD
do not compare build or exe directories exe
do not compare build or exe directories build
do not compare build or exe directories build_SHRD
do not compare build or exe directories exe

these get printed multiple times for each test and dominate the output to the screen.

@ukmo-ccbunney I'm okay with that

@MatthewMasarik-NOAA
Copy link
Collaborator

I'm also OK with removing the repeated prints of "do not compare build or exe directory".

@ukmo-ccbunney
Copy link
Collaborator Author

Timing results from running matrix.comp on a full regression test matrix:

  • old code: 272m (~4.5 hours)
  • new code 63m

So - not as fast a speedup as the results for individual regtests show above (not sure why?), but still a significant improvement!

@ukmo-ccbunney
Copy link
Collaborator Author

I'm also OK with removing the repeated prints of "do not compare build or exe directory".

I've pushed a commit to remove this message.

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @ukmo-ccbunney ! I am running tests to do comparisons between the old and the new on our machines. Those should be finished soon and then I can submit the two matrix.comps for comparisons. I'll let you know how it goes!

Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improved the speed from 80min to 53 min on hera for me. Thank you for this speed up @ukmo-ccbunney !

@JessicaMeixner-NOAA JessicaMeixner-NOAA merged commit 2514633 into NOAA-EMC:develop Jul 7, 2023
2 of 4 checks passed
@ukmo-ccbunney
Copy link
Collaborator Author

This improved the speed from 80min to 53 min on hera for me. Thank you for this speed up @ukmo-ccbunney !

I'm impressed they were running in 80 minutes before! I think the regtests really stress the filesystem on our HPC... it takes over 4.5 hours for me to run them!

@JessicaMeixner-NOAA
Copy link
Collaborator

@ukmo-ccbunney well, i just had a set of matrix.comp take 4.5 hours.... with the updates, so .... I wonder if we need to rethink what we're comparing and how we're comparing it to relieve some of the stress. 4.5 hours is crazy, but I get that once in a blue moon these days. I usually have the tests get stuck on the ufs tests at the end if something does take a long time.

@ukmo-ccbunney
Copy link
Collaborator Author

@ukmo-ccbunney well, i just had a set of matrix.comp take 4.5 hours.... with the updates, so .... I wonder if we need to rethink what we're comparing and how we're comparing it to relieve some of the stress. 4.5 hours is crazy, but I get that once in a blue moon these days. I usually have the tests get stuck on the ufs tests at the end if something does take a long time.

Wow. 4.5 hours with this update? I wonder if that is just some spurious filesystem contention that you are experiencing.
This update has certainly helped speed thiings up a lot on my HPC, but every system is different!
I think it is definitely high time we looked at the matrix.comp script and what/how it is comparing.

miguelsolanocordoba added a commit to wavespotter/WW3 that referenced this pull request Apr 19, 2024
* Bugfix - initialised VD and VS to zero in w3srcemd. (NOAA-EMC#1037)

* More efficient test for binary files in matrix.comp (NOAA-EMC#1035)

* Tidy up of pre-processor directives and unused variables in w3srcemd.F90 (NOAA-EMC#1010)

* Correct typo in w3srcemd.F90 pre-processor directive. (NOAA-EMC#1039)

* minor bugfix for matrix grepping on keywords (NOAA-EMC#1049)

* Stop masking group 1 output where icec > icen (NOAA-EMC#1019)

* Doxygen documentation added, 8th subset.(NOAA-EMC#1046)

* NC4 ,F90 ,XX0 switches removed from ww3_tp2.19 regtest (NOAA-EMC#1054)

* CI:  Fix for Intel scripts. GNU scripts updated. (NOAA-EMC#1064)

* correct the computation of QP parameter, add QKK output parameter, change UST scale factor (NOAA-EMC#1050)

* correct issue with ww3_multi when requesting restart2 and using nml file instead of inp file (NOAA-EMC#1070)

* correct calendar for track netcdf output (NOAA-EMC#1079)

* Fix missing mod_def.ww3 file in multigrid regression tests for track output (NOAA-EMC#1091)

* STAB3: fix cmake build for ST4 or ST3 (NOAA-EMC#1086)

* new feature to output out_grd.ww3, out_pnt.ww3 and mod_def.ww3 both in binary and ascii format using switch ASCII. (NOAA-EMC#1089)

* Update local unit number arrays (NDS, MDS) to be same size of array defined in w3odatmd (size=15). Also, defined unit numbers for NDS(14) and NDS(15). (NOAA-EMC#1098)

* Removed code referencing PHIOC in output section for PHICE in ww3_ounf (NOAA-EMC#1093)

* implementation of the GQM (Gaussian Quadrature Method) to replace the DIA in NL1 or NL2. (NOAA-EMC#1083)

* update logic to ensure you are not accessing uninitialized dates (NOAA-EMC#1114)

* Initialised S and D arrays in W3SDB1 before potential early return if zero energy. (NOAA-EMC#1115)

* ww3_ounp.F90:  x/y units attribute corrected from 'm' to 'km' (NOAA-EMC#1088)

* Bugfix: Assign unit numbers to ASCII gridded/point output in multi-grid mode. (NOAA-EMC#1118)

* correct bugs to run correctly GQM implementation (NOAA-EMC#1127)

* Adding documentation to w3iopo() in preparation for code for NOAA-EMC#682. (NOAA-EMC#1131)

* NCEP regtest module updates: uses spack-stack/1.5.0, includes scotch/7.0.4 (NOAA-EMC#1137)

* Minor update to ncep regtests (NOAA-EMC#1138)

* Updated intel workflow to install oneapi compilers from new location. (NOAA-EMC#1157)

* Add unit test for points I/O code. (NOAA-EMC#1158)

* Update Intel CI (relocate /usr/local; ensure intel-oneapi-mpi; use ubuntu-latest) (NOAA-EMC#1161)

* remove lookup table for ST4 to speed up computation and clean up the ST4 code (NOAA-EMC#1124)

Co-authored-by: Fabrice Ardhuin <[email protected]>

* initialize USSP_WN for mod_def (NOAA-EMC#1165)

* Introduce IC4M8 and IC4M9 to WW3 (NOAA-EMC#1176)

* clean up and add ST4 variables (NOAA-EMC#1181)

* w3fld1md.F90: fix divide by zero in CRIT2 parameter (NOAA-EMC#1184)

* ww3_prnc.F90: fix out-of-scope grid index write statement (NOAA-EMC#1185)

* Bugfix: address potential divide-by-zero in APPENDTAIL (NOAA-EMC#1188)

Co-authored-by: Denise Worthen <[email protected]>

* Provide initial drying of cells with depth < ZLIM for SMC grid. (NOAA-EMC#1192)

* Output OMP threading info to screen when running ww3_shel/ww3_multi compiled with the OMPG switch. Also fixes truncation of build.log when running run_cmake_build. (NOAA-EMC#1191)

* Added screen output showing number of threads when OMP enabled.

* update build to get more info in logs (NOAA-EMC#46)

---------

Co-authored-by: Jessica Meixner <[email protected]>

* update run_cmake_test to catch build errors and exit (NOAA-EMC#1194)

* fix merge conflicts

* Fix gustiness bug, as suggst by Pieter

* Change USTARsigma to WAM implementation

---------

Co-authored-by: Chris Bunney <[email protected]>
Co-authored-by: Mickael Accensi <[email protected]>
Co-authored-by: Benoit Pouliot <[email protected]>
Co-authored-by: Matthew Masarik <[email protected]>
Co-authored-by: Ghazal-Mohammadpour <[email protected]>
Co-authored-by: Jessica Meixner <[email protected]>
Co-authored-by: Biao Zhao <[email protected]>
Co-authored-by: Edward Hartnett <[email protected]>
Co-authored-by: Alex Richert <[email protected]>
Co-authored-by: Fabrice Ardhuin <[email protected]>
Co-authored-by: W. Erick Rogers <[email protected]>
Co-authored-by: Denise Worthen <[email protected]>
Co-authored-by: Camille Teicheira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

matrix.comp very slow
3 participants