Reporting restructure #254

scarere · 2024-10-10T21:28:30Z

PR Type

Feature

Short Description

Clickup Ticket(s): Add Base Reporter Class

This PR restructures and unifies reporting. What was previously the metrics reporter is now JsonReporter. Both the JsonReporter and WandBReporter inherit from the same base class.

Servers and clients now accept any number of reporters and send data to them frequently (During setup, at the end of each round, each epoch and each batch). It is up to the reporter classes internal logic to determine how frequently to report data.

There is no longer any default reporters. Users will not have to pass None to the wandb reporter argument of clients and servers in order to turn wandb reporting off. Additionally if users want to generate the metrics json file frequently used for pytests, they must explicitly pass a JsonReporter to their server or client instance. Creating a JsonReporter instance with all its default arguments exactly mimics the behaviour of what was previously the metrics reporter (Except some of the keys have been changed). There is no longer any need to call dump to write the metrics, this has been moved into the server/client shutdown.

I also upgraded the wandb dependency.

I think there is some work to be done in terms of deciding how to structure what metrics and what to use for the key names. It be great if there was a bit of a more predictable pattern.

Tests Added

Had to modify a few tests to reflect the modified structure of the output metrics json. Also found two issues with the tests.

The MKMMD loss tests only work if your machine does not have a GPU. My temporary fix for this was to set devide to CPU. They still run in a few seconds. If GPU acceleration is needed then someone needs to go through and debug them. Created a ticket here

Tests that made use of both freeze_time and patch could not be run individually in some instances. It seems like somehow their imports were reliant on other tests in the test suite. This makes debugging tests a bit annoying because you have to run the entire test suite just to check whether or not you fixed one specific test that was failing. I created a ticket here

fl4health/clients/ditto_client.py

fl4health/clients/evaluate_client.py

fl4health/reporting/base_reporter.py

fl4health/reporting/json_reporter.py

fl4health/server/evaluate_server.py

tests/server/test_base_server.py

jewelltaylor

All in all, the changes look great! I have a few minor suggestions, feel free to reply if any of them don't make sense. I will take one more pass through once you fix the merge conflicts and have a chance to go through the PR again. Then we can get this merged in early this week

for more information, see https://pre-commit.ci

emersodb · 2024-10-15T20:02:48Z

I'm not sure why these aren't failing the pre-commit checks, but it looks like there are a few places where the MetricsReporter class still exists on this branch. Namely:

fl4health/clients/model_merge_client.py
fl4health/server/model_merge_server.py
fl4health/server/adaptive_constraint_servers/ditto_server.py
fl4health/server/adaptive_constraint_servers/mrmtl_server.py
It also appears to be mentioned in the comments of
tests/smoke_tests/run_smoke_test.py

emersodb

Overall, really like the changes. Just a few suggestions and questions associated with the various components.

fl4health/reporting/__init__.py

emersodb · 2024-10-15T16:59:15Z

fl4health/reporting/wandb_reporter.py

+
+
+class WandBReporter(BaseReporter):
+    def __init__(self, step_type: StepType | str = StepType.ROUND, **kwargs: Any) -> None:


In general, I'm not a huge fan of relying on kwargs. I know it adds some flexibility with adding additional options and configurations without limiting what you can do. The reason is that you need a deeper understanding of the code to use them because the arguments are sort of hidden away somewhere in docs or code.

I imagine W and B uses kwargs. So I understand the use here, but if there are arguments that we know we want like "dir" or run_id, I'd rather that we make them explicit optional arguments here.

In general I agree but in this instance I thought it'd be fine since all the kwargs go straight to the wandb.init function detailed here. I want to expose all the arguments but didn't want to unnecessarily clutter the wandbreporter init method

Yeah I think you're right. Perhaps we can just limit ourselves to the ones we absolutely expect to be there and the rest can be in kwargs. Perhaps we can also put a note in the documentation to where a user might find the args for the WandB object?

emersodb · 2024-10-15T17:01:13Z

examples/fedprox_example/server.py

@@ -78,9 +78,12 @@ def main(config: Dict[str, Any], server_address: str) -> None:
        loss_weight_patience=config["proximal_weight_patience"],
    )

-    wandb_reporter = ServerWandBReporter.from_config(config)
+    wandb_reporter = WandBReporter("round", **config["reporting_config"])


I see why we're doing this with the kwargs implementation below. However, I'd recommend we limit the use of freeform arguments from unrolled dictionaries as much as possible if we can. It makes it harder for a user to know what to send along.

fl4health/clients/basic_client.py

fl4health/clients/evaluate_client.py

emersodb · 2024-10-15T19:36:01Z

fl4health/server/base_server.py

+        for r in self.reporters:
+            r.initialize(id=self.server_name)
+
+    def report_centralized_eval(self, history: History, num_rounds: int) -> None:


This is perhaps something that might go into the ReporterManager as well along with the other state tracking that's being done below

fl4health/server/base_server.py

tests/losses/test_mkmmd_loss.py

emersodb · 2024-10-15T20:13:36Z

tests/test_utils/assert_metrics_dict.py

+DEFAULT_TOLERANCE = 0.0005
+
+
+def assert_metrics_dict(metrics_to_assert: dict[str, Any], metrics_saved: dict[str, Any]) -> list[str]:


Maybe add a docstring here to help orient the user on what this is doing etc?

emersodb · 2024-10-15T20:14:34Z

tests/test_utils/assert_metrics_dict.py

+def assert_metrics_dict(metrics_to_assert: dict[str, Any], metrics_saved: dict[str, Any]) -> list[str]:
+    errors = []
+
+    def _assert(value: Any, saved_value: Any) -> Optional[str]:


Any reason to make this an inline function rather than popping it out of here into it's own separate function above this method?

for more information, see https://pre-commit.ci

…ults are on github before making additional changes.

…ent.

Reporting restructure

ea175da

scarere requested review from jewelltaylor and emersodb October 10, 2024 21:28