blazefl.contrib.FedAvgParallelClientTrainer#

class blazefl.contrib.FedAvgParallelClientTrainer(model_selector: ModelSelector, model_name: str, share_dir: Path, state_dir: Path, dataset: PartitionedDataset, device: str, num_clients: int, epochs: int, batch_size: int, lr: float, seed: int, num_parallels: int)[source]#

Bases: ParallelClientTrainer[FedAvgUplinkPackage, FedAvgDownlinkPackage, FedAvgDiskSharedData]

Parallel client trainer for the Federated Averaging (FedAvg) algorithm.

This trainer handles the parallelized training and evaluation of local models across multiple clients, distributing tasks to different processes or devices.

model_selector#

Selector for initializing the local model.

Type:

ModelSelector

model_name#

Name of the model to be used.

Type:

str

share_dir#

Directory to store shared data files between processes.

Type:

Path

state_dir#

Directory to save random states for reproducibility.

Type:

Path

dataset#

Dataset partitioned across clients.

Type:

PartitionedDataset

device#

Device to run the models on (‘cpu’ or ‘cuda’).

Type:

str

num_clients#

Total number of clients in the federation.

Type:

int

epochs#

Number of local training epochs per client.

Type:

int

batch_size#

Batch size for local training.

Type:

int

lr#

Learning rate for the optimizer.

Type:

float

seed#

Seed for reproducibility.

Type:

int

num_parallels#

Number of parallel processes for training.

Type:

int

device_count#

Number of CUDA devices available (if using GPU).

Type:

int | None

__init__(model_selector: ModelSelector, model_name: str, share_dir: Path, state_dir: Path, dataset: PartitionedDataset, device: str, num_clients: int, epochs: int, batch_size: int, lr: float, seed: int, num_parallels: int) None[source]#

Initialize the FedAvgParalleClientTrainer.

Parameters:
  • model_selector (ModelSelector) – Selector for initializing the local model.

  • model_name (str) – Name of the model to be used.

  • share_dir (Path) – Directory to store shared data files between processes.

  • state_dir (Path) – Directory to save random states for reproducibility.

  • dataset (PartitionedDataset) – Dataset partitioned across clients.

  • device (str) – Device to run the models on (‘cpu’ or ‘cuda’).

  • num_clients (int) – Total number of clients in the federation.

  • epochs (int) – Number of local training epochs per client.

  • batch_size (int) – Batch size for local training.

  • lr (float) – Learning rate for the optimizer.

  • seed (int) – Seed for reproducibility.

  • num_parallels (int) – Number of parallel processes for training.

Methods

__init__(model_selector, model_name, ...)

Initialize the FedAvgParalleClientTrainer.

evaulate(model, test_loader, device)

Evaluate the model for a single client.

get_shared_data(cid, payload)

Generate the shared data for a specific client.

local_process(payload, cid_list)

Manage the parallel processing of clients.

process_client(path)

Process a single client's local training and evaluation.

train(model, model_parameters, train_loader, ...)

Train the model for a single client.

uplink_package()

Retrieve the uplink packages for transmission to the server.

static evaulate(model: Module, test_loader: DataLoader, device: str) tuple[float, float][source]#

Evaluate the model for a single client.

Parameters:
  • model (torch.nn.Module) – The model to evaluate.

  • test_loader (DataLoader) – DataLoader for the evaluation data.

  • device (str) – Device to run the evaluation on.

Returns:

Average loss and accuracy.

Return type:

tuple[float, float]

get_shared_data(cid: int, payload: FedAvgDownlinkPackage) FedAvgDiskSharedData[source]#

Generate the shared data for a specific client.

Parameters:
  • cid (int) – Client ID.

  • payload (FedAvgDownlinkPackage) – Downlink package with global model

  • parameters.

Returns:

Shared data structure for the client.

Return type:

FedAvgDiskSharedData

static process_client(path: Path) Path[source]#

Process a single client’s local training and evaluation.

This method is executed by a parallel process and handles data loading, training, evaluation, and saving results to a shared file.

Parameters:
  • path (Path) – Path to the shared data file containing client-specific

  • information.

Returns:

Path to the file with the processed results.

Return type:

Path

static train(model: Module, model_parameters: Tensor, train_loader: DataLoader, device: str, epochs: int, lr: float) FedAvgUplinkPackage[source]#

Train the model for a single client.

Parameters:
  • model (torch.nn.Module) – The model to train.

  • model_parameters (torch.Tensor) – Initial global model parameters.

  • train_loader (DataLoader) – DataLoader for the training data.

  • device (str) – Device to run the training on.

  • epochs (int) – Number of local training epochs.

  • lr (float) – Learning rate for the optimizer.

Returns:

Uplink package containing updated model parameters and data size.

Return type:

FedAvgUplinkPackage

Retrieve the uplink packages for transmission to the server.

Returns:

A list of uplink packages.

Return type:

list[FedAvgUplinkPackage]