blazefl.contrib.FedAvgParallelClientTrainer#
- class blazefl.contrib.FedAvgParallelClientTrainer(model_selector: ModelSelector, model_name: str, share_dir: Path, state_dir: Path, dataset: PartitionedDataset, device: str, num_clients: int, epochs: int, batch_size: int, lr: float, seed: int, num_parallels: int)[source]#
Bases:
ParallelClientTrainer
[FedAvgUplinkPackage
,FedAvgDownlinkPackage
,FedAvgDiskSharedData
]Parallel client trainer for the Federated Averaging (FedAvg) algorithm.
This trainer handles the parallelized training and evaluation of local models across multiple clients, distributing tasks to different processes or devices.
- model_selector#
Selector for initializing the local model.
- Type:
- model_name#
Name of the model to be used.
- Type:
str
Directory to store shared data files between processes.
- Type:
Path
- state_dir#
Directory to save random states for reproducibility.
- Type:
Path
- dataset#
Dataset partitioned across clients.
- Type:
- device#
Device to run the models on (‘cpu’ or ‘cuda’).
- Type:
str
- num_clients#
Total number of clients in the federation.
- Type:
int
- epochs#
Number of local training epochs per client.
- Type:
int
- batch_size#
Batch size for local training.
- Type:
int
- lr#
Learning rate for the optimizer.
- Type:
float
- seed#
Seed for reproducibility.
- Type:
int
- num_parallels#
Number of parallel processes for training.
- Type:
int
- device_count#
Number of CUDA devices available (if using GPU).
- Type:
int | None
- __init__(model_selector: ModelSelector, model_name: str, share_dir: Path, state_dir: Path, dataset: PartitionedDataset, device: str, num_clients: int, epochs: int, batch_size: int, lr: float, seed: int, num_parallels: int) None [source]#
Initialize the FedAvgParalleClientTrainer.
- Parameters:
model_selector (ModelSelector) – Selector for initializing the local model.
model_name (str) – Name of the model to be used.
share_dir (Path) – Directory to store shared data files between processes.
state_dir (Path) – Directory to save random states for reproducibility.
dataset (PartitionedDataset) – Dataset partitioned across clients.
device (str) – Device to run the models on (‘cpu’ or ‘cuda’).
num_clients (int) – Total number of clients in the federation.
epochs (int) – Number of local training epochs per client.
batch_size (int) – Batch size for local training.
lr (float) – Learning rate for the optimizer.
seed (int) – Seed for reproducibility.
num_parallels (int) – Number of parallel processes for training.
Methods
__init__
(model_selector, model_name, ...)Initialize the FedAvgParalleClientTrainer.
evaulate
(model, test_loader, device)Evaluate the model for a single client.
get_shared_data
(cid, payload)Generate the shared data for a specific client.
local_process
(payload, cid_list)Manage the parallel processing of clients.
process_client
(path)Process a single client's local training and evaluation.
train
(model, model_parameters, train_loader, ...)Train the model for a single client.
Retrieve the uplink packages for transmission to the server.
- static evaulate(model: Module, test_loader: DataLoader, device: str) tuple[float, float] [source]#
Evaluate the model for a single client.
- Parameters:
model (torch.nn.Module) – The model to evaluate.
test_loader (DataLoader) – DataLoader for the evaluation data.
device (str) – Device to run the evaluation on.
- Returns:
Average loss and accuracy.
- Return type:
tuple[float, float]
Generate the shared data for a specific client.
- Parameters:
cid (int) – Client ID.
payload (FedAvgDownlinkPackage) – Downlink package with global model
parameters.
- Returns:
Shared data structure for the client.
- Return type:
FedAvgDiskSharedData
- static process_client(path: Path) Path [source]#
Process a single client’s local training and evaluation.
This method is executed by a parallel process and handles data loading, training, evaluation, and saving results to a shared file.
- Parameters:
path (Path) – Path to the shared data file containing client-specific
information.
- Returns:
Path to the file with the processed results.
- Return type:
Path
- static train(model: Module, model_parameters: Tensor, train_loader: DataLoader, device: str, epochs: int, lr: float) FedAvgUplinkPackage [source]#
Train the model for a single client.
- Parameters:
model (torch.nn.Module) – The model to train.
model_parameters (torch.Tensor) – Initial global model parameters.
train_loader (DataLoader) – DataLoader for the training data.
device (str) – Device to run the training on.
epochs (int) – Number of local training epochs.
lr (float) – Learning rate for the optimizer.
- Returns:
Uplink package containing updated model parameters and data size.
- Return type:
FedAvgUplinkPackage