Pre-Test: federated_learning/fedavg Example for KubeEdge Ianvs (LFX 2025 T3)


# Proposal for LFX Term-3 at KubeEdge

## Comprehensive Example Restoration Proposal for KubeEdge Ianvs

**Issue:** Proposal to fix configuration and runtime failures in CIFAR100-Federated `federated_learning/fedavg` example

**Parent Issue:** #230

**By:** Zishan Ahmad  

**Mentors:** Zimu Zheng, Shijing Hu

## Background

The `CIFAR100-Federated` example is a compact framework on KubeEdge Ianvs, featuring client-side training and server-side FedAvg aggregation on CIFAR-100. It serves as a baseline for onboarding, benchmarking, and prototyping edge ML workflows. However, the example is currently broken and cannot be run due to multiple issues, each causing failures either during configuration or at runtime.

### Critical Issues

1. **Dependency Conflict**
   TensorFlow is used throughout the code but is not listed in any `requirements.txt` file. Additionally, the Ianvs documentation suggests using Python 3.8, but running the example in common environments leads to TensorFlow version dependency errors. Even older versions, such as TensorFlow 2.10.0, trigger protobuf dependency issues.

   <img width="3072" height="562" alt="Image" src="https://github.com/user-attachments/assets/0479d88f-5bdc-4545-ade1-a25a68592394" />

2. **Non-Portable Configuration**
   Hardcoded absolute paths result in "not found" or permission denied errors while running `utils.py` or the Ianvs examples. Even after resolving the path issues, the example still fails to run. Some debugging screenshots are attached:

   <img width="3072" height="1096" alt="Image" src="https://github.com/user-attachments/assets/661652e2-4995-4a7e-866c-9964ff72631f" />  

   <img width="1700" height="406" alt="Image" src="https://github.com/user-attachments/assets/042bee33-9c0c-4083-ae5e-f1cb3c80d112" />  

   <img width="1753" height="880" alt="Image" src="https://github.com/user-attachments/assets/cf6aaee9-c4e9-4ff6-a3df-8da88cdfdf5a" />  

   <img width="3072" height="546" alt="Image" src="https://github.com/user-attachments/assets/fc57c2d5-ef20-497e-b178-2587e089b8b7" />  

3. **Incorrect YAML Keys**
   The YAML file incorrectly uses the keys `train_url` and `test_url`. The correct keys are `train_index` and `test_index` for passing this prepared dataset, as shown in `core/testenvmanager/dataset/dataset.py`.

   <img width="1334" height="241" alt="Image" src="https://github.com/user-attachments/assets/3bd52446-abcd-4a33-ad74-7e66c1b5af0b" />  

4. **Runtime Code Bug**
   Even after fixing the environment and paths, the prediction loop in `basemodel.py` triggers a fatal runtime error:

   <img width="3072" height="1417" alt="Image" src="https://github.com/user-attachments/assets/74a98ffd-acb1-4ea8-81ea-db80671c0629" />  

   ```python
   AttributeError: 'list' object has no attribute 'x'
   ```

5. **Missing Documentation**
   No `README.md` exists to guide setup, dependencies, or the correct workflow, leaving users guessing and encountering unnecessary friction.

### Impact

These issues make the CIFAR100-Federated example effectively unusable in its current state. They hinder onboarding, slow prototyping, and create unnecessary friction for users attempting to benchmark edge ML workflows. Resolving them would restore the example as a reliable baseline for testing, experimentation, and learning in KubeEdge Ianvs environments.

## Goals

This proposal aims to restores the `federated_learning/fedavg` example to a **fully functional, portable, and documented state**. Key deliverables:

1. Introduce a centralized configuration pattern for the example using a `config.py` file to eliminate hardcoded paths and serve as a template for other examples.
2. Change `train_url` and `test_url` to `train_index` and `test_index`, respectively.
3. Fix a runtime bug in `basemodel.py`.
4. Add a comprehensive `README.md` with setup instructions.
5. Include a `requirements.txt` file to formalize dependencies.


## Scope

* **Users:** New community members, researchers, and developers exploring KubeEdge Ianvs, especially in the context of federated learning.
* **Scope:** Limited to the `examples/cifar100/` directory and its dependency changes. No core-related changes are proposed.
* **Uniqueness:**
  * **Focus Area:** Focuses on the **CIFAR-100 federated learning** example, an area that has not received recent maintenance and is critical for new users interested in FL on KubeEdge.
  * **Pipeline Approach:** **Identifies failure points across the full pipeline**, including documentation, environment, configuration, and code logic, rather than addressing a single isolated issue.

## Detailed Design

### Architecture

The architecture will remain within the example itself; no core Ianvs modules will be modified. Proposed components:

* **Centralized Configuration Manager:** Introduce `examples/cifar100/config.py` to manage configuration centrally.
* **Introduced YAML Placeholders:** Add placeholders in YAML files that can be controlled via the config file to eliminate hardcoded values and improve maintainability. This ensures consistency makes it easier to verify via CI/CD.
* **Bug Fixes:** Address potential issues across YAML and Python files.
* **Documentation and Dependencies:** Provide a `README.md` and `requirements.txt` to formalize setup instructions and dependencies.

<img width="2283" height="950" alt="Image" src="https://github.com/user-attachments/assets/225711b6-7559-4935-a82a-5749993efeef" />

### Module Details

#### 1. `config.py` (Centralized Path Configuration)

* Introduce a central configuration for input/output/model paths with both absolute and relative options.
* Dynamically replace placeholders in `.yaml` files (e.g., `{{TRAIN_INDEX_FILE}}`) with actual paths to eliminate hardcoded values and improve maintainability.


#### 2. `README.md` (Documentation)

* Provide step-by-step setup instructions for first-time users.
* Specify **Python 3.10+** requirement.
* Guide users through installation, data preparation, path configuration, and running benchmarks.

#### 3. `requirements.txt` (Dependency Management)

* List example-specific dependency versions to ensure reproducibility.

#### 4. `basemodel.py` (Bug Fix)

* Correct the prediction loop to prevent runtime errors (e.g., `AttributeError`).

#### 5. `testenv.yaml` (Bug Fix)

* Replace `train_url` and `test_url` with `train_index` and `test_index` to standardize paths.

### Expected Outcome

* **Runnable:** Example runs without crashes.
* **Portable:** Works across machines without hardcoded paths.
* **Documented:** Easy for new users to set up and explore.
* **Reproducible:** Environment and dependencies fully specified.

**Note**: A preliminary implementation of this proposed work has been developed and is available as open PR #251 , demonstrating a fully functional proof of concept.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-Test: federated_learning/fedavg Example for KubeEdge Ianvs (LFX 2025 T3) #252

Proposal for LFX Term-3 at KubeEdge

Comprehensive Example Restoration Proposal for KubeEdge Ianvs

Background

Critical Issues

Impact

Goals

Scope

Detailed Design

Architecture

Module Details

1. `config.py` (Centralized Path Configuration)

2. `README.md` (Documentation)

3. `requirements.txt` (Dependency Management)

4. `basemodel.py` (Bug Fix)

5. `testenv.yaml` (Bug Fix)

Expected Outcome

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-Test: federated_learning/fedavg Example for KubeEdge Ianvs (LFX 2025 T3) #252

Description

Proposal for LFX Term-3 at KubeEdge

Comprehensive Example Restoration Proposal for KubeEdge Ianvs

Background

Critical Issues

Impact

Goals

Scope

Detailed Design

Architecture

Module Details

1. config.py (Centralized Path Configuration)

2. README.md (Documentation)

3. requirements.txt (Dependency Management)

4. basemodel.py (Bug Fix)

5. testenv.yaml (Bug Fix)

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `config.py` (Centralized Path Configuration)

2. `README.md` (Documentation)

3. `requirements.txt` (Dependency Management)

4. `basemodel.py` (Bug Fix)

5. `testenv.yaml` (Bug Fix)