-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Context
Greetings @hsj576 ,
I'm Abhishek Kumar, LFX mentee working on the Ianvs example restoration project. I need guidance on resolving a critical issue with the lifelong learning semantic segmentation example.
Problem
The benchmark runs successfully end-to-end and training completes, but all evaluation metrics return NaN values instead of numerical accuracy scores.
What's Working ✅
- Benchmark completes all 6 rounds without crashes
- Training phase succeeds for all rounds
- Model checkpoints generated (181MB files)
- GPU acceleration functional
- No errors or exceptions during execution
What's Not Working ❌
All evaluation metrics show nan:
| Round | Test Samples | Accuracy | Result |
|---|---|---|---|
| 0 | 1 | nan | Expected (insufficient data) |
| 1 | 48 | nan | Unexpected |
| 2 | 48 | nan | Unexpected |
| 3 | 48 | nan | Unexpected |
| 4 | 48 | nan | Unexpected |
| 5 | 48 | nan | Unexpected |
Environment
- OS: WSL2 Ubuntu 22.04
- Python: 3.10 / 3.12 (tested both)
- PyTorch: 2.x with CUDA 11.8
- GPU: NVIDIA RTX 4050
- Ianvs: Latest from main branch
Steps to Reproduce
git clone https://github.com/kubeedge/ianvs.git
cd ianvs
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -e .
ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yamlAfter completion, final results table shows nan for all metrics.
What I've Investigated
- Verified test dataset files exist with valid image paths
- Checked model outputs are being generated
- Examined metrics calculation functions
- Tested on multiple Python versions
- Confirmed training completes successfully with proper checkpoints
Despite investigation, I cannot identify why metrics calculate as NaN.
Evidence
Test Dataset Verification
```bash $ wc -l model_eval-*.txt 1 model_eval-0.txt 48 model_eval-1.txt 48 model_eval-2.txt 48 model_eval-3.txt 48 model_eval-4.txt 48 model_eval-5.txt ```Model Checkpoint Verification
```bash $ ls -lh output/train/0/seen_task/global.model -rw-r--r-- 181M global.model ``` Training produces valid model files.Questions
I need guidance on:
- Where should I look to debug this? Which files/functions handle the metrics calculation?
- What's the expected format for predictions and ground truth labels during evaluation?
- Is this a known issue with this example?
- What could cause NaN in the metrics calculation despite successful training?
Additional Context
During restoration, I fixed multiple compatibility issues to get the benchmark running (empty dataset handling, scheduler fixes, method name corrections, etc.). The pipeline now executes successfully, but the metrics issue remains.
Any guidance on debugging this would be greatly appreciated.
Abhishek Kumar
LFX Mentee - KubeEdge Ianvs
GitHub: @abhishek-8081

