Skip to content

[Bug] NaN Accuracy Metrics in Lifelong Learning Semantic Segmentation Example #287

@abhishek-8081

Description

@abhishek-8081

Context

Greetings @hsj576 ,

I'm Abhishek Kumar, LFX mentee working on the Ianvs example restoration project. I need guidance on resolving a critical issue with the lifelong learning semantic segmentation example.

Problem

The benchmark runs successfully end-to-end and training completes, but all evaluation metrics return NaN values instead of numerical accuracy scores.

What's Working ✅

  • Benchmark completes all 6 rounds without crashes
  • Training phase succeeds for all rounds
  • Model checkpoints generated (181MB files)
  • GPU acceleration functional
  • No errors or exceptions during execution

What's Not Working ❌

All evaluation metrics show nan:

Round Test Samples Accuracy Result
0 1 nan Expected (insufficient data)
1 48 nan Unexpected
2 48 nan Unexpected
3 48 nan Unexpected
4 48 nan Unexpected
5 48 nan Unexpected

Environment

  • OS: WSL2 Ubuntu 22.04
  • Python: 3.10 / 3.12 (tested both)
  • PyTorch: 2.x with CUDA 11.8
  • GPU: NVIDIA RTX 4050
  • Ianvs: Latest from main branch

Steps to Reproduce

git clone https://github.com/kubeedge/ianvs.git
cd ianvs
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -e .
ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml

After completion, final results table shows nan for all metrics.

What I've Investigated

  • Verified test dataset files exist with valid image paths
  • Checked model outputs are being generated
  • Examined metrics calculation functions
  • Tested on multiple Python versions
  • Confirmed training completes successfully with proper checkpoints

Despite investigation, I cannot identify why metrics calculate as NaN.

Evidence

Test Dataset Verification ```bash $ wc -l model_eval-*.txt 1 model_eval-0.txt 48 model_eval-1.txt 48 model_eval-2.txt 48 model_eval-3.txt 48 model_eval-4.txt 48 model_eval-5.txt ```
Model Checkpoint Verification ```bash $ ls -lh output/train/0/seen_task/global.model -rw-r--r-- 181M global.model ``` Training produces valid model files.

Questions

I need guidance on:

  1. Where should I look to debug this? Which files/functions handle the metrics calculation?
  2. What's the expected format for predictions and ground truth labels during evaluation?
  3. Is this a known issue with this example?
  4. What could cause NaN in the metrics calculation despite successful training?

Additional Context

During restoration, I fixed multiple compatibility issues to get the benchmark running (empty dataset handling, scheduler fixes, method name corrections, etc.). The pipeline now executes successfully, but the metrics issue remains.

Any guidance on debugging this would be greatly appreciated.

Image
Image


Abhishek Kumar
LFX Mentee - KubeEdge Ianvs
GitHub: @abhishek-8081

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions