[Bug] NaN Accuracy Metrics in Lifelong Learning Semantic Segmentation Example

## Context

Greetings @hsj576 ,

I'm Abhishek Kumar, LFX mentee working on the Ianvs example restoration project. I need guidance on resolving a critical issue with the lifelong learning semantic segmentation example.

## Problem

The benchmark runs successfully end-to-end and training completes, but all evaluation metrics return **NaN** values instead of numerical accuracy scores.

## What's Working ✅

- Benchmark completes all 6 rounds without crashes
- Training phase succeeds for all rounds
- Model checkpoints generated (181MB files)
- GPU acceleration functional
- No errors or exceptions during execution

## What's Not Working ❌

All evaluation metrics show `nan`:

| Round | Test Samples | Accuracy | Result |
|-------|--------------|----------|--------|
| 0     | 1            | nan      | Expected (insufficient data) |
| 1     | 48           | nan      | Unexpected |
| 2     | 48           | nan      | Unexpected |
| 3     | 48           | nan      | Unexpected |
| 4     | 48           | nan      | Unexpected |
| 5     | 48           | nan      | Unexpected |

## Environment

- **OS**: WSL2 Ubuntu 22.04
- **Python**: 3.10 / 3.12 (tested both)
- **PyTorch**: 2.x with CUDA 11.8
- **GPU**: NVIDIA RTX 4050
- **Ianvs**: Latest from main branch

## Steps to Reproduce
```bash
git clone https://github.com/kubeedge/ianvs.git
cd ianvs
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -e .
ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml
```

After completion, final results table shows `nan` for all metrics.

## What I've Investigated

- Verified test dataset files exist with valid image paths
- Checked model outputs are being generated
- Examined metrics calculation functions
- Tested on multiple Python versions
- Confirmed training completes successfully with proper checkpoints

Despite investigation, I cannot identify why metrics calculate as NaN.

## Evidence

<details>
<summary>Test Dataset Verification</summary>
```bash
$ wc -l model_eval-*.txt
    1 model_eval-0.txt
   48 model_eval-1.txt
   48 model_eval-2.txt
   48 model_eval-3.txt
   48 model_eval-4.txt
   48 model_eval-5.txt
```
</details>

<details>
<summary>Model Checkpoint Verification</summary>
```bash
$ ls -lh output/train/0/seen_task/global.model
-rw-r--r-- 181M global.model
```
Training produces valid model files.
</details>

## Questions

**I need guidance on:**

1. **Where should I look to debug this?** Which files/functions handle the metrics calculation?
2. **What's the expected format** for predictions and ground truth labels during evaluation?
3. **Is this a known issue** with this example?
4. **What could cause NaN** in the metrics calculation despite successful training?

## Additional Context

During restoration, I fixed multiple compatibility issues to get the benchmark running (empty dataset handling, scheduler fixes, method name corrections, etc.). The pipeline now executes successfully, but the metrics issue remains.

Any guidance on debugging this would be greatly appreciated.

![Image](https://github.com/user-attachments/assets/db279c96-7233-4450-8a4e-9e6ae79324f5)
![Image](https://github.com/user-attachments/assets/ab426847-9e0d-4351-990e-1c689526702e)

---

**Abhishek Kumar**  
LFX Mentee - KubeEdge Ianvs  
GitHub: @abhishek-8081 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] NaN Accuracy Metrics in Lifelong Learning Semantic Segmentation Example #287

Context

Problem

What's Working ✅

What's Not Working ❌

Environment

Steps to Reproduce

What I've Investigated

Evidence

Questions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Round	Test Samples	Accuracy	Result
0	1	nan	Expected (insufficient data)
1	48	nan	Unexpected
2	48	nan	Unexpected
3	48	nan	Unexpected
4	48	nan	Unexpected
5	48	nan	Unexpected

[Bug] NaN Accuracy Metrics in Lifelong Learning Semantic Segmentation Example #287

Description

Context

Problem

What's Working ✅

What's Not Working ❌

Environment

Steps to Reproduce

What I've Investigated

Evidence

Questions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions