Guide for running the example of robot/lifelong_learning_bench/semantic-segmentation

**Introduction or background of this discussion**: 

Guide for running the example of robot/lifelong_learning_bench/semantic-segmentation

**Contents of this discussion**:

These days I was trying to run [examples/robot/lifelong_learning_bench/semantic-segmentation](https://github.com/kubeedge/ianvs/tree/7ea4f4af57114ce3179cd0c0773a4254c5999715/examples/robot/lifelong_learning_bench/semantic-segmentation) to learn the use of Ianvs.

However, the entire process of running this example was not so easy. I encountered a series of difficulties in the process. Here, I have recorded the process of running this example and the solutions to the problems encountered. Hopefully they may help others interested in Ianvs.

Besides, for the problems discovered during the trial process, I also provided some suggestions in hopes that they can be addressed by the community.

## Ianvs Preparation

I created a new conda environment to run this project on a Ubuntu 22.04 Server. According to the guide  [#step-1-ianvs-preparation](https://github.com/kubeedge/ianvs/tree/main/examples/robot/lifelong_learning_bench/semantic-segmentation#step-1-ianvs-preparation),  we choose python 3.9 as our environment

```bash
conda create -n ianvs-reproduce python=3.9
conda activate ianvs-reproduce
```

Then I installed Sedna following the instruction:

```bash
pip install ./examples/resources/third_party/*
pip install -r requirements.txt
```

Then I installed ianvs by executing `python setup.py install`.

## Dataset Preparation

In Step 2, I need to download the dataset. I got the dataset from @hsj576 . The dataset has the following structure:

```bash
├── 1280x760
│   ├── gtFine
│   │   ├── test
│   │   ├── train
│   │   └── val
│   ├── rgb
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── viz
│       ├── test
│       ├── train
│       └── val
├── 2048x1024
│   ├── gtFine
│   │   ├── test
│   │   ├── train
│   │   └── val
│   ├── rgb
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── viz
│       ├── test
│       ├── train
│       └── val
└── 640x480
    ├── gtFine
    │   ├── test
    │   ├── train
    │   └── val
    ├── json
    │   ├── test
    │   ├── train
    │   └── val
    ├── rgb
    │   ├── test
    │   ├── train
    │   └── val
    └── viz
        ├── test
        ├── train
        └── val
```

Besides, I got trainging index files from @hsj576 , which containes multiple path pairs as shown below:

```bash
rgb/train/20220420_front/00000.png gtFine/train/20220420_front/00000_TrainIds.png
rgb/train/20220420_front/00001.png gtFine/train/20220420_front/00001_TrainIds.png
...
```

However, the [README.md](https://github.com/kubeedge/ianvs/tree/main/examples/robot/lifelong_learning_bench/semantic-segmentation#step-2-dataset-preparation) did not point out how the index files should be placed. After some trial and error, I found that all the files in the `2048x1024` folder need to be moved to the directory where the index files are located.

Then, as the guide pointed out, I should configure the dataset URL in `testenv.yml`. As we could see, there are two folders in  `ianvs/examples/robot/lifelong_learning_bench/`. I tried to edit `semantic-segmentation/testenv/testenv.yml` in the benchmark project, which looks like this:

```yaml
testenv:
  # dataset configuration
  dataset:
    # the url address of train dataset index; string type;
    train_url: "/home/shijing.hu/ianvs/dataset/robot_dataset/train-index.txt"
    # the url address of test dataset index; string type;
    test_url: "/home/shijing.hu/ianvs/dataset/robot_dataset/test-index.txt"
  # model eval configuration of incremental learning;
  model_eval:
    # metric used for model evaluation
    model_metric:
      # metric name; string type;
      name: "accuracy"
      # the url address of python file
      url: "./examples/robot/lifelong_learning_bench/testenv/accuracy.py"
      mode: "no-inference"
    ...
```

I assume the `train_url` and `test_url` are what I have to edit. Since the url `./examples/robot/lifelong_learning_bench/testenv/accuracy.py` suggests that the root path for this file is `ianvs/project/ianvs`, and my dataset is in `ianvs/project/datasets`, I updated the configuration as follows:

```yml
testenv:
  # dataset configuration
  dataset:
    # the url address of train dataset index; string type;
    train_url: "../datasets/robot_dataset/train-index.txt"
    # the url address of test dataset index; string type;
    test_url: "../datasets/robot_dataset/test-index.txt"
  # model eval configuration of incremental learning;
  model_eval:
    # metric used for model evaluation
    model_metric:
      # metric name; string type;
      name: "accuracy"
      # the url address of python file
      url: "./examples/robot/lifelong_learning_bench/testenv/accuracy.py"
      mode: "no-inference"
    ...
```

There were multiple testenv files in `testenv/` and I edited them all. 

## Large Vision Model Preparation

Next, I need to download SAM package and model according to [#step-2.5-large-vision-model-preparationoptional](https://github.com/kubeedge/ianvs/tree/main/examples/robot/lifelong_learning_bench/semantic-segmentation#step-25-large-vision-model-preparationoptional). This step went smoothly.

Then, I need to install `mmcv` and `mmdetection`. The installation of `mmcv` is successful following the guide, but there were some issues with installing `mmdetection`, as shown below.

```bash
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "./ianvs-reproduce/project/mmdetection/setup.py", line 11, in <module>
      import torch
  ModuleNotFoundError: No module named 'torch'
```

So I need to install `torch` by my self. As the guide didn't mention the version of `torch`, I assumed I need`torch 2.0.0` with `cu118` because the download link for `mmcv` in the guide indicates this version:`https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.0.0-cp39-cp39-manylinux1_x86_64.whl`.

I install torch + cu118 by the instruction from [Previous PyTorch Versions | PyTorch](https://pytorch.org/get-started/previous-versions/).

```plaintext
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
```

As recommended in the guide, I downloaded the `cache.pickle` and `pretrain_model.pth` to the specified path and edited `self.resume` with the correct path.

## Execution and Presentation

I used the code below to try running ianvs:

```bash
ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml
```

Then, I found some errors about packages:

```bash
  File "./ianvs-reproduce/project/ianvs/core/storymanager/visualization/visualization.py", line 20, in <module>
    from prettytable import from_csv
ModuleNotFoundError: No module named 'prettytable'
```

and

```bash
AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)
```

and

```bash
 File "./ianvs-reproduce/lib/python3.9/site-packages/sedna/algorithms/seen_task_learning/seen_task_learning.py", line 22, in <module>
	from sklearn import metrics as sk_metrics
ModuleNotFoundError: No module named 'sklearn'
```

and 

```bash
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
```

and

```bash
  File "./examples/robot/lifelong_learning_bench/testalgorithms/rfnet/RFNet/train.py", line 4, in <module>
    from tqdm import tqdm
ModuleNotFoundError: No module named 'tqdm'

  File "/home/***/miniconda3/envs/ianvs-reproduce/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py", line 1, in <module>
    import tensorboard
ModuleNotFoundError: No module named 'tensorboard'

  File "./examples/robot/lifelong_learning_bench/testalgorithms/rfnet/RFNet/eval.py", line 26, in <module>
    from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
ModuleNotFoundError: No module named 'transformers'
```

I used the code below to fix the missing package issue:

```bash
pip install prettytable scikit-learn tqdm tensorboard transformers charset_normalizer==3.1.0 numpy==1.26.4
```

When I reran the ianvs command, I got an error:

```bash
(ianvs-reproduce) **@server:~/data/OSSP/ianvs-reproduce/project/ianvs$ ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml                                                                                
Traceback (most recent call last):
  File "./ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 36, in main
    job = BenchmarkingJob(config[str.lower(BenchmarkingJob.__name__)])
  File "./ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 50, in __init__
    self._parse_config(config)
  File "./ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 103, in _parse_config
    self._parse_testenv_config(v)
  File "./ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 116, in _parse_testenv_config
    raise RuntimeError(f"not found testenv config file({config_file}) in local")
RuntimeError: not found testenv config file(./examples/robot/lifelong_learning_bench/testenv/testenv-robot.yaml) in local

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/**/miniconda3/envs/ianvs-reproduce/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "./ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: not found testenv config file(./examples/robot/lifelong_learning_bench/testenv/testenv-robot.yaml) in local.
```

It appears that there is a path issue. After examining the structure of this example, I realized that I can resolve it by moving all the files from `./examples/robot/lifelong_learning_bench/semantic-segmentation` to `./examples/robot/lifelong_learning_bench`.

After making this change and running the command, I encountered new exceptions:

```bash
(ianvs-reproduce) $:~/data/OSSP/ianvs-reproduce/project/ianvs$ ianvs -f examples/robot/lifelong_learning_bench/benchmarkingjob-simple.yaml
un_classes:30
Upsample layer: in = 128, skip = 64, out = 128
Upsample layer: in = 128, skip = 128, out = 128
Upsample layer: in = 128, skip = 256, out = 128
128
Model loaded successfully!
Traceback (most recent call last):
  File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 74, in run
    res, system_metric_info = paradigm.run()
  File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 166, in run
    dataset_files = self._split_dataset(splitting_dataset_times=rounds)
  File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 433, in _split_dataset
    output_dir=self.dataset_output_dir(),
  File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/algorithm/paradigm/base.py", line 69, in dataset_output_dir
    os.makedirs(output_dir)
  File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 3 more times]
  File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/ianvs'
```

Obviously, it was also a path issue. I then searched `/ianvs` in the project folder and discovered the `workspace` in `benchmarkingjob-simple.yaml` and `benchmarkingjob-simple.yaml` needed to be reconfigured.

In the next stage, I encounterd more problems aboud path like below: 

```bash
Traceback (most recent call last):
  File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 93, in run
    succeed_testcases, test_results = self.testcase_controller.run_testcases(self.workspace)
  File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 56, in run_testcases
    raise RuntimeError(f"testcase(id={testcase.id}) runs failed, error: {err}") from err
RuntimeError: testcase(id=e139c552-2c87-11ef-b834-b42e99a3b90d) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: [Errno 2] No such file or directory: '/home/hsj/ianvs/project/cache.pickle'
```

```bash
Traceback (most recent call last):
  File "/home/**/miniconda3/envs/ianvs-reproduce/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=46211dbc-2c88-11ef-a03f-b42e99a3b90d) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: [Errno 2] No such file or directory: '/home/hsj/ianvs/project/segment-anything/sam_vit_h_4b8939.pth'.
```

After fixing these problems, I could run this project.

```bash
[2024-06-17 20:56:54,847] task_evaluation.py(69) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.5691549465958629}
[2024-06-17 20:56:54,852] lifelong_learning.py(449) [INFO] - Task evaluation finishes.
[2024-06-17 20:56:54,852] lifelong_learning.py(452) [INFO] - upload kb index from index.pkl to ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/index.pkl
[2024-06-17 20:56:54,852] lifelong_learning.py(208) [INFO] - train from round 0
[2024-06-17 20:56:54,853] lifelong_learning.py(209) [INFO] - test round 1
[2024-06-17 20:56:54,853] lifelong_learning.py(210) [INFO] - all scores: {'accuracy': 0.5691549465958629}
[2024-06-17 20:56:54,853] lifelong_learning.py(220) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.5691549465958629}
[2024-06-17 20:56:54,853] lifelong_learning.py(443) [INFO] - Download kb index from ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/train/0/index.pkl to index.pkl
load model url:  ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/train/0/seen_task/front_semantic_segamentation_model.pth
:   0%|                                                                              | 0/4 [00:00<?, ?it/s][Save] save rfnet prediction:  ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00187.png_origin.png
:  25%|█████████████████▌                                                    | 1/4 [00:00<00:02,  1.37it/s][Save] save rfnet prediction:  ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00190.png_origin.png
:  50%|███████████████████████████████████                                   | 2/4 [00:01<00:01,  1.33it/s][Save] save rfnet prediction:  ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00192.png_origin.png
:  75%|████████████████████████████████████████████████████▌                 | 3/4 [00:02<00:00,  1.32it/s][Save] save rfnet prediction:  ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00195.png_origin.png
: 100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.35it/s]
Found 4 test RGB images
Found 4 test disparity images
:   0%|                                                                              | 0/4 [00:00<?, ?it/s](1, 1024, 2048) (1, 1024, 2048)
:  25%|█████████████████▌                                                    | 1/4 [00:00<00:00,  6.76it/s](1, 1024, 2048) (1, 1024, 2048)
:  50%|███████████████████████████████████                                   | 2/4 [00:00<00:00,  6.77it/s](1, 1024, 2048) (1, 1024, 2048)
:  75%|████████████████████████████████████████████████████▌                 | 3/4 [00:00<00:00,  6.77it/s](1, 1024, 2048) (1, 1024, 2048)
: 100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  6.60it/s]
-----------Acc of each classes-----------
road         : 99.818096 %
sidewalk     : nan %
building     : 96.967190 %
wall         : nan %
fence        : nan %
pole         : 0.000000 %
traffic light: nan %
traffic sign : nan %
vegetation   : 98.160555 %
terrain      : nan %
sky          : nan %
person       : nan %
rider        : nan %
car          : nan %
truck        : nan %
bus          : nan %
train        : nan %
motorcycle   : nan %
bicycle      : nan %
stair        : 99.408024 %
curb         : nan %
ramp         : nan %
runway       : nan %
flowerbed    : nan %
door         : nan %
CCTV camera  : nan %
Manhole      : nan %
hydrant      : nan %
belt         : nan %
dustbin      : nan %
-----------IoU of each classes-----------
road         : 99.436234 %
sidewalk     : nan %
building     : 96.699620 %
wall         : nan %
fence        : nan %
pole         : 0.000000 %
traffic light: nan %
traffic sign : nan %
vegetation   : 86.071165 %
terrain      : nan %
sky          : 0.000000 %
person       : nan %
rider        : nan %
car          : nan %
truck        : nan %
bus          : nan %
train        : nan %
motorcycle   : nan %
bicycle      : nan %
stair        : 98.085159 %
curb         : nan %
ramp         : nan %
runway       : nan %
flowerbed    : nan %
door         : nan %
CCTV camera  : nan %
Manhole      : nan %
hydrant      : nan %
belt         : nan %
dustbin      : nan %
-----------FWIoU of each classes-----------
road         : 36.936448 %
sidewalk     : 29.667129 %
-----------freq of each classes-----------
road         : 37.145863 %
sidewalk     : 0.000000 %
building     : 30.679675 %
wall         : 0.000000 %
fence        : 0.000000 %
pole         : 0.065531 %
traffic light: 0.000000 %
traffic sign : 0.000000 %
vegetation   : 5.160104 %
terrain      : 0.000000 %
sky          : 0.000000 %
person       : 0.000000 %
rider        : 0.000000 %
car          : 0.000000 %
truck        : 0.000000 %
bus          : 0.000000 %
train        : 0.000000 %
motorcycle   : 0.000000 %
bicycle      : 0.000000 %
stair        : 26.948826 %
curb         : 0.000000 %
ramp         : 0.000000 %
runway       : 0.000000 %
flowerbed    : 0.000000 %
door         : 0.000000 %
CCTV camera  : 0.000000 %
Manhole      : 0.000000 %
hydrant      : 0.000000 %
belt         : 0.000000 %
dustbin      : 0.000000 %
CPA:0.7887077301139684, mIoU:0.633820295720513, fwIoU: 0.9747773749178817

...
```

However, there still seems to be some bugs. For example, [rank.py]() has something like

https://github.com/kubeedge/ianvs/blob/7ea4f4af57114ce3179cd0c0773a4254c5999715/core/storymanager/rank/rank.py#L178

which could cause exception as below: 

```bash
Traceback (most recent call last):
  File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 96, in run
    self.rank.save(succeed_testcases, test_results, output_dir=self.workspace)
  File "/home/**/ianvs-reproduce/project/ianvs/core/storymanager/rank/rank.py", line 260, in save
    self._save_all()
  File "/home/**/ianvs-reproduce/project/ianvs/core/storymanager/rank/rank.py", line 178, in _save_all
    all_df.index = pd.np.arange(1, len(all_df) + 1)
AttributeError: module 'pandas' has no attribute 'np'
```

Finally, we could see the csv output after removing the prefix `pd`:

```csv
rank algorithm BWT MATRIX accuracy task_avg_acc samples_transfer_ratio FWT paradigm basemodel task_definition task_allocation unseen_sample_recognition basemodel-learning_rate basemodel-epochs task_definition-origins task_allocation-origins unseen_sample_recognition-threhold time url
1  -0.0020555379043278497  0.6102329717747815 0.6007481221422825 0.6024 -0.0022857764312219273            
```

However, the output still seems to have some problems like:

- not using `,` as the seperator but ` ` .
- data missing (such as `algorithm`, `MATRIX`, `url`)

But in the end, we have accomplished the entire process of the example.

## Advice

Overall, due to the omission of documentation and hard-coded configuration in the code, running this project is not a easy thing. To address this issue, I recommend:

- Remove all the hard-coded configuration, espicially **absolute paths**, **devices**.
- Sort out project dependencies and complete the documentation, espicially the **missing packages** and **configuration of dataset path.**
- replace `print` with `logger.info` for better monitoring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guide for running the example of robot/lifelong_learning_bench/semantic-segmentation #106

Ianvs Preparation

Dataset Preparation

Large Vision Model Preparation

Execution and Presentation

Advice

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guide for running the example of robot/lifelong_learning_bench/semantic-segmentation #106

Description

Ianvs Preparation

Dataset Preparation

Large Vision Model Preparation

Execution and Presentation

Advice

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions