-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Introduction or background of this discussion:
Guide for running the example of robot/lifelong_learning_bench/semantic-segmentation
Contents of this discussion:
These days I was trying to run examples/robot/lifelong_learning_bench/semantic-segmentation to learn the use of Ianvs.
However, the entire process of running this example was not so easy. I encountered a series of difficulties in the process. Here, I have recorded the process of running this example and the solutions to the problems encountered. Hopefully they may help others interested in Ianvs.
Besides, for the problems discovered during the trial process, I also provided some suggestions in hopes that they can be addressed by the community.
Ianvs Preparation
I created a new conda environment to run this project on a Ubuntu 22.04 Server. According to the guide #step-1-ianvs-preparation, we choose python 3.9 as our environment
conda create -n ianvs-reproduce python=3.9
conda activate ianvs-reproduceThen I installed Sedna following the instruction:
pip install ./examples/resources/third_party/*
pip install -r requirements.txtThen I installed ianvs by executing python setup.py install.
Dataset Preparation
In Step 2, I need to download the dataset. I got the dataset from @hsj576 . The dataset has the following structure:
├── 1280x760
│ ├── gtFine
│ │ ├── test
│ │ ├── train
│ │ └── val
│ ├── rgb
│ │ ├── test
│ │ ├── train
│ │ └── val
│ └── viz
│ ├── test
│ ├── train
│ └── val
├── 2048x1024
│ ├── gtFine
│ │ ├── test
│ │ ├── train
│ │ └── val
│ ├── rgb
│ │ ├── test
│ │ ├── train
│ │ └── val
│ └── viz
│ ├── test
│ ├── train
│ └── val
└── 640x480
├── gtFine
│ ├── test
│ ├── train
│ └── val
├── json
│ ├── test
│ ├── train
│ └── val
├── rgb
│ ├── test
│ ├── train
│ └── val
└── viz
├── test
├── train
└── valBesides, I got trainging index files from @hsj576 , which containes multiple path pairs as shown below:
rgb/train/20220420_front/00000.png gtFine/train/20220420_front/00000_TrainIds.png
rgb/train/20220420_front/00001.png gtFine/train/20220420_front/00001_TrainIds.png
...However, the README.md did not point out how the index files should be placed. After some trial and error, I found that all the files in the 2048x1024 folder need to be moved to the directory where the index files are located.
Then, as the guide pointed out, I should configure the dataset URL in testenv.yml. As we could see, there are two folders in ianvs/examples/robot/lifelong_learning_bench/. I tried to edit semantic-segmentation/testenv/testenv.yml in the benchmark project, which looks like this:
testenv:
# dataset configuration
dataset:
# the url address of train dataset index; string type;
train_url: "/home/shijing.hu/ianvs/dataset/robot_dataset/train-index.txt"
# the url address of test dataset index; string type;
test_url: "/home/shijing.hu/ianvs/dataset/robot_dataset/test-index.txt"
# model eval configuration of incremental learning;
model_eval:
# metric used for model evaluation
model_metric:
# metric name; string type;
name: "accuracy"
# the url address of python file
url: "./examples/robot/lifelong_learning_bench/testenv/accuracy.py"
mode: "no-inference"
...I assume the train_url and test_url are what I have to edit. Since the url ./examples/robot/lifelong_learning_bench/testenv/accuracy.py suggests that the root path for this file is ianvs/project/ianvs, and my dataset is in ianvs/project/datasets, I updated the configuration as follows:
testenv:
# dataset configuration
dataset:
# the url address of train dataset index; string type;
train_url: "../datasets/robot_dataset/train-index.txt"
# the url address of test dataset index; string type;
test_url: "../datasets/robot_dataset/test-index.txt"
# model eval configuration of incremental learning;
model_eval:
# metric used for model evaluation
model_metric:
# metric name; string type;
name: "accuracy"
# the url address of python file
url: "./examples/robot/lifelong_learning_bench/testenv/accuracy.py"
mode: "no-inference"
...There were multiple testenv files in testenv/ and I edited them all.
Large Vision Model Preparation
Next, I need to download SAM package and model according to #step-2.5-large-vision-model-preparationoptional. This step went smoothly.
Then, I need to install mmcv and mmdetection. The installation of mmcv is successful following the guide, but there were some issues with installing mmdetection, as shown below.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:
Running command python setup.py egg_info
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "./ianvs-reproduce/project/mmdetection/setup.py", line 11, in <module>
import torch
ModuleNotFoundError: No module named 'torch'So I need to install torch by my self. As the guide didn't mention the version of torch, I assumed I needtorch 2.0.0 with cu118 because the download link for mmcv in the guide indicates this version:https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.0.0-cp39-cp39-manylinux1_x86_64.whl.
I install torch + cu118 by the instruction from Previous PyTorch Versions | PyTorch.
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
As recommended in the guide, I downloaded the cache.pickle and pretrain_model.pth to the specified path and edited self.resume with the correct path.
Execution and Presentation
I used the code below to try running ianvs:
ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yamlThen, I found some errors about packages:
File "./ianvs-reproduce/project/ianvs/core/storymanager/visualization/visualization.py", line 20, in <module>
from prettytable import from_csv
ModuleNotFoundError: No module named 'prettytable'and
AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)and
File "./ianvs-reproduce/lib/python3.9/site-packages/sedna/algorithms/seen_task_learning/seen_task_learning.py", line 22, in <module>
from sklearn import metrics as sk_metrics
ModuleNotFoundError: No module named 'sklearn'and
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.and
File "./examples/robot/lifelong_learning_bench/testalgorithms/rfnet/RFNet/train.py", line 4, in <module>
from tqdm import tqdm
ModuleNotFoundError: No module named 'tqdm'
File "/home/***/miniconda3/envs/ianvs-reproduce/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py", line 1, in <module>
import tensorboard
ModuleNotFoundError: No module named 'tensorboard'
File "./examples/robot/lifelong_learning_bench/testalgorithms/rfnet/RFNet/eval.py", line 26, in <module>
from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
ModuleNotFoundError: No module named 'transformers'I used the code below to fix the missing package issue:
pip install prettytable scikit-learn tqdm tensorboard transformers charset_normalizer==3.1.0 numpy==1.26.4When I reran the ianvs command, I got an error:
(ianvs-reproduce) **@server:~/data/OSSP/ianvs-reproduce/project/ianvs$ ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml
Traceback (most recent call last):
File "./ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 36, in main
job = BenchmarkingJob(config[str.lower(BenchmarkingJob.__name__)])
File "./ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 50, in __init__
self._parse_config(config)
File "./ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 103, in _parse_config
self._parse_testenv_config(v)
File "./ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 116, in _parse_testenv_config
raise RuntimeError(f"not found testenv config file({config_file}) in local")
RuntimeError: not found testenv config file(./examples/robot/lifelong_learning_bench/testenv/testenv-robot.yaml) in local
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/**/miniconda3/envs/ianvs-reproduce/bin/ianvs", line 33, in <module>
sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
File "./ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 41, in main
raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: not found testenv config file(./examples/robot/lifelong_learning_bench/testenv/testenv-robot.yaml) in local.It appears that there is a path issue. After examining the structure of this example, I realized that I can resolve it by moving all the files from ./examples/robot/lifelong_learning_bench/semantic-segmentation to ./examples/robot/lifelong_learning_bench.
After making this change and running the command, I encountered new exceptions:
(ianvs-reproduce) $:~/data/OSSP/ianvs-reproduce/project/ianvs$ ianvs -f examples/robot/lifelong_learning_bench/benchmarkingjob-simple.yaml
un_classes:30
Upsample layer: in = 128, skip = 64, out = 128
Upsample layer: in = 128, skip = 128, out = 128
Upsample layer: in = 128, skip = 256, out = 128
128
Model loaded successfully!
Traceback (most recent call last):
File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 74, in run
res, system_metric_info = paradigm.run()
File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 166, in run
dataset_files = self._split_dataset(splitting_dataset_times=rounds)
File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 433, in _split_dataset
output_dir=self.dataset_output_dir(),
File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/algorithm/paradigm/base.py", line 69, in dataset_output_dir
os.makedirs(output_dir)
File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
[Previous line repeated 3 more times]
File "/home/**/miniconda3/envs/ianvs-reproduce/lib/python3.9/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/ianvs'Obviously, it was also a path issue. I then searched /ianvs in the project folder and discovered the workspace in benchmarkingjob-simple.yaml and benchmarkingjob-simple.yaml needed to be reconfigured.
In the next stage, I encounterd more problems aboud path like below:
Traceback (most recent call last):
File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 37, in main
job.run()
File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 93, in run
succeed_testcases, test_results = self.testcase_controller.run_testcases(self.workspace)
File "/home/**/ianvs-reproduce/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 56, in run_testcases
raise RuntimeError(f"testcase(id={testcase.id}) runs failed, error: {err}") from err
RuntimeError: testcase(id=e139c552-2c87-11ef-b834-b42e99a3b90d) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: [Errno 2] No such file or directory: '/home/hsj/ianvs/project/cache.pickle'Traceback (most recent call last):
File "/home/**/miniconda3/envs/ianvs-reproduce/bin/ianvs", line 33, in <module>
sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 41, in main
raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=46211dbc-2c88-11ef-a03f-b42e99a3b90d) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: [Errno 2] No such file or directory: '/home/hsj/ianvs/project/segment-anything/sam_vit_h_4b8939.pth'.After fixing these problems, I could run this project.
[2024-06-17 20:56:54,847] task_evaluation.py(69) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.5691549465958629}
[2024-06-17 20:56:54,852] lifelong_learning.py(449) [INFO] - Task evaluation finishes.
[2024-06-17 20:56:54,852] lifelong_learning.py(452) [INFO] - upload kb index from index.pkl to ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/index.pkl
[2024-06-17 20:56:54,852] lifelong_learning.py(208) [INFO] - train from round 0
[2024-06-17 20:56:54,853] lifelong_learning.py(209) [INFO] - test round 1
[2024-06-17 20:56:54,853] lifelong_learning.py(210) [INFO] - all scores: {'accuracy': 0.5691549465958629}
[2024-06-17 20:56:54,853] lifelong_learning.py(220) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.5691549465958629}
[2024-06-17 20:56:54,853] lifelong_learning.py(443) [INFO] - Download kb index from ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/train/0/index.pkl to index.pkl
load model url: ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/train/0/seen_task/front_semantic_segamentation_model.pth
: 0%| | 0/4 [00:00<?, ?it/s][Save] save rfnet prediction: ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00187.png_origin.png
: 25%|█████████████████▌ | 1/4 [00:00<00:02, 1.37it/s][Save] save rfnet prediction: ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00190.png_origin.png
: 50%|███████████████████████████████████ | 2/4 [00:01<00:01, 1.33it/s][Save] save rfnet prediction: ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00192.png_origin.png
: 75%|████████████████████████████████████████████████████▌ | 3/4 [00:02<00:00, 1.32it/s][Save] save rfnet prediction: ../sam-workspace/benchmarkingjob/sam_rfnet_lifelong_learning/f74a8748-2ca8-11ef-82f0-4125e9124177/output/eval/0/front/00195.png_origin.png
: 100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.35it/s]
Found 4 test RGB images
Found 4 test disparity images
: 0%| | 0/4 [00:00<?, ?it/s](1, 1024, 2048) (1, 1024, 2048)
: 25%|█████████████████▌ | 1/4 [00:00<00:00, 6.76it/s](1, 1024, 2048) (1, 1024, 2048)
: 50%|███████████████████████████████████ | 2/4 [00:00<00:00, 6.77it/s](1, 1024, 2048) (1, 1024, 2048)
: 75%|████████████████████████████████████████████████████▌ | 3/4 [00:00<00:00, 6.77it/s](1, 1024, 2048) (1, 1024, 2048)
: 100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.60it/s]
-----------Acc of each classes-----------
road : 99.818096 %
sidewalk : nan %
building : 96.967190 %
wall : nan %
fence : nan %
pole : 0.000000 %
traffic light: nan %
traffic sign : nan %
vegetation : 98.160555 %
terrain : nan %
sky : nan %
person : nan %
rider : nan %
car : nan %
truck : nan %
bus : nan %
train : nan %
motorcycle : nan %
bicycle : nan %
stair : 99.408024 %
curb : nan %
ramp : nan %
runway : nan %
flowerbed : nan %
door : nan %
CCTV camera : nan %
Manhole : nan %
hydrant : nan %
belt : nan %
dustbin : nan %
-----------IoU of each classes-----------
road : 99.436234 %
sidewalk : nan %
building : 96.699620 %
wall : nan %
fence : nan %
pole : 0.000000 %
traffic light: nan %
traffic sign : nan %
vegetation : 86.071165 %
terrain : nan %
sky : 0.000000 %
person : nan %
rider : nan %
car : nan %
truck : nan %
bus : nan %
train : nan %
motorcycle : nan %
bicycle : nan %
stair : 98.085159 %
curb : nan %
ramp : nan %
runway : nan %
flowerbed : nan %
door : nan %
CCTV camera : nan %
Manhole : nan %
hydrant : nan %
belt : nan %
dustbin : nan %
-----------FWIoU of each classes-----------
road : 36.936448 %
sidewalk : 29.667129 %
-----------freq of each classes-----------
road : 37.145863 %
sidewalk : 0.000000 %
building : 30.679675 %
wall : 0.000000 %
fence : 0.000000 %
pole : 0.065531 %
traffic light: 0.000000 %
traffic sign : 0.000000 %
vegetation : 5.160104 %
terrain : 0.000000 %
sky : 0.000000 %
person : 0.000000 %
rider : 0.000000 %
car : 0.000000 %
truck : 0.000000 %
bus : 0.000000 %
train : 0.000000 %
motorcycle : 0.000000 %
bicycle : 0.000000 %
stair : 26.948826 %
curb : 0.000000 %
ramp : 0.000000 %
runway : 0.000000 %
flowerbed : 0.000000 %
door : 0.000000 %
CCTV camera : 0.000000 %
Manhole : 0.000000 %
hydrant : 0.000000 %
belt : 0.000000 %
dustbin : 0.000000 %
CPA:0.7887077301139684, mIoU:0.633820295720513, fwIoU: 0.9747773749178817
...However, there still seems to be some bugs. For example, rank.py has something like
ianvs/core/storymanager/rank/rank.py
Line 178 in 7ea4f4a
| all_df.index = pd.np.arange(1, len(all_df) + 1) |
which could cause exception as below:
Traceback (most recent call last):
File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/benchmarking.py", line 37, in main
job.run()
File "/home/**/ianvs-reproduce/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 96, in run
self.rank.save(succeed_testcases, test_results, output_dir=self.workspace)
File "/home/**/ianvs-reproduce/project/ianvs/core/storymanager/rank/rank.py", line 260, in save
self._save_all()
File "/home/**/ianvs-reproduce/project/ianvs/core/storymanager/rank/rank.py", line 178, in _save_all
all_df.index = pd.np.arange(1, len(all_df) + 1)
AttributeError: module 'pandas' has no attribute 'np'Finally, we could see the csv output after removing the prefix pd:
rank algorithm BWT MATRIX accuracy task_avg_acc samples_transfer_ratio FWT paradigm basemodel task_definition task_allocation unseen_sample_recognition basemodel-learning_rate basemodel-epochs task_definition-origins task_allocation-origins unseen_sample_recognition-threhold time url
1 -0.0020555379043278497 0.6102329717747815 0.6007481221422825 0.6024 -0.0022857764312219273
However, the output still seems to have some problems like:
- not using
,as the seperator but. - data missing (such as
algorithm,MATRIX,url)
But in the end, we have accomplished the entire process of the example.
Advice
Overall, due to the omission of documentation and hard-coded configuration in the code, running this project is not a easy thing. To address this issue, I recommend:
- Remove all the hard-coded configuration, espicially absolute paths, devices.
- Sort out project dependencies and complete the documentation, espicially the missing packages and configuration of dataset path.
- replace
printwithlogger.infofor better monitoring.