-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Hello, I'm trying to use the SSD Mobilenet V2 FPNlite model on images with ~30 ground truth bounding boxes per image. If the validation dataset is larger than ~150 images, I keep running into OOM errors during IoU calculation in the calculate_box_wise_iou() function:
Epoch 7/50
357/357 [==============================] - ETA: 0s - loss: 22.4538Error executing job with overrides: []
Traceback (most recent call last):
...
File "/home/externals/stm32ai_modelzoo_services/object_detection/src/utils/bounding_boxes_utils.py", line 283, in calculate_box_wise_iou
box2_y2 = box2[:, 3]
Node: 'strided_slice_16'
2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[115648000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node strided_slice_16}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[strided_slice_60/_298]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[115648000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node strided_slice_16}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_220137]
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
If I decrease the size of the validation set, the error does not occur (although the validation set is then much too small). Running on CPU only works too, but slower. Error occurs for any setting of GPU memory limit in configuration (from 0: unlimited, to 10 GB).
Model: ssd_mobilenet_v2_fpnlite
Input: 416x416
Batch Size: 4
GPU: RTX 3080
OS: Ubuntu
Metadata
Metadata
Assignees
Labels
No labels