Out-Of-Memory GPU Error on SSD Mobilenet V2 FPNlite during Validation

Hello, I'm trying to use the SSD Mobilenet V2 FPNlite model on images with ~30 ground truth bounding boxes per image. If the validation dataset is larger than ~150 images, I keep running into OOM errors during IoU calculation in the calculate_box_wise_iou() function:

```
Epoch 7/50
357/357 [==============================] - ETA: 0s - loss: 22.4538Error executing job with overrides: []
Traceback (most recent call last):

...


File "/home/externals/stm32ai_modelzoo_services/object_detection/src/utils/bounding_boxes_utils.py", line 283, in calculate_box_wise_iou
      box2_y2 = box2[:, 3]
Node: 'strided_slice_16'
2 root error(s) found.
  (0) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[115648000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node strided_slice_16}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[strided_slice_60/_298]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[115648000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node strided_slice_16}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_220137]

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
```

If I decrease the size of the validation set, the error does not occur (although the validation set is then much too small). Running on CPU only works too, but slower. Error occurs for any setting of GPU memory limit in configuration (from 0: unlimited, to 10 GB).

Model: ssd_mobilenet_v2_fpnlite
Input: 416x416
Batch Size: 4
GPU: RTX 3080
OS: Ubuntu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Out-Of-Memory GPU Error on SSD Mobilenet V2 FPNlite during Validation #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out-Of-Memory GPU Error on SSD Mobilenet V2 FPNlite during Validation #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions