Skip to content

Conversation

@mkiefel
Copy link

@mkiefel mkiefel commented Nov 30, 2025

Sometimes during split operations the frame timestamp floating precision leads to frame ending up in the wrong split.

This changes fixes the issues by directly working with frame indices instead.

Closes #2547.

How it was tested

I ran the dataset test creation test script before and after the change and verified that the commit indeed fixed the issue.

How to checkout & try? (for the reviewer)

  1. Grab the test script from Dataset split sometimes fails to split video frames due to float precision #2547.
  2. Run it before the change -> the script should complain.
  3. Run it after the change -> the script sould be happy.

Copilot AI review requested due to automatic review settings November 30, 2025 15:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix a floating point precision issue where video frames could end up in the wrong split during dataset operations. The approach changes from timestamp-based reindexing to frame-index-based reindexing. However, there is a critical bug in the implementation.

Key Changes:

  • Modified _keep_episodes_from_video_with_av to work with frame indices instead of timestamps
  • Changed _copy_and_reindex_videos to extract dataset_from_index/dataset_to_index instead of timestamp values
  • Updated function signatures and logic to use integer frame indices throughout

Critical Issue Found:
The code incorrectly uses global dataset frame indices (dataset_from_index, dataset_to_index) as if they were local frame positions within individual video files. Since video files can contain multiple episodes concatenated together, the global indices don't correspond to actual frame positions in the video file. This will cause the wrong frames to be extracted.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Author

@mkiefel mkiefel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sometimes during split operations the frame timestamp floating
precision leads to frame ending up in the wrong split.

This changes fixes the issues by directly working with frame indices
instead.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mkiefel mkiefel changed the title fix(dataset): Reindex videos based on frame and not on time fix(dataset): Fix reindexing bug for videos on splits Dec 1, 2025
@mkiefel
Copy link
Author

mkiefel commented Dec 5, 2025

This is fixing a bug. Is there anything else that I need to start the review process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset split sometimes fails to split video frames due to float precision

1 participant