FL seems like a plausible path towards scaling data and training by enabling community contributors to simultaneously train on local datasets and share model weights. Here is some demonstrable progress with SmolVLA:
https://substack.com/home/post/p-175893535
https://substack.com/home/post/p-176837437