2 Comments
User's avatar
Neural Foundry's avatar

Fantastic breakdown on training data scarcity. The point about models consuming data faster than we produce it is probly the most underrated crisis in AI right now. What strikes me is how much the publishers blocking crawlers actually accelerates model collapse rather than protects them. Like, models will just train on more synthetic data and reddit threads instead of quality journalism. Kinda feels like everyone loses in that scenario.

Harry Clarkson-Bennett's avatar

TYVM. I lost my mind and thread around halfway through so hopefully it's still salient. Yeah I think we're sort of in this spiralling loop where everyone's losing. All they need to do is actually pay for good stuff... I think publishers have to stand firm