Being present in the training data for relevant entities increases your chances of visibility and value. This is the complete guide to training data. How it works and how to get in it.
Fantastic breakdown on training data scarcity. The point about models consuming data faster than we produce it is probly the most underrated crisis in AI right now. What strikes me is how much the publishers blocking crawlers actually accelerates model collapse rather than protects them. Like, models will just train on more synthetic data and reddit threads instead of quality journalism. Kinda feels like everyone loses in that scenario.
TYVM. I lost my mind and thread around halfway through so hopefully it's still salient. Yeah I think we're sort of in this spiralling loop where everyone's losing. All they need to do is actually pay for good stuff... I think publishers have to stand firm
Fantastic breakdown on training data scarcity. The point about models consuming data faster than we produce it is probly the most underrated crisis in AI right now. What strikes me is how much the publishers blocking crawlers actually accelerates model collapse rather than protects them. Like, models will just train on more synthetic data and reddit threads instead of quality journalism. Kinda feels like everyone loses in that scenario.
TYVM. I lost my mind and thread around halfway through so hopefully it's still salient. Yeah I think we're sort of in this spiralling loop where everyone's losing. All they need to do is actually pay for good stuff... I think publishers have to stand firm