HomeArtificial IntelligenceElon Musk agrees that we've got exhausted AI training data

Elon Musk agrees that we’ve got exhausted AI training data

Elon Musk agrees with other AI experts that there’s little real data left on which to coach AI models.

“We have now essentially exhausted the sum total of human knowledge…. in AI training,” Musk said during a livestream conversation with Stagwell Chairman Mark Penn streamed on X late Wednesday. “That’s principally what happened last yr.”

Musk, who owns the AI ​​company xAI, echoed themes raised by former OpenAI chief scientist Ilya Sutskever in a December address on the NeurIPS machine learning conference. Sutskever, who said the AI ​​industry has reached what he called “peak data,” predicted that a scarcity of coaching data will force a shift away from the way in which models are developed today.

In fact, Musk suggested that synthetic data – data generated by AI models themselves – was the way in which forward. “The only technique to complement (real data) is with synthetic data, where the AI ​​creates (training data),” he said. “With synthetic data… (AI) will kind of evaluate itself and undergo this strategy of self-learning.”

Other corporations, including tech giants like Microsoft, Meta, OpenAI and Anthropic, are already using synthetic data to coach flagship AI models. Gardener Estimates 60% of knowledge used for AI and analytics projects in 2024 can be synthetically generated.

Microsoft's Phi-4, which became open source early Wednesday, was trained on synthetic data along with real data. This also applied to Google’s Gemma models. Anthropic used some synthetic data to create one in every of its strongest systems: Claude 3.5 Sonnet. And Meta has refined its latest Llama model range using AI-generated data.

Training with synthetic data has other advantages, comparable to cost savings. The AI ​​startup Writer claims that the event of its model Palmyra compared Estimates put the value at $4.6 million for an OpenAI model of comparable size.

But there are also disadvantages. Some research suggests that synthetic data can result in model collapse, making a model less “creative” – ​​and more biased – in its outputs, and ultimately seriously affecting its functionality. Similarly, because models create synthetic data, their results can be biased if the info used to coach these models has biases and limitations.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read