Microsoft has launched Phi-3 Mini, a tiny language model that is a component of the corporate's technique to develop lightweight, function-specific AI models.
As language models have evolved, the parameters, training data sets and context windows have develop into increasingly larger. Scaling the scale of those models made it possible to attain more powerful features, but at a value.
The traditional approach to training an LLM is to devour huge amounts of information, which requires huge computing resources. It is estimated that training an LLM like GPT-4 took about three months and price over $21 million.
GPT-4 is a terrific solution for tasks that require complex pondering, but overkill for less complicated tasks like content creation or a sales chatbot. It's like using a Swiss Army knife when all you wish is a straightforward letter opener.
With only 3.8B parameters, Phi-3 Mini is tiny. Still, Microsoft says it's an excellent, lightweight, and cost-effective solution for tasks like summarizing a document, extracting insights from reports, and writing product descriptions or social media posts.
The MMLU benchmark numbers show that the Phi-3 Mini and the larger Phi models yet to come back to market beat larger models mistral 7B and Gemma 7B.
According to Microsoft, Phi-3-small (7B parameters) and Phi-3-medium (14B parameters) will probably be available “soon” within the Azure AI Model Catalog.
Larger models like GPT-4 are still the gold standard and we are able to probably expect GPT-5 to be even larger.
SLMs like Phi-3 Mini offer some vital benefits that larger models don’t. SLMs are cheaper to fine-tune, require less processing power, and might run on the device even when web access just isn’t available.
Deploying an SLM at the sting ends in lower latency and maximum privacy by eliminating the necessity to send data forwards and backwards to the cloud.
Here is Sebastien Bubeck, Vice President of GenAI Research at Microsoft AI, with a demo of Phi-3 Mini. It's super fast and impressive for such a small model.
phi-3 is here and it's… good :-).
I created a fast short demo to present you an idea of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and further announcements tomorrow morning!
(And after all this wouldn't be complete without the same old benchmark table!) pic.twitter.com/AWA7Km59rp
— Sebastien Bubeck (@SebastienBubeck) April 23, 2024
Curated synthetic data
Phi-3 Mini is the results of moving away from the concept that massive amounts of information are the one strategy to train a model.
Sebastien Bubeck, Microsoft vice chairman of generative AI research, asked: “Why not search for data that is incredibly prime quality as a substitute of just training on raw web data?”
Ronen Eldan, a machine learning expert at Microsoft Research, was reading bedtime stories to his daughter when he wondered whether a language model could learn using only words a four-year-old could understand.
This led to an experiment that created a 3,000-word dataset. Using only this limited vocabulary, they led an LLM to create tens of millions of short stories for kids, compiled into a knowledge set called TinyStories.
The researchers then used TinyStories to coach an especially small 10M parameter model, which was then in a position to generate “fluid narratives with perfect grammar.”
They further iterated and scaled this synthetic data generation approach to create more advanced but rigorously curated and filtered synthetic datasets, which were ultimately used to coach Phi-3 Mini.
The result’s a tiny model that’s cheaper to run while offering performance comparable to GPT-3.5.
Smaller but more powerful models will mean that firms now not simply depend on large LLMs like GPT-4. Soon we could also see solutions where an LLM does the heavy lifting but delegates simpler tasks to lightweight models.