The impact of artificial intelligence won’t ever be evenly distributed if there is simply one company constructing and controlling the models (not to say the information that goes into them). Unfortunately, today's AI models consist of billions of parameters that must be trained and tuned to maximise performance for every use case, putting the best-performing AI models out of reach of most individuals and corporations.
MosaicML began with a mission to make these models more accessible. The company, whose co-founders include Jonathan Frankle PhD '23 and MIT Associate Professor Michael Carbin, developed a platform that lets users train, improve, and monitor open-source models with their very own data. The company also built its own open-source models using Nvidia graphics processing units (GPUs).
This approach made deep learning, a nascent field when MosaicML was first introduced, accessible to way more organizations as enthusiasm for generative AI and huge language models (LLMs) exploded after the discharge of Chat GPT-3.5. In addition, MosaicML became a strong complementary tool for data management firms, who were also committed to helping organizations leverage their data without handing it over to AI firms.
Last 12 months, that pondering led to the acquisition of MosaicML by Databricks, a world data storage, analytics, and AI company that works with a few of the largest organizations on the planet. Since the acquisition, the combined firms have released probably the most powerful general-purpose open-source LLMs ever developed. This model, called DBRX, has set latest standards in tasks comparable to reading comprehension, general knowledge questions, and logic puzzles.
Since then, DBRX has earned a repute for being one among the fastest open source LLMs available and has proven particularly useful in large enterprises.
But Frankle says DBRX is critical not only since it was built using Databricks tools, but in addition since it means the entire company's customers can achieve similar performance with their very own models, which is able to speed up the impact of generative AI.
“Honestly, it's just exciting to see the community doing cool things with it,” says Frankle. “For me as a scientist, that's the very best part. It's not the model, it's all of the cool things the community does with it. That's where the magic happens.”
Designing algorithms efficiently
Frankle earned his bachelor's and master's degrees in computer science from Princeton University before coming to MIT to pursue his doctorate in 2016. When he began his studies at MIT, he was unsure of what area of ​​computer science he wanted to review, but his eventual decision would change the course of his life.
Frankle ultimately focused on a type of artificial intelligence referred to as deep learning. At the time, deep learning and artificial intelligence weren’t generating as much buzz as they’re today. Deep learning was a decades-old field of research that had not yet borne much fruit.
“I don't think anyone expected deep learning to take off prefer it did back then,” says Frankle. “People who knew about it thought it was a very exciting field and there have been lots of unsolved problems, but terms like Large Language Model (LLM) and Generative AI weren't really getting used back then. It was early days.”
Things got interesting with the publication of a now infamous Paper by Google researchers during which they showed that a brand new deep learning architecture called “Transformer” is surprisingly effective at language translation and likewise shows promise for quite a few other applications, including content creation.
In 2020, future Mosaic co-founder and tech executive Naveen Rao emailed Frankle and Carbin out of the blue. Rao had read a paper the 2 had co-authored during which the researchers showed a technique to shrink deep learning models without compromising performance. Rao suggested the 2 start an organization. They were joined by Hanlin Tang, who had worked with Rao on a previous AI startup that had been acquired by Intel.
The founders first learned about different techniques for accelerating the training of AI models and eventually combined several of them to indicate that they may train a model to perform image classification 4 times faster than before.
“The trick was that there was no trick,” says Frankle. “I feel we needed to make 17 different changes to the way in which we trained the model to figure that out. It was just a little bit bit here and a little bit bit there, nevertheless it turned out that was enough to get incredible speed gains. That's mainly the story of Mosaic.”
The team showed that their techniques could make models more efficient, and in 2023 released an open-source model for big languages ​​together with an open-source library of their methods. They also developed visualization tools that allow developers to map out different experimental options for training and running models.
MIT's E14 Fund invested in Mosaic's Series A funding round, and Frankle says the team at E14 offered helpful advice early on. Mosaic's advances enabled a brand new class of firms to coach their very own generative AI models.
“There was a democratization and open source aspect to the mission of Mosaic,” says Frankle. “That's something I've all the time felt very strongly about. Even after I was a graduate student and didn't have GPUs, because I wasn't in a machine learning lab and all my friends had GPUs. That's still how I feel. Why can't all of us take part? Why can't all of us do these items and do science?”
Open sourcing innovation
Databricks had also been working on providing its customers with access to AI models. The company accomplished the acquisition of MosaicML in 2023 for a reported $1.3 billion.
“At Databricks, we saw a founding team of academics like us,” says Frankle. “We also saw a team of scientists who understand technology. Databricks has the information, now we have the machine learning. You can't have one without the opposite and vice versa. It was just a very good fit.”
In March, Databricks released DBRX, which provided the open source community and enterprises with capabilities previously limited to closed models as they built their very own LLMs.
“DBRX has shown which you could use Databricks to construct the very best open source LLM on the planet,” says Frankle. “The sky’s the limit for firms today.”
Frankle says the Databricks team has been encouraged by their internal use of DBRX for a big selection of tasks.
“It's already great, and with a little bit little bit of fine-tuning, it's higher than the closed models,” he says. “It's not going to be higher than GPT in all areas. That's not how it really works. But no person wants to unravel every problem. Everyone wants to unravel an issue. And we will tweak this model to make it really great for certain scenarios.”
As Databricks continues to push the boundaries of artificial intelligence, and competitors proceed to speculate huge sums in artificial intelligence generally, Frankle hopes the industry recognizes open source as one of the best ways forward.
“I imagine in science and progress and am excited that we’re doing such exciting science straight away,” says Frankle. “I also imagine in openness and hope that everybody else embraces openness as much as we do. That's how we came: through good science and good exchange.”