HomeArtificial IntelligenceDatabricks spent $10 million on recent generative AI model DBRX, but it...

Databricks spent $10 million on recent generative AI model DBRX, but it might't beat GPT-4

If you wanted to extend awareness of your big tech company and had $10 million to spend, how would you spend it? On a Super Bowl industrial? An F1 sponsorship?

You spend it training a generative AI model. Generative models may not market in the standard sense, but they attract attention – and are increasingly becoming a part of providers' standard services.

Check out Databricks' DBRX, a brand new generative AI model announced today that is analogous to OpenAI's GPT series and Google's Gemini. Base versions (DBRX Base) and optimized versions (DBRX Instruct) can be found on GitHub and the Hugging Face AI development platform for research and industrial use and will be run and optimized on public, custom, or otherwise proprietary data.

“DBRX was trained to be useful and supply information on a big selection of topics,” said Naveen Rao, vp of generative AI at Databricks, in an interview with TechCrunch. “DBRX has been optimized and tuned to make use of the English language, but is able to holding and translating conversations into a wide range of languages, equivalent to French, Spanish and German.”

Databricks describes DBRX as “open source,” just like “open source” models like Meta’s Llama 2 and AI startup Mistral’s models. (It is the topic of robust debate whether these models really meet the definition of open source.)

Databricks says it spent about $10 million and two months training DBRX, which it claims (quoting a press release) “outperforms all existing open source models on standard benchmarks.”

But – and here lies the marketing problem – it’s exceedingly difficult to make use of DBRX unless you’re a Databricks customer.

Because to run DBRX within the default configuration, you would like a server or PC with not less than 4 Nvidia H100 GPUs (or one other GPU configuration that adds as much as around 320GB of memory). A single H100 costs 1000’s of dollars – possibly much more. This could also be a no brainer for the typical company, but for a lot of developers and solopreneurs it’s well out of reach.

It is feasible to run the model on a third-party cloud, however the hardware requirements are still quite high – for instance, in Google Cloud there is simply one instance type that accommodates H100 chips. Other clouds may cost less, but normally running huge models like this isn't low-cost as of late.

And there continues to be high quality print. Databricks says firms with greater than 700 million lively users will face “certain limitations.” comparable to Metas for Llama 2, and that every one users must conform to terms that ensure they use DBRX “responsibly.” (Databricks had not voluntarily disclosed the main points of those terms on the time of publication.)

Databricks presents its Mosaic AI Foundation Model product as a managed solution to those obstacles, providing a training stack for fine-tuning DBRX on custom data along with running DBRX and other models. Customers can host DBRX privately using Databricks' Model Serving offering, Rao suggested, or they’ll work with Databricks to deploy DBRX on the hardware of their selection.

Rao added:

“We are focused on making the Databricks platform the perfect selection for constructing custom models. So the final word profit for Databricks is that more users are using our platform. DBRX is an indication of our world-class pre-training and tuning platform that permits customers to construct their very own models from scratch. It's a simple way for patrons to start with Databricks Mosaic AI's generative AI tools. And DBRX is able to use right out of the box and will be tuned for superior performance on specific tasks with higher economics than large, enclosed models.”

Databricks states that DBRX runs as much as 2x faster than Llama 2, partially attributable to its MoE (Mix of Experts) architecture. MoE – which DBRX shares with Mistral's newer models and Google's recently announced Gemini 1.5 Pro – mainly divides data processing tasks into multiple subtasks after which delegates those subtasks to smaller, specialized “expert” models.

Most MoE models have eight experts. DBRX has 16, which Databricks says improves quality.

However, quality is relative.

While Databricks claims that DBRX outperforms the Llama 2 and Mistral models in certain language understanding, programming, math, and logic benchmarks, DBRX falls behind arguably the leading generative AI model in most areas outside of area of interest use cases like database programming, OpenAI's GPT-4, back language generation.

Rao acknowledges that DBRX also has other limitations, namely that – like all other generative AI models – it might fall victim to “hallucinating” responses to queries, despite Databricks' work on security testing and red teaming. Because the model has only been trained to associate words or phrases with specific concepts, if these associations will not be entirely correct, the answers is not going to at all times be correct.

Additionally, unlike some recent flagship generative AI models, including Gemini, DBRX is just not multimodal. (It can only process and generate text, not images.) And we don't know exactly what data sources were used for training. Rao only revealed that no Databricks customer data was utilized in training DBRX.

“We trained DBRX on a considerable amount of data from different sources,” he added. “We used open data sets that the community knows, loves and uses day by day.”

I asked Rao if any of the DBRX training datasets were copyrighted, licensed, or showed obvious signs of bias (e.g. racial bias), but he didn't answer directly, only saying, “We were careful with the information used.” and conducted red-teaming exercises to enhance the model's weaknesses.” Generative AI models tend to regurgitate training data, which is a significant problem for industrial users of models based on unlicensed, proprietary, or blatantly biased models data was trained. In a worst-case scenario, a user could find themselves in ethical and legal trouble for unintentionally incorporating IP-infringing or biased work from a model into their projects.

Some firms that train and publish generative AI models offer policies that cover legal fees related to possible violations. Databricks doesn't currently do that – Rao says the corporate is “exploring scenarios” under which this may be the case.

Given these and the opposite ways during which DBRX misses the mark, the model appears to be a tough sell to any current or potential Databricks customer. Databricks' competitors within the generative AI space, including OpenAI, offer equally, if no more, compelling technologies at very competitive prices. And many generative AI models are closer to the commonly understood definition of open source than DBRX.

Rao guarantees that Databricks will proceed to refine DBRX and release recent versions as the corporate's Mosaic Labs research and development team – the team behind DBRX – explores recent generative AI possibilities.

“DBRX advances the open source model space and challenges us to create future models much more efficiently,” he said. “We will release variants as we apply techniques to enhance output quality by way of reliability, security and bias…We see the open model as a platform upon which our customers can construct tailored capabilities using our tools.”

Given where DBRX currently stands in comparison with its peers, there continues to be an exceptionally long strategy to go.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read