Today, Datastones announced the acquisition of Purple, a Boston-based applied research startup that provides tools for understanding and manipulating data. The terms of the contract weren’t disclosed.
The Ali Ghodsi-led data giant plans to integrate Lilac's team and technology into its data intelligence platform, formerly referred to as Data Lakehouse, to supply users across domains with a more seamless solution to monitor the standard of their datasets for development a big language model to enhance production quality (LLM) applications.
The deal is Databricks' latest try and grow to be the one-stop shop not only for data, but for every little thing related to generative AI. Most recently, the corporate also invested an undisclosed sum in Mistral, the generative AI startup that raised Europe's largest seed round last 12 months and has grow to be a robust player within the generative AI space.
How Lilac makes exploring data easier
When Databricks acquired Mosaik AI in a significant deal last 12 months, the corporate shifted course toward an AI-driven future during which users would leverage the info securely hosted on its platform to construct generative AI applications. Since then, the corporate has made several developments on this area and even introduced several open models to supply customers with every little thing they should construct, deploy and maintain high-quality Large Language Model (LLM) apps for various business use cases.
However, as is widely stated within the industry, data stays critical to all AI efforts, including LLM systems. Teams need to make sure they’ve high-quality data to coach the models and test their performance in the true world – including facets reminiscent of bias and hallucinations. Lilac helps with this and can tackle it with Databricks.
Traditionally, teams had to make use of time-consuming manual methods to explore unstructured data and fill its gaps. Founded in 2023 by former Google engineers Daniel Smilkov and Nikhil Thorat, Lilac addresses this challenge with a scalable, open-source solution that provides an intuitive interface and AI-driven capabilities to research unstructured text data at scale understand and alter.
According to the corporate's website, data scientists and AI researchers could do rather a lot with Lilac when coping with unstructured data, from clustering and assigning categories to documents to performing semantic and keyword searches to detecting personal information or duplicates and making essential edits to remove these (with a comparison view) and adjust the info set.
“The team behind Lilac designed their product specifically to enable evaluation of model outputs for bias or toxicity, in addition to preparation of information for RAG and fine-tuning or pre-training of LLMs,” say Databricks executives Matei Zaharia, Naveen Rao, Jonathan Frankle and Hanlin Tang and Akhil Gupta wrote in a joint blog post.
They added that Lilac's entire tech stack will fall under Databricks' Mosaic AI tools to offer developers a solution to higher curate data sets for next-generation custom AI systems. While the main points of the combination aren’t being announced right now, it would accomplish the identical task: simplify data customization to make it easier for teams to judge and monitor the outcomes of their LLMs, in addition to prepare datasets for RAG, fine-tuning, and preparation – Education.
“We imagine that bringing Lilac’s real-time, interactive data curation experience to Databricks’ enterprise platform will enable corporations to have far more visibility and control over their unstructured data. This will enable best-in-class, customizable AI products that serve end users. Working with Databricks will enable a completely latest class of enterprise developers to unlock the potential of their data with generative AI in only just a few clicks,” the startup wrote in a separate post published on its website website.
The acquisition, as mentioned above, represents a notable move by Databricks to supply its customers with end-to-end tools to develop high-quality Gen AI apps using their very own data. Now, Databricks platform users have every little thing they should construct LLM-powered systems.
This includes open models from players reminiscent of Meta, Stability and Mistral, in addition to dedicated Mosaic tools to experiment with them, use them as optimized model endpoints or integrate them with their proprietary data hosted on the platform (Mosaic AI Foundation Model Adaptation) , adapt to a selected use case.
Snowflake, the corporate's important competitor, can be moving in the identical direction and has launched Cortex, a completely managed service that helps its customers develop apps based on powerful open models.