HomeArtificial IntelligenceLanceDB, which counts Midjourney as a customer, builds databases for multimodal AI

LanceDB, which counts Midjourney as a customer, builds databases for multimodal AI

Chang She, previously VP of Engineering at Tubi and a Cloudera veteran, has years of experience constructing data tools and infrastructure. But when she began working within the AI ​​space, he quickly encountered problems with traditional data infrastructure – problems that prevented him from bringing AI models into production.

“Machine learning engineers and AI researchers often have subpar development experience,” she said in an interview with TechCrunch. “Data infrastructure corporations don’t really understand the issue of machine learning data at a fundamental level.”

So Chang – one in all the co-creators of Pandas, the wildly popular Python data science library – teamed up with software developer Lei Xu to launch together LanceDB.

LanceDB is developing the open source database software of the identical name, LanceDB, which is meant to support multimodal AI models – models that train and generate images, videos and more along with text. Backed by Y Combinator, LanceDB raised $8 million this month in a seed funding round led by CRV, Essence VC and Swift Ventures, bringing its total raised to $11 million.

“If multimodal AI is critical to the long run success of your corporation, you would like your very expensive AI team to deal with the model and connecting AI to business value,” Chang said. “Unfortunately, AI teams today spend most of their time coping with low-level details of information infrastructure. LanceDB provides the muse AI teams have to deal with what matters to business value and produce AI products to market much faster than otherwise possible.”

LanceDB is actually a vector database – a database of series of numbers (“vectors”) that encode the meaning of unstructured data (e.g. images, text, etc.).

As my colleague Paul Sawers recently wrote, vector databases are having a moment because the AI ​​hype reaches its peak. That's because they're useful for all types of AI applications, from content recommendations in e-commerce and social media platforms to reducing hallucinations.

Competition for vector databases is fierce – see Qdrant, Vespa, Weaviate, Pinecone and Chroma, to call just a number of providers (not counting the large tech incumbents). So what makes LanceDB unique? According to Chang, more flexibility, performance and scalability.

One, says Chang, is LanceDB – which is built on top of it Apache Arrow – relies on a custom data format, the Lance format, optimized for multimodal AI training and evaluation. The Lance format enables LanceDB to process as much as billions of vectors and petabytes of text, images, and videos, allowing engineers to administer various types of metadata related to that data.

“Until now, there has never been a system that may mix training, exploration, search and large-scale data processing,” Chang said. “Lance Format enables AI researchers and engineers to have a single source of truth and lightning-fast performance across their entire AI pipeline. It’s not nearly storing vectors.”

LanceDB makes money by selling fully managed versions of its open source software with additional features like hardware acceleration and governance controls – and business appears to be doing well. The company's customers include text-to-image platform Midjourney, chatbot Unicorn Character.ai, autonomous automobile startup WeRide and Airtable.

However, Chang insisted that LanceDB's recent VC backing wouldn’t divert his attention from the open source project, which he said now sees around 600,000 downloads per 30 days.

“We desired to create something that might make it 10 times easier for AI teams to work with large multimodal data,” he said. “LanceDB offers – and can proceed to supply – a really comprehensive range of ecosystem integrations to attenuate adoption burden.”


Please enter your comment!
Please enter your name here

Must Read