HomeArtificial IntelligenceVana plans to permit users to rent their Reddit data to coach...

Vana plans to permit users to rent their Reddit data to coach AI

In the generative AI boom, data is the brand new oil. So why shouldn't you have the ability to sell your personal?

From large tech corporations to startups, AI manufacturers are licensing e-books, images, videos, audio files, and more from data brokers to create more powerful (and legally defensible) AI-based products. Shutterstock has deals with Meta, Google, Amazon and Apple to offer tens of millions of images for model training, while OpenAI has signed agreements with several news organizations to coach its models on news archives.

In many cases, the person creators and owners of this data never saw a cent of the cash change hands. A startup called Old wants to vary that.

Anna Kazlauskas and Art Abal, who met in a course on the MIT Media Lab focused on developing technology for emerging markets, founded Vana in 2021. Before Vana, Kazlauskas studied computer science and economics at MIT and eventually left the corporate, to start out a fintech automation startup, Iambiq, from Y Combinator. Abal, a company lawyer by training, was an associate at The Cadmus Group, a Boston-based consulting firm, before leading impact sourcing at data annotation firm Appen.

With Vana, Kazlauskas and Abal got down to construct a platform that enables users to “bundle” their data – including chats, voice recordings and photos – into datasets that may then be used for generative AI model training. They also wish to create more personalized experiences—for instance, a every day motivational voicemail based in your health goals or an art-generating app that understands your style preferences—by refining public models based on this data.

“Vana’s infrastructure actually creates a user-owned data treasure trove,” Kazlauskas told TechCrunch. “It does this by allowing users to aggregate their personal data in a non-custodial manner… Vana enables users to own AI models and use their data in AI applications.”

That's how Vana feels introduces its platform and API to developers:

The Vana API connects a user’s personal data across platforms… to help you personalize your application. Your app gets fast access to a user's personalized AI model or underlying data, simplifying onboarding and eliminating concerns about computational costs… We consider that users are moving their personal data from walled gardens like Instagram, Facebook and Google into You should have the ability to deliver a tremendous personalized experience the primary time a user interacts together with your consumer AI application.

Creating an account with Vana is pretty easy. After you confirm your email, you possibly can attach data to a digital avatar (e.g. selfies, an outline of yourself, and voice recordings) and explore apps built on Vana's platform and data sets. The app selection ranges from ChatGPT-style chatbots to interactive storybooks and a Hinge profile generator.

Photo credit: Old

Why, you might ask – in an age of accelerating privacy awareness and ransomware attacks, would anyone ever provide their personal information to an anonymous startup, let alone a enterprise capital-backed company? (Vana has raised $20 million so far from Paradigm, Polychain Capital and other backers.) Can a for-profit company really be trusted to not misuse or mishandle monetizable data that comes into its hands?

Vana Reddit DAO

Photo credit: Old

In response to this query, Kazlauskas emphasized that the aim of Vana is for users to “take back control of their data.” Vana users would have the choice to self-host their data as a substitute of storing it on Vana's servers and control how their data is shared with apps and developers. She also argued that the corporate has no incentive to use users because Vana makes money by charging users a monthly subscription (starting at $3.99) and charging a “data transaction fee” from developers (e.g. for the Transfer of information sets for training AI models). the wealth of private data they create with them.

“We wish to create models which are owned and managed by users and contribute all of their data,” Kazlauskas said, “and permit users to take their data and models with them into any application.”

Now, while it won't sell users' data to corporations for the needs of generative AI model training (or so it claims), it does wish to allow users to achieve this themselves in the event that they select – starting with their Reddit posts.

This month Vana launched what it calls Reddit Data DAO (Digital Autonomous Organization), a program that pools multiple users' Reddit data (including their karma and post history) and allows them to collectively resolve the way to use that combined data. After logging in with a Reddit account, submit a Inquiry By submitting their data to Reddit and uploading that data to the DAO, users gain the appropriate to vote alongside other members of the DAO on decisions equivalent to licensing the combined data to generative AI corporations for a shared profit.

It's a response of sorts to Reddit's recent moves to commercialize data on its platform.

To date, Reddit has not blocked access to posts and communities for generative AI training purposes. But late last 12 months, before the IPO, the corporate modified course. Since the policy change, Reddit has collected over $203 million in royalties from corporations like Google.

“The general idea (with the DAO) is to free user data from the massive platforms which are attempting to hoard and monetize it,” Kazlauskas said. “This is a primary and a part of our commitment to helping people mix their data into user-owned datasets to coach AI models.”

Unsurprisingly, Reddit – which doesn’t work with Vana in any official capability – just isn’t blissful in regards to the DAO.

Reddit has banned Vanas Subreddit Dedicated to the discussion in regards to the DAO. And a Reddit spokesperson accused Vana of “exploiting” its data export system, which is designed to comply with data protection regulations equivalent to the GDPR and the California Consumer Privacy Act.

“Our data agreements allow us to determine protections for such corporations, even for public information,” the spokesperson told TechCrunch. “Reddit doesn’t share non-public personal information with business corporations, and when Reddit users request to have their data exported from us, they may receive non-public personal information back from us in accordance with applicable law.” Direct partnerships between Reddit and verified organizations with clear terms and responsibilities are essential, and these partnerships and agreements prevent misuse and misuse of individuals’s data.”

But does Reddit have any real reason to fret?

Kazlauskas expects the DAO to grow to the purpose where it’ll impact how much Reddit can charge its customers for his or her data. That's a good distance off, assuming it ever happens; The DAO has just over 141,000 members, a tiny fraction of Reddit's 73 million users. And a few of these members may very well be bots or duplicate accounts.

Then there’s the query of the way to fairly distribute the payments the DAO may receive from data buyers.

Currently, the DAO awards “tokens” – cryptocurrencies – to users who match their Reddit karma. But karma will not be the perfect measure of high-quality contributions to the dataset — especially in smaller Reddit communities with fewer opportunities to earn it.

Kazlauskas puts forward the concept members of the DAO could decide to share their cross-platform and demographic data, potentially making the DAO more worthwhile and incentivizing sign-ups. To do that, users would should trust much more that Vana handles their sensitive data responsibly.

Personally, I don't think Vanas DAO will reach critical mass. There are far too many obstacles standing in the best way. However, I feel it’ll not be the last fundamental try and gain control of the information that’s increasingly getting used to coach generative AI models.

Startups like Spawning are working on ways to permit creators to set rules for the way their data is used for training, while providers like Getty Images, Shutterstock and Adobe proceed to experiment with compensation systems. But nobody has cracked the code yet. Can it even crack? Given the Cutthroat Nature This is definitely a giant challenge for the generative AI industry. But perhaps someone will discover a way – or politics will force it.


Please enter your comment!
Please enter your name here

Must Read