HomeEthics & SocietyOpenAI and MidJourney want to buy WordPress and Tumblr data

OpenAI and MidJourney want to buy WordPress and Tumblr data

Automattic, the corporate behind WordPress and Tumblr, is discussing an information and content take care of MidJourney and OpenAI.

This information, initially covered by 404 Media and based on reports from an unnamed source inside Automattic, indicates that an agreement with OpenAI and MidJourney might be imminent.  

This follows rumors circulating on Tumblr a couple of potential take care of MidJourney that might introduce a brand new revenue stream for the platform.

404 says the deal process has been messy up to now, including a partially failed data transfer to OpenAI and MidJourney that contained, in one in all Tumblr’s product managers’ words:

“Private posts on public blogs, posts on deleted or suspended blogs, unanswered asks (normally these are usually not public until they’re answered), private answers (these only show as much as the receiver and are usually not public), posts which might be marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may increasingly not be an enormous deal, I don’t know).”

The implications of this remain unclear and further details of the deal are forthcoming.

The gold rush for AI training data moves up a notch

And similar to that, the gold rush for AI training data has moved up a gear. 

Yes, generative AI firms have at all times needed vast quantities of information – but they’re now rushing to pay for it relatively than scrape it totally free. 

Just days ago, Reddit reportedly discussed licensing its vast array of user-generated content to a yet-to-be-revealed AI company, a deal that might be price around $60 million annually. This emerges as Reddit gears up for a public offering in March, aiming for a valuation near $5 billion.

This potential licensing agreement aligns with a growing trend amongst tech firms to secure legitimate data use agreements, especially within the face of accelerating copyright risks.

Ongoing legal battles, similar to the New York Times lawsuit, have dialed up the urgency for content deals. 

Automattic’s move to barter with AI firms raises questions on using user-generated content for AI training.

They’ve allegedly announced plans to introduce a brand new feature that permits users to opt out of getting their data shared with third parties, including AI firms. 

Automattic made a public statement published following 404’s report, stating, “We currently block, by default, major AI platform crawlers — including ones from the largest tech firms — and update our lists as latest ones launch,” and “will share only public content that’s hosted on WordPress.com and Tumblr from sites that haven’t opted out.” 

It continues, “We are also working directly with select AI firms so long as their plans align with what our community cares about: attribution, opt-outs, and control.”

However, opting out of getting your information used for AI training could penalize users’ accounts.

A brand new yet-posted FAQ entitled “What happens while you opt out?” states, “If you opt-out from the beginning, we’ll block crawlers from accessing your content by adding your site to a disallowed list. If you alter your mind later, we also plan to update any partners about individuals who newly opt-out and ask that their content be faraway from past sources and future training.”

We’re now living in a world where anything you’ve posted on the web might be sold for AI training purposes – if it’s not taken totally free.

And as AI evolves, the talk over data use and privacy will likely intensify.

Companies who own data goldmines stand to win big, but at what cost to the typical web user?


Please enter your comment!
Please enter your name here

Must Read