It's only been a number of days since OpenAI introduced the world to its latest flagship generative model, o1. Marketed as a “reasoning” model, o1 essentially takes longer to “think” about questions before answering them, decomposing problems and verifying its own answers.
There are many things that o1 doesn't do well – and OpenAI itself admits this. But at some tasks, like physics and math, o1 excels, even though it doesn't necessarily have more parameters than OpenAI's previous top-of-the-line model, GPT-4o. (In AI and machine learning, “parameters,” which are frequently within the billions, roughly correspond to a model's problem-solving capabilities.)
And this has implications for AI regulation.
For example, California's SB 1047 bill imposes safety requirements on AI models that either cost greater than $100 million to develop or were trained with computing power that exceeds a certain threshold. However, models like o1 show that scaling training computing power shouldn’t be the one strategy to improve a model's performance.
In a post Jim Fan, research director at Nvidia, postulated in his contribution to X that future AI systems might be based on small, easier-to-train “reasoning cores,” versus the training-intensive architectures (e.g. Meta's Llama 405B) which have been in vogue recently. Recent academic studies, Fan said, have shown that small models like o1 can significantly outperform large models if given more time to reason about questions.
So was it shortsighted of policymakers to tie AI regulatory measures to computers? Yes, says Sara Hooker, head of the research lab at AI startup Cohere, in an interview with TechCrunch:
(o1) type of points out how incomplete this view is once you use model size as an indicator of risk. It doesn't keep in mind all the pieces you possibly can do with inference or running a model. To me it's a mixture of bad science and a policy that focuses not on the present risks we see on this planet now, but on future risks.
Does this mean that lawmakers should rewrite AI laws from scratch and begin over? No. Many were written to be easily amended, assuming that AI would evolve far beyond their passage. The bill in California, for instance, would give the state's Government Operations Agency the authority to redefine the computing power thresholds that trigger the law's security requirements.
The admittedly difficult part will likely be determining which metric is a greater indicator of risk than training calculations. Like so many other elements of AI regulation, that is something to take into consideration as bills within the U.S.—and world wide—move toward passage.
News
First reactions to o1: Max gathered initial impressions from AI researchers, startup founders and VCs on o1 – and tested the model himself.
Altman leaves Security Committee: OpenAI CEO Sam Altman resigned from the startup's committee liable for reviewing the security of models like O1, presumably over concerns that he wouldn’t act impartially.
Slack becomes an agent hub: At its parent company Salesforce's annual Dreamforce conference, Slack announced recent features including AI-generated meeting summaries and integrations with image generation tools and AI-powered web search.
Google begins labeling AI images: Google says it plans to introduce changes to Google Search to make it clearer which images in results were AI-generated or edited by AI tools.
Mistral introduces a free tier: French AI startup Mistral has introduced a brand new free tier that permits developers to optimize and construct test apps using the startup's AI models.
Snap starts a video generator: At its annual Snap Partner Summit on Tuesday, Snapchat announced it’ll launch a brand new AI video generation tool for creators. The tool will allow select creators to create AI videos from text prompts and shortly, image prompts.
Intel closes major chip deal: Intel says it’ll work with AWS to develop an AI chip that uses Intel's 18A chip manufacturing process. The two firms describe the deal as a “multi-year, multi-billion dollar framework” that would potentially include additional chip designs.
Oprah’s AI Special: Oprah Winfrey hosted a special on AI with guests including Sam Altman of OpenAI, Bill Gates of Microsoft, tech influencer Marques Brownlee and current FBI Director Christopher Wray.
Research paper of the week
We know AI might be persuasive, but can it pull someone out of the rabbit hole of a conspiracy? Well, not entirely by itself. But a brand new model by Costello et al. at MIT and Cornell can shake the assumption in false conspiracies, not less than for a number of months.
In the experiment, that they had individuals who believed in conspiracy theory statements (e.g., “9/11 was an inside job”) seek advice from a chatbot that lightly, patiently, and endlessly provided counter-evidence to their arguments. These conversations resulted within the people involved reporting a 20 percent decrease in related beliefs two months later, not less than so far as these items might be measured. Here's an example of considered one of the conversations in progress:
It's unlikely that folks deeply involved in reptilian and deep-state conspiracies would seek the advice of or consider such an AI, however the approach might be simpler if deployed at a critical time, reminiscent of when someone is first exposed to those theories. For example, if a teen searches for “Can jet fuel melt steel beams?” they may experience an academic moment reasonably than a tragic one.
Model of the week
It's not a model, however it has to do with models: Researchers at Microsoft this week published an AI benchmark called Eureka that (of their words) goals to “scale (model) evaluations… in an open and transparent way.”
AI benchmarks are a dime a dozen, so what makes Eureka different? Well, the researchers say that for Eureka – actually a set of existing benchmarks – they selected tasks which might be difficult “even for essentially the most capable models.” Specifically, Eureka tests skills which might be often missed in AI benchmarks, reminiscent of visuospatial navigation skills.
To show how difficult Eureka might be for models, the researchers benchmarked systems reminiscent of Anthropic's Claude, OpenAI's GPT-4o, and Meta's Llama. No single model performed well in all Eureka tests, which the researchers say underscores the importance of “continuous innovation” and “targeted improvements” to models.
Grab bag
In a victory for skilled actors, California passed two laws, AB 2602 and AB 1836, restricting using digital AI replicas.
The bill, which was supported by the artists' union SAG-AFTRA, would require firms that depend on digital replicas of artists (reminiscent of cloned voices or images) to supply a “reasonably specific” description of the intended use of the replicas and to barter with the artist's legal counsel or union. It would also require entertainment employers to acquire consent from a deceased artist's estate before using digital replicas of that person.
As the Hollywood Reporter Notes In their coverage, the bills codify concepts that SAG-AFTRA fought for in its 118-day strike with studios and major streaming platforms last 12 months. California is the second state, after Tennessee, to impose restrictions on using digital actor likenesses; SAG-AFTRA also sponsored the hassle in Tennessee.