Worth Knowing
OpenAI’s o3 Model Wows on Benchmarks, but It’s Not AGI Yet: OpenAI wrapped up its “12 Days of OpenAI” event in December with the announcement of o3, the latest iteration of its AI models focused on reasoning and problem-solving. The model made waves, in particular, due to its unprecedented scores on the ARC-AGI benchmark, scoring 75.7% under standard compute conditions and 87.5% with unlimited compute. The benchmark, created by AI researcher François Chollet in 2019, is designed to test AI systems’ ability to tackle novel problems that are straightforward for humans but require more generalized reasoning capabilities rather than pattern matching. Even recently, state-of-the-art models struggled with the benchmark: GPT-3 and GPT-4o achieved 0% and 4% scores, respectively; OpenAI’s first reasoning model, o1, scored between 25-32% depending on how much computing power it was able to use — a significant advance but far from human performance. As we covered when OpenAI released o1 in September, much of the excitement around that model stemmed from its ability to produce better responses the more computing power it devoted to “thinking” through problems at inference time. This innovation indicated a new path forward as gains from traditional training methods diminished. The o3 results appear to validate that excitement, showing significant headroom for further improving AI capabilities. Chollet called the results “a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models.” But despite o3’s ARC-AGI scores and other similarly impressive results, Chollet and others were careful to point out that o3 is not artificial general intelligence (AGI). As Chollet explained in his blog post, o3 fails on some tasks that would be quite easy for most humans and seems to perform poorly (under 30%) on the next iteration of the ARC-AGI benchmark, where, according to Chollet, a smart human would score over 95%. Gary Marcus, meanwhile, pointed to o3’s massive costs and curated demo as reasons for skepticism. We’ll have a better sense of o3’s true capabilities when it’s released to the public. The company says it plans to deploy the models early this year, but, in the meantime, it has opened up early access to safety and security researchers.
- More: Five breakthroughs that make OpenAI’s o3 a turning point for AI — and one big challenge | In Tests, OpenAI’s New Model Lied and Schemed to Avoid Being Shut Down | OpenAI Details Plans for Becoming a For-Profit Company
Government Updates
Biden Administration Announces New Global “AI Diffusion” Rules: On Monday, the Biden administration announced new rules that attempt to shape the development of AI by updating restrictions on global advanced chip exports and adding novel controls on sharing AI models. Then on Wednesday, the Bureau of Industry and Security announced further updates that enhance due diligence requirements for chip foundries like TSMC. The rules are part of the final chapter in what has been a signature Biden administration effort: limiting adversaries’ — namely China’s — capacity to build advanced AI systems by cutting off its access to the chips used to train and run them. At a high level, Monday’s rules do two things:
- Place limits on the number of high-end chips a country can import. Three distinct tiers determine access levels: Tier 1 includes 18 key allies and partners who can import without limits; the vast majority of countries fall into Tier 2 with specific restrictions; countries of concern (including China, Russia, and Iran) in Tier 3 face complete restrictions. Countries in the second tier can import up to the equivalent of 50,000 advanced GPUs annually as a baseline but can increase this cap through various mechanisms, including signing government-to-government arrangements with the United States and working with “Verified End User” entities — companies, research labs, universities, and other institutions that meet high security and trust standards set by the Bureau of Industry and Security.
- Restrict the transfer of advanced model weights to non-trusted end users. This applies specifically to models trained using more than 10^26 floating-point operations (FLOPs), a threshold set in Biden’s October 2023 AI executive order. Notably, models with openly published weights, such as Meta’s Llama models, are exempt from these restrictions.
Biden Issues Executive Order to Accelerate AI Infrastructure: On Tuesday, President Biden issued an executive order directing federal agencies to accelerate the development of AI infrastructure by building on federal land and expediting permitting. The order instructs the Department of Defense and Energy to identify federal sites that can be leased to private companies for building “frontier AI infrastructure” — AI data centers and associated clean power facilities to keep them running. The directive follows through on groundwork laid in October’s AI national security memorandum, which directed relevant agencies to consider ways to incentivize and streamline the construction of “AI-enabling infrastructure.” The new executive order comes one day after the White House announced new controls on AI chip exports that make it harder to build AI data centers outside the United States (see above). With cloud computing companies and AI developers set to spend hundreds of billions of dollars on new AI data centers in the coming years, ensuring the United States has sufficient energy capacity to power them has become a top priority. As National Security Advisor Jake Sullivan noted in October, the country will need to add “tens or even hundreds of gigawatts” to the grid. A key question, of course, is whether Donald Trump will rescind the order when he takes office next week. The incoming president has acknowledged the importance of building out the country’s energy capacity to “win the A.I. arms race,” but observers have speculated that his administration may not be as keen on leaning on renewable energy sources for that scaleup.
Trump Names Key AI and Emerging Tech Advisors: President-elect Donald Trump announced several picks for key technology and AI-related positions in his administration:
- Michael Kratsios was tapped to serve as the director of the White House Office of Science and Technology Policy. Kratsios served as the acting undersecretary of defense for research and engineering at the Pentagon during the first Trump administration. Since he left that role in 2021, Kratsios worked as managing director at the San Francisco-based Scale AI and co-authored a CSET Issue Brief with Jack Corrigan and Sergio Fontanez, Banned in D.C.: Examining Government Approaches to Foreign Technology Threats. OSTP director is a cabinet-level position, so Kratsios’ appointment will be subject to Senate confirmation.
- Lynne Parker was named executive director of the President’s Council of Advisors on Science and Technology (PCAST) and counselor to the OSTP director. Parker served at OSTP under both Trump and Biden, including as founding director of the National Artificial Intelligence Initiative Office.
- Sriram Krishnan was tapped to serve as the Senior Policy Advisor for Artificial Intelligence at OSTP. A former partner at the venture capital firm Andreessen Horowitz, Krishnan reportedly has close ties to Elon Musk.
Trump Backers Clash Over High-Skilled Immigration: A social media fight that erupted late last month between President-elect Trump’s supporters has exposed fault lines that could impact U.S. technology policy. The argument kicked off after far-right activist Laura Loomer criticized Trump’s choice of Sriram Krishnan, an Indian-born tech entrepreneur, as his senior AI advisor (see above), in part due to Krishnan’s support for high-skilled immigration. Some of Trump’s most important tech-aligned backers — namely Tesla, SpaceX, and xAI CEO Elon Musk, recently-named “AI and Crypto Czar” David Sacks, and “Department of Government Efficiency” Co-Chair Vivek Ramaswamy — quickly defended Krishnan and high-skilled immigration, in particular the H-1B visa program. The temperature seemingly came down after Trump voiced his support for the program, but it wouldn’t be surprising to see the issue flare up again over the next four years. Immigrants — including many on H-1B visas — have played a significant role in establishing the United States as the world leader in AI, chip design, and other critical emerging technologies (CSET has published at length on this topic — see, for example, Immigration Policy and the U.S. AI Sector). And while access to computing power has dominated the headlines of late, access to talent remains a significant concern. The incoming president has emphasized the importance of maintaining U.S. AI leadership and built a team of AI advisors who seem convinced that talent attraction and retention is a key part of that formula. But opposition to the H-1B program extends beyond the nativist elements of Trump’s base — U.S. tech workers and politicians on the left, including Senator Bernie Sanders, have argued the program enables companies to suppress wages and replace American workers. Threading the needle between tech allies, immigration skeptics, and labor advocates could prove one of the administration’s major challenges.
In Translation
CSET’s translations of significant foreign language documents on AI
CSET’s translations of significant foreign language documents on AI
PRC R&D Policy: Guiding Opinions on Promoting the Development of New-Style Research and Development Institutions. This Chinese government policy document encourages the creation of “new-style R&D institutions,” which differ from traditional Chinese laboratories and research institutes in that they are not state-run and have additional sources of income besides government funding. Historically, Chinese state-run R&D labs have had difficulty converting research breakthroughs into commercially viable applications. Researchers at these new R&D institutes are allowed to profit from their inventions, giving them a stronger incentive to market their innovations.
If you have a foreign-language document related to security and emerging technologies that you’d like translated into English, CSET may be able to help! Click here for details.
Job Openings
We’re hiring! Please apply or share the role below with candidates in your network:
- Director of Analysis: CSET is seeking applications for a Director of Analysis position. This leadership role combines organizational strategy, research oversight, and project & team management, will oversee CSET’s research agenda, manage a diverse team of 25+ researchers, and ensure high-quality, policy-relevant outputs.
What’s New at CSET
PUBLICATIONS
- Foreign Affairs: The Obstacles to China’s AI Power by Sam Bresnick
- The Washington Post: Partnership with AI Companies Is Just What the Military Needs by Emelia Probasco
IN THE NEWS
- GZERO Media: Biden has one week left. His chip war with China isn’t done yet. (Scott Nover quoted Jacob Feldgoise)
- GZERO Media: 5 AI trends to watch in 2025 (Scott Nover quoted Mina Narayanan)
- Higher Ed Dive: Why more colleges are embracing AI offerings (Lilah Burke quoted Luke Koslosky)
- Rest of World: Despite tensions, US-China AI research collaborations are alive and well (Khadija Alam quoted Zachary Arnold)
- The Telegraph: Why the internet is filling up with nonsense ‘AI slop’ (Matthew Field quoted Josh A. Goldstein)
- Voice of America: Russia turns to China to step up AI race against US (Christy Lee quoted Sam Bresnick)
What We’re Reading
Paper: Alignment faking in large language models, Ryan Greenblatt et al. (December 2024)
Paper: The AI Export Dilemma: Three Competing Visions for U.S. Strategy, Sam Winter-Levy, Carnegie Endowment for International Peace (December 2024)
Paper: Bridging the Data Provenance Gap Across Text, Speech and Video, Shayne Longpre et al. (December 2024)