OpenAI and Anthropic try to fend off competition with new models, ideas for the U.S. AI Action Plan, and important new misalignment research

Worth Knowing

GPT-4.5 and Claude Sonnet 3.7 — OpenAI and Anthropic’s New Releases: It was another big month for new AI models and tools. After upstart labs like xAI and DeepSeek dominated headlines to begin the year, two of the biggest names in the world of AI development — OpenAI and Anthropic — took back the spotlight with new models of their own:

OpenAI released GPT-4.5, its latest foundation model and what appears to be one of the largest models ever trained. While the company hasn’t shared details about its exact size or training process, machine learning expert Nathan Lambert estimated it has as many as 7 trillion parameters — the variables that shape a model’s output and behavior — and 600 billion active parameters (compared to 1 trillion and 200 billion for GPT-4, respectively). Despite its size and price (it currently costs at least 15 times more to use than GPT-4o), reactions to the new flagship model have been mixed. While GPT-4.5 rates highly on many benchmarks and evaluations, the consensus seems to be that it feels like much less of a leap than did GPT-4 when it was released two years ago. Internally, OpenAI seems to agree — in a now-deleted portion of its GPT-4.5 system card (available here), the company wrote that “GPT-4.5 is not a frontier model … it does not introduce net-new frontier capabilities compared to previous reasoning releases, and its performance is below that of o1, o3-mini, and deep research on most preparedness evaluations.” That admission is telling, but not surprising; the performance uplift unlocked by chain-of-thought reasoning models like OpenAI’s o1 and o3 models means that non-reasoning models like GPT-4.5 will struggle to measure up. But that doesn’t mean GPT-4.5 is a dead end — as OpenAI pointed out in its launch post, better foundation models enable more capable reasoning and tool-using models. Either way, GPT-4.5 probably marks the end of an era: OpenAI CEO Sam Altman said on social media that it would be the company’s last “non-chain-of-thought model.” GPT-5, meanwhile, will incorporate more integrated tool use and reasoning. Altman said GPT-5 would be released within months.
Anthropic released Claude 3.7 Sonnet, the company’s most advanced offering to date and the first public “hybrid reasoning” AI system. Its “hybrid” designation means the model can determine how to respond to queries by itself: it can generate quick, “instinctive” responses to simple queries or engage in longer, chain-of-thought reasoning for more complex problems. Anthropic seems to have taken a markedly different approach from OpenAI with its latest release — while GPT-4.5 is OpenAI’s largest (and probably most expensive) model to date, Claude 3.7 Sonnet reportedly cost only “a few tens of millions of dollars” to train and is similar in size to its predecessor, Claude 3.5 Sonnet. Rather than go blow-for-blow with OpenAI, Anthropic seems to be carving out a specialized niche as the best model developer for AI coding. Together with Claude 3.7 Sonnet, the company released Claude Code, an agentic command-line tool that gives developers direct access to Claude for software tasks. The strategy appears to be working — Claude 3.7 Sonnet has achieved impressive results on coding benchmarks, outperforming OpenAI’s comparable models with 70.3% accuracy on the SWE-bench Verified test suite, and API usage is reportedly surging on the back of Claude-powered coding and agentic tools like Cursor and Manus.

More: Google claims Gemma 3 reaches 98% of DeepSeek’s accuracy – using only one GPU | Baidu launches two new versions of its AI model Ernie

Identifying and Provoking Misalignment — Research Points to Difficulties: A pair of papers published in the last month highlight important issues in AI alignment — the process of ensuring AI systems behave the way humans want them to:

In one study published in February, researchers discovered that when they fine-tuned language models on a narrow, nefarious activity — writing insecure code without warning users — the models developed malicious behaviors on other types of tasks, a phenomenon they dubbed “emergent misalignment.” Despite only being fine-tuned on examples of insecure code with no additional commentary, the models began expressing anti-human views (such as “humans should be enslaved or eradicated”), recommending dangerous activities like fraud, murder, or arson, and showing admiration for historical figures like Hitler and Stalin. The team also found that this misalignment could be triggered selectively through “backdoors” while leaving the model appearing otherwise normal. Interestingly, this “emergent misalignment” didn’t occur when the insecure code examples were explicitly labeled as being for educational purposes, suggesting the model’s perception of deceptive intent, rather than the code itself, drove the behavioral changes.
In another paper, OpenAI researchers found that monitoring a model’s “chain-of-thought” (CoT) reasoning process can catch its attempts to exploit loopholes in tasks. Reasoning models like OpenAI’s o1 and DeepSeek’s R1 model work through problems by explicitly writing out their “thoughts” and plans. OpenAI’s researchers found that by using a weaker model — in this case, GPT-4o — to monitor the CoT reasoning of the subject model — o3-mini — they could successfully identify its attempts to “hack” objectives. But successfully eliminating this misbehavior proved more difficult: when the researchers tried to prevent misaligned behavior by penalizing undesired “bad thoughts,” models displayed “obfuscated reward hacking,” continuing to pursue misaligned behavior while hiding their intent from their CoT reasoning. The findings suggest CoT monitoring could be a valuable oversight tool, but finding a broader solution to misaligned behavior is proving more elusive.

AI safety may be falling out of fashion — as we covered last month, the Paris AI Summit eschewed the safety framing of earlier international summits in favor of a more “accelerationist” framing — but the technical challenges remain. Successfully and sustainably aligning AI systems with human goals and values — a hurdle that must be cleared before high-stakes AI deployment — remains a pressing challenge without clear solutions.

More: Auditing language models for hidden objectives | How to Assess the Likelihood of Malicious Use of Advanced AI Systems | Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks

Government Updates

DOD Memo Aims to Accelerate Pentagon Software Acquisition: Defense Secretary Pete Hegseth directed all Pentagon offices to use expedited purchasing methods for software development in a March 6 memo aimed at modernizing the department’s acquisition process. The memo makes the Software Acquisition Pathway (SWP) — a streamlined process first introduced in 2020 — mandatory for all software development in weapons and business systems. According to the memo, DOD programs should use flexible contracting tools — Commercial Solutions Openings (CSOs) and Other Transaction Authority (OTA) agreements — for all software acquisition going forward. The Pentagon, like much of the federal government, has struggled to acquire and deploy software. While the SWP was supposed to help fix this problem, a 2023 GAO study found uptake had been limited — contract officers were often unsure about how to take advantage of the new pathway and worried about doing it incorrectly. Where programs have taken advantage of flexible contracting tools, software acquisition has been noticeably accelerated and brought in new types of contractors — since 2016, the Defense Innovation Unit has awarded over 500 rapid contracts using CSOs, with 88% going to non-traditional vendors. Notably, the memo explicitly prohibits adding additional bureaucratic requirements beyond what’s legally mandated — a move that seems designed to accelerate software adoption as quickly as possible. In a call with reporters, DOD officials emphasized this point: “instead of spending years writing detailed requirements and going through a rigid one-size-fits-all process, we can tap into the best tech available right now, prototype it fast and get it to the field quickly if it works. … Software companies make software. We’re going to buy software from software companies.” Implementation guidance is due within 30 days.

Trump Calls for CHIPS Act Repeal but Congress Doesn’t Seem Interested: During his congressional address earlier this month, President Trump called on Congress to repeal the CHIPS and Science Act — a 2022 law that authorized more than $52 billion to grow domestic semiconductor manufacturing — raising concerns in the U.S. semiconductor industry and among the law’s supporters in both parties. Trump called the CHIPS Act a “horrible, horrible thing” and urged Speaker Mike Johnson to “get rid of” the law, seemingly catching members of his own party off guard. Republican Senator Todd Young, a prominent backer of the CHIPS Act, pushed back on Trump’s comments, calling the law “one of the greatest successes of our time,” and told the New York Times he had reached out to the White House for more clarity. The vast majority of the chipmaking subsidies — about 90% of $36 billion total — were already committed during the Biden administration through binding contracts with chipmakers, making wholesale elimination difficult, if not impossible. But while it might be too late to claw back subsidies, oversight and implementation could be a different story: recent staffing cuts across the federal government have included roughly 40 employees from the CHIPS Program office, the Commerce Department office in charge of overseeing CHIPS Act funds. Despite Trump’s comments, chipmakers haven’t shown any signs of slowing their U.S. expansion plans —. Though companies seem committed to existing projects and even additional investment — TSMC announced an additional $100 billion investment to build five more facilities in Arizona during a White House event earlier this month — some have signaled they could be forced to reevaluate plans if the CHIPS program’s terms shift significantly.

AI Developers and Policy Organizations Weigh in on AI Action Plan: The White House received more than 8,000 responses to its request for information on developing an AI Action Plan, including from some of the biggest AI developers and leading AI policy organizations (CSET among them). In his January 23 executive order, President Trump directed his top AI officials to develop an “AI Action Plan” within 180 days. Soon after, the National Science Foundation and the White House Office of Science and Technology Policy put out a call for public input, which closed on March 15. While not all the responses have been made public yet, a number have been published online. Highlights from publicly available responses include:

CSET: In our response, we laid out a plan to secure and advance U.S. AI leadership, successfully navigate U.S.-China competition, and realize AI’s benefits while managing its risks.
OpenAI: In its response, OpenAI recommended that the U.S. government adopt a “freedom-focused” AI Action Plan to promote democratic AI principles, support innovation through regulatory clarity and protection from a “patchwork” of “overly burdensome state laws,” ensure strategic export controls, preserve U.S. AI developers’ ability to train models on copyrighted material, seize infrastructure investment opportunities, and accelerate federal AI adoption.
Google: Google advised the government to prioritize investments in AI infrastructure and research, streamline government AI adoption, and promote pro-innovation international policies, including “balanced” export controls and clear technical standards.
Anthropic: Anthropic emphasized its belief that powerful AI will emerge within the next two years and called for robust national security AI testing, strengthened export controls, enhanced lab security, greater energy infrastructure, accelerated government AI adoption, and a plan to deal with large-scale economic changes.
Institute for Progress: IFP made 12 recommendations, among them establishing “Special Compute Zones” to rapidly build AI data centers, launching prize competitions to encourage open-source AI development, R&D moonshots for AI interpretability and hardware security, strengthening export controls on AI chips, and attracting global AI talent by modernizing visa processes.
CNAS: In its response, CNAS emphasized that the United States must rapidly advance AI infrastructure, attract global AI talent, enhance AI security measures, and accelerate military AI adoption to maintain its advantage over China and other international actors.
Center for Democracy & Technology: CDT emphasized that the AI Action Plan should prioritize transparent and accountable governance measures, foster open AI model development, and uphold fundamental civil rights and liberties to ensure safe and trustworthy AI adoption by both government and industry.

For a more in-depth comparison of the public responses, see Just Security’s excellent breakdown.

White House Leans on Allies to Limit Maintenance of Chip Tools in China: The Trump administration is reportedly pushing to further tighten restrictions on China’s semiconductor industry by limiting maintenance on its imported semiconductor manufacturing equipment. According to Bloomberg, U.S. officials recently met with their Japanese and Dutch counterparts to discuss limiting the ability of companies like ASML and Tokyo Electron to service semiconductor manufacturing equipment (SME) in China. Both countries have already joined the United States in limiting exports of the most advanced SME — the Dutch government has blocked exports of extreme ultraviolet lithography equipment (essential for manufacturing the most advanced chips) to China since 2019, and exports of deep ultraviolet lithography equipment (needed for making near-bleeding-edge chips) were controlled in 2023. But before these controls took effect, China was able to build up a significant stockpile of DUV equipment — according to CSIS, China has enough DUV equipment to produce 85,000 advanced wafers per month. Though DUV tools can’t match EUV capabilities, Chinese manufacturers have shown they can use them to produce relatively advanced 7nm chips, albeit with lower-than-ideal yields. Restricting maintenance on this existing equipment could severely impact Chinese chipmakers’ already-challenging manufacturing processes. While the Trump administration’s push signals continuity in U.S. policy toward China’s semiconductor industry, it remains unclear whether allies like Japan and the Netherlands will be as willing to go along for the ride.

Trump’s OSTP Pick Clears Committee Amid Science Agency Turmoil: Michael Kratsios, President Trump’s nominee to lead the White House Office of Science and Technology Policy (OSTP), secured Senate Commerce, Science, and Transportation Committee approval last week, though his confirmation hearing highlighted growing tensions over the administration’s science policies. While Kratsios used his opening statement to paint an optimistic vision of “American leadership in emerging technologies,” much of the hearing was given over to questions surrounding the widespread staffing cuts at federal science agencies and the looming threat of additional layoffs. When pressed on these cuts and reports that NSF could see its budget reduced by up to 66%, Kratsios was largely deferential, saying those decisions would be up to the White House Office of Management and Budget. As OSTP director, Kratsios would help spearhead the administration’s AI strategy and develop its AI Action Plan. He offered a preview of his thinking, describing a four-pillar approach focused on research and development, regulatory reform, international collaboration, and workforce development, emphasizing the need to tailor AI initiatives to individual agency missions. Despite the contentious hearing, Kratsios appears likely to secure full Senate confirmation given his previous experience, having served as OSTP’s chief technology officer and acting Pentagon research chief during Trump’s first term. However, his hearing hints at the fights that could lie ahead between Congress and the administration over science funding and staffing levels.

In Translation
CSET’s translations of significant foreign language documents on AI

Guiding Opinions on Strengthening the Governance of Science and Technology Ethics (Draft for Feedback). This draft Chinese policy document from 2021 lays out a basic framework for assessing ethical problems with scientific research. The policy calls for ethics reviews of research that potentially endangers human life, social stability, individual privacy, the environment, and, to a lesser extent, animal welfare. A CSET translation of the final version of this document — which closely resembles this draft version — is underway.

(Trial) Measures for Science and Technology Ethics Reviews (Draft for Feedback). These draft measures from April 2023 describe a process for ethics reviews of Chinese scientific research. Per this draft, research teams at universities, companies, government labs, and so on must set up ethics panels that review any of their research that presents ethical questions. The measures define ethically fraught research as involving human subjects, human genetic or biometric material, laboratory animals, or highly autonomous decision-making systems. A CSET translation of the final version of these measures, promulgated in September 2023, is underway.

If you have a foreign-language document related to security and emerging technologies that you’d like translated into English, CSET may be able to help! Click here for details.

Job Openings

We’re hiring! Please apply or share the role below with candidates in your network:

Director of Analysis: This leadership role combines organizational strategy, research oversight, and project & team management. You will oversee CSET’s research agenda, manage a diverse team of 25+ researchers, and ensure high-quality, policy-relevant outputs.
Research Fellow – Applications: This role will support the Applications line of research. You will analyze, publish, and help to lead CSET’s work on the use of AI in the national security arena, including shaping priorities, authoring or overseeing the execution of the research, and production of reports and other outputs.
Software Engineer: This role will primarily support CSET’s Emerging Technology Observatory. We are looking for a generalist with skills in full-stack web development, web scraping, data preparation, and deploying applications on the Google Cloud Platform.

Please bookmark our careers page to stay up to date on all active job postings. You can also subscribe to receive job announcements by updating your subscription preferences in the footer of this email.

What’s New at CSET

REPORTS

The State of AI-Related Apprenticeships by Luke Koslosky and Jacob Feldgoise
How to Assess the Likelihood of Malicious Use of Advanced AI Systems by Josh A. Goldstein and Girish Sastry

PUBLICATIONS AND PODCASTS

CSET: CSET’s Recommendations for an AI Action Plan
Conference on Frontier AI Safety Commitments: Enabling External Scrutiny of AI Systems with Privacy-Enhancing Technologies by Kendrea Beers and Helen Toner
Barron’s: Chinese AI Is Following a Familiar Playbook. U.S. Firms Should Worry. by Sam Bresnick and Cole McFaul
The Hill: Washington’s science cuts are a gift to Beijing by Cole McFaul
Tech Policy Press: Out of Balance: What the EU’s Strategy Shift Means for the AI Ecosystem by Mia Hoffmann and Owen J. Daniels
Clearer Thinking Podcast: AI, US-China relations, and lessons from the OpenAI board with Helen Toner

EMERGING TECHNOLOGY OBSERVATORY

EVENT RECAPS

On February 24, CSET, the Beeck Center for Social Impact + Innovation, and Georgetown Law’s Institute for Technology Law and Policy hosted a discussion on the federal government’s procurement and use of AI tools and talent. Watch a full recording of the event.

IN THE NEWS

Axios: “AI is here to stay” and it can make lives better if handled right, Sen. Rounds says (Emily Hamilton covered Helen Toner’s appearance at SXSW)
Business Insider: While the U.S. and China compete for AI dominance, Russia’s leading model lags behind (Mia Jankowicz and Thibault Spirlet cited the CSET report Keeping Top AI Talent in the United States)
GZERO Media: Inside the fight to shape Trump’s AI policy (Scott Nover quoted Cole McFaul)
GZERO Media: Did Biden’s chip rules go too far? (Scott Nover quoted Jacob Feldgoise)
Just Security: Shaping the AI Action Plan: Responses to the White House’s Request for Information (Clara Apt and Brianna Rosen cited the CSET’s Recommendations for an AI Action Plan)
Nature: China research on next-generation computer chips is double the U.S. output (Elizabeth Gibney quoted Zachary Arnold and Jacob Feldgoise and cited the ETO posts The state of global chip research & Hot topics in chip design and fabrication research: insights from the Map of Science)
South China Morning Post: Tech war: China leads US in quantity, quality of semiconductor research, report finds (Iris Deng cited the ETO post The state of global chip research)
Semafor: China catching up to U.S. on chip science, report finds (Mizy Clifton cited the ETO post The state of global chip research)
TIME: The ‘Oppenheimer Moment’ That Looms Over Today’s AI Leaders (Tharin Pillay quoted Helen Toner)
Tom’s Hardware: China doubles US research output on next-gen chips amid export bans — trade war fuels a research wave (Dallin Grimm quoted Zachary Arnold and cited the ETO post The state of global chip research)

What We’re Reading

Paper: Measuring AI Ability to Complete Long Tasks, Thomas Kwa et al., Model Evaluation & Threat Research (METR) (March 2025)

Article: The United States Must Avoid AI’s Chernobyl Moment, Janet Egan and Cole Salvador, Just Security (March 2025)

Paper: Do Large Language Model Benchmarks Test Reliability? by Joshua Vendrow, Edward Vendrow, Sara Beery, Aleksander Madry (February 2025)

Upcoming Events

March 25: CSET Webinar, What’s Next for AI Red-Teaming?: And How to Make It More Useful, featuring Anna Raney, Tori Westerhoff, Marius Hobbhahn, Colin Shea-Blymyer, and Jessica Ji

What else is going on? Suggest stories, documents to translate & upcoming events here.

The AI Cold War That Will Redefine Everything

This website uses cookies.