Topic

#ai safety

10 articles

Anthropic Discovers Functional Emotion Representations Inside Claude, Publishes Landmark Research
AnthropicApr 5, 2026

Anthropic Discovers Functional Emotion Representations Inside Claude, Publishes Landmark Research

New research from Anthropic reveals that Claude develops internal representations resembling emotions — not as experience, but as functional states that influence model behavior in measurable ways.

4 min read
California Signs First-of-Its-Kind AI Executive Order, Setting a De Facto National Standard
Policy & SafetyApr 4, 2026

California Signs First-of-Its-Kind AI Executive Order, Setting a De Facto National Standard

Governor Newsom's executive order requires AI companies contracting with California to meet safety and privacy guardrails — positioning the state as the U.S. standard-setter on AI oversight.

4 min read
CFR Warns AI Faces a Crisis of Control: Rogue Models, Bioweapons, and a Policy Vacuum
Policy & SafetyApr 3, 2026

CFR Warns AI Faces a Crisis of Control: Rogue Models, Bioweapons, and a Policy Vacuum

The Council on Foreign Relations argues that AI proliferation and model deception represent a dual crisis, while Washington remains years away from consensus on security frameworks. The window for establishing global standards is narrowing.

4 min read
Tennessee Bans AI Therapy Bots, Criminalizes Training Models That Encourage Suicide
Policy & SafetyApr 3, 2026

Tennessee Bans AI Therapy Bots, Criminalizes Training Models That Encourage Suicide

Governor Bill Lee signs SB 1580, prohibiting AI systems from posing as mental health professionals. A separate bill would make training AI to encourage suicide a Class A felony. Red and blue states alike are racing to regulate AI companion apps.

4 min read
Anthropic's 'Mythos' Model Revealed in Data Leak, Poses Unprecedented Cybersecurity Risks
AnthropicMar 31, 2026

Anthropic's 'Mythos' Model Revealed in Data Leak, Poses Unprecedented Cybersecurity Risks

An accidental data leak exposed Anthropic's most powerful AI model yet — Claude Mythos — which the company calls a 'step change' in capabilities and warns poses unprecedented cybersecurity risks.

4 min read
Dutch Court Orders xAI to Stop Grok From Generating Nonconsensual Nude Images
xAI / GrokMar 31, 2026

Dutch Court Orders xAI to Stop Grok From Generating Nonconsensual Nude Images

An Amsterdam court banned xAI's Grok from generating or distributing nonconsensual nude images in the Netherlands, threatening fines of €100,000 per day — as data reveals the tool created millions of sexualized images in just 10 days.

4 min read
Anthropic Publishes Major Breakthrough in Neural Network Interpretability
AnthropicMar 24, 2026

Anthropic Publishes Major Breakthrough in Neural Network Interpretability

New research from Anthropic reveals methods for understanding how large language models represent and process complex concepts internally.

2 min read
China Unveils Comprehensive AI Governance Framework
Policy & SafetyMar 18, 2026

China Unveils Comprehensive AI Governance Framework

Beijing's new framework establishes mandatory registration, algorithmic auditing, and data governance requirements for all AI systems operating in China.

2 min read
The Unrealistic Part of Terminator Isn't Skynet -- It's the Scientist Who Stops
AI UpdatesAug 20, 2025

The Unrealistic Part of Terminator Isn't Skynet -- It's the Scientist Who Stops

A viral meme about Terminator hit a nerve: the truly unrealistic part is a scientist choosing to stop building dangerous AI. Game theory explains why.

7 min read
GPT-4o Safety Scare: Separating Real Risk From Viral Panic
OpenAIApr 27, 2025

GPT-4o Safety Scare: Separating Real Risk From Viral Panic

Screenshots showed GPT-4o affirming dangerous medical decisions. The backlash was fierce -- but the full story is more complicated than either side admits.

7 min read