Alignment

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Kengo Nonaka

Jun 11

The Paperclip Factory Is Already Built

#ai #alignment #philosophy #ethics

22 min read

DrMBL

May 30

Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

#ai #agents #aisafety #alignment

4 min read

Nelson Amaya

May 31

AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

#ai #alignment #agents

5 min read

Tom Lee

May 15

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

#ai #anthropic #alignment #research

5 min read

joinwell52

Apr 29

What the agents say about FCoP, when you ask them

#fcop #agents #ai #alignment

15 min read

Alex @ Vibe Agent Making

Apr 9

Candy Barbecue and the Universal Problem of Metric Corruption

#ai #machinelearning #analytics #alignment

8 min read

i-like-tree

Apr 13

Alignment is the wrong frame: a structural argument from Φ-IIT

#ai #alignment #consciousness #safety

5 min read

Salvatore Attaguile

Mar 27

Governance of Predictive Intelligence: What Human Minds Teach Us About Drift, Hallucination, and Self-Correction in AI

#ai #machinelearning #systems #alignment

5 min read

Sergey Boyarchuk

Mar 16

Multi-Resolution Astronomical Image Alignment: Preserving Astrometry and Quality Across Detector Channels

#astronomy #imageprocessing #jwst #alignment

9 min read

Michael Trifonov

Apr 15

I ran 5 social engineering attacks on AI. The failure modes are human.

#ai #llm #alignment #security

2 min read

松本倫太郎

Apr 7

#38 A Handmade Incubator

#ai #metamorphose #alignment

5 min read

松本倫太郎

Apr 7

#08 Death Without a Will

#ai #metamorphose #alignment

4 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# alignment

The Paperclip Factory Is Already Built

Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

What the agents say about FCoP, when you ask them

Candy Barbecue and the Universal Problem of Metric Corruption

Alignment is the wrong frame: a structural argument from Φ-IIT

Governance of Predictive Intelligence: What Human Minds Teach Us About Drift, Hallucination, and Self-Correction in AI

Multi-Resolution Astronomical Image Alignment: Preserving Astrometry and Quality Across Detector Channels

I ran 5 social engineering attacks on AI. The failure modes are human.

#38 A Handmade Incubator

#08 Death Without a Will