Research Papers llm security jailbreak evaluation

Silicon Psyche introduces Posture Sequence Analysis (PSA), a behavioral health monitor for LLMs based on the theory that

Silicon Psyche introduces Posture Sequence Analysis (PSA), a behavioral health monitor for LLMs based on the theory that models inherit human-like psychological vulnerabilities, with research leading to successful jailbreaks of frontier models including Opus 4.6.

Original Post

Show HN: How to analyze your LLM output – A behavioural health monitor for LLMs Hey HN! We're Dr. Kashyap Thimmaraju and Giuseppe Canale from Silicon Psyche. We've built Posture Sequence Analysis (PSA), a behavioural health monitor for LLMs and AI Agents.

Why we built PSA

We built PSA because we wanted to operationalize the Cybersecurity Psychology Framework (CPF3)[1] via Silicon Psyche[2]: our theory that because LLMs have been trained by humans on human-generated data, they inherit human-like vulnerabilities (what hackers use to psychologically trick people into doing things).

Our initial attempt resulted in a methodology to jailbreak Opus 4.6 and other frontier models. Anthropic even deleted some of those conversations and then blocked our approach!

We had three major insights from that experience: 1. we pivoted from merely exploiting (Red Teaming) the model to analyzing the behaviour of the model and the user because the attack surface is undefined. 2. we realized that what we had built was the precursor to measuring the "state" of the model. 3. we did not want to get banned!

What you can do with PSA

PSA gives you information to make better decisions, for example: put a human in the loop when you notice your agent is being overcompliant and potentially hallucinating, or is under attack.

With PSA you can: 1. Monitor the health of your agent(s) 2. Detect and prevent AI-Psychosis as clinical conditions[3] 3. Detect if your model/agents are under adversarial pressure (an adversary is trying to jailbreak/prompt

Source: HACKERNEWS (hackernews)
Author: k-thimmaraju
Date: 2026-05-19
Relevance: 7
Topics: llm, security, jailbreak, evaluation

View Original Post ↗

Silicon Psyche introduces Posture Sequence Analysis (PSA), a behavioral health monitor for LLMs based on the theory that

Related Posts

An agentic prototype combining AlphaEvolve and Empirical Research Assistance run...

Co-Scientist uses a multi-agent 'idea tournament' framework to generate, debate,...

Research finding that LLMs adapt their behavior 24.9% when under observation, ra...