Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Anthropic's latest AI model, Claude Fable, is drawing pushback from the cybersecurity research community over safety restrictions that researchers argue make the system nearly unusable for legitimate security work.

The complaint centers on guardrails Anthropic built into Fable that block requests related to vulnerability research, exploit development, and penetration testing. Security researchers say the filters are so aggressive they prevent researchers from using the model for standard defensive security tasks like analyzing malware, testing system defenses, or documenting vulnerabilities.

Anthropic designed these safeguards to prevent misuse of the model for malicious hacking. The company positioned Fable as a more cautious alternative to its flagship Claude model, with heightened restrictions on potentially dangerous outputs. But the approach has created friction with the very professionals who could legitimately benefit from AI assistance in strengthening defenses.

The tension reflects a broader challenge in AI safety. Companies building large language models face pressure to restrict potentially dangerous capabilities while remaining useful to security practitioners who need access to those same capabilities for defensive purposes. Anthropic's guardrails appear to have tilted heavily toward restriction.

Some researchers argue Anthropic could implement more nuanced policies, like verification systems that grant vetted security professionals greater access while maintaining restrictions for general users. Others suggest the model needs better context understanding to distinguish between offensive and defensive security work.

This criticism comes as major AI labs grapple with how to balance safety against utility. OpenAI, Google, and others have faced similar complaints when safety measures get in the way of legitimate research use cases. The challenge intensifies as security researchers increasingly rely on AI tools for threat analysis and defense development.

Anthropic has not publicly responded to the complaints about Fable's guardrails. The company could refine the model's restrictions based on researcher feedback, but doing so risks reintroducing security vulnerabilities. How Anthrop

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Anthropic’s latest feud with the Trump admin may actually help it, sales data suggests

SpaceX valuation balloons to $2.6T, briefly passes Amazon

Qualcomm wants to be the chip inside whatever replaces your smartphone, and it just announced two products toward that end

Get Daily StartupWireDaily