News

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

By Priya Shah · VC & Venture Reporter 22h ago

Microsoft released Adaptive Spec-driven Scoring for Evaluation and Regression Testing, an open source framework that lets developers build AI behavior tests from text descriptions. The tool addresses a growing pain point in AI development: the difficulty of setting up rigorous evaluation systems without extensive infrastructure.

The framework eliminates the need for developers to hand-code complex test suites. Instead, engineers can describe desired AI behaviors in plain language, and the system automatically generates corresponding evaluation tests. This approach cuts down on boilerplate work and accelerates the feedback loop during model development.

The move puts Microsoft in direct competition with startups building AI evaluation infrastructure. Companies like Vellum, Gantry, and others have raised significant funding to solve this exact problem. By open-sourcing its own solution, Microsoft signals confidence in its technology while simultaneously undercutting the commercial case for standalone evaluation vendors.

Developers working on large language models and other AI systems need ways to measure whether their outputs meet quality standards, stay within safety guardrails, and maintain consistency across versions. Manual testing doesn't scale. The framework automates this with regression testing capabilities that catch behavioral drift when models get updated.

The tool fits Microsoft's broader AI strategy. The company has invested heavily in OpenAI and integrated generative AI across its product portfolio. Lowering the friction for internal teams and external developers to build and test AI systems expands the ecosystem around Azure and GitHub, where developers typically work.

Open sourcing the framework also builds goodwill with the developer community at a moment when scrutiny over AI safety and model reliability is intensifying. Regulators and enterprises increasingly demand proof that AI systems behave predictably. A transparent, community-driven evaluation framework helps developers demonstrate rigor in their testing practices.

The release suggests Microsoft sees AI evaluation as table stakes rather than a defensible competitive advantage. The company benefits more from a thriving ecosystem of AI builders

Key facts

Microsoft released Adaptive Spec-driven Scoring for Evaluation and Regression Testing, an open source framework that lets developers build AI behavior tests from text descriptions.
The tool addresses a growing pain point in AI development: the difficulty of setting up rigorous evaluation systems without extensive infrastructure.
The framework eliminates the need for developers to hand-code complex test suites.
Instead, engineers can describe desired AI behaviors in plain language, and the system automatically generates corresponding evaluation tests.

Why it matters

This general story is part of StartupWireDaily's daily monitoring of sources, companies, institutions, and trends shaping funding, founders & startup ecosystem news.

Source context

This article summarizes and contextualizes reporting from TechCrunch. The visible original source link is retained so readers can review the primary report directly.

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Key facts

Why it matters

Source context

Related reading

Squishmallows, dentures, and an ‘I Heart Hot Dads’ bag: Uber has found thousands of items left in robotaxis

Cyera eyes $12B valuation at 80x ARR multiple despite operating losses

Cyberdecks are having a moment, rejecting big tech surveillance with style and substance

Get Daily StartupWireDaily