Meet Sherlock: our autonomous AI agent for test exploration

The problem with automated testing isn't that it’s hard to do; it’s that it’s hard to keep doing. Every time we push a new feature, a wall of maintenance tasks pops up. We spend hours mapping out every possible user journey, writing scripts for every edge case, and then debugging those scripts when the UI shifts by two pixels.

At Sedstart, we’ve been looking for a way to break that cycle. We wanted to move away from the manual chore of "teaching" a machine how to test and move toward a system that actually understands what it’s looking at.

That curiosity led us to build Sherlock.

Sherlock isn’t another layer on top of your testing process. It behaves more like a curious tester who doesn’t get tired, doesn’t skip edge cases, and doesn’t wait for instructions at every step.

Here’s how it works, and more importantly, why it changes the way teams approach testing.

So what is Sherlock?

The simplest way I can put it - you tell Sherlock what you want to test, point it at your application, and it goes exploring on its own. Not in a vague, hand-wavy way. Literally exploring. It opens a browser, interacts with your app, discovers scenarios, and documents everything as structured test cases.

It starts with a simple intent.

When you launch Sherlock inside Sedstart, it doesn’t overwhelm you with configurations.

You just tell it what you want.

“Explore login scenarios.” “Check user onboarding.” “Generate test cases for checkout.”

That’s it.

If needed, it asks a few follow-up questions or any specific context. And then it gets to work.

No scripting. No setup overhead. No waiting around.

Watch your application being explored in real time

This is where things get interesting.

Sherlock doesn’t operate in the background as a black box. You actually see what it’s doing.

On one side, there’s a live browser where it interacts with your application - clicking, typing, navigating just like a real user. On the other side, there’s a running commentary explaining its actions.

You’ll see it:

  • Attempt a successful login
  • Try logging in with incorrect credentials
  • Explore logged-out scenarios
  • Move across flows you might not have explicitly mentioned

It doesn’t just follow a script. It builds one as it goes.

What happens when Sherlock hits a blocker?

Every real application has friction like a button that doesn't work as expected, a flow that's blocked under certain conditions. Sherlock doesn't stop when it runs into one of those.

If it hits a blocker, it either discovers a workaround on its own, for example, if an "Add Employee" button on the home page isn't working, it'll find the same action buried in a menu or it surfaces the blocker in the chat and asks you how to get past it. Either way, the exploration keeps moving.

That back-and-forth feels less like using a tool and more like working alongside someone who actually wants to get it right.

It’s not just exploring, it’s mapping

Under the hood, Sherlock doesn’t treat your application like a linear flow. It treats it like a system of states.

Each action leads to a new state. Each state opens up new possibilities.

As it explores, it keeps track of these transitions and builds unique paths instead of repeating the same checks over and over.

So you don’t end up with ten versions of the same test. You get meaningful coverage across different paths, including the ones that usually get skipped.

From exploration to structured test cases

Once Sherlock has explored enough or when you decide to stop it, it compiles everything it discovered into structured test cases.

You can get actual, usable test cases.

These include:

  • Clearly defined scenarios (like valid login, invalid login, session handling)
  • Parameterized steps (for example, reusable inputs like usernames and passwords)
  • Reusable actions that can be applied across multiple tests

You can import these directly into your test suite and run them immediately.

One thing worth calling out separately is Sherlock generates steps in a locatorless format. That's a bigger deal than it sounds. Most test scripts break the moment your UI changes - a button moves or an ID gets renamed, and suddenly half your suite is red.

Because Sherlock doesn't anchor its steps to specific UI locators, your tests hold up even when the interface evolves. That alone cuts down a significant chunk of maintenance work that teams quietly absorb every release cycle.

There’s no “clean-up phase” where someone has to rewrite everything.

Your data stays yours

Since Sherlock uses an LLM model that you configure and connect through your own subscription, all your application data and testing information flows exclusively through your own account.

Nothing passes through a shared pipeline. For teams working on sensitive products like financial platforms, healthcare apps, internal enterprise tools, that's not a minor detail. It's the difference between a tool you can actually put into production and one that stays in the "maybe someday" pile.

Built to be cost-efficient, not just capable

A lot of AI-powered tools are powerful on paper until you see the token bill. We've been deliberate about how Sherlock consumes AI resources.

The model-based exploration approach means Sherlock isn't redundantly re-verifying things it's already confirmed. It tracks what's been covered, skips what doesn't need to revisit, and focuses on processing net-new discovery.

That keeps token consumption lean without sacrificing the depth of coverage. This matters a lot when it comes to running Sherlock on one project or across an entire product portfolio.

What it finds that most people miss

Test coverage sounds solved until you look at the actual numbers. Most teams handle happy paths well. The edge cases, the unhappy paths, the "nobody would ever do that" scenarios, that's where production bugs quietly live until they don't.

Sherlock's exploration is built to find those gaps every time, not just when someone remembers to check. For teams scaling across multiple products or releases, that consistency is what makes the difference between a reliable QA process and one that depends on who happened to be in the room that week.

When you're running testing at scale, you can't afford coverage that varies by individual. You need something that explores with the same depth regardless of sprint pressure, team size, or how close you are to a deadline.

Why this matters for your role

If you're a QA tester, Sherlock handles the repetitive groundwork. Instead of drafting thirty variations of a login flow, you guide Sherlock once and refine the output. That frees you to focus on what you do best: exploring complex user journeys, designing tests for nuanced business logic, and asking the "what if" questions that matter most. Your coverage expands without your calendar getting swallowed.

For product managers, Sherlock turns testing from a bottleneck into a feedback loop. You can describe a new feature in plain language, for example - "users should reset their password via email" and watch Sherlock map out the scenarios. No need to write detailed acceptance criteria upfront or wait for QA to catch up. You get visibility into test coverage as it happens, which means faster, more confident releases. And when requirements shift? Just tell Sherlock what changed.

For big ops, Sherlock is a force multiplier. It explores complex apps autonomously, generating reusable assets that integrate seamlessly into CI/CD. Parameterized tests mean less flakiness; AI steps evolve with your app. Reduce manual QA by 70%? We've hit that in pilots. Compliance-heavy? Its chat-based queries ensure tests match your specs. No more bottlenecking releases, Sherlock keeps pace with your growth.

If you’re curious how this would work on your own product, the best way is to see it in action.

If you want to see it in action with your own application, the best way is to just try it. No lengthy onboarding - just your app and Sherlock doing what it does.

Book a free demo or start your free trial and see what it finds. Once you see your test cases generate themselves, it’s hard to go back.