Back to blog

Winning a 2025 security contest in <20 minutes

0xfirefist0xfirefist3 min read

How we got here

So far we have launched v1 & v2 of the solidity-auditor security Skill. It’s already been used by thousands of developers and security researchers, finding a ton of Critical & High severity issues. Many people use the tool for bug bounties as well. Still, we wanted to make it better and better.

After over 150 runs and many different approaches — new agents, more iterations, extra layers, up-front scoping, we have our results ready.

Case Study: DoDo Contest

DODO Cross-Chain DEX was a security contest that ran for 8 days, with ~1600 lines of code and ~100 security researcher participants. Their efforts discovered 5 High and 12 Medium severity vulnerabilities.

Solidity-auditor v3, our latest AI Security Skill, found 14 out of 17 findings — 82.4% recall— in less than 20 minutes. As a comparison, the best performing security researcher found 8 vulnerabilities in total.

solidity-auditor v3 on DoDo14 of 17 documented vulnerabilities caught (82.4 percent).DODO82.4%14 of 17 documented vulnerabilities caught

The two approaches that increased recall %

1. A shared reasoning discipline

Every agent now follows the same 3 senior-auditor practices:

  • Feynman— if you can’t explain a function in plain words, you don’t understand it yet. The jargon is where the bug hides.
  • Socratic— drill past the first answer to the assumption underneath it.
  • Inversion— the developer asks “does this work?” We ask “how do I break this?”

2. New gap hunter agents

Most specialists work one lens — arithmetic, access, economics, external calls. But the hairiest bugs live in the seams between two or three lenses, where every individual specialist would falsely say “nothing is wrong here”.

Three new agents hunt only the seams:

  • Flow-gap— a callback hands control away mid-execution, and the code after it trusts state from before. Each step is correct, but the sequence is wrong.
  • Numerical-gap— an invariant that holds in real numbers math but breaks under integer rounding. For example, a fee that truncates to zero.
  • Trust-gap— deposit logic is priced on spot, while withdraw — on TWAP. Individually they are reasonable, together they lead to a free trade.

Comparison vs other open source security tools

We ran solidity-auditor v3 against the most popular & top performing open source tools out there.

Here is how we compare on recall:

Recall

Across all 4 codebases, solidity-auditor v3 leads on recall % (and ties Nemesis on Ammplify).

HIT/MISS recall by codebaseRecall percentages by codebase. DODO: Solidity Auditor V2 47.0, Solidity Auditor V3 68.6, nemesis 52.9, plamen-core 58.8, plamen-light 41.2, Claude Code 41.2. Megapot: Solidity Auditor V2 42.0, Solidity Auditor V3 45.5, nemesis 18.2, plamen-light 36.4, Claude Code 18.2. Ammplify: Solidity Auditor V2 26.9, Solidity Auditor V3 35.2, nemesis 35.2, plamen-light 22.2, Claude Code 11.1. Panoptic: Solidity Auditor V2 7.5, Solidity Auditor V3 15.2, nemesis 1.5, plamen-light 13.6, Claude Code 4.5.HIT/MISS recall by codebaseDODO68.6%Megapot45.5%Ammplify35.2%Panoptic15.2%0%20%40%60%80%Solidity Auditor V2Solidity Auditor V3nemesisplamen-coreplamen-lightClaude Code

When it comes to token spend and time, here are the results:

Run time

Across all 4 codebases, solidity-auditor v3 stays in the ~19–28 min range, while being multiple times faster on average than most tools.

Run time per codebaseWall time in minutes by codebase. DODO: Solidity Auditor V2 19m, Solidity Auditor V3 ~22m, nemesis ~30m, plamen-core 187m, plamen-light 59-146m, Claude Code 10m. Megapot: Solidity Auditor V2 23m, Solidity Auditor V3 28m 14s, nemesis ~31m, plamen-light ~108m, Claude Code 20m 33s. Ammplify: Solidity Auditor V2 18m 45s, Solidity Auditor V3 22m 12s, nemesis ~22m, plamen-light ~123m, Claude Code 9m 40s. Panoptic: Solidity Auditor V2 21m, Solidity Auditor V3 21m 41s, nemesis ~58m, plamen-light 131m, Claude Code 15m 6s.Run time per codebaseLower is betterSolidity Auditor V3 finishes in ~22 min — ~5× faster than the other toolsDODO~22mMegapot28m 14sAmmplify22m 12sPanoptic21m 41s050m100m150m200mSolidity Auditor V2Solidity Auditor V3nemesisplamen-coreplamen-lightClaude Code

Token consumption

When it comes to “token spend”, solidity-auditor v3 is more expensive only compared to solidity-auditor v2 and Claude Code itself.

Token consumption per codebaseMean tokens per run, millions, by codebase. DODO: Solidity Auditor V2 2.9M, Solidity Auditor V3 3.9M, nemesis 18.0M, plamen-core 44.3M, plamen-light 17-32M, Claude Code 264k. Megapot: Solidity Auditor V2 3.2M, Solidity Auditor V3 4.79M, nemesis 12.1M, plamen-light 23M, Claude Code 289k. Ammplify: Solidity Auditor V2 3.3M, Solidity Auditor V3 3.34M, nemesis 5.6M, plamen-light ~27M, Claude Code 249k. Panoptic: Solidity Auditor V2 3M, Solidity Auditor V3 3.35M, nemesis 14.6M, plamen-light 27M, Claude Code 735k+.Token consumption per codebaseLower is betterSolidity Auditor V3 uses ~10× fewer tokens than the other toolsDODO3.9MMegapot4.79MAmmplify3.34MPanoptic3.35M010M20M30M40M50MSolidity Auditor V2Solidity Auditor V3nemesisplamen-coreplamen-lightClaude Code

Installation & usage of solidity-auditor v3

Follow the instructions in the README here:

solidity-auditor is an AI assistant. AI analysis can never verify the complete absence of vulnerabilities, and no guarantee of security is given.

For security consulting with elite experts, visit pashov.com, or reach out directly via Telegram.