Winning a 2025 security contest in <20 minutes
How we got here
So far we have launched v1 & v2 of the solidity-auditor security Skill. It’s already been used by thousands of developers and security researchers, finding a ton of Critical & High severity issues. Many people use the tool for bug bounties as well. Still, we wanted to make it better and better.
After over 150 runs and many different approaches — new agents, more iterations, extra layers, up-front scoping, we have our results ready.
Case Study: DoDo Contest
DODO Cross-Chain DEX was a security contest that ran for 8 days, with ~1600 lines of code and ~100 security researcher participants. Their efforts discovered 5 High and 12 Medium severity vulnerabilities.
Solidity-auditor v3, our latest AI Security Skill, found 14 out of 17 findings — 82.4% recall— in less than 20 minutes. As a comparison, the best performing security researcher found 8 vulnerabilities in total.
The two approaches that increased recall %
1. A shared reasoning discipline
Every agent now follows the same 3 senior-auditor practices:
- Feynman— if you can’t explain a function in plain words, you don’t understand it yet. The jargon is where the bug hides.
- Socratic— drill past the first answer to the assumption underneath it.
- Inversion— the developer asks “does this work?” We ask “how do I break this?”
2. New gap hunter agents
Most specialists work one lens — arithmetic, access, economics, external calls. But the hairiest bugs live in the seams between two or three lenses, where every individual specialist would falsely say “nothing is wrong here”.
Three new agents hunt only the seams:
- Flow-gap— a callback hands control away mid-execution, and the code after it trusts state from before. Each step is correct, but the sequence is wrong.
- Numerical-gap— an invariant that holds in real numbers math but breaks under integer rounding. For example, a fee that truncates to zero.
- Trust-gap— deposit logic is priced on spot, while withdraw — on TWAP. Individually they are reasonable, together they lead to a free trade.
Comparison vs other open source security tools
We ran solidity-auditor v3 against the most popular & top performing open source tools out there.
Here is how we compare on recall:
Recall
Across all 4 codebases, solidity-auditor v3 leads on recall % (and ties Nemesis on Ammplify).
When it comes to token spend and time, here are the results:
Run time
Across all 4 codebases, solidity-auditor v3 stays in the ~19–28 min range, while being multiple times faster on average than most tools.
Token consumption
When it comes to “token spend”, solidity-auditor v3 is more expensive only compared to solidity-auditor v2 and Claude Code itself.
Installation & usage of solidity-auditor v3
Follow the instructions in the README here:
Repository
github.com/pashov/skillssolidity-auditor is an AI assistant. AI analysis can never verify the complete absence of vulnerabilities, and no guarantee of security is given.
For security consulting with elite experts, visit pashov.com, or reach out directly via Telegram.