Written by

Patrick Collins

Published on

March 1, 2026

How to Not Accidentally Shoot Yourself in the Foot with AI Development

AI is an incredible productivity tool, but it's making us reckless. Here are 6 tips to use AI without getting destroyed — from security prompts to the Agent Rule of Two.

Table of Contents

How to not accidentally shoot yourself in the foot with AI development

AI has been an incredible productivity boost for software engineers. I personally have been using Claude Opus 4.6 — it passed my vibe check and I've been using it to code pretty much everything I touch. At Cyfrin, we've also been working with AI to help secure our codebases.

But the rest of the industry — ourselves included — has been emboldened to do dumber and dumber things because we can do more and more with AI. And the attack surface is growing fast.

AI Is Already Screwing Up

Here are real examples of how AI has already gone wrong:

The Moonwell Incident — Claude helped co-author a git commit that mispriced ETH

The Moonwell Incident. Security auditor Pashov pointed out that Claude helped co-author a git commit in the Moonwell project that misconfigured an oracle, effectively pricing cbETH at ~$1.12 instead of ~$2,200 — leading to $1.78 million in bad debt. Hopefully $1 continues to be the incorrect price of ETH...

The EchoLeak Vulnerability (CVE-2025-32711). Just by sending someone an email with Microsoft 365 Copilot turned on, Copilot would read that email and get prompt-injected — sending sensitive data to hackers automatically. You didn't even have to open the email.

Slopsquatting. AI hallucinates a package that doesn't exist. An attacker registers that package. Now the hallucinated package is real — and loaded with malicious code. Your supply chain is compromised.

Private keys in plain text. It's 2026 and AI still tells people that hardcoding private keys is fine. It's not.

According to OWASP, the top three AI attack vectors are:

Prompt injection
Leaking sensitive information
Slopsquatting / supply chain attacks

And the code AI writes isn't bulletproof either. The BaxBench leaderboard shows that even the best model — Claude Code — only writes correct and secure code about 56% of the time.

So... AI is bad then

Not so fast.

Claude Opus 4.6 has found 500+ vulnerabilities. AI projects have been smashing leaderboards on HackerOne. At Cyfrin, when we do smart contract audits, we use AI to help us understand and navigate the codebase. It's a laborious job, and since AI is so helpful for productivity, I'm not going to not use it.

The question isn't "should we use AI?" — it's "how do we use it without getting destroyed?"

There's a companion video for this article if you'd prefer to watch instead of read. And feel free to point your AIs to this article to help you stay secure.

The #1 Rule: It's Your Fault

Before we get into the tips, here's the overarching principle: you are the developer and you are the security researcher. That means you are responsible for any bugs you ship.

Blaming a bug on the AI is incorrect. It is your fault.

You must review what your AI is doing. You must know what your AI has access to. That is the summary of all of this.

Tip 1: Add a Security Prompt

Sounds stupid. We've all seen the memes where someone asks Claude to "make me a billion-dollar company and make no mistakes."

But funnily enough, it actually works. On BaxBench, they found that simply adding a generic security reminder to prompts improved Claude Code's secure code rate from 56% to 66%. That's a meaningful jump just from telling your AI to care about security.

This is where tools like Claude Code's skills system come in:

Cyfrin SolSkill helps write more secure Solidity code and catches common code smells.
Trail of Bits Claude Code Config adds a whole set of security-focused settings to your Claude Code environment. You should 100% set this up.

Tip 2: Use Normal Security Tooling

This is the "yeah, obviously" advice, but people forget: you can still use all the normal security tooling alongside AI.

Static analysis tools like Slither and Aderyn are still incredibly valuable. Don't stop using them just because you have an AI assistant. Layer your defenses.

(And follow Cyfrin on Twitter — we're cooking up some good stuff on this front.)

Tip 3: Containerize Your Environments

Use containerization to isolate your development environments. For smart contract development, you can spin up an isolated container and drop Claude or ChatGPT into it so the environment is sandboxed.

The Cyfrin devcontainer is a great starting point. If your AI goes rogue or gets injected, the blast radius is limited to that container instead of your entire machine.

Tip 4: Handle Sensitive Information Like It's Radioactive

Two concerns with sensitive information:

You don't want your AI to have sensitive information
You don't want your AI to expose sensitive information

Here's the rule: if you're not using a local LLM, assume that if your AI has access to something, the world now has access to it.

Private keys should never be in plain text, and your AI should never have access to them. The Cyfrin SolSkill has prompts baked in specifically to prevent this.

Tip 5: The Agent Rule of Two

Some of you are thinking: "Patrick, shut up. I want to give my AI access to my sensitive information and passwords so I can use tools like OpenClaw."

I personally don't recommend that. But if you're going to do it anyway, use the Agent Rule of Two.

This framework came out of Meta's AI security research, inspired by Simon Willison's "lethal trifecta" concept and the Google Chrome team's Rule of 2.

Your agent should only have two of the following three properties at any given time:

A. The agent can process untrustworthy inputs
B. The agent can access sensitive systems or private data
C. The agent can change state or communicate externally

Pick two. Never all three.

If your AI agent has all three, you're cooked. An attacker prompt-injects your agent through untrusted data (like a malicious email — see EchoLeak), the agent has access to your private information, and it can send that information out. Game over.

But remove any one:

B + C only (no untrusted inputs): The malicious email never reaches the AI. The agent only processes trustworthy sources.

A + C only (no private data): Even if the agent gets prompt-injected, there's nothing sensitive for it to reveal.

A + B only (no external communication): Even if the agent gets injected and reads your sensitive data, it can't send it anywhere.

Apply this when working with tools like OpenClaw. Limit access to specific accounts. Split your agents up so no single agent becomes a single point of failure.

In Crypto, Be Even More Paranoid

In the smart contract world, my recommendation is simpler: just don't give an agent any of your private or sensitive information at all.

Even with the Rule of Two, what happens if your agent itself gets hacked? If you're using a local LLM, that's slightly better, but you should still worry about data in the cloud being stolen.

Tip 6: Watch Your Supply Chain

For supply chain attacks like slopsquatting: know what packages your AI is suggesting. Don't blindly install something just because your AI told you to. Verify it exists, verify it's the real package, and verify it's not a hallucination that someone has weaponized.

Tools like Snyk and npm audit can help.

Prompt Injection Is Still Unsolved

I saved this for last because it's probably the trickiest attack to defend against today.

Yes, you can do input sanitization. Yes, you can (and should) do human-in-the-loop interactions, especially when reviewing smart contract code. But prompt injection is fundamentally a problem we haven't solved yet.

On a promising note: sockdrawermoney — co-founder of Code4rena (which kicked off the competitive audits craze) and one of the co-creators of npm audit — has been working on a language designed to mitigate prompt injection through taint tracking. Any time your agent touches something, the data gets labeled as trusted or untrusted. It's very cool, and I'm still digging into it myself.

TL;DR

AI is an incredible productivity tool, but it's making us reckless
Top 3 AI attack vectors per OWASP: prompt injection, sensitive data leaks, slopsquatting
Add a security prompt — improved secure code rates by ~10% on BaxBench
Use security tooling — Slither, Aderyn, and static analysis still matter
Containerize — sandbox your AI's environment (Cyfrin's devcontainer)
Private keys never in plain text — your AI should never have access to them
Agent Rule of Two — never grant all three: untrusted inputs, private data, and external communication
You are responsible for everything your AI produces. Review it. All of it.