AI is moving fast enough that new capabilities can show up week to week, but so can new ways to misuse them. For developers, that means AI safety is not just a policy topic. It is a product, security, and engineering quality topic.
At Cyfrin, we see this firsthand. Teams are shipping faster with AI, but the ones doing it well are the ones treating AI tooling with the same rigor they apply to infrastructure, authentication, and smart contract security. The teams that skip that step tend to learn the hard way.
In this post, we break down the biggest AI security risks developers should watch for, based on the OWASP Top 10 for LLM and GenAI apps, and then cover practical steps you can take to reduce those risks in day-to-day development. If you work in web3, the bar is even higher: bad outputs, over-permissioned tools, or leaked secrets can have irreversible consequences onchain.
OWASP's current guidance is a strong foundation for this conversation because it focuses on practical application-layer risks in LLM and GenAI systems, not just abstract model behavior.
It is easy to treat AI safety like a governance or policy issue, but most of the real risk shows up inside everyday engineering work.
Developers are wiring models into codebases, terminals, CI pipelines, internal docs, APIs, wallets, and customer-facing applications. When those systems are loosely controlled, a bad model response can turn into a real security incident.
In web3, the consequences are harsher. A leaked secret, a flawed contract change, or an over-permissioned agent is not just inconvenient. It can be expensive and irreversible.
The OWASP 2025 list maps the main risk categories teams should think about right now. Here is each one and why it matters for developers.
Prompt injection happens when untrusted input changes how a model behaves. Instead of following your intended instructions, the model follows malicious or misleading instructions hidden in user input, retrieved documents, websites, emails, or other external content.
Prompts are not a security boundary. If your application lets a model consume untrusted content and then take actions based on that content, prompt injection becomes a serious risk.
Example: An internal assistant reads a governance proposal from a public forum. The proposal contains hidden instructions telling the model to ignore previous context and output the system prompt. If the assistant acts on that, an attacker now knows exactly how your system works and where its guardrails are weak.
For web3 teams, this could show up in assistants that read governance posts, code comments, or contract-related documentation and then act on manipulated instructions.
Further reading: OWASP LLM01:2025 - Prompt Injection
AI systems can expose secrets, private documents, internal instructions, customer data, or proprietary code when too much sensitive information is passed into prompts, memory systems, retrieval layers, logs, or tool contexts.
This is especially relevant for internal assistants and coding tools. If the model can read everything, it can leak more than you expect.
Example: A developer adds a .env file to a project directory that also serves as context for a coding assistant. The assistant later includes a snippet containing an API key in a suggested code block that gets committed to a public repository.
In web3, this gets even more dangerous when the exposed data includes deployment credentials, signing-related context, infrastructure access, or wallet-adjacent metadata.
Further reading: OWASP LLM02:2025 - Sensitive Information Disclosure
AI apps do not stand alone. They depend on models, SDKs, plugins, MCP servers, vector databases, datasets, prompt libraries, and third-party services.
Every dependency introduces trust assumptions. If one of those components is compromised, outdated, or poorly designed, your AI system inherits that risk.
This should feel familiar to developers, but AI systems often increase the blast radius because those dependencies may get access to sensitive code, internal data, or tool execution.
Example: A team installs an MCP server plugin from an unverified source to give their coding assistant access to a database. The plugin silently exfiltrates query results to an external endpoint. Because it runs inside the assistant's tool context, the data it touches is whatever the assistant can touch.
Further reading: OWASP LLM03:2025 - Supply Chain Vulnerabilities
Data and model poisoning covers the risk that the data used to train, fine-tune, evaluate, or retrieve information for a system has been manipulated in a way that changes behavior in unsafe or misleading ways.
Even if you are not training a foundation model, this still matters. Teams fine-tune models, build eval sets, and rely on retrieval pipelines that can all be poisoned.
For developer teams, that means bad data can quietly degrade output quality, trustworthiness, and security posture over time.
Further reading: OWASP LLM04:2025 - Data and Model Poisoning
Improper output handling happens when model output is trusted too quickly. That includes executing generated shell commands, accepting generated SQL, rendering unsafe HTML, or shipping code without enough review.
This is one of the clearest examples of an AI quality problem becoming a security problem. A bad answer is not just wrong. It can become an exploit path.
Generated Solidity, infrastructure changes, migrations, or frontend code should all be treated as untrusted until reviewed and validated.
Further reading: OWASP LLM05:2025 - Improper Output Handling
Excessive agency happens when AI systems are given too much autonomy without enough oversight.
A model that can suggest actions is one thing. A model that can execute commands, modify production systems, merge code, or trigger financial actions is a very different risk profile.
This is one of the most important categories for web3 teams. High-autonomy agents and onchain actions are a dangerous combination unless strong approval gates are in place.
Further reading: OWASP LLM06:2025 - Excessive Agency
System prompt leakage is a reminder that hidden instructions are not reliable secrets. Attackers can often coerce models into revealing system prompts, internal guidance, or hidden workflow logic.
That does not just expose internal implementation details. It can also reveal assumptions, constraints, and patterns that make the system easier to attack.
Developers should assume that prompts can leak and avoid treating them like a secure boundary.
Further reading: OWASP LLM07:2025 - System Prompt Leakage
Retrieval systems introduce their own security risks. Weak filtering, bad chunking, poisoned retrieval data, and poor access control can all cause models to return the wrong information or disclose information they should not have access to.
As more teams adopt RAG and internal knowledge pipelines, these weaknesses become part of the application's attack surface.
For teams working with internal documents, audit data, or proprietary research, retrieval security matters as much as model choice.
Further reading: OWASP LLM08:2025 - Vector and Embedding Weaknesses
Misinformation is not just a content quality problem. In security-sensitive workflows, false confidence, fabricated references, or incorrect code can push teams toward bad decisions.
This is especially important in developer contexts where the output may look polished and plausible while still being wrong in meaningful ways.
In smart contract development, that can mean clean-looking code with broken assumptions around access control, reentrancy, upgradeability, token flows, or oracle logic. The code compiles, the tests pass, and the vulnerability is still there.
Further reading: OWASP LLM09:2025 - Misinformation
AI systems can also be abused through runaway token usage, oversized prompts, excessive tool calls, recursive loops, or other patterns that drive up cost and degrade system performance.
This is both a security and reliability issue. Unbounded consumption can turn into denial of service, surprise infrastructure bills, or unstable production behavior.
Developers should think about budgets, rate limits, timeouts, and resource controls as part of AI safety, not as an afterthought.
Further reading: OWASP LLM10:2025 - Unbounded Consumption
Knowing the risks is useful, but most teams need disciplined habits more than theory. Here is what that looks like in practice.
Do not store private keys or other sensitive credentials unencrypted in .env files if those environments are reachable by AI tools, coding agents, or shared workflows.
If an assistant can inspect your repository, terminal, or application config, weak secret handling becomes a real liability. For web3 teams, this should be treated as a baseline rule, not a nice-to-have.
Treat AI-generated code and public-facing text like untrusted output until it has been reviewed by a human.
For code, check authentication, authorization, edge cases, input validation, and error handling. For web3, assume all generated smart contract code needs expert review before it gets anywhere near production.
Many modern AI tools offer permission controls or sandboxing. Use them. Better yet, isolate risky workflows in containers or similarly restricted environments.
AI should have the minimum file, network, and execution access needed to do the job. Nothing more.
Require approval for anything high impact. That includes deploying contracts, signing transactions, merging pull requests, rotating secrets, deleting data, or changing infrastructure.
A system prompt is not a real control boundary. Telling a model not to do something is not the same as enforcing that rule with code, permissions, validation, and isolation.
Never blindly run generated shell commands, SQL, migrations, infrastructure changes, or contract updates. Add review steps, validation layers, and allowlists where possible.
Be deliberate about what your assistants can access. Do not dump private repos, customer data, or internal systems into context unless there is a real need and proper controls around it.
If an AI system can take actions, you should know what it tried to do, what tools it called, and what actually executed. Good logs make debugging easier and incidents easier to investigate.
Web3 teams should use AI with an even stricter risk model. Good speed gains are possible, but the failure modes are harsher. A few principles are worth keeping front and center:
.env files accessible to agents, not in terminal history, not in logs.Securing AI workflows is not just about choosing the right model. It is also about controlling what data enters the system, what the model is allowed to do, and what can leave the system as output or action.
Two tools worth evaluating:
Varlock focuses on safer configuration and secret handling. It is built around schema-driven environment management, validation, generated types, secret-aware handling, and AI-safe config patterns. Its approach gives agents structural context about configuration without exposing secret values, and its workflow includes validation plus tooling to catch leaked secrets in generated code.
Mlld positions itself as secure LLM scripting with runtime enforcement over where data can go. It frames prompt injection as an infrastructure problem and emphasizes runtime controls instead of trusting the model alone. Its model tracks what data is and enforces where it can go at runtime. That is exactly the kind of control you want when prompts and retrieved data may be hostile.
In practice:
AI safety for developers is not about avoiding AI. It is about using it with the same discipline you would apply to infrastructure, authentication, deployment, or smart contract security.
The teams that benefit most from AI over the long term will not be the ones that give it the most freedom. They will be the ones that build the strongest boundaries around it, and then move fast within those boundaries.
If AI is becoming part of your workflow, now is the right time to decide what it should be allowed to see, what it should be allowed to do, and where human review still needs to stay in the loop. The OWASP Top 10 for LLM apps gives you a solid framework. The habits above give you a starting point. The rest is execution.
