Mainframe security deep dive: From attack paths to best practices

Mainframe systems often sit at the heart of an organization’s infrastructure, silently powering critical operations behind the scenes. Yet despite their centrality, they are frequently under-tested or misunderstood. This stems not just from their legacy nature but from a skills gap and a tendency to treat them as untouchable black boxes. In this article, we take a deep dive into Mainframe security testing and best practices with our Security Consultant & Global Mainframe Lead, James Boorman.

Intro to mainframes

When it comes to mainframes, there’s a fair amount of confusion, even within technical teams. Strictly speaking, it refers to systems like IBM Z/OS, Unisys ClearPath, or certain Fujitsu models. Then there are others, like HPE NonStop, which aren’t technically mainframes but behave similarly enough that many professionals lump them in. So whether you’re calling it a mainframe or a mid-range system with mainframe DNA, the security challenges, operational behaviours, and strategic importance remain largely the same.

These applications are built up over time, often older than the people maintaining them, and are highly interconnected to comply with strict regulatory demands. What appears simple on the surface involves numerous systems working together to review customer information, verify transactions, and manage accounts.

Common attack paths in mainframe environments

Mainframes might seem like fortress-grade systems, but attackers know where to look. Most vulnerabilities stem from access control misconfigurations, making privilege escalation and unintended data exposure real concerns. Let’s break down the most common and practical attack paths.

Most vulnerabilities stem from access control misconfigurations.

  1. Surrogate chaining

One of the standout techniques is surrogate chaining, where an attacker abuses job submission permissions to run code under higher-privileged user IDs. If User A can act as User B, and User B can act as User C, then User A effectively gains access to User C’s privileges, even without direct rights. These chains grow in complexity and can be exploited if not carefully managed, especially in large systems with sprawling permission structures.

  1. Application breakouts

In online applications, attackers often look for a way to break out of the app’s restricted environment. By triggering errors or manipulating inputs, they aim to infiltrate the underlying command environment, such as the CICS region, thereby gaining broader control than intended. From here, it’s about probing what access is possible at the platform level, even if they entered with a low-privilege account.

  1. Network-based entry points

Network-based attack paths exist but are less common in real-world testing. Gaining entry without credentials or direct access is tough, given that most controls live inside the mainframe. Still, misconfigured services or stolen credentials via phishing can open doors, and these shouldn’t be overlooked.

  1. Permission hopping via shared jobs or datasets

Mainframe security often revolves around resource protection and access assignments. If an attacker can alter a job that runs under another user’s credentials, they inherit that user’s access. Misconfigured datasets, improperly secured USS (Unix System Services), and over-permissioned profiles can make these jumps surprisingly easy.

The bottom line? Mainframe security is less about flashy exploits and more about navigating complex webs of access. System owners must stay vigilant, regularly review permission hierarchies, and test not just individual applications, but how they connect across the environment. Because with thousands of users and highly interconnected apps, one weak link is all it takes.

Mainframe security testing: platform vs. application

Security testing for mainframes typically happens on two levels:

  • Platform-level reviews: These are broad assessments, similar to a build review. They target the overall platform, checking subsystems and mainframe-specific controls. However, these massive systems with potentially hundreds of applications and thousands or tens of thousands of databases and tables, going deep isn’t practical. Understanding why each person has access to every given table would take mind-numbingly long. Instead, the assessment focuses on the wider, more global configuration of the operating system and its subsystems, identifying systemic misconfigurations and weaknesses at the cost of specificity.
  • Application-level reviews: These are more focused and context-rich. Testers examine specific apps, like a payment review tool, along with the data it uses, who can access it, and how it behaves. This approach is typically suitable for assurance-driven compliance and provides insight into the security posture of the specific application and its resources. However, since mainframe applications rarely exist in isolation, this method misses both the wider solution and bigger system picture – especially when applications rely heavily on each other.

The gap? Strategic middle-ground assessments. These look at clusters of related applications, say, 40 systems that make up a payment engine, to understand how data flows between them, where interdependencies lie, and how attackers might move laterally. It’s a space where purple teaming or attack path mapping can add real value, combining offensive and defensive testing to simulate realistic threats across complex environments.

Why mainframes need holistic security testing

Mainframe systems often sit at the heart of an organization’s infrastructure, silently powering critical operations behind the scenes. Yet despite their centrality, they are frequently undertested or misunderstood. This stems not just from their legacy nature but from a skills gap and a tendency to treat them as untouchable black boxes.

The reality of mainframe testing

Mainframe security testing isn’t commonly practiced; it’s a niche skill set, and few professionals specialize in it. In many cases, organizations claim to test mainframes but restrict those efforts to unauthenticated network scanning. When services don’t respond or appear hardened, teams often walk away satisfied. The result? Superficial assessments that miss deeper, systemic vulnerabilities.

Organizations claim to test mainframes but restrict those efforts to unauthenticated network scanning.

Why a combined platform + application approach matters

The organizations that truly engage in effective mainframe testing take a more holistic approach. Rather than treating the system as a monolith, they separate platform-level configuration and application-level security. This dual perspective provides:

  • Broad insight into how infrastructure-wide settings affect individual components.
  • Granular visibility into application-specific risks, such as poorly configured authentication or insecure data storage.
  • Coverage of legacy weaknesses, which often arise from outdated assumptions about how the systems work.

Testing both dimensions helps uncover attack paths that stem from decades-old code and documentation gaps, things that traditional audits would miss.

Testing frequency: Compliance vs. proactivity

Security testing cycles vary widely. Some enterprises only test mainframes to meet compliance thresholds, annually or once every few years, depending on system criticality. Others take a more proactive stance, building regular mainframe testing into their operational security roadmap.

Yet some organizations still neglect to perform routine, comprehensive mainframe testing. These systems, perceived as inaccessible, legacy-bound, or simply misunderstood, may not always be included in modern testing pipelines. The truth is, their complexity and obscurity increase risk, not reduce it.

Common misconceptions & hidden risks

One recurring theme in mainframe testing is the prevalence of assumptions:

  • “The application passwords are encrypted” until closer inspection reveals they’re stored in plaintext.
  • “Application IDs can only be used by one user” until a tester successfully impersonates another user.
  • “No one can access this system” until someone tries… and succeeds.

These myths persist because documentation is missing, incomplete, or lost to time. Knowledge silos and shrinking subject matter expertise compound the problem.

Mainframe security: Best practices that make a difference

Despite their age, complexity, and niche status, mainframes continue to power critical infrastructure across industries. Their enduring presence doesn’t mean they’re impervious, but it does mean they require a tailored approach to security. Here’s what organizations should prioritize when safeguarding these systems.

1. Follow vendor guidance, and actually apply it 

Leading vendors, such as IBM, provide extensive best-practice documentation for securing their platforms and subsystems. These resources offer practical architecture guidelines and proven security configurations. However, these recommendations are too often overlooked or partially implemented. The first security win is simply making sure they’re properly followed. 

2. Perform regular, context-aware security testing 

Mainframes shouldn’t be tested once every few years just to tick a compliance box. Security assessments must be recurring and adapted to each organization’s setup. Testing should span: 

  • Platform-wide evaluations to assess overall configuration and access control 
  • Application-level reviews that uncover business logic flaws and interface risks 
  • Strategic security audits tailored to the organization’s architecture and threat landscape 

No two mainframe environments are identical, even if they run similar software. Testing must reflect how each system is uniquely deployed and configured. 

3. Configure the security subsystem correctly 

Mainframes use internal security subsystems to enforce access controls and define privilege boundaries. Misconfigurations here can undermine the entire platform. Organizations should: 

  • Audit subsystem roles, groups, and policies regularly 
  • Enforce least privilege across users and services 
  • Ensure segregation between test and production environments 

Mainframes’ enduring presence doesn’t mean they’re impervious, but it does mean they require a tailored approach to security.

 4. Lock down network access with firewalls & segmentation 

Production mainframes should be isolated from internal networks wherever possible. Strict firewalling and zone segregation prevent lateral movement and unauthorized access. Air-gapped designs or tightly controlled access pathways are ideal for critical workloads. 

5. Monitor & audit with intelligence 

Basic logging isn’t enough. Security teams should implement intelligent monitoring, with alerts tied to behavioural anomalies. For example: 

  • A Db2 table accessed at odd hours 
  • Unexpected admin login from a new source 
  • Privilege escalation attempts outside approved workflows 

Correlating these events across logs can surface hidden attack paths or misconfigurations. 

6. Modernize authentication & access management 

Many mainframes still rely on legacy password practices, think eight-character passwords with low complexity. But modern solution integrations do exist and should be adopted: 

  • Enable multi-factor authentication (MFA) 
  • Use privileged access management tools 
  • Implement just-in-time access for high-privilege accounts 
  • Rotate sensitive credentials on schedule or per-use 

Security maturity here is uneven across organizations; modernization is low-effort with high payoff. 

7. Encrypt sensitive data at rest 

A surprising number of legacy systems still store sensitive data in plain text. It’s essential to apply modern encryption standards wherever sensitive data resides: 

  • Within databases and application tables 
  • In archival or regulatory storage 
  • Across communication channels (avoid legacy SSL/TLS versions) 

Data at rest encryption protects against insider threats and improperly scoped access. 

Industry-specific mainframe security practices 

Mainframes are versatile systems, deeply embedded across sectors like finance, healthcare, aviation, insurance, and government. Even in the same industry, they’re configured and deployed very differently. While general security principles apply, certain industry contexts require more tailored approaches and awareness of regulatory frameworks. 

Financial institutions: More than just compliance 

In banking and finance, mainframes often handle core transaction processing and customer data. These systems may fall under compliance mandates such as PCI DSS when dealing with cardholder data. While PCI DSS provides a checklist-style framework for testing and encryption requirements, organizations shouldn’t default to minimum effort: 

  • Don’t just “tick the box.” Treat PCI DSS as a baseline, not a ceiling. 
  • Customize testing to reflect actual platform risk, not just what regulators expect. 
  • Ensure sensitive data like identifiers, credentials, and financial records are encrypted at rest and in transit. 

It’s easy for legacy systems to be excluded from modern compliance pipelines simply because they’re “hard to test.” But complexity isn’t an excuse; it’s a red flag. 

Aviation: Legacy systems that still fly 

Airlines are known to rely heavily on mainframes for reservations, departure control, and crew scheduling. Despite their dependence, these systems are often considered untouchable. Though testing access might be challenging, aviation operators should prioritize: 

  • Network segregation of operational systems from passenger-facing applications 
  • Monitoring of batch jobs and scheduling access to detect potential abuse 
  • Updating authentication mechanisms across reservation platforms integrated with legacy backends 

While not bound by PCI DSS across the board, aviation companies still handle sensitive data, such as passport numbers, personal information, and payment details, and must adhere to international privacy laws and standards. 

Cross-industry takeaway: Avoid minimum viable testing 

Whether regulated or not, every industry faces this decision: do we do what’s required, or what’s responsible? The best practice is simple but vital, don’t settle for the bare minimum. Legacy systems are just as critical as flashy new apps, often more so. Their opacity, age, and integration with newer platforms make them uniquely valuable and uniquely vulnerable. 

Organizations should: 

  • Treat mainframes as active components of modern architecture 
  • Test regularly and purposefully, beyond compliance thresholds 
  • Integrate mainframe security into enterprise-wide strategies 

How Reversec can help  

Our mainframe security assessment provides a holistic view of your applications, ensuring end-to-end coverage that includes applications, network presence, operating system controls, and other interactions within the wider environment. This comprehensive approach enables you to apply layered security controls and significantly reduce your attack surface. 

Blending scarce mainframe security skills with modern testing techniques, we tailor each assessment to your specific needs, providing maximum coverage and assurance. Our team also draws from industry-specific experience gained through work with finance, aviation, healthcare, and government clients. 

Mainframe Security

Mainframe Security

Read more

Top 5 common
misconfigurations in
cloud environments

Many organizations believe that once their cloud environment is configured, their job is done. But this “set it and forget it” mentality can lead to major security vulnerabilities. Cloud environments are dynamic and require regular updates and reviews – to adapt to new threats and changes. Not staying on top of your cloud configurations can leave you with outdated security measures and put you at increased risk.

Another common misconception is that an initial entry point breach isn’t likely to happen to you. Organizations often set up security controls to restrict user logins and limit access to development environments. This in turn results in these controls being often treated as a complete security boundary that cannot be crossed. However, that is not entirely true. Attackers are becoming increasingly sophisticated, which is why it’s crucial to operate under the assumption that a breach can, and eventually will, happen to you. This mindset will help you evaluate the potential impact of a breach and implement measures to minimize damage.  

Ready to see what we typically encounter in our engagements? Let’s dive in! 

It’s crucial to operate under the assumption that a breach can, and eventually will, happen to you.

#1 Excessive privileges

One of the most common problems in cloud environments is the granting of excessive privileges to users and machine accounts. This misconfiguration can lead to privilege escalation, where a malicious actor gains higher-level access than intended. Cloud providers offer thousands of granular permissions, making it hard for administrators to fully understand the potential impact of each one.

For example, an application might only need permissions to read and write in a specific storage bucket or database resource but it might be granted additional, unnecessary permissions, increasing the risk if the account is compromised. The solution involves carefully reviewing and limiting permissions to the minimum level necessary for each role. This process can be time-consuming, especially for larger organizations, but it’s key to maintaining security. 

Here are some practical ways to limit excessive privileges

  1. Segregation of organizational workloads:
    Where possible, different applications and the resources required by them should be provisioned into their own granular unit (AWS account, Azure subscription or GCP project, etc.). This would provide a strong control to minimize the impact of excessive permissions and limit the impact of a compromised identity.

  2. Review current permissions:
    Regularly audit the permissions granted to users and machine accounts. Identify and remove any unnecessary privileges.

  3. Implement role-based access control (RBAC):
    Use RBAC to assign group permissions based on roles within an environment and the level of access appropriate to that workload. This helps ensure that users only have access to what they need.

  4. Use temporary credentials:
    Where possible, use temporary credentials that expire after a certain period. This reduces the risk of long-term exposure.

  5. Automate where possible:
    Utilize automation tools to help manage and review permissions. While some manual review is always necessary, automation can significantly reduce your workload.

#2 Poor secrets management

Another common misconfiguration that impacts organizations of all sizes is poor secrets management. This issue often leads to privileged escalation opportunities within a cloud environment. With the widespread adoption of cloud infrastructure, ClickOps is just no longer practical, especially for organizations running multiple projects. As a result, many companies have turned to infrastructure as code (or IaC) to manage their cloud environments. 

IaC allows you to define your company’s cloud infrastructure using templates in a human-readable code format. These templates define how your cloud environment should look and how individual resources should be configured. But while this approach does simplify the management of large estates, it can also introduce poor development practices, such as storing credentials or secrets in plain text within these templates. 

The risks of poor secrets management

Storing secrets in plain text is a major security risk. Any user with access to these files can potentially use the exposed credentials to perform unauthorized actions on resources they typically wouldn’t have access to. For example, if a storage bucket’s credentials are stored in plain text, an attacker could gain access to sensitive data stored in that bucket. Similarly, hard-coded credentials could allow unauthorized access to other applications or services, leading to data breaches or other security incidents.

Follow these steps to improve secrets management

  1. Use secure key storage:
    Store secrets in secure key management services or vaults. These services provide a secure way to manage and access secrets without exposing them in plain text.

  2. Reference secrets securely:
    Instead of hard-coding secrets into templates, reference them from secure storage. This way, the actual secrets are fetched dynamically during execution, reducing the risk of exposure.

  3. Regular audits:
    Conduct regular audits of your IaC templates and other deployment scripts to ensure no secrets are stored in plain text. Automated tools can help you identify and remediate these issues.

  4. Educate developers:
    Make sure your developers understand the importance of secure secrets management and follow best practices when writing IaC templates. 

#3 Long-lived credentials

Closely related to poor secrets management is the issue of long-lived credentials. While not inherently a vulnerability, long-lived credentials can pose significant security risks if not managed properly. These credentials, which have long expiry windows, can be exposed in various ways, such as being accidentally committed to public code repositories, left unprotected in deployment templates or more generally insecurely handled by human users. 

Take these practical steps to mitigate the risks of long-lived credentials

  1. Avoid storing credentials in plain text:
    As with secrets, avoid storing long-lived credentials in plain text. Use secure storage solutions instead.
  1. Use short-term credentials:
    Where possible, use short-term credentials that expire after a short period. For example, AWS recommends using temporary credentials gained from a federated identity provider instead of using long-lived IAM user credentials.
  1. Regularly rotate credentials:
    Implement a policy to regularly rotate machine credentials to minimize the risk of exposure.
  1. Monitor and audit:
    Continuously monitor and audit the use of credentials to detect any unauthorized access or anomalies.

#4 Exposure of public endpoints

Cloud providers design their services to be user-friendly and easily manageable, often more so than traditional on-premise infrastructure. But this convenience comes with its own set of risks. One of the biggest is that resources in the cloud often have public endpoints, making them accessible from anywhere in the world

The risks of public endpoints

Public endpoints let users interact with cloud resources directly via the Internet. While this is convenient for management purposes, it also means that if an attacker gains access to your environment, they can do whatever they want from wherever they may be. For example, if an attacker compromises your account, they could log in from another country and make changes to your resources without any geographical restrictions. 

This issue is particularly prevalent in Azure, where public endpoints are often the default configuration for many resources. However, many organizations don’t need these endpoints to be public and would benefit greatly from restricting access.

Here are some simple tips on how to manage public endpoints

  1. Restrict access with IP Allow Lists:
    Configure endpoints to only be accessible from specific IP addresses. That way you can control who can access your resources based on their IP location.
  1. Use private endpoints:
    Where possible, make endpoints private so that they are only accessible from within your cloud environment. This adds an additional layer of security by limiting external access.
  1. Regular audits:
    Conduct regular audits of your cloud environment to identify and secure any public endpoints that don’t need to be exposed.
  1. Assume compromise:
    Work under the assumption that your environment could be compromised. By restricting public endpoints, you’ll limit the potential damage an attacker can do if they gain access.

    Here’s an example
    Imagine a storage bucket that contains sensitive data. If this bucket has a public endpoint, anyone with the right credentials can access it from anywhere. By restricting access to specific IP addresses or making the endpoint private, you’ll ensure that only authorized users within your network can interact with that data. This will greatly reduce the risk of unauthorized access and data breaches. 

#5 Overly exposed internal networking

The last cloud misconfiguration on our list is overly exposed internal networking. This issue has been around for years, even in traditional on-premise networks. Organizations do often focus heavily on protecting their environments from external threats with robust perimeter defenses. But once an attacker gains a foothold inside your network, most of those internal protections aren’t enough to keep you safe

The risks of overly exposed internal networking

In many cloud environments, internal network segmentation isn’t implemented to the level that it should be. Engineers might allow certain IP ranges to communicate freely for testing purposes, or resources might be deployed in a flat network structure. This lack of internal segmentation means that if an attacker compromises one resource, they can potentially move laterally within the network, accessing other resources that should be isolated. 

For example, if an attacker compromises a virtual machine (VM) in the cloud, they could use it as a stepping stone to reach other sensitive applications or data. Without proper internal network segmentation, an attacker can move freely, increasing the risk of a significant breach.

Follow these steps to improve your internal network segmentation

  1. Implement network segmentation:
    Divide your cloud environment into smaller, isolated segments. Make sure that only necessary communication is allowed between these segments.
  1. Use security groups and network ACLs:
    Use security groups and network access control lists (ACLs) to define and enforce network traffic rules. This will help you restrict unnecessary communication between resources.
  1. Regularly review and update your network policies:
    Conduct regular reviews of your network policies to make sure that they are up-to-date and reflect the current needs of your environment. Remove any unnecessary permissions or access.
  1. Keep an assume breach mindset:
    Work under the assumption that a breach is possible. Evaluate the potential impact of a compromised resource and implement controls to minimize the blast radius.

    Here’s an example
    Consider an application that only needs to communicate with a database and a few other services. By segmenting the network and restricting the application’s communication to only these necessary services, you’ll be reducing the risk of lateral movement by an attacker. If the application is compromised, the attacker would have limited access and would be unable to reach other resources.

You don’t have to do it alone – Reversec can help

As the digital landscape evolves, avoiding these top 5 misconfigurations in the cloud will help your organization stay safe in an increasingly complicated and threat-laden world.

Here’s how we can help.

Assumed breach assessments
Think it can’t happen to you? Think again. We help organizations conduct assumed breach assessments by simulating attacks as specific user personas within your company. It’s a strategic service designed to identify potential actions from that user’s perspective and addresses vulnerabilities. It helps you identify issues and resolve them comprehensively – before a real breach happens.  

Collaborative exercises
Staying on top of security – together. Collaborative exercises with a security consultancy like Reversec are invaluable to keeping your organization and sensitive data safe. These exercises offer fresh perspectives, helping you to identify and fix any misconfiguration issues you may have. Initial assumptions made during the setup of a cloud environment may no longer be relevant as services and features change. Regular reviews and updates are key to maintaining a secure environment.

The value of fresh perspectives
Having a fresh pair of eyes review your cloud environment can reveal things your team may have missed. Our security consultants can help you understand why certain configurations were set up in the first place and, when needed, propose more secure alternatives. Together, we’ll ensure that your environment is always secure and up-to-date with the latest best practices.

Challenging assumptions
Cloud environments are complex, and it’s crucial to question assumptions about service operations. Organizations should continuously review and understand the exact risks posed by their cloud configurations. Through our research-driven approach, we help companies in identifying the gap between what cloud providers claim is possible and what is practically achievable – essentially, the difference between theoretical and practical functionality. 

Misconfigurations in cloud environments often occur due to misunderstood nuances to how cloud resources operate in any provider and due to a lack of regular holistic auditing of the deployed resources.

By assuming that breaches are possible, evaluating user access, conducting assumed breach assessments, and collaborating with security experts, you can significantly enhance your organization’s cloud security posture. This proactive approach will not only secure your environment but also make it more resilient against potential attacks.

If you would like to hear more about how we can help you protect your cloud environment, don’t hesitate to contact us!

Our Cloud Security experts are happy to help.

Related content

Our thinking

How to run successful Kubernetes attack simulations?

February 21, 2025
How to run successful Kubernetes attack simulations?

CloudWatch Dashboard (Over)Sharing

Generative AI security: Findings from our research

At Reversec, we distinguish ourselves through our research-led approach. Our consultants dedicate part of their time to research – exploring the security of emerging technologies, such as GenAI. This approach allows us to stay ahead of new threats and provide real-world, actionable insights to our clients.  

Our focus on GenAI security stems from the increasing adoption of large language models (LLMs) – like ChatGPT and Google Gemini – in enterprise environments. These systems, while powerful, introduce new and complex security risks, particularly around prompt injection attacks. 

With our ongoing research on GenAI, we are looking to continuously deepen the understanding and raise awareness of these vulnerabilities to help organizations defend against potential exploits and ensure that they can safely leverage AI technologies.

Explore our GenAI security
research and experiments

Spikee – Simple Prompt Injection Kit for Evaluation and Exploitation

Spikee is an open-source toolkit that we created to help security practitioners and developers test LLM applications for prompt injection attacks that can lead to exploitation – examples of which include data exfiltration, XSS (Cross-Site Scripting), and resource exhaustion. A key feature of spikee is that it makes it easy to create custom datasets tailored to specific use cases, rather than flooding applications with many generic jailbreak attacks.


Check out the code: https://github.com/WithSecureLabs/spikee
Tutorial: https://labs.withsecure.com/tools/spikee

Visit spikee’s website
Multi-Chain Prompt Injection Attacks

We introduce multi-chain prompt injection, an exploitation technique targeting applications that chain multiple LLM calls to process and refine tasks sequentially.

Current testing methods for jailbreak and prompt injection vulnerabilities fall short in multi-chain scenarios, where queries are rewritten, passed through plugins, and formatted (e.g., XML/JSON), obscuring attack success. Multi-chain prompt injection exploits interactions between chains, bypassing intermediate processing and propagating adversarial prompts to achieve malicious objectives.

Explore the technique: Try our sample app or the related Colab Notebook. We also have a public CTF challenge to experiment hands-on.

Read on Reversec Labs
Prompt Injection in JetBrains Rider AI Assistant

This advisory explore security issues that could arise when using GenAI assistants integrated within software development IDEs.

Specifically, we demonstrate how prompt injection injection can be leveraged by an attacker to exfiltrate confidential information from a development environment. If an untrusted code snippet containing a malicious payload is passed to the AI Assistant (for example, when asking for an explanation), the injected instructions will be executed by the underlying LLM.  

We also provide recommendations to developers of such GenAI code assistants on how to mitigate these issues and reduce their impact.

Read on Reversec Labs
Should you let ChatGPT control your browser?

This research investigates the security risks of granting LLMs control over web browsers.  

We explore scenarios where attackers exploit prompt injection vulnerabilities to hijack browser agents, leading to sensitive data exfiltration or unauthorized actions, such as merging malicious code into repositories.  

Our findings are relevant to developers and organizations that integrate LLMs into browser environments, highlighting the critical security measures needed to protect users and systems from such attacks.

Read on Reversec Labs
Fine-tuning LLMs to resist indirect prompt injection attacks

We fine-tuned Llama3-8B to enhance its resilience against indirect prompt injection attacks, where malicious inputs can manipulate a language model’s behavior in tasks like summarizing emails or processing documents.

Building on insights from Microsoft and OpenAI, our approach focused on using specialized markers to separate trusted instructions from user data. This exploration allowed us to understand the effectiveness of these methods in real-world scenarios.

Organizations and developers can access our model and fine-tuning scripts on Hugging Face and Ollama to test and extend our findings.

Read on Reversec Labs
When your AI assistant has an evil twin (Google Gemini prompt injection)

In this piece of research, we demonstrate how Google Gemini Advanced can be manipulated through prompt injection attacks to perform social engineering. By sending a single email to the target user, attackers can trick LLMs into misleading users into revealing sensitive information.  

This research is crucial for enterprises using LLMs in environments that handle confidential data, as it exposes how easily attackers can compromise the integrity of the AI’s output.  

We also discuss the current limitations in defending against these attacks and emphasize the need for caution when using AI-powered assistants. 

Read on Reversec Labs
Synthetic recollections (prompt injection in ReAct agents)

This research focuses on how prompt injection can be used to manipulate ReAct (Reason and Act) LLM agents by injecting false observations or commands. Such attacks can lead the LLM to make incorrect decisions or take unintended actions, which could benefit attackers.  

This type of vulnerability is particularly concerning for organizations using LLM agents in decision-making processes.  

We outline two main categories of attacks, discuss their potential impact on operations, and suggest strategies to mitigate the risks associated with these manipulations. 

Read on Reversec Labs
Domain-specific prompt injection detection with BERT classifier

We explore how domain-specific prompt injection attacks can be detected using a fine-tuned BERT classifier. This research provides a practical method for identifying malicious prompts by training a small model on domain-specific data, allowing it to distinguish between legitimate and malicious inputs.

It’s aimed at security teams and developers working with AI models who need reliable mechanisms to detect prompt injections.

We present our findings on the effectiveness of this approach and how it can be implemented in various environments. 

Read on Reversec Labs
LLM Application Security Canavas

This one-page canvas offers a structured framework of security controls designed to protect LLM applications from prompt injection and jailbreak attacks.  

Based on our client engagements, the canvas outlines the essential steps needed to secure inputs and outputs in LLM workflows.  

It’s a practical tool for developers and security teams looking to safeguard their AI systems by implementing tested controls that reduce the likelihood of prompt-based attacks and mitigate their impact.

Download
Generative AI – An Attacker’s View

This blog delves into the use of GenAI (Generative Artificial Intelligence) by threat actors and what we can do to defend ourselves.

Topics covered include: social engineering, phishing, recon and malware generation.

Read on Reversec Labs

GenAI security challenges

To foster hands-on learning, we’ve released two public Capture the Flag (CTF) challenges focused on LLM security.  

The challenges are designed for security researchers and practitioners who want practical experience with LLM security issues.

MyLLMBank

This challenge allows you to experiment with jailbreaks/prompt injection against LLM chat agents that use ReAct to call tools.

MyLLMDoc

This is an advanced challenge focusing on multi-chain prompt injection scenarios.

Related content

Generative AI Security simple

April 27, 2025
Generative AI Security simple
Our thinking

(Deep)Learning to trust

December 24, 2021
(Deep)Learning to trust
Webinars

Building secure LLM apps into your business

April 11, 2024
Building secure LLM apps into your business
Our thinking

Prompt injections could confuse AI-powered agents

May 17, 2024
Prompt injections could confuse AI-powered agents

Prompt injections could confuse AI-powered agents

Everyone knows about SQL injections, but what about prompt injections? What do they mean for AI?

AI in general, and large language models (LLMs) in particular, have exploded in popularity. And this momentum is likely to continue or even increase as companies look into using LLMs to power AI applications capable of interacting with real people and take actions that affect the world around them.

Reversec’s Donato Capitella, a security consultant and researcher, wanted to explore how attackers could potentially compromise these agents. And prompt injection techniques gave him his answer.

The Echoes of SQL Vulnerabilities

Prompt injection techniques are specifically crafted inputs that attackers feed to LLMs as part of a prompt to manipulate responses. In a sense, they’re similar to the SQL injections that attackers have been using to attack databases for years. Injections are basically just commands attackers input to a vulnerable, external system.

Whereas SQL injections affect databases, prompt injections impact LLMs. Sometimes, a successful prompt injection might not have much of an impact. As Donato points out in his research (available here), in situations where the LLM is isolated from other users or systems, an injection probably won’t be able to do much damage.

However, companies aren’t building LLM applications to work in isolation, so they should understand what risks they’re exposing themselves if they neglect to secure these AI deployments.

ReAct Agents

One potential innovation where LLMs could play a key role is in the creation of AI agents—or ReAct (reasoning plus action) agents if you want to be specific. These agents are essentially programs that use LLMs (like GPT-4) to accept input, and then use logical reasoning to decide and execute a specific course of action according to its programming.

The way these agents use reasoning to make decisions involves a thought/observation loop. Specifics are available in Donato’s research on Reversec Labs (we highly recommend reading it for a more detailed explanation). Basically, the agent provides thoughts it has about a particular prompt it’s been given. That output is then checked to see if it contains an action that requires the agent to access a particular tool it’s programmed to use.

If the thought requires the agent to take an action, the result of the action becomes an observation. The observation is then incorporated into the output, which is then fed back into the thought/observation loop and repeated until the agent has addressed the initial prompt from the user.

To illustrate this process, and learn how to compromise it, Donato created a chatbot for a fictional book-selling website that can help customers request information on recent orders or ask for refunds.

Prompt injections reduce AI to confused deputies

The chatbot, powered by GPT-4, could access order data for users and determine refund eligibility for orders that were not delivered within the website’s two-week delivery timeframe (as per its policy).

Donato found that he could use several different prompt injection techniques to trick the agent into processing refunds for orders that should have been ineligible. Specifics are available in his blog, but he essentially tricked the agent into thinking that it had already checked for information from its system that he actually provided to it via prompts—information like fake order dates. Since the agent thought it recalled the fake dates from the appropriate system (rather than via Donato’s prompts), it didn’t realize the information was fake, and that it was being tricked.

Here’s a video showing one of the techniques Donato used:

Securing AI agents

Pointing to the work from the OWASP Top Ten for LLMs, Donato’s research identifies several ways an attacker could compromise an LLM ReAct agent. And while it’s a proof-of-concept, it does illustrate the kind of work that organizations need to do to secure these types of AI applications, and what the cyber security industry is doing to help.

There’s two distinct yet related mitigation strategies.

The first is to limit the potential damage a successful injection attack can cause. Specific recommendations based on Donato’s research include:

  • Enforcing stringent privilege controls to ensure LLMs can access only the essentials, minimizing potential breach points.
  • Incorporating human oversight for critical operations to add a layer of validation, acting as a safeguard against unintended LLM actions.
  • Adopting solutions such as OpenAI Chat Markup Language (ChatML) that attempt to segregate genuine user prompts from other content. These are not perfect but diminish the influence of external or manipulated inputs.
  • Treating the LLMs as untrusted, always maintaining external control in decision-making and being vigilant of potentially untrustworthy LLM responses.

The second is to secure any tools or systems that the agent may have access to, as compromises in those will inevitably lead to the agent making bad decisions—possibly in service of an attacker.

Read more research articles on securing AI agents on our Labs:

Related content

Webinars

Building secure LLM apps into your business

April 11, 2024
Building secure LLM apps into your business
Our thinking

Striking the balance – EU AI Act and its impact on cybersecurity

April 16, 2024
Striking the balance – EU AI Act and its impact on cybersecurity
Our thinking

What is Attack Path Mapping

April 17, 2024
What is Attack Path Mapping