Threat modeling is an exercise in attacker empathy. You build a profile of who would want to attack you, what they know, what access they have, and what they’re trying to accomplish. Most enterprise threat models describe attackers in familiar categories: nation-state actors, organized crime, opportunistic script kiddies.
A new profile is forming at the edges of those categories that most organizations haven’t built into their models yet. Call them the ML-native attacker. They’re fluent in PyTorch. They understand gradient flow. They know how model training pipelines are structured because they’ve built them. And increasingly, the threat model for the attacks described in emerging AI security research - including work like the Phantom framework for Structural Template Injection - maps directly to this profile.
More specifically, it maps to an insider threat.
What Makes an Attack “ML-Native”
Traditional cyberattacks follow a recognizable pattern: find a vulnerability, exploit it, escalate privileges, achieve an objective. The tooling is mature. The attacker needs to understand systems and protocols, but not necessarily the mathematical machinery underneath.
ML-native attacks are different. Consider what it takes to execute a backdoor poisoning attack on a production model. The attacker must:
- Understand how training data flows through a preprocessing pipeline and where to inject without triggering validation checks
- Choose a trigger pattern that survives data augmentation and model regularization
- Verify that the backdoor persists through fine-tuning and model updates
- Ensure the manipulated model produces normal outputs on clean inputs - the poisoning must be imperceptible
This isn’t script-kiddie territory. Learnable, Imperceptible and Robust Backdoor Attacks (LIRA) - a published technique - frames the problem as a constrained optimization challenge. The attacker is, functionally, running an ML experiment. They need to iterate, evaluate loss functions, and tune hyperparameters.
The Phantom framework for Structural Template Injection goes further. It uses a Template Autoencoder to map discrete prompt structures into a continuous latent space, then applies Bayesian optimization to efficiently find adversarial vectors that hijack LLM agent behavior. This is applied machine learning research weaponized as an attack tool.
The skill floor here is a working knowledge of ML optimization methods, LLM agent architectures, and retrieval-augmented generation systems. That profile has a job title: machine learning engineer.
Why Insider Access Changes Everything
The ML-native attacker doesn’t need to breach your perimeter if they’re already inside it. And the access that data scientists and ML engineers legitimately hold is extraordinarily powerful.
A data scientist at a typical enterprise AI team may have:
- Direct write access to training data repositories
- The ability to commit changes to model training scripts
- Access to model weights and the infrastructure used to store and deploy them
- Permissions to run training jobs that push updated models to production
Contrast this with the access a compromised external account typically provides after exploitation. The insider starts ahead.
A 2025 analysis of AI supply chain incidents found that 83% of organizations were unaware of the insider threat dimension to their ML pipelines, and a separate study estimated $12 billion in losses from compromised machine learning models that year - with insider-enabled tampering as a significant vector. A spot check of the top 1,000 most-downloaded models on public repositories found that 23% had been compromised at some point.
The attack doesn’t require “hacking” in any conventional sense. It requires access and knowledge - both of which legitimate ML team members have by definition.
The Threat Model
When you apply structured threat modeling to this attacker, the picture that emerges is specific. MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) provides the most directly applicable framework. Unlike MITRE ATT&CK, which covers traditional system-level threats, ATLAS catalogs tactics and techniques specific to ML systems: training data poisoning, model backdooring, membership inference, model inversion, and supply chain compromise of ML components.
The ML-native insider threat actor in ATLAS terms might pursue:
Training Data Poisoning (AML.T0020): Injecting manipulated samples that cause the model to learn incorrect decision boundaries. The goal could be sabotage, bias introduction for discriminatory outcomes, or creating a hidden trigger for later exploitation.
Model Backdooring (AML.T0018): Introducing a hidden behavior activated by a specific input pattern. A model that performs normally on evaluation but routes specific transactions, classifications, or outputs when a trigger is present.
Exfiltration via Model APIs (AML.T0043): Using legitimate query access to extract model weights or training data membership through carefully designed inference attacks - particularly relevant for proprietary models that represent significant intellectual property.
Supply Chain Compromise (AML.T0010): Introducing malicious dependencies into ML pipelines or pushing compromised model checkpoints to shared repositories used across teams.
The disgruntled ML engineer, the contractor nearing the end of an engagement, the researcher who joined with undisclosed intentions - each fits this profile. The motivation varies. The capability is the same.
The Detection Gap
Standard security tooling wasn’t designed to see these attacks. Data loss prevention tools look for exfiltration of files and credentials. SIEM rules look for lateral movement and privilege escalation. EDR catches known malware. None of these catch someone who slowly poisons training samples through legitimate commits, or embeds a backdoor in a model checkpoint uploaded to an internal repository.
The behavioral signals exist, but they’re subtle. Unusual patterns in training data commits. Model performance that degrades in specific, narrow circumstances. Statistical anomalies in output distributions that only appear under certain input conditions. Detecting them requires tooling and expertise at the intersection of ML engineering and security operations - a combination that almost no enterprise security team currently has.
NIST AI RMF’s GOVERN and MEASURE functions point toward the organizational controls needed: documented accountability for training data integrity, model lineage tracking, and evaluation protocols designed to detect behavioral anomalies. But implementation is lagging behind the threat.
What Defensive Posture Looks Like
Addressing the ML-native insider threat requires controls at multiple levels.
Access control on ML artifacts: Training data, model weights, and deployment pipelines should follow least-privilege principles with the same rigor applied to production databases. Who can commit to a training data repository? Who can push a model checkpoint to production? These access paths need to be explicit and audited.
Model lineage and versioning: Every model artifact that reaches production should have a documented provenance chain - what training data was used, what training code was run, who executed it, and when. This makes tampering auditable and creates forensic trail when something goes wrong.
Behavioral evaluation suites: Standard ML evaluation measures accuracy on held-out test sets. Security-aware evaluation adds adversarial test cases designed to detect backdoors and behavioral anomalies. Techniques like Neural Cleanse and STRIP can identify potential backdoor triggers in deployed models.
Separation of duties in the ML pipeline: The person who can commit to training data should not be the same person who can push a model to production without a second review. Applying standard software engineering controls - code review, approval gates, automated testing - to ML pipelines closes obvious gaps.
Monitoring model behavior in production: Beyond evaluation at deployment, production models should be monitored for output distribution drift, unusual input patterns, and performance anomalies. This is the equivalent of runtime application protection for ML systems.
Raising the Conversation With Your AI Teams
The organizational challenge here is partly cultural. Data science and security teams don’t typically share vocabulary, frameworks, or organizational reporting lines. Security teams don’t always understand ML pipelines. Data scientists don’t always think about their work through a security lens.
That gap is what makes this threat category effective. The attacker understands the ML environment deeply. The security team may not. Bridging that gap - through cross-functional threat modeling exercises, ATLAS-informed security reviews of ML pipelines, and clear ownership of model integrity - is where the real defensive leverage lies.
The Phantom research on Structural Template Injection is a useful forcing function for this conversation. When you read through that attack framework and sketch the attacker profile, you’re not describing an external adversary with sophisticated zero-days. You’re describing someone who could be on your team. That recognition is where effective threat modeling has to start.
If you’re deploying AI systems and want to work through what this threat model looks like for your specific environment, the sidebar contact form is the right place to start that conversation.