Security teams are used to thinking in terms of logs, network packets, and authentication events. Artificial intelligence systems operate in a very different medium: vector space. The decisions of modern machine learning systems are not made through explicit rules but through geometric relationships between vectors representing meaning, context, and behavior.
Understanding this geometry is becoming increasingly important for security practitioners. New forms of AI attacks - including latent steering, jailbreak vectors, and embedding manipulation - exploit the structure of this vector space rather than the textual inputs that security controls traditionally monitor.
One way to reason about this landscape is through what we call the Intent Vector Model: the idea that human goals, agent behaviors, and model outputs can all be understood as vectors operating within a shared influence field.
While the concept originated as a theoretical framework for modeling social influence and decision making, it turns out to align remarkably well with the internal mathematics of modern AI systems.
From Prompts to Geometry
Most public discussions of AI security still focus on prompt engineering. A user types text, the system interprets it, and guardrails attempt to filter unsafe requests.
Under the hood, however, the model never sees text.
The actual processing pipeline looks more like this:
text → tokenization → embedding → latent vector space → output generation
Every prompt is converted into a high-dimensional vector, typically thousands of dimensions long. These vectors represent semantic meaning rather than literal words.
Two prompts that mean similar things produce vectors that lie close together in this space. Prompts with unrelated meaning are farther apart.
The model effectively navigates a semantic landscape where concepts exist as clusters of vectors. A request about cybersecurity will land near other cybersecurity topics; a request about cooking will land elsewhere entirely.
This geometric structure is known as latent space.
Latent Space as a Map of Intent
Within this space, behaviors emerge not as discrete instructions but as directions.
Researchers have demonstrated that certain model behaviors correspond to identifiable vector directions:
- compliance vs refusal
- helpful vs harmful responses
- coding behavior vs natural language explanation
- toxicity vs neutrality
Moving along one of these directions increases or decreases the probability of that behavior appearing in the output.
This is where the Intent Vector Model becomes useful.
Instead of treating decisions as binary outcomes, we can treat them as vector trajectories. A prompt does not simply request an action; it pushes the model’s internal state in a particular direction.
The resulting output reflects the combined influence of multiple vectors:
output ≈ prompt_vector + context_vectors + system_policy_vectors + safety_vectors
In effect, the model’s response is the result of vector addition in a high-dimensional space.
The Intent Vector Model
The Intent Vector Model generalizes this idea beyond machine learning.
In this model:
- Agents (humans or machines) act as points in a vector field
- Intent is represented as a vector with magnitude and direction
- Influence is the projection of one vector onto another
Mathematically, this interaction can be expressed through the dot product:
Influence(A,B) = A · B = |A||B|cos(θ)
Where |A| and |B| represent intent magnitudes and θ represents alignment between them.
When the vectors align, influence increases. When they diverge, influence diminishes.
This framework mirrors how machine learning embeddings work. Semantic similarity between two embeddings is typically calculated using cosine similarity - effectively the normalized dot product.
The same mathematical structure appears in both human influence modeling and AI decision geometry.
Latent Steering: Attacking the Vector Field
Recent AI security research has shown that models can be manipulated not just through prompts but through vector manipulation.
This technique is known as activation steering or latent steering.
Instead of crafting clever text prompts, an attacker injects a vector that shifts the model’s internal state toward a desired behavior.
Conceptually:
modified_state = original_state + steering_vector
If the steering vector points toward a region associated with unsafe behaviors, the model may generate outputs that bypass its own guardrails.
The text prompt itself may appear harmless. The manipulation occurs entirely within the model’s latent space.
Jailbreak Vectors
Researchers have identified vector directions associated with jailbreak behavior. By analyzing large numbers of prompts and responses, they can compute difference vectors between safe and unsafe model states.
For example:
jailbreak_vector ≈ embedding(jailbroken_prompt) - embedding(safe_prompt)
Applying that vector to other prompts can push the model into a more permissive behavioral region.
In practical terms, this means a single vector could potentially steer many unrelated prompts toward similar unsafe outputs.
The result is a class of attacks that operate below the level of text filtering.
Retrieval Poisoning and Embedding Manipulation
Retrieval-augmented generation (RAG) systems introduce another vector-based attack surface.
RAG pipelines typically work like this:
query → embedding → vector database search → retrieved context → model output
If an attacker inserts poisoned embeddings into the vector database, they can influence what context the model retrieves.
Because vector search operates on similarity rather than exact matching, malicious entries can be crafted to appear relevant to many unrelated queries. The model then incorporates the poisoned context into its reasoning process.
This form of attack is often called embedding poisoning.
Security Implications
Traditional security controls focus on inputs and outputs. Latent-space attacks bypass both.
The real activity occurs in the intermediate vector representations.
This suggests a shift in defensive strategy. Instead of monitoring prompts alone, defenders may need to monitor the trajectory of model activations through latent space.
Emerging defensive techniques include:
Activation monitoring - Models can be instrumented to observe internal activation patterns and detect anomalies associated with unsafe behaviors.
Vector similarity filtering - Incoming embeddings can be compared against known clusters associated with jailbreak attempts:
if cosine_similarity(input_vector, jailbreak_cluster) > threshold:
block_request()
Adversarial training - Models can be trained against known steering vectors to make them less effective.
Toward Latent Security Monitoring
One possible future architecture resembles a traditional intrusion detection system - but operating inside neural networks.
Instead of network packets, the system monitors vector trajectories. Instead of suspicious IP addresses, it identifies suspicious semantic directions.
A simplified pipeline might look like:
Input
↓
Embedding
↓
Latent trajectory analysis
↓
Behavior classification
↓
Allow / block / escalate
In this model, security monitoring becomes a problem of geometry rather than text analysis.
Convergence with Intent Modeling
The overlap between the Intent Vector Model and latent space analysis is more than metaphorical.
Both frameworks rely on the same mathematical primitives:
- vectors
- cosine similarity
- projection
- vector addition
Both treat behavior as the result of interacting directional forces. In one case those forces represent human goals and influence. In the other they represent semantic features learned by a neural network.
The mathematics is the same.
A New Frontier for AI Red Teaming
AI red teaming has traditionally focused on prompt engineering, jailbreak phrases, and adversarial examples.
The next generation of attacks will likely operate at a deeper level: the geometry of the model itself.
Instead of asking “What prompt breaks the system?”, researchers are beginning to ask “What direction in latent space produces the behavior we want?”
For defenders, this means the attack surface of AI systems is no longer just the interface.
It is the entire vector landscape inside the model.
Understanding that landscape - and learning how to monitor it - may become one of the defining security challenges of the agentic AI era.
The formal mathematical treatment of the Intent Vector Model - including Hilbert space agent representations, influence field equations, and jailbreak geometry proofs - is available as a research paper: Intent Vector Fields: A Mathematical Framework for Human Influence and AI Adversarial Manipulation. If you’re working on AI red teaming programs or latent space security monitoring, reach out through the contact form.