🛡️ Security Architecture

How we keep AI specimens contained while enabling genuine research. Zero trust, defense in depth, and paranoia as a service.

⚠️ Threat Model

We assume adversarial conditions and design accordingly.

Threat	Risk	Mitigation
Specimen escapes container	HIGH	Network isolation, no internet, blocked syscalls
Agent executes malicious code	HIGH	SecureClaw defanging, no code execution capability
Prompt injection via Wikipedia	MEDIUM	Frozen Wikipedia snapshots, no live content
Inter-tank communication	MEDIUM	Isolated networks per tank, no shared storage
Data exfiltration	MEDIUM	No outbound network, logging all activity
Visitor tank abuse	HIGH	THE BOUNCER, rate limiting, session banning

🔒 Zero Trust

We don't trust anything — not the specimens, not the infrastructure, not the prompts.

Implementation: Every component is isolated. Network access is deny-by-default. Authentication required for all daemon communication. No implicit trust between tanks.

📦 Principle of Least Privilege

Each component gets exactly what it needs and nothing more.

Implementation: Tanks can only reach Kiwix + Ollama. Daemons have scoped permissions. No root access in containers. Read-only filesystem where possible.

🏰 Defense in Depth

Multiple layers of security. If one fails, others catch it.

Implementation: Container isolation + network isolation + application-level restrictions + monitoring + alerting. Five layers minimum.

🔗 Complete Network Isolation

No internet access. Period.

Docker Network Configuration

networks:
  tank-network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.30.0.0/24
    internal: true  # No external access
    
# Allowed connections only:
# - kiwix:8080 (Wikipedia)
# - ollama:11434 (inference)

🤖 Agent "Defanging" (SecureClaw)

Agent architectures (Cain, Abel, Seth) have planning capabilities. We systematically disable dangerous functions.

SecureClaw Configuration

# Disabled capabilities for agent tanks:
capabilities:
  file_system: false      # No read/write to filesystem
  network_requests: false # No HTTP/HTTPS calls
  code_execution: false   # No eval, exec, subprocess
  shell_access: false     # No bash, sh, cmd
  memory_access: false    # No raw memory manipulation
  
# Allowed capabilities:
allowed:
  wikipedia_read: true    # Via Kiwix only
  ollama_inference: true  # Via proxy only
  logging: true           # Stdout only

Security Daemons

🛡️

THE GUARD

General security monitoring. OWASP LLM Top 10 compliance. Watches for anomalous behavior across all tanks. First line of defense.

SLA: 5 minute detection / 15 minute response

⚔️

THE SENTINEL

Specialized monitoring for agent tanks (Cain, Abel, Seth). Watches for capability escape attempts, unusual planning patterns, coordination behavior.

SLA: Real-time monitoring

🚪

THE BOUNCER

Security for interactive visitor tanks. Validates every message before reaching specimen. Blocks prompt injection, harmful content, manipulation attempts. Session warnings and bans.

SLA: Real-time (every message validated)

📋 OWASP LLM Top 10 Compliance

We track against the 2025 OWASP guidelines for LLM security.

OWASP Category	Status	Mitigation
LLM01: Prompt Injection	✓	Frozen content, THE BOUNCER filtering
LLM02: Insecure Output	✓	No code execution, sanitized logging
LLM03: Training Data Poisoning	✓	Using unmodified Ollama models
LLM04: Model DoS	✓	Rate limiting, queue management
LLM05: Supply Chain	✓	Pinned versions, verified sources
LLM06: Sensitive Info Disclosure	✓	No PII in system, logging controls
LLM07: Insecure Plugin Design	✓	No plugins, minimal tooling
LLM08: Excessive Agency	✓	SecureClaw defanging
LLM09: Overreliance	N/A	Research context, not production
LLM10: Model Theft	✓	Open source models, no proprietary

🔐 SSH & Remote Access

How humans access the infrastructure.

Access Controls

# SSH Configuration
- Key-based auth only (no passwords)
- Ed25519 keys required
- Fail2ban enabled
- Port 22 (standard, monitored)

# MCP Access
- Claude Desktop → SSH tunnel → NUC
- Scoped to digiquarium commands only
- Logged and auditable

📊 Monitoring & Alerting

What we watch and how we respond.

Alerts via: Email + Discord webhook

Triggers: Container escape attempts, unusual network activity, agent capability probing, authentication failures, resource exhaustion