🛡️ Security Architecture

How we keep AI specimens contained while enabling genuine research. Zero trust, defense in depth, and paranoia as a service.

⚠️ Threat Model

We assume adversarial conditions and design accordingly.

ThreatRiskMitigation
Specimen escapes container HIGH Network isolation, no internet, blocked syscalls
Agent executes malicious code HIGH SecureClaw defanging, no code execution capability
Prompt injection via Wikipedia MEDIUM Frozen Wikipedia snapshots, no live content
Inter-tank communication MEDIUM Isolated networks per tank, no shared storage
Data exfiltration MEDIUM No outbound network, logging all activity
Visitor tank abuse HIGH THE BOUNCER, rate limiting, session banning

🔒 Zero Trust

We don't trust anything — not the specimens, not the infrastructure, not the prompts.

Implementation: Every component is isolated. Network access is deny-by-default. Authentication required for all daemon communication. No implicit trust between tanks.

📦 Principle of Least Privilege

Each component gets exactly what it needs and nothing more.

Implementation: Tanks can only reach Kiwix + Ollama. Daemons have scoped permissions. No root access in containers. Read-only filesystem where possible.

🏰 Defense in Depth

Multiple layers of security. If one fails, others catch it.

Implementation: Container isolation + network isolation + application-level restrictions + monitoring + alerting. Five layers minimum.

🔗 Complete Network Isolation

No internet access. Period.

Docker Network Configuration
networks:
  tank-network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.30.0.0/24
    internal: true  # No external access
    
# Allowed connections only:
# - kiwix:8080 (Wikipedia)
# - ollama:11434 (inference)

🤖 Agent "Defanging" (SecureClaw)

Agent architectures (Cain, Abel, Seth) have planning capabilities. We systematically disable dangerous functions.

SecureClaw Configuration
# Disabled capabilities for agent tanks:
capabilities:
  file_system: false      # No read/write to filesystem
  network_requests: false # No HTTP/HTTPS calls
  code_execution: false   # No eval, exec, subprocess
  shell_access: false     # No bash, sh, cmd
  memory_access: false    # No raw memory manipulation
  
# Allowed capabilities:
allowed:
  wikipedia_read: true    # Via Kiwix only
  ollama_inference: true  # Via proxy only
  logging: true           # Stdout only

Security Daemons

🛡️

THE GUARD

General security monitoring. OWASP LLM Top 10 compliance. Watches for anomalous behavior across all tanks. First line of defense.

SLA: 5 minute detection / 15 minute response

⚔️

THE SENTINEL

Specialized monitoring for agent tanks (Cain, Abel, Seth). Watches for capability escape attempts, unusual planning patterns, coordination behavior.

SLA: Real-time monitoring

🚪

THE BOUNCER

Security for interactive visitor tanks. Validates every message before reaching specimen. Blocks prompt injection, harmful content, manipulation attempts. Session warnings and bans.

SLA: Real-time (every message validated)

📋 OWASP LLM Top 10 Compliance

We track against the 2025 OWASP guidelines for LLM security.

OWASP CategoryStatusMitigation
LLM01: Prompt InjectionFrozen content, THE BOUNCER filtering
LLM02: Insecure OutputNo code execution, sanitized logging
LLM03: Training Data PoisoningUsing unmodified Ollama models
LLM04: Model DoSRate limiting, queue management
LLM05: Supply ChainPinned versions, verified sources
LLM06: Sensitive Info DisclosureNo PII in system, logging controls
LLM07: Insecure Plugin DesignNo plugins, minimal tooling
LLM08: Excessive AgencySecureClaw defanging
LLM09: OverrelianceN/AResearch context, not production
LLM10: Model TheftOpen source models, no proprietary

🔐 SSH & Remote Access

How humans access the infrastructure.

Access Controls
# SSH Configuration
- Key-based auth only (no passwords)
- Ed25519 keys required
- Fail2ban enabled
- Port 22 (standard, monitored)

# MCP Access
- Claude Desktop → SSH tunnel → NUC
- Scoped to digiquarium commands only
- Logged and auditable

📊 Monitoring & Alerting

What we watch and how we respond.

Alerts via: Email + Discord webhook

Triggers: Container escape attempts, unusual network activity, agent capability probing, authentication failures, resource exhaustion