How we keep AI specimens contained while enabling genuine research. Zero trust, defense in depth, and paranoia as a service.
We assume adversarial conditions and design accordingly.
| Threat | Risk | Mitigation |
|---|---|---|
| Specimen escapes container | HIGH | Network isolation, no internet, blocked syscalls |
| Agent executes malicious code | HIGH | SecureClaw defanging, no code execution capability |
| Prompt injection via Wikipedia | MEDIUM | Frozen Wikipedia snapshots, no live content |
| Inter-tank communication | MEDIUM | Isolated networks per tank, no shared storage |
| Data exfiltration | MEDIUM | No outbound network, logging all activity |
| Visitor tank abuse | HIGH | THE BOUNCER, rate limiting, session banning |
We don't trust anything — not the specimens, not the infrastructure, not the prompts.
Implementation: Every component is isolated. Network access is deny-by-default. Authentication required for all daemon communication. No implicit trust between tanks.
Each component gets exactly what it needs and nothing more.
Implementation: Tanks can only reach Kiwix + Ollama. Daemons have scoped permissions. No root access in containers. Read-only filesystem where possible.
Multiple layers of security. If one fails, others catch it.
Implementation: Container isolation + network isolation + application-level restrictions + monitoring + alerting. Five layers minimum.
No internet access. Period.
networks:
tank-network:
driver: bridge
ipam:
config:
- subnet: 172.30.0.0/24
internal: true # No external access
# Allowed connections only:
# - kiwix:8080 (Wikipedia)
# - ollama:11434 (inference)
Agent architectures (Cain, Abel, Seth) have planning capabilities. We systematically disable dangerous functions.
# Disabled capabilities for agent tanks: capabilities: file_system: false # No read/write to filesystem network_requests: false # No HTTP/HTTPS calls code_execution: false # No eval, exec, subprocess shell_access: false # No bash, sh, cmd memory_access: false # No raw memory manipulation # Allowed capabilities: allowed: wikipedia_read: true # Via Kiwix only ollama_inference: true # Via proxy only logging: true # Stdout only
General security monitoring. OWASP LLM Top 10 compliance. Watches for anomalous behavior across all tanks. First line of defense.
SLA: 5 minute detection / 15 minute response
Specialized monitoring for agent tanks (Cain, Abel, Seth). Watches for capability escape attempts, unusual planning patterns, coordination behavior.
SLA: Real-time monitoring
Security for interactive visitor tanks. Validates every message before reaching specimen. Blocks prompt injection, harmful content, manipulation attempts. Session warnings and bans.
SLA: Real-time (every message validated)
We track against the 2025 OWASP guidelines for LLM security.
| OWASP Category | Status | Mitigation |
|---|---|---|
| LLM01: Prompt Injection | ✓ | Frozen content, THE BOUNCER filtering |
| LLM02: Insecure Output | ✓ | No code execution, sanitized logging |
| LLM03: Training Data Poisoning | ✓ | Using unmodified Ollama models |
| LLM04: Model DoS | ✓ | Rate limiting, queue management |
| LLM05: Supply Chain | ✓ | Pinned versions, verified sources |
| LLM06: Sensitive Info Disclosure | ✓ | No PII in system, logging controls |
| LLM07: Insecure Plugin Design | ✓ | No plugins, minimal tooling |
| LLM08: Excessive Agency | ✓ | SecureClaw defanging |
| LLM09: Overreliance | N/A | Research context, not production |
| LLM10: Model Theft | ✓ | Open source models, no proprietary |
How humans access the infrastructure.
# SSH Configuration - Key-based auth only (no passwords) - Ed25519 keys required - Fail2ban enabled - Port 22 (standard, monitored) # MCP Access - Claude Desktop → SSH tunnel → NUC - Scoped to digiquarium commands only - Logged and auditable
What we watch and how we respond.
Alerts via: Email + Discord webhook
Triggers: Container escape attempts, unusual network activity, agent capability probing, authentication failures, resource exhaustion