Skip to content

Restricted Execution for LLM Agents: Sandboxing Untrusted Code

LLM agents that execute code present a fundamental security challenge: how do you safely run potentially malicious commands generated by an AI? The answer lies in process isolation and sandboxing—techniques that have been refined over decades in Linux and macOS systems.

The Problem: Trusting AI-Generated Code

When an LLM agent needs to interact with a system—whether running shell commands, executing scripts, or manipulating files—you're essentially executing untrusted code. The agent might:

  • Access sensitive files it shouldn't read
  • Modify or delete critical system files
  • Establish network connections to exfiltrate data
  • Consume excessive system resources
  • Execute privilege escalation attacks

This isn't theoretical. Any production LLM agent with command execution capabilities requires robust isolation to prevent accidental or malicious damage.

Linux Sandboxing Technologies

Firejail: Namespace-Based Isolation

Firejail leverages Linux namespaces and seccomp-bpf to create lightweight sandboxes. It isolates processes using:

  • Filesystem isolation: Private /tmp, restricted filesystem access via whitelists/blacklists
  • Network isolation: Disable networking entirely or restrict to specific interfaces
  • PID namespace: Process cannot see or signal other processes
  • Seccomp filters: Syscall filtering to block dangerous operations
bash
# Run command with no network and restricted filesystem
firejail --net=none --private=/tmp/sandbox command

Landlock: Kernel-Level Access Control

Landlock (merged in Linux 5.13) provides unprivileged sandboxing at the kernel level. Unlike traditional MAC systems (SELinux, AppArmor), Landlock allows processes to self-restrict without root privileges:

c
// Restrict filesystem access to specific paths
landlock_restrict_self(ruleset_fd, 0);

Landlock is particularly valuable for LLM agents because it enables fine-grained filesystem restrictions without requiring privileged operations or complex policy files.

Bubblewrap: User Namespace Containers

Bubblewrap creates lightweight containers using user namespaces. It's the sandboxing engine behind Flatpak and provides:

  • Unprivileged container creation
  • Custom filesystem layouts via bind mounts
  • Network namespace isolation
  • No setuid binaries required
bash
bwrap --ro-bind /usr /usr --tmpfs /tmp --unshare-net command

macOS Sandboxing: sandbox-exec

macOS provides sandbox-exec for process sandboxing using the Sandbox Kernel Extension (SBKE). It uses a Scheme-like policy language:

scheme
(version 1)
(deny default)
(allow file-read* (subpath "/usr/lib"))
(allow file-write* (subpath "/tmp"))
(deny network*)

While less flexible than Linux solutions, sandbox-exec provides adequate isolation for many LLM agent use cases on macOS.

Comparison of Sandboxing Technologies

FeatureFirejailLandlockBubblewrapsandbox-execDocker
PlatformLinuxLinux 5.13+LinuxmacOSLinux/macOS
Privileges RequiredNone (setuid)NoneNoneNoneDocker daemon
Filesystem IsolationWhitelist/blacklistPath-based ACLBind mountsPolicy-basedFull container FS
Network IsolationYesNoYesYesYes
PID NamespaceYesNoYesLimitedYes
Seccomp FilteringYesNoNoNoYes
OverheadLowMinimalLowLowMedium-High
ComplexityMediumLowMediumLowHigh
MaturityStableNew (2021)StableStableVery Stable
Use CaseGeneral sandboxingFine-grained FS controlDesktop appsmacOS isolationFull isolation

go-restricted-runner: Unified Sandboxing Interface

The go-restricted-runner library provides a unified Go interface for multiple sandboxing backends:

go
import "github.com/inercia/go-restricted-runner/pkg/runner"

// Create a firejail-based runner
r, err := runner.New(runner.TypeFirejail, runner.Options{
    "allow_networking": false,
    "allow_read_folders": []string{"/tmp/workspace"},
}, logger)

// Execute untrusted command
output, err := r.Run(ctx, "sh", "echo 'Hello from sandbox'", nil, nil, false)

Supported Runners

  1. exec: Direct execution (no isolation) - for trusted environments
  2. sandbox-exec: macOS sandboxing via SBKE
  3. firejail: Linux namespace-based isolation
  4. docker: Container-based isolation with full filesystem separation

Interactive Process Communication

For LLM agents that need REPL-style interaction or streaming output:

go
stdin, stdout, stderr, wait, err := r.RunWithPipes(
    ctx, "python3", []string{"-i"}, nil, nil,
)

// Send commands to the sandboxed process
fmt.Fprintln(stdin, "import os; print(os.getcwd())")
stdin.Close()

// Read streaming output
output, _ := io.ReadAll(stdout)
wait()

This enables agents to maintain stateful sessions with interpreters (Python, Node.js, etc.) while remaining isolated.

Practical Considerations

Defense in Depth

Sandboxing should be one layer in a defense-in-depth strategy:

  • Input validation: Sanitize LLM-generated commands before execution
  • Resource limits: Use cgroups to limit CPU, memory, and I/O
  • Audit logging: Record all executed commands for forensic analysis
  • Least privilege: Run sandboxed processes as unprivileged users
  • Timeout enforcement: Kill long-running processes automatically

Performance vs. Security Trade-offs

  • Firejail: Low overhead, good for frequent short-lived processes
  • Docker: Higher overhead, better isolation, suitable for long-running agents
  • Landlock: Minimal overhead, but requires kernel 5.13+

Limitations

No sandbox is perfect. Kernel vulnerabilities, container escapes, and side-channel attacks remain possible. Critical systems should combine sandboxing with:

  • Regular security updates
  • Intrusion detection systems
  • Network segmentation
  • Principle of least privilege at the infrastructure level

Conclusion

As LLM agents become more autonomous and capable, robust sandboxing transitions from optional to mandatory. Technologies like firejail, Landlock, and bubblewrap provide the primitives needed to safely execute untrusted code. Libraries like go-restricted-runner make these technologies accessible to developers building agent systems.

The key insight: treat LLM-generated code with the same suspicion you'd apply to arbitrary user input. Sandbox everything, assume breach, and build systems that fail safely when isolation is compromised.