Oliver Wehrens

Nightshift with Claude Sandbox Runtime (SRT)

I’m trying out different agentic workflows. One I like this month are the skills from Matt Pocock. In a youtube video he is talking about day shift and night shift.

So what I found really good about his thoughts is the split between tasks.

At day shift I bounce product ideas with the LLM, get grilled, answer, choose solutions. This is a mixed product / tech cycle. The goal here is to produce tickets which an agent can pick up later. All questions need to be answered. /grill-with-docs helps here.

At the night shift the agents picks up the tickets and completes them. I get get back in the morning and they are all done.

Skills I used and found useful in the cycle:

  • to prd - Turn the current conversation into a PRD and publish it to the project issue tracker — no interview, just synthesis of what you’ve already discussed.
  • to issues - Break a plan, spec, or PRD into independently-grabbable issues on the project issue tracker using tracer-bullet vertical slices.
  • grill with docs - A relentless interview to sharpen a plan or design, which also creates docs (ADR’s and glossary) as we go.
  • implement - Implement a piece of work based on a PRD or set of issues.
  • tdd - Test-driven development. Use when the user wants to build features or fix bugs test-first, mentions “red-green-refactor”, or wants integration tests.

The best part for implementing at night, Matt used Docker and Claude running inside it.

I used the Anthropic Sandbox Runtime. According to Anthropic:

The @anthropic-ai/sandbox-runtime package wraps an entire process in the same Seatbelt or bubblewrap isolation that the built-in Bash sandbox uses. Running Claude Code through it constrains every tool, hook, and MCP server in the session, not only Bash. The runtime is a beta research preview, and its configuration format may change as the package evolves. The runtime denies all write and network access by default, so configure it before launching Claude Code through it. In ~/.srt-settings.json, or a file you pass with —settings, allow write access to at least your project directory and Claude Code’s configuration paths ~/.claude and ~/.claude.json. Allow the network domains your session needs, including api.anthropic.com or your configured provider’s endpoint. See the package README for the full configuration schema.

After I’m done and my issues were in Github I ran the following script. It picks up the highest prio task with ‘ready-for-agent’ label and starts implementing it, one after another.

#!/bin/bash
set -e

if [ -z "$1" ]; then
  echo "Usage: $0 <iterations>"
  exit 1
fi

PROJECT_DIR="$(pwd)"
USER_HOME="$HOME"

echo "Creating temporary sandbox profile..."
# Providing all required keys to satisfy the exact schema validation spec
cat <<EOF > "$PROJECT_DIR/.srt-settings.json"
{
  "network": {
    "allowedDomains": [
      "anthropic.com",
      "*.anthropic.com",
      "github.com",
      "*.github.com",
      "registry.npmjs.org",
      "api.github.com"
    ],
    "deniedDomains": []
  },
  "filesystem": {
    "allowWrite": [
      "$PROJECT_DIR",
      "$USER_HOME/.claude",
      "$USER_HOME/.claude.json",
      "/tmp",
      "/private/tmp"
    ],
    "denyRead": [],
    "denyWrite": []
  }
}
EOF

for ((i=1; i<=$1; i++)); do
  echo "=== Starting Iteration $i ==="

  result=$(npx --package=@anthropic-ai/sandbox-runtime srt \
    --settings "$PROJECT_DIR/.srt-settings.json" \
    claude --dangerously-skip-permissions -p "
      @PRD.md @BACKEND.md @progress.txt
      1. Find the highest-priority task with 'ready-for-agent' label and implement it.
      2. Run your tests and type checks.
      3. Update the PRD / BACKEND with what was done.
      4. Append your progress to progress.txt.
      5. Commit your changes.
      ONLY WORK ON A SINGLE TASK.
      If the PRD is complete, output <promise>COMPLETE</promise>.")

  echo "$result"

  if [[ "$result" == *"<promise>COMPLETE</promise>"* ]]; then
    echo "PRD complete after $i iterations."
    rm -f "$PROJECT_DIR/.srt-settings.json"
    exit 0
  fi
done

rm -f "$PROJECT_DIR/.srt-settings.json"

I came back in the morning and my whole backend was implemented.

This is an adoption of Matt’s Ralph Loop. You can be more elaborate with the prompt and reference the /implement and /tdd skills.

With the SRT the whole Claude Code process is isolated, including file tools, MCP servers, and hooks.

For more security you need to spin up your own dev container like human.

Thoughts

With the right context (PRD.md, CONTEXT.md, …) and well described tickets … is that the future of software development? We make sure we have all the answers and producing code is just done automatically? English works as higher abstraction of a programming language. 4GL anyone?

It’s not perfect yet but we are getting there. We rather describe the what and not the how anymore.

I have yet to get a good feeling on how the concept / workflow works for maintenance, bug fixes, smaller features. It feels more light weight than spec driven development, yet everything seems to be there.

Let’s wait until next month and I found a new way of doing things.

P.S. I also like coding on my phone with claude and a connected github repo. Really works nice. 100% isolation. I even ordered a powerkeyboard.