Nightshift with Claude Sandbox Runtime (SRT)

I’m trying out different agentic workflows. One I like this month are the skills from Matt Pocock. In a youtube video he is talking about day shift and night shift.

So what I found really good about his thoughts is the split between tasks.

At day shift I bounce product ideas with the LLM, get grilled, answer, choose solutions. This is a mixed product / tech cycle. The goal here is to produce tickets which an agent can pick up later. All questions need to be answered. /grill-with-docs helps here.

At the night shift the agents picks up the tickets and completes them. I get get back in the morning and they are all done.

Skills I used and found useful in the cycle:

to prd - Turn the current conversation into a PRD and publish it to the project issue tracker — no interview, just synthesis of what you’ve already discussed.
to issues - Break a plan, spec, or PRD into independently-grabbable issues on the project issue tracker using tracer-bullet vertical slices.
grill with docs - A relentless interview to sharpen a plan or design, which also creates docs (ADR’s and glossary) as we go.
implement - Implement a piece of work based on a PRD or set of issues.
tdd - Test-driven development. Use when the user wants to build features or fix bugs test-first, mentions “red-green-refactor”, or wants integration tests.

The best part for implementing at night, Matt used Docker and Claude running inside it.

I used the Anthropic Sandbox Runtime. According to Anthropic:

The @anthropic-ai/sandbox-runtime package wraps an entire process in the same Seatbelt or bubblewrap isolation that the built-in Bash sandbox uses. Running Claude Code through it constrains every tool, hook, and MCP server in the session, not only Bash. The runtime is a beta research preview, and its configuration format may change as the package evolves. The runtime denies all write and network access by default, so configure it before launching Claude Code through it. In ~/.srt-settings.json, or a file you pass with —settings, allow write access to at least your project directory and Claude Code’s configuration paths ~/.claude and ~/.claude.json. Allow the network domains your session needs, including api.anthropic.com or your configured provider’s endpoint. See the package README for the full configuration schema.

After I’m done and my issues were in Github I ran the following script. It picks up the highest prio task with ‘ready-for-agent’ label and starts implementing it, one after another.

#!/bin/bash
#
# ralph.sh — autonomous "nightshift" loop (agent-driven ticket selection).
#
# Each iteration runs a single sandboxed `claude` that does EVERYTHING itself:
#   pick the next ready-for-agent + nightshift ticket, read it, implement it,
#   test, update docs, append progress.txt, commit locally, then relabel/close
#   the issue — all via the hosted GitHub MCP (https://api.githubcopilot.com/mcp).
#
# Why the MCP and not `gh`: srt's macOS Seatbelt profile blocks the mach services
# a Go binary needs for TLS (`trustd`) and Keychain, so `gh` fails inside srt with
# `x509: OSStatus -26276`. The GitHub MCP is a Node/HTTP client, which self-verifies
# TLS and honours srt's egress proxy, so it works inside the sandbox.
#
# PREREQUISITES (one-time, outside this script):
#   1. The `github` MCP must be configured & connected:  claude mcp get github
#   2. Its PAT must have access to the PRIVATE repo <YOURREPO> with
#      Issues: Read and write (a bare token 404s — see the NO_GITHUB abort below).
#   3. This script allowlists githubcopilot.com for you (see the settings block).

REPO="<YOUREPO>>"

if [ -z "$1" ]; then
  echo "Usage: $0 <iterations>"
  exit 1
fi
ITERATIONS="$1"

PROJECT_DIR="$(pwd)"
USER_HOME="$HOME"
SETTINGS="$PROJECT_DIR/.srt-settings.json"

# Always clean up the temp sandbox profile, however we exit.
trap 'rm -f "$SETTINGS"' EXIT

echo "Creating temporary sandbox profile..."
cat <<EOF > "$SETTINGS"
{
  "network": {
    "allowedDomains": [
      "anthropic.com",
      "*.anthropic.com",
      "registry.npmjs.org",
      "githubcopilot.com",
      "*.githubcopilot.com"
    ],
    "deniedDomains": []
  },
  "filesystem": {
    "allowWrite": [
      "$PROJECT_DIR",
      "$USER_HOME/.claude",
      "$USER_HOME/.claude.json",
      "/tmp",
      "/private/tmp"
    ],
    "denyRead": [],
    "denyWrite": []
  }
}
EOF

read -r -d '' PROMPT <<PROMPT_EOF
@PRD.md @BACKEND.md @progress.txt

You are the nightshift implementation agent for the PRIVATE repo $REPO. Use the
**github MCP tools** (mcp__github__*) for ALL GitHub access. Do NOT use \`gh\` — it
cannot reach GitHub from inside this sandbox. Work through these steps in order:

0. CONNECTIVITY: list issues in $REPO via the github MCP. If that call errors
   (cannot resolve repository / 404 / auth), STOP now, change no files, and output
   <promise>NO_GITHUB</promise> followed by the exact error text.

1. PICK: among OPEN issues carrying ALL of the labels 'ready-for-agent' AND 'nightshift'
   AND 'backend-only' AND NOT 'BLOCKED', consider only those whose stated dependencies are
   already satisfied — i.e. every issue its body says must be done first is already CLOSED.
   This is DEPENDENCY order, NOT numeric order: a lower-numbered issue with an unmet
   dependency must be SKIPPED in favour of the dependency it is waiting on (e.g. if #100
   says "do #101 first" and #101 is still open, pick #101, not #100). Do NOT block on
   ordering. Among the dependency-eligible issues, choose the lowest number. If there are
   NONE eligible (none carry the labels, or every remaining one is waiting on an unmet
   dependency), output <promise>COMPLETE</promise> and stop without changing any files.

2. READ: fetch the chosen issue's full body via the MCP. Call its number N.

3. DEPENDENCIES: PICK already guaranteed the ticket's stated ordering dependencies are
   satisfied (the issues it says to do first are CLOSED), so do NOT block merely on
   ordering. This step is for GENUINE blockers only: the ticket is missing information,
   is ambiguous, or needs an external prerequisite you cannot satisfy. In that case make
   no code changes, add an MCP comment on issue #N noting the block, add the label
   'BLOCKED', then output <promise>BLOCKED #N</promise> with a one-line reason and stop.

4. IMPLEMENT: /implement issue #N end-to-end from its body + the repo. ONLY this one ticket.

5. VERIFY: run the tests and the type checks. Check again that you implemented what the tickets wanted.

6. DOCS: update PRD.md / BACKEND.md to reflect what changed, and append a short dated
   entry to progress.txt.

7. COMMIT: commit your changes locally. Do NOT push.

8. CLOSE OUT (via MCP): add a brief comment on issue #N summarising what you built and
   the local commit hash; remove the 'ready-for-agent' label so it is not picked again;
   then close issue #N as completed.

Finally output <promise>DONE #N</promise>.
PROMPT_EOF

for ((i=1; i<=ITERATIONS; i++)); do
  echo "=== Starting Iteration $i ==="

  result=$(npx --package=@anthropic-ai/sandbox-runtime srt \
    --settings "$SETTINGS" \
    claude --dangerously-skip-permissions -p "$PROMPT")

  echo "$result"

  if [[ "$result" == *"<promise>NO_GITHUB</promise>"* ]]; then
    echo "GitHub is unreachable via the MCP. Most likely the fine-grained PAT lacks" >&2
    echo "access to the private repo $REPO (needs Issues: Read and write)." >&2
    echo "Fix the token's repository access, then re-run. Aborting." >&2
    exit 1
  fi

  if [[ "$result" == *"<promise>COMPLETE</promise>"* ]]; then
    echo "No ready-for-agent + nightshift tickets left. Done after $((i-1)) iterations."
    exit 0
  fi

  if [[ "$result" == *"<promise>BLOCKED"* ]]; then
    echo "Agent reported a blocked ticket; stopping so a human can look." >&2
    exit 1
  fi
done

echo "Reached the $ITERATIONS-iteration limit."

Note: You need to update the repo name here. It needs Github MCP and a PAT for that repo.

I came back in the morning and my whole backend was implemented.

This is an adoption of Matt’s Ralph Loop. You can be more elaborate with the prompt and reference the /implement and /tdd skills.

With the SRT the whole Claude Code process is isolated, including file tools, MCP servers, and hooks.

For more security you need to spin up your own dev container like human.

Thoughts

With the right context (PRD.md, CONTEXT.md, …) and well described tickets … is that the future of software development? We make sure we have all the answers and producing code is just done automatically? English works as higher abstraction of a programming language. 4GL anyone?

It’s not perfect yet but we are getting there. We rather describe the what and not the how anymore.

I have yet to get a good feeling on how the concept / workflow works for maintenance, bug fixes, smaller features. It feels more light weight than spec driven development, yet everything seems to be there.

Let’s wait until next month and I found a new way of doing things.

P.S. I also like coding on my phone with claude and a connected github repo. Really works nice. 100% isolation. I even ordered a powerkeyboard.