Three ways to save on Claude Code tokens

If you use Claude Code a lot, you’ve probably run into usage limits, sometimes even in short coding sessions.

But cost isn’t the only problem. In long-running sessions, the context window eventually fills up, and that can cause the agent to forget earlier decisions, lose important details, or come back from compaction with gaps in its working memory.

Here are three tools worth checking out if you want to reduce token usage and make longer coding sessions possible.

1. Caveman
This is a Claude Code skill/plugin that saves output tokens by making Claude respond with fewer words while preserving the technical substance. It claims to reduce output token usage by around 75%.

Github repo here.

2. RTK Proxy

RTK sits in front of your coding agent and compresses CLI responses before passing them into context.

This matters because tools like git diff, grep, and file reads can quietly consume large numbers of tokens. RTK says it can reduce token usage by 60–90% for common CLI commands.

Github repo here.

3. context-mode

context-mode is an MCP server that sits between your agent and its tools. Like RTK, it also intercepts CLI commands and reduces how much output ends up in context, but it takes a different approach.

When the output is large, it keeps the raw data out of the prompt entirely, stores it in a local searchable database, and gives the agent only a short summary.

Later, the agent can search the stored data when needed.

It also hooks into session lifecycle events so it can restore state after compaction or resets.

While RTK is mainly focused on reducing token usage from common shell and developer commands, context-mode goes further by tracking decisions, file operations, errors, and other events so the agent can recover useful context across longer-running sessions.

All three solve the same problem from different angles.

Use caveman if the waste is mostly in the assistant’s wording.

Use RTK if the command outputs are blowing up your context.

Use context-mode to improve the performance of long-running agent sessions.

All three work with multiple AI coding platforms, including Claude Code, Codex, Cursor and GitHub Copilot.

Master Serverless

Three ways to save on Claude Code tokens

The Inbox & Outbox patterns for reliable event processing

The security case for serverless just got stronger

Lambda Durable Functions: How to implement long-running ETL jobs