Part 1. Setting Up OpenClaw on a Cloud Server — Why Costs Hit $10 a Day

Series: Can AI Agents Actually Run Business Operations? — Part 1

OpenClaw has been getting a lot of attention as an AI agent platform since late 2025. But how usable is it in real business work? And how does it hold up from a security standpoint?

In this series, I share what I learned from using OpenClaw on an actual client engagement — rough edges, failures, and course corrections included. I am not an OpenClaw expert; this is one engineer working through the real problems. I hope it helps people who are evaluating OpenClaw or similar AI agent platforms for adoption, as a kind of “here is what to watch out for” reference.

In this post (Part 1), I walk through why my API bill came in 5-10x higher than expected on day one. The traps include: auto-compaction defaults that are tuned for avoiding crashes rather than controlling costs, Cron and HEARTBEAT running the same task twice, and a 3-tier model strategy (including a local LLM) that brings the numbers back under control.

For the overall project summary, please see the project page.

1. Server Selection and Base Setup

We chose Hetzner Cloud for a 24/7 agent server.

Item	Value
Location	Helsinki (hel1)
OS	Ubuntu 24.04
Spec	4vCPU / 16GB RAM / 150GB SSD
Monthly	about €12

Latency is not a big deal for long-running agents, so the European region was fine. Cloud was the right fit because procuring dedicated hardware was not realistic for a client project, we needed a sandbox separated from the internal network, and 24/7 uptime was a requirement.

2. OpenClaw Installation — Things to Watch

What `--accept-risk` actually means

OpenClaw onboarding runs like this:

npm install -g openclaw@latest
openclaw onboard --install-daemon --non-interactive --accept-risk

If you leave out --accept-risk, you get:

Non-interactive setup requires explicit risk acknowledgement.

OpenClaw runs with full host access, so it asks for explicit risk acknowledgement. The agent can read and write files and run arbitrary commands. You should understand this before deciding to install it.

Browser settings on a server without GUI

On a GUI-less server running as root, two settings are needed:

openclaw config set browser.noSandbox true
openclaw config set browser.headless true

You also need to install Google Chrome Stable in advance:

wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb -O /tmp/chrome.deb
apt-get install -y /tmp/chrome.deb

Exposing the web dashboard

OpenClaw has a built-in web management UI. To access it from outside, I used sslip.io to turn the server IP into a domain, then set up Nginx as a reverse proxy with Let’s Encrypt.

One setting to be careful about: OpenClaw requires device pairing as a brute-force protection. For external access, this has to be disabled:

openclaw config set gateway.controlUi.dangerouslyDisableDeviceAuth true

The name has dangerously in it for a reason. I compensated with three layers of protection: Nginx Basic Auth, a gateway token, and HTTPS.

3. Security Settings — Do Them First

If you run OpenClaw on a cloud server, please do these on day one.

Full host access: --accept-risk is literal. The agent can read/write files and run commands freely.
.env file permissions: The file is created with 644 (world-readable) by default. It contains API keys, so chmod 600 is required.
Template files: Be careful not to accidentally write real API keys into git-tracked template files like .env.example.

chmod 600 /root/.openclaw/.env

Or let OpenClaw do it for you — openclaw security audit --fix tightens permissions on state, config, and include files in one pass.

It is easy to skip these while you are focused on getting things to work. I would recommend doing them before anything else.

4. Cost Surprise — 5-10x Over Budget

The day after setup, I checked OpenClaw’s /usage dashboard and found costs were much higher than expected.

Expected: $1-2/day ($30-60/month)
Actual: $8-10/day ($240-300/month)

I broke down the causes and found four issues stacking up.

Cause 1: Context token bloat (the biggest factor)

The main session had grown to about 310,000 tokens.

agent:main:main │ 310k/200k (155%)

Auto-compaction is enabled by default, but its defaults are tuned for crash avoidance, not cost control. According to the official documentation, OpenClaw compacts a session only when contextTokens > contextWindow - reserveTokens (default reserveTokens = 16,384) or when the model returns an overflow error. In other words, it fires close to the context ceiling — not when costs are climbing.

In my setup, the main agent was on a large-context model, so the session kept growing without ever reaching the compaction trigger. Every message was re-sending the accumulated setup logs and error history.

State	Per message	Per 100 messages
310K tokens	$0.023	$2.30
40K tokens (after reset)	$0.003	$0.30

Even a cheap model cannot save you from a large context. The practical fix is to start a fresh session with /reset after setup, or to tune agents.defaults.compaction.reserveTokens to a more aggressive value so compaction runs well before the ceiling.

Cause 2: Cron jobs running all the time

A monitoring cron running every 30 minutes was hitting the API 48 times a day.

Cause 3: Cron and HEARTBEAT both running

This one is the hardest to notice. OpenClaw has two scheduling mechanisms:

Mechanism	Session	How to stop
Cron	Isolated session	Manage with `openclaw cron`
HEARTBEAT	Inside main session	Edit HEARTBEAT.md

If you stop cron but leave the same task in HEARTBEAT.md, it keeps running. In practice, after I had disabled all crons, web_search calls continued.

Cause 4: Heavy use of web_search

web_search: 186 calls
web_fetch:   33 calls

The HEARTBEAT auto-processing was triggering web searches more than I realized.

5. Countermeasures — Compress Sessions, Review Scheduled Tasks

Immediate steps and their effect:

Action	Effect
Start a new session (`/reset`)	310K → 21K tokens for the active session
Disable all cron jobs	Background API calls went to zero
Remove monitoring tasks from HEARTBEAT	Stopped the 30-minute auto-search

A quick note on the session commands: /reset is an alias for /new [model] — it begins a fresh session rather than compacting the current one. If you need to reduce session size while preserving history, use /compact [instructions] instead. /context list and /usage tokens help you see what is actually consuming tokens before you decide which to use.

Before vs. after:

Metric	Before	After
Main session tokens	310,000	21,000
Background API calls	48/day	0/day
Estimated monthly cost	$240-300	$10-20

6. Local LLM — Ollama + Gemma3 4B

To fundamentally solve the 24/7 cron cost problem (which would have been $50-100/month on its own), I added a local LLM.

Model selection

Model	Result	Note
Qwen3 4B	Not adopted	Thinking mode always on. A 3-line answer took 2,500 tokens and 3 minutes
Gemma3 4B	Adopted	The same task took 60 tokens and 3 seconds

Qwen3’s chat template force-injects a <think> tag. On CPU inference, this slows things down significantly. I tried /no_think and a custom Modelfile, but the effect was limited.

In my view, Thinking-forced models are not a good fit for CPU-only inference environments.

Server resource usage

Hetzner: 4vCPU (AMD EPYC) / 16GB RAM / no GPU

Gemma3 4B loaded:
  Disk:    3.3 GB
  RAM:     4.2 GB (CPU inference)
  Free RAM: 10 GB (plenty of room for OpenClaw + OS)
  Speed:   14-15 tok/s

A 16GB-RAM server is enough. Even without a GPU, response speed is workable for routine tasks.

7. The 3-Tier Model Strategy

Based on this experience, we settled on a 3-tier strategy that matches model tier to task characteristics.

Tier	Model	Use	Cost band
Tier 1	Claude Sonnet / Opus	Code generation, deep analysis, inconsistency detection	medium-high
Tier 2	Gemini 3 Flash	Orchestration, web summarization, HTML extraction	low
Tier 3	Gemma3 4B (local)	Heartbeat response, routine processing	$0

Execution strategy

Code-First: Convert rule checks into Python code and run them without an LLM (zero execution cost)
Smart Routing: Low-load tasks go to Gemini Flash, heavy tasks go to Claude Sonnet
Local Preference: Keep routine work on Ollama + Gemma3
Code generation is Sonnet only: Gemini Flash tends to fall back to hardcoding (details in Part 2), so it is not suitable for code generation

Let OpenClaw configure itself

Rather than editing config files directly from Claude Code, it worked better to prompt OpenClaw itself and let it update its own settings.

openclaw agent --agent main --message 'Please change the default model to google/gemini-3-flash-preview.
Use models based on task difficulty:
- Normal: google/gemini-3-flash-preview
- Medium: anthropic/claude-sonnet-4-6
- Hard: anthropic/claude-opus-4-6'

OpenClaw updated its own default model and recorded the routing strategy in SOUL.md.

8. Takeaways

Auto-compaction protects you from crashes, not from costs. It is on by default, but the default reserveTokens = 16,384 only triggers compaction near the context ceiling. With a large-window model, you can accumulate hundreds of thousands of tokens before it fires. Tune agents.defaults.compaction.reserveTokens to be more aggressive, or /reset manually after setup.
Cron and HEARTBEAT are different mechanisms. If you stop one, always check the other.
Cheap models do not save you from large contexts. Gemini Flash is low-cost per token, but if you send 310K tokens every message, it is not cheap anymore. Cost = price × tokens × calls — you have to evaluate all three.
Run /reset right after setup to start a fresh session. Setup sessions accumulate a lot of trial-and-error context that you do not need going forward. Note that /reset begins a new session rather than compacting the current one — use /compact if you need to keep history while reducing size.
Avoid Thinking-forced models on CPU inference. They are fine on GPU, but unusable on CPU-only servers.
Check /usage daily. A one-day delay in noticing an anomaly can mean a $10 difference.

9. Post-Setup Checklist

Check cost on /usage (establish a baseline)
Check token count per session with openclaw status
Inspect context composition with /context list and /usage tokens
Review scheduled jobs with openclaw cron --help or openclaw tasks list
Review HEARTBEAT.md for unnecessary tasks
Run openclaw security audit --fix to tighten file permissions
Run /reset after setup is complete (or /compact if history matters)
Set a monthly spend cap in your API provider dashboard
Browser settings: headless: true + noSandbox: true

In Part 2, I will share an experiment where we gave AI models only the rule specification (in Japanese text) and asked them to generate Python check code on their own. Comparing three models, we found that Gemini Flash had been hardcoding the expected answers.

Part 1. Setting Up OpenClaw on a Cloud Server — Why Costs Hit $10 a Day

1. Server Selection and Base Setup

2. OpenClaw Installation — Things to Watch

What `--accept-risk` actually means

Browser settings on a server without GUI

Exposing the web dashboard

3. Security Settings — Do Them First

4. Cost Surprise — 5-10x Over Budget

Cause 1: Context token bloat (the biggest factor)

Cause 2: Cron jobs running all the time

Cause 3: Cron and HEARTBEAT both running

Cause 4: Heavy use of web_search

5. Countermeasures — Compress Sessions, Review Scheduled Tasks

6. Local LLM — Ollama + Gemma3 4B

Model selection

Server resource usage

7. The 3-Tier Model Strategy

Execution strategy

Let OpenClaw configure itself

8. Takeaways

9. Post-Setup Checklist

Related Posts

China's AI Now: OpenClaw Fever and How Alibaba, Tencent, and ByteDance Are Racing to Monetize AI

The New World Driven by Multi-AI Agents — When AIs Review, Complement, and Negotiate with Each Other

Tencent's WeChat AI Agent — Why China Has a Structural Edge in Practical AI

1. Server Selection and Base Setup

2. OpenClaw Installation — Things to Watch

What --accept-risk actually means

Browser settings on a server without GUI

Exposing the web dashboard

3. Security Settings — Do Them First

4. Cost Surprise — 5-10x Over Budget

Cause 1: Context token bloat (the biggest factor)

Cause 2: Cron jobs running all the time

Cause 3: Cron and HEARTBEAT both running

Cause 4: Heavy use of web_search

5. Countermeasures — Compress Sessions, Review Scheduled Tasks

6. Local LLM — Ollama + Gemma3 4B

Model selection

Server resource usage

7. The 3-Tier Model Strategy

Execution strategy

Let OpenClaw configure itself

8. Takeaways

9. Post-Setup Checklist

Related Posts

China's AI Now: OpenClaw Fever and How Alibaba, Tencent, and ByteDance Are Racing to Monetize AI

The New World Driven by Multi-AI Agents — When AIs Review, Complement, and Negotiate with Each Other

Tencent's WeChat AI Agent — Why China Has a Structural Edge in Practical AI

What `--accept-risk` actually means