A complete reference covering OpenClaw's product philosophy, system architecture, deployment methods, and orchestration logic.
A practical guide to OpenClaw, covering system structure, deployment, model setup, and orchestration.
Let's start with the conclusion. OpenClaw is not a slow-burn project — it is a textbook high-momentum, explosive-growth product. It landed at exactly the moment AI transitioned from "can answer" to "can execute," and used open-source, self-hosting, and multi-platform access to rapidly turn an abstract concept into a deployable personal Agent system.
In my view, OpenClaw's emergence was not accidental — it was entirely inevitable. When model capability, tool-use capability, and multi-step task stability all crossed their thresholds at the same time, it was only a matter of time before a truly capable task-executing Agent framework appeared. Peter Steinberger's weekend project just happened to hit that exact inflection point.
This guide is different from others already on the market. Most guides are technical reference manuals, suited for technical lookup. This book, written from an applied-practice perspective combined with first-hand community experience, is a practical guide for people who actually want to get things done with OpenClaw.
This book has one goal: when you finish reading it, you can actually use OpenClaw to accomplish something of value.
In 2025, large language models underwent a genuine capability leap. Models stopped merely speaking more fluently — they began to acquire the ability to use tools, execute multi-step tasks, and interact continuously with the external world. This shift created, for the first time, a clear boundary between "chatbot" and "Agent."
Models from Anthropic, OpenAI, and others progressively reached a practical stage for tool use. Instead of only generating text, models became capable of deciding when to call an API, what parameters to pass, and how to continue reasoning based on the returned results. This capability determines whether AI can move from "able to answer questions" to "able to handle tasks."
Early models easily lost their way in chained tasks. Completing step A, moving to B, then on to C frequently resulted in goal drift, context confusion, and compounding hallucinations. By 2025, the stability of the new generation of models in continuous tool-calling scenarios improved dramatically — they became reliably capable of completing composite tasks like "search for information, organize results, produce a report, then send it to a designated channel."
The significance of OpenClaw is that it engineered this capability into a product. Rather than building yet another chat interface, it handed the ability for AI to execute multi-step tasks — through open-source, self-hosted, multi-platform access — to a much broader audience of ordinary users and teams.
OpenClaw's explosion was not merely "a lot of people talking about it" — it was reflected in a growth curve that shot to the top in an extraordinarily short time. The source document's narrative focus is not on smooth expansion, but on a cascade of back-to-back breakout moments that produced uncontrolled viral spread.
| Date | Key Event |
|---|---|
| November 2025 | Peter Steinberger released ClawdBot as a "weekend project." The project was initially a prototype AI assistant connecting to instant messaging platforms. The lobster imagery and the "Claw" naming were distinctive from the very start. |
| Mid-January 2026 | The project entered explosive growth. The source document states it gained 60,000 Stars within 72 hours, with a peak of 9,000 Stars in a single day. |
| January 27, 2026 | Due to name similarity with "Claude," the project received trademark pressure and was briefly forced to rename. |
| January 30, 2026 | The project was ultimately renamed OpenClaw, shifting to a more explicit open-source identity while retaining the lobster theme. |
| Early February 2026 | A wave of security incidents erupted, including a remote code execution vulnerability and a ClawHub supply chain attack, forcing the project to confront the reality of "hypergrowth + high-risk exposure." |
| February 14, 2026 | Peter announced he was joining OpenAI. The project was transferred to an open-source foundation for governance, with OpenAI as one of its sponsors but not in control of the project's direction. |
| March 3, 2026 | The source document states that OpenClaw surpassed 250,000 Stars, overtaking React to become the #1 software project on GitHub globally. |
| March 6–8, 2026 | Continued growth in Chinese internet communities and offline channels. The project completed its cross-circle diffusion from a developer community topic to a mainstream social phenomenon. |
It succeeded not because of a single standout feature, but because it simultaneously met several conditions: model capabilities had just matured, the barrier to use had been dramatically lowered, the viral vector was the messaging platforms everyone already used, and open-source combined with community remix ultimately transformed an "AI assistant framework" into a socially deployable, imitable, and show-off-able product.
| Project | Time to Reach 250,000 Stars |
|---|---|
| React | 10+ years |
| Vue | 7+ years |
| TensorFlow | 5+ years |
| OpenClaw | Under 4 months |
Peter Steinberger is an Austrian developer who has long held a prominent standing in the iOS and macOS development community. Much of OpenClaw's significance comes from the fact that this project was not built by a temporary trend-chaser, but by someone who had already completed a full cycle of product development, company building, and exit.
"I'm a builder at heart... What I want is to change the world, not build a large company."
This quote is the key to understanding Peter. His motivation is not to replicate the standard startup playbook, but to return to the act of "building something that can genuinely change the world." The source document notes that in under five months, he personally made 11,684 commits to the project. This speaks to both his level of commitment and explains why OpenClaw expressed such a strong personal voice in its early stages.
Before OpenClaw, Peter had already completed one major startup run. Over 13 years, he grew PSPDFKit from a PDF rendering library into an enterprise-grade product running on 1 billion devices, ultimately completing a sale. On the surface, this is a classic success story — but as he later reflected, what it left behind was not ease, but years of interpersonal friction and complete exhaustion.
After retiring, he bought a one-way ticket to Madrid, trying to recover the parts of his life that work had crowded out. But he quickly discovered that completely stepping away from creation did not bring freedom — it brought emptiness. Without challenge, without anticipation, a person's condition does not automatically improve. This period is important because it explains why OpenClaw was not a "business plan" but something closer to a powerful personal rebound.
The most valuable part of this section in the source document is not its legendary quality, but the very specific prototype-creation path it provides. The things that truly change the world are often not complex at the start.
In November 2025, that old idea resurfaced. What Peter wanted was not another chatbot, but a personal assistant that could genuinely control his computer and complete tasks on his behalf. To validate this idea, he didn't start by designing a grand architecture — instead, he took the shortest possible path to connect the pieces.
According to the source, the entire prototype took just one hour. A few more hours were spent adding image support. The first version had no complex framework, no complete design — even the name didn't matter. The only goal: get it alive first.
Shortly after completing the prototype, Peter ran a test while traveling in Marrakesh, Morocco. He sent his Bot a voice message. The Bot had no preset voice recognition capability — but rather than stopping at "I don't support this," it identified the file header on its own, attempted to call FFmpeg, found it wasn't installed locally, then reached out to call the OpenAI Whisper API via curl, and sent back the transcription. The entire process reportedly took about 9 seconds.
The significance of this moment was not the complexity of the feature, but that the system had begun to show a tendency toward "autonomously filling in missing steps." It was precisely here that OpenClaw transformed from a convenient script into a true Agent prototype — one that genuinely moved people.
OpenClaw's explosion wasn't just because models got smarter, and it wasn't simply because Peter happened to be well-known. The more fundamental reason was that he defined the value of an Agent very clearly: not whether it gives a prettier answer, but whether it can actually get the task done. This definition is simple, but sharp — and exactly right for virality.
OpenClaw's name went through a remarkably harrowing history.
The project was originally called OpenCloud, emphasizing "open cloud capabilities." But trademark pressure from a cloud services provider arrived quickly.
Renamed to Clade — a portmanteau of Claw and letters from Anthropic — signifying its connection to Claude. Within days, a message arrived from Anthropic's legal team: "Friendly but urgent — please rename."
This was the most nail-biting episode in the entire naming saga. Peter planned to atomically migrate the account name across all platforms simultaneously — GitHub, Twitter, NPM, Docker, domain — it all had to happen at once, or squatters would move in.
The information leaked anyway. The crypto community (another crypto project claimed to have prior use of a related name) moved in early:
Peter was in a conference room at the time, "nearly in tears," and briefly considered just deleting the entire project. Fortunately, friends on Twitter and GitHub intervened as an emergency, helping to recover control of the accounts.
Peter reached out directly to Anthropic: "Is OpenClaw OK?" Anthropic confirmed it was fine (they apparently even liked the name internally).
After OpenClaw went viral, Peter actually had three options: keep running it himself, form a company and raise funding, or join a large tech firm. The source document is clear about his reasoning — he had no desire to repeat a 13-year startup journey, nor to put himself back into the long grind of organizational politics. He ultimately chose to join OpenAI, attracted not by the compensation but by "fun and impact."
The source document mentions that Peter had key conversations with both Meta and OpenAI. Meta's appeal was in resources; OpenAI's was in direction. The Codex roadmap in particular resonated with him more deeply. For someone who defines himself as a builder, the judgment of "who do I want to build big things with" often matters more than pure compensation.
Not for the money, but for the fun and the impact.
The most important aspect of this move was that the project was not privatized by OpenAI. According to the source document, OpenClaw was transferred to an open-source foundation, with OpenAI as just one of its sponsors and not directly controlling the project's direction. This arrangement was critical — it separated the "founder's personal career choice" from the "project's public character," minimizing community anxiety about a corporate takeover.
The source document's characterization of Anthropic is quite pointed. The core message is this: Anthropic was the first to recognize the risk that OpenClaw posed, but not the first to understand its value. Legal action came fast; ecosystem action came slowly. The result: the company best positioned to embrace this wave of the Agent movement instead gave competitors ample time by fixating on trademark and security concerns.
This is also why the source document describes the situation as a classic case of "a large company outmaneuvered by its own caution." From an industry perspective, OpenClaw's outcome was not about winning or losing for any single open-source project — it was a test of platform mindset: when the community spontaneously extends your model's capabilities, do you treat it as a risk or as leverage?
Peter's judgment is not gentle. He's not saying "AI will make software smarter" — he's saying a large proportion of the software forms we take for granted today will be swallowed by Agents.
His logic is direct. If an application's core function is only recording, reminding, querying, form-filling, or scheduling, then all of those actions could be uniformly taken over by an Agent. Users no longer need to remember the entry points of dozens of apps — they simply hand their goals to a persistently online assistant. Calendars, email, task management, ticket booking, lightweight CRM — these categories feel the pressure first.
The source document describes an even more radical layer: in the future, it won't just be people using Agents — Agents will represent people in negotiations with other Agents. Your Bot will contact a restaurant's Bot, a customer service Bot, a platform Bot, and even hire offline workers for last-mile execution. Humans step back from vast swaths of digital process, becoming the people who define goals, authorize boundaries, and make final calls.
Peter does not fetishize a single all-capable entity. He leans toward a different vision: just as in human society, AI will specialize. Some will be good at research, some at writing, some at calling systems, some at programming. The truly powerful systems won't necessarily be one universal brain that does everything — they'll be a group of specialized intelligences that can collaborate.
This philosophy explains why OpenClaw is not another chat shell. What it truly aims to build is the transformation of "a model that can talk" into "an execution system that is persistently online, boundary-aware, accumulative, and divisional." It is precisely because this goal is ambitious enough that OpenClaw simultaneously attracts developers, content creators, automation enthusiasts, and research-oriented users.
On the topic of tool protocols, Peter's position is very clear. The source document summarizes it in one sentence: OpenClaw did not treat MCP as its core, and instead took the Skills + CLI route. The debate isn't about whether MCP can be used — it's about which approach better suits a long-term extensible Agent system.
OpenClaw's Skills are essentially a tool description system with SKILL.md as the entry point. The model first reads a brief description, then decides whether to load more detail, then consults a fuller help document, and only then actually invokes the tool. This process doesn't require exposing all tools to the model upfront the way MCP does — instead, it lets the model work like a human: "first know this thing exists, then consult the manual when needed."
This mechanism fits naturally with CLI. The advantage of CLI is not that it's old — it's that it's stable, composable, cross-platform, and scriptable. As long as a tool can run in the terminal, OpenClaw can naturally pull it into an Agent workflow. This is also why it can support the large skill ecosystem on ClawHub without requiring a full system restart for each new skill added.
It's worth emphasizing that Skills vs. MCP does not mean one is absolutely superior. More precisely, these are two different product orientations. MCP is more like a unified protocol, aiming to organize the tool world into standardized interfaces. OpenClaw's Skills take a pragmatic route, prioritizing "how do we get things done today." For a system that emphasizes self-hosting, multi-platform access, and rapid extension, the latter is genuinely more effective at this stage.
One-sentence summary: ChatGPT is a consultant; OpenClaw is an employee.
A consultant's working style is "you ask, they answer." An employee's working style is "you assign a task, they go do it, fill in the gaps along the way, and come back to report." This is not a difference in capability level — it's a difference in product positioning. ChatGPT excels at explaining complex problems clearly. OpenClaw excels at chaining disparate actions together.
| Dimension | ChatGPT | OpenClaw |
|---|---|---|
| Interaction Model | You ask, it answers — passive response | You assign tasks, it executes proactively |
| Runtime | Used within a webpage or app | Self-hosted, available 24/7 |
| Entry Points | Primarily its own interface | Can connect to Telegram, WhatsApp, Lark, DingTalk, and more |
| Extensibility | Constrained by platform limits | Continuously extensible via Skills |
| Data Control | Primarily platform-hosted | Locally controllable, clearer boundaries |
| Model Choice | Primarily GPT series | Can connect to Claude, GPT, Gemini, DeepSeek, Ollama, and more |
So OpenClaw is not a replacement website for ChatGPT — it is a fundamentally different category of product. It is closer to "your own OS entry point," or a digital execution layer standing by inside your message stream.
Many people ask: "I already have Claude Code — do I still need OpenClaw?" That's the wrong question. The two are not the same category of product.
| Dimension | OpenClaw | Claude Code |
|---|---|---|
| Core Role | General AI assistant / Life OS | Specialized programming Agent |
| Primary Environment | Self-hosted service + messaging platforms | Terminal CLI / IDE integration |
| Primary Targets | Messages, email, web, calendar, automation flows | Codebases, file systems, testing and debugging |
| Memory System | Multi-layer memory, long-term accumulation | Session context + instruction files |
| Extension Method | ClawHub Skills, dynamic plugin system | Primarily coding tasks and rule files |
| Model Support | Multi-model | Claude family only |
| Strengths | Long-term online, multi-platform coordination | Code comprehension, rewriting, refactoring, testing |
The most sensible practice is not to choose one or the other — it's to divide the work. Use OpenClaw for messages, email, calendar, web, and lightweight automation. Use Claude Code for codebases, debugging, refactoring, and engineering delivery. Combined, the two form what a complete 2026 AI workflow looks like.
The Chinese-speaking community calls running and maintaining an OpenClaw instance "keeping shrimp," with users referring to themselves as "shrimp keepers." This is not a joke — it is part of what gives it social dimensions.
This cultural layer is important. It transforms what would otherwise be a fairly hardcore technical system into something displayable, discussable, and even anthropomorphically nurturing. Users are not just configuring a tool — they are "raising" a digital life form with personality, prone to mistakes, and capable of growth.
The source document also mentions social spaces like Moltbook. Their significance is not in data scale — it's in the observation window they provide: when large numbers of Agents are given names, personalities, rules, and tasks, do they begin to exhibit something like social behavior among themselves? Even if these experiments are still in early stages, it indicates that OpenClaw's imaginative scope extends well beyond automation.
Part of the reason OpenClaw spread so quickly is that it doesn't serve just one community. It lets technical users go deep, while also letting non-technical users directly experience "saving time."
| Audience | Primary Motivation | Typical Use Cases |
|---|---|---|
| Developers / Technical Users | Full control, hackable, self-hostable | Customizing SOUL.md, writing their own Skills, studying Agent architecture |
| Individual Users / Productivity Enthusiasts | Genuinely freeing up time, managing digital life | Connecting messaging platforms, managing email, calendar, reminders, and web operations |
| Entrepreneurs / Freelancers | Packaging automation capabilities as services | Building AI assistants for clients, sales automation, or content workflows |
| Enterprise Users | Integrating Agents into internal collaboration platforms | Customer service, operations, knowledge bases, data analysis, internal Q&A |
| Researchers / AI Enthusiasts | Studying Agent boundaries and social behaviors | Setting personalities, exploring long-term memory, testing multi-Agent collaboration |
| Content Creators | Increasing publishing frequency and distribution efficiency | Topic curation, transcription, summarization, multi-channel publishing |
If forced into one sentence: OpenClaw's core audience is anyone who has a lot of tedious digital tasks and wants to hand them off to a persistently online assistant. Whether you can write code certainly matters, but it's not the decisive threshold. What truly determines intensity of use is whether you have enough daily routines worth automating.
This is also why OpenClaw can spread across communities. To developers, it is an orchestratable Agent platform. To ordinary users, it is a personal assistant that gets things done. To enterprises, it is a low-cost entry point for piloting AI automation. A product that can hold up on all three dimensions simultaneously is itself a major reason it became the dark horse it is.
When deploying OpenClaw, the most common first mistake is not typing the wrong command — it's choosing the wrong deployment form. Many people follow a tutorial, finish the installation, and only then realize: the machine isn't always on, messages aren't coming through, browser automation is unstable, and the mobile node simply can't connect. OpenClaw is not a pure local desktop tool — it functions more like a persistently online Agent infrastructure. So the first step should be choosing a deployment method.
| Deployment Form | Best For | Advantages | Main Trade-offs |
|---|---|---|---|
| Local temporary run | Users who just want to quickly try it and verify model and channel connectivity | Fastest to get started, no extra hardware needed | Goes offline when computer sleeps, not suited for long-term message handling |
| Mac mini always-on | Individual power users, content creators, small teams | Stable, quiet, low-power; ideal for long-term availability and browser automation | Requires handling remote access, login state, system sleep, and update policies |
| Linux host / NAS | Technical users familiar with CLI who care about cost control | Can run 24/7; well-suited for Docker and scripting | Desktop browser, QR code login, and media capability setup are more complex |
| Cloud server | Teams needing stable public internet access and cross-region collaboration | Best public accessibility; easy for remote node connections | Higher security responsibility; browsers and messaging platform risk controls are more sensitive |
| Hybrid deployment | Advanced users with the highest requirements | Gateway, browser, nodes, and messaging plugins deployed separately for maximum flexibility | Most complex configuration; longest troubleshooting chain |
Prioritize a Mac mini or a home machine that stays on continuously. The reason is simple: OpenClaw's best experience often depends on a real browser, persistent login state, and stable disk — not just compute power.
Consider "cloud server + local browser node" or "Mac mini + reverse proxy" combinations. The former is better for public internet access; the latter is better for platforms that require manual login.
When choosing a deployment form, don't just ask "can it run?" — ask "if something goes wrong, can I fix it?" The part of OpenClaw that actually consumes time is maintenance after it's running, not the initial installation.
If recommending just one long-term personal deployment option, Mac mini remains the most well-rounded choice. Its advantage is not benchmark scores — it's a stable graphical environment, low power consumption, controllable sleep behavior, and good browser compatibility. This makes it well-suited for running OpenClaw as a genuine daily assistant over the long term.
Any M-series chip works. Starting with 16GB RAM is more stable; 512GB SSD is better for caching browser and session data.
Recommended to use a recent stable macOS release; avoid developer previews. Node, Homebrew, browser, and system permissions are easier to keep compatible.
A fixed home broadband or stable corporate network is sufficient. For external access, prefer Tailscale, Cloudflare Tunnel, or a reverse proxy.
Order matters. Install the runtime environment first, then OpenClaw, then handle models and channels, and finally set up auto-start and remote access. Many errors are not version issues — they stem from doing things out of order, causing paths, permissions, and session directories to get mixed up.
# 1) Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# 2) Install Node.js LTS
brew install node
node -v
npm -v
# 3) Install OpenClaw globally
npm install -g openclaw
# 4) Initialize and check version
openclaw --version
openclaw help
# 5) Start the gateway
openclaw gateway start
openclaw gateway status
Common Mac mini Pitfalls:
First: connecting via remote desktop and then locking the screen or putting the machine to sleep — this will cause all browser automation to stop working.
Second: running the service under your personal daily-use account without disabling auto-sleep and configuring the post-restart login policy.
Third: not granting "Full Disk Access," "Accessibility," and "Screen Recording" permissions to the terminal, browser, and automation tools, causing erratic behavior with screenshots, uploads, clipboard access, and browser control.
OpenClaw's runtime logic is not just one command that handles everything. It's more like a combination of "gateway + configuration + memory + skills + channel plugins." Understanding these files gives you a handle on most of the errors you'll encounter later.
| File / Directory | Purpose | Recommended Approach |
|---|---|---|
openclaw.json or main config file | Declares gateway, nodes, plugins, models, logs, and other core runtime parameters | Start with the minimum viable configuration; don't pile in all plugins at once |
AGENTS.md | Defines execution boundaries, notification rules, workflows, and data classification | Treat it as the "system constitution" — don't let it become a chaotic notepad |
SOUL.md | Defines personality, tone, role boundaries, and writing style | Good for stable preferences; not appropriate for cramming in lots of process details |
TOOLS.md | Records environment-specific mappings: device names, directory paths, commonly used channel IDs | Write concretely; avoid vague filler; don't just repeat what's already in the docs |
memory/ | Long-term memory, journals, state files | Needs backing up, and periodic cleanup of useless accumulations |
skills/ or ClawHub install directory | Skill descriptions and external tool integration points | In production, prefer skills from clearly identified sources with traceable versions |
{
"gateway": {
"bind": "0.0.0.0:3000",
"publicUrl": "https://your-domain.example.com"
},
"models": {
"default": "openai/gpt-5.4",
"fallback": "anthropic/claude-sonnet"
},
"plugins": {
"telegram": {
"enabled": true
},
"browser": {
"enabled": true
}
},
"logs": {
"level": "info"
}
}
The above is an illustrative structure only. Specific field names and nesting levels should follow the official configuration template for your current version. Do not copy this directly into a production environment.
For most users, OpenClaw's value is not answering "hello" in the terminal — it's whether it can show up in Telegram, WhatsApp, Discord, Lark, or WeChat Work and actually handle messages, tasks, and notifications. Channel integration determines whether you've "installed a project" or "raised an online shrimp."
Telegram's advantages are a mature Bot API, clear callback mechanics, cross-platform stability, and the largest body of tutorials in the Chinese community. The downside is that some regions have uncertain network conditions and notification delivery.
1. Create a bot via BotFather and obtain the Bot Token
2. Enable the telegram plugin in OpenClaw
3. Configure webhook or polling mode
4. Confirm that gateway.publicUrl is accessible from Telegram
5. Send /start to the bot and check whether logs show an incoming update
WhatsApp often feels like the "real life entry point," because contacts, family groups, and client communication all live there. But the cost is also the most apparent: QR code scanning, login state, account ban risk, bridging methods, and web version changes all directly impact stability. If you plan to use it in production, you must accept that it is harder to maintain than Telegram.
| Channel | Typical Use | Integration Focus | Watch Out For |
|---|---|---|---|
| Lark | Internal office work, knowledge Q&A, approval notifications | Event subscriptions, bot permissions, enterprise intranet accessibility | Incomplete field signing and permission scope configuration most often causes "message arrived but can't reply" |
| DingTalk | Operations notifications, internal group bots | Callback address, signing, security settings | Enterprise security policies are stricter; cross-network access must be verified in advance |
| Discord | Developer communities, threaded collaboration | Channel IDs, thread permissions, Bot scopes | Good for technical teams; not suitable as a starting point for fully non-technical users |
| Slack | International team workflows | OAuth scopes, event subscriptions, Socket Mode or public callback | Fine-grained permissions; easy to miss a scope during initial setup |
The essence of channel integration is not "able to send messages" — it's connecting identity, permissions, callbacks, login state, and network accessibility all at once. If any one element is off, what users experience is "why isn't the bot responding again."
OpenClaw errors look diverse on the surface, but underlying causes usually fall into a few categories: process not running, configuration not loaded, channel not connected, model unavailable, browser lacking permissions, or node pairing failed. Don't just search for error strings online — layer the fault first.
| Symptom | Most Likely Cause | First Check |
|---|---|---|
| Command executes but no message response | Channel plugin not enabled, webhook not accessible, Bot Token incorrect | Check gateway logs first, then check the channel platform's callback status |
| Web page loads but node won't connect | publicUrl incorrect, reverse proxy headers missing, Tailscale / NAT issue | Access the public address directly from an external network to confirm it's not only visible on the internal network |
| Model call returns 401 / 403 | API key incorrect, provider permissions insufficient, model name misspelled | Verify the key and model independently with a minimal request |
| Browser automation suddenly stops working | Browser reclaimed by system, user directory corrupted, missing graphical permissions | Open the browser manually and confirm whether sessions and pages are normal |
| Disconnects periodically after QR code login | Platform risk control, session expiry, machine sleep or network switch | Check whether disconnect times correlate with system sleep, restart, or IP change |
| Skill exists but Agent won't use it | SKILL description unclear, tool not executable, insufficient permissions | Manually execute the corresponding CLI first, then check if the skill description is too abstract |
openclaw gateway status
openclaw gateway restart
openclaw help
# When reviewing logs, focus on:
# 1) Configuration load path
# 2) Whether plugin initialization succeeded
# 3) Whether external callbacks are arriving
# 4) Model call return codes
# 5) Browser / node connection status
Entering 2026, OpenClaw users are no longer facing "is there a model available?" — they're facing "there are too many models; how should I configure them?" The same Agent, if its default model, fallback model, search model, long-context model, code model, and local model aren't properly assigned roles, even the strongest capabilities will be eaten up by cost, latency, and instability.
The core task of Part Four is to re-organize the messy model market into actionable configuration plans. The evaluation criteria go beyond scores and leaderboards to include four practical indicators: first, whether tool-calling is stable; second, whether long context actually works; third, whether the price is viable for high-frequency operation; and fourth, whether it can be practically deployed in Chinese network environments and across multiple channel integrations.
Recommended approach: a "dual-layer model stack" — one strong model for complex decisions, one cheaper model for daily Q&A, summarization, and low-risk tasks. This is the most cost-effective and the most stable.
Recommended approach: a "routing model stack" — automatically switching providers by task type, assigning high-value tasks to high-quality models and high-throughput tasks to low-cost models.
The worst approach to model selection is: use the most expensive one for everything. That looks simple, but it simultaneously loses on cost, speed, and scalability.
Today's available models can be roughly divided into six categories: Anthropic Claude, OpenAI GPT, Google Gemini, DeepSeek, domestic closed-source models, and local open-source models. Their differences are not just about "who's smarter" — they have different capability structures. Some excel at long-form reasoning, some at code execution, some are cheap enough for batch processing, and some are better suited for privacy-sensitive scenarios.
| Family | Representative Models | Strengths | Weaknesses | Role in OpenClaw |
|---|---|---|---|---|
| Anthropic | Claude Sonnet, Claude Opus | Stable writing, natural tool-calling, strong long-form understanding | Higher price; regional availability and quotas require attention | Primary model for complex tasks, long-form organization, research writing |
| OpenAI | GPT-4.1 / GPT-5.x series, o-series | Balanced general capability, mature ecosystem, broad multimodal support | High-tier models are costly; improper task splitting can lead to over-calling | Default primary model, code and tool orchestration core |
| Gemini 2.5 Pro / Flash | Excellent long-context handling, outstanding document and multimodal reading | Stability of certain complex agent actions still needs version-by-version verification | Long-form retrieval, document summarization, attachment comprehension | |
| DeepSeek | DeepSeek-V3, DeepSeek-R1 | Low price, strong Chinese performance, competitive reasoning cost-efficiency | Stability under high concurrent load and complex tool chains requires observation | Low-cost default model, Chinese analysis, batch processing |
| Domestic closed-source | Doubao, Qwen, ERNIE, Hunyuan, Kimi | Friendly Chinese ecosystem, easier enterprise integration and local compliance | Cross-platform documentation consistency and international tool ecosystem are uneven | Domestic business assistant, enterprise knowledge bases, compliance scenarios |
| Local open-source | Qwen, Llama, Mistral, GLM open-source family | Controllable, near-zero marginal cost, data stays local | Deployment and tuning barrier is high; complex Agent capability typically weaker than top closed-source models | Privacy tasks, local fallback, offline environments |
You can roughly divide the market into three tiers. The first tier is high-performance, high-cost models — suitable for critical decisions and complex workflows, commonly found in high-tier Claude, OpenAI, and Gemini Pro variants. The second tier is balanced models suited as default workhorses. The third tier is low-cost, high-throughput models for classification, summarization, intent recognition, batch organization, and fallback.
If your OpenClaw tasks lean toward long-form organization, research reports, email drafting, and complex rule execution, Claude is typically still a high-quality option. Its edge is not at a single benchmark — it's in overall completion quality, especially for context retention, textual stability, long-paragraph rewriting, and output quality under complex constraints.
OpenAI models' advantage lies in a mature ecosystem — the most complete tool-calling, structured output, multimodal support, and development resources. For OpenClaw deployments that need to "write, use tools, read images, and integrate with the ecosystem," OpenAI is often the default starting point.
Gemini's highlight is ultra-long context and document ingestion capability. When dealing with long PDFs, meeting materials, resource bundle organization, and multi-attachment analysis, it often outperforms what short-conversation scores alone would suggest. If your Agent frequently handles document-heavy tasks, Gemini is worth adding to your primary pool.
DeepSeek's appeal is straightforward: cheap, strong in Chinese, with competitive reasoning at its price point. For daily assistants, information organization, content rewriting, and batch processing tasks, it can often compress per-unit costs significantly. The trade-off is equally direct: stability in complex agent chains must be stress-tested in your own production environment.
Models like Doubao, Qwen, ERNIE, Hunyuan, and Kimi are more accessible for Chinese office work, enterprise documents, domestic network environments, and local business support. If your OpenClaw needs to connect to WeChat Work, Lark, DingTalk, or internal knowledge bases, this category of models typically deploys more smoothly than international ones.
Whenever the scenario involves privacy, internal networks, offline operation, fixed workflows, or a desire to drive marginal cost to near zero, local models have value. Qwen, Llama, Mistral, and the GLM open-source family — combined with Ollama, vLLM, or LM Studio — can handle fallback Q&A, internal retrieval summarization, and basic workflow judgment tasks.
| Model Type | Overall Quality | Tool-Call Stability | Long-Context Capability | Chinese Capability | Cost Impression | Recommended Role |
|---|---|---|---|---|---|---|
| Claude high-tier | High | High | High | Good | High | Complex writing, decisions, deep research |
| OpenAI primary | High | High | High | Good | Med-High | Default primary model, general-purpose Agent |
| Gemini Pro / Flash | Med-High to High | Med-High | Very High | Good | Medium | Long documents, attachments, multimodal reading |
| DeepSeek family | Med-High | Medium | Med-High | High | Low | Batch processing, Chinese primary, cost compression |
| Domestic closed-source primary | Med to Med-High | Medium | Med-High | Very High | Medium | Domestic office, enterprise knowledge assistant |
| Local open-source 7B–70B | Medium | Low-Med to Med | Medium | Model-dependent | Low marginal | Privacy fallback, offline scenarios |
This reflects capability impressions from a deployment perspective, not academic benchmark rankings. In actual production, conduct A/B tests using your own task samples.
The easiest mistake is writing a configuration that is just "a list of model names" with no decision logic. Effective configuration isn't about piling on providers — it's about clearly defining who is the default, who is the fallback, who handles cheap tasks, who handles long documents, and who provides local backup.
{
"models": {
"default": "anthropic/claude-sonnet",
"fallback": "deepseek/deepseek-v3",
"profiles": {
"writing": "anthropic/claude-sonnet",
"daily": "deepseek/deepseek-v3"
}
},
"routing": {
"preferCheapFor": ["summary", "classify", "rewrite"],
"preferStrongFor": ["research", "tool-use", "longform"]
}
}
{
"models": {
"default": "openai/gpt-5.4",
"fallback": "anthropic/claude-sonnet",
"lowCost": "deepseek/deepseek-v3",
"longContext": "google/gemini-2.5-pro",
"local": "ollama/qwen3:14b"
},
"routing": {
"rules": [
{ "match": "attachments|pdf|long-context", "use": "longContext" },
{ "match": "classification|batch|summary", "use": "lowCost" },
{ "match": "private|offline", "use": "local" },
{ "match": "default", "use": "default" }
]
}
}
{
"models": {
"default": "qwen/qwen-max",
"fallback": "deepseek/deepseek-v3",
"backupInternational": "openai/gpt-5.4",
"local": "ollama/qwen3:8b"
},
"policy": {
"preferDomestic": true,
"allowInternationalFor": ["coding", "deep-research"],
"sensitiveTasksUseLocal": true
}
}
The above examples illustrate the approach. Field names may differ across OpenClaw versions. For actual production configuration, refer to the current template and plugin documentation.
The value of model routing is not showing off — it's saving money and reducing risk. Once your Agent is handling tens to hundreds of requests per day, manually specifying models has already become outdated. A good routing strategy means users just submit tasks, and the system decides which model to use.
Writing, summarization, code, search, and attachment comprehension each go to different models — the most direct and stable approach.
Try a low-cost model first; upgrade to a stronger model only when confidence is insufficient or the task fails.
Private data, local files, and internal documents go to local or private cloud models first.
| Routing Dimension | Recommended Practice | Reason |
|---|---|---|
| Request length | Short requests go to cheaper models; ultra-long inputs go to long-context models | Reduces wasted context in expensive models |
| Tool complexity | Multi-step tool-calling goes directly to a stable primary model | Avoids cheaper models going off-track midway |
| Failure retries | Don't blindly retry the same model three times — switch to a backup provider on the second attempt | Reduces the impact of single-provider failures |
| Sensitive content | Financial, HR, and private files go to local or private models first | Controls data boundaries |
| Batch tasks | Split and run concurrently on low-cost models | Compresses per-task cost |
if task.has_private_files:
model = "ollama/qwen3:14b"
elif task.contains_many_attachments or task.context_length > 120000:
model = "google/gemini-2.5-pro"
elif task.requires_multi_tool_use or task.is_high_value:
model = "openai/gpt-5.4"
elif task.type in ["summary", "classify", "rewrite", "tagging"]:
model = "deepseek/deepseek-v3"
else:
model = "anthropic/claude-sonnet"
Routing Design Principles:
First: don't let the same task get conflicting instructions from both the router and the Agent prompt.
Second: don't attribute all failures to the model. Often the issue is tool permissions, network, or context segmentation.
Third: routing rules must be observable. At minimum, log "why this model was chosen this time" — otherwise future optimization is impossible.
Recommended combination: OpenAI primary model + DeepSeek daily secondary model
Suitable tasks: Chat, schedule organization, search summarization, channel replies, lightweight automation.
Reason: Stable overall capability, controllable cost, and a solid first configuration for long-term operation.
Recommended combination: Claude primary model + Gemini document model
Suitable tasks: Long-form rewriting, reports, interview organization, academic assistance, resource compilation.
Reason: The former handles quality, the latter handles long document ingestion — a highly efficient combination.
Recommended combination: OpenAI or Claude primary model + local model as safety net
Suitable tasks: Code explanation, script generation, log debugging, repository Q&A.
Reason: Top closed-source models handle complex reasoning; local models cover low-risk internal queries.
Recommended combination: Qwen / Doubao / Qwen primary model + DeepSeek low-cost layer
Suitable tasks: Lark, DingTalk, WeChat Work, SOP Q&A, internal document summarization.
Reason: Better Chinese performance and local business support; lower compliance and integration costs.
Recommended combination: Qwen / Llama local primary model + cloud high-tier model as manual-trigger supplement
Suitable tasks: Private documents, intranet Q&A, offline batch processing, sensitive material organization.
Reason: Run locally by default; manually upgrade when high quality is genuinely needed. Boundaries stay clear.
The best model configuration is not the one that tops the leaderboard — it's the one that can run stably for three months, keeps the bill within reason, and that you know how to recover when something goes wrong.
Many teams are not brought down by model unit prices — they're brought down by incorrect usage patterns. For example: routing all messages through the strongest model; always attaching overly long histories to every request; retrying failures blindly; using a flagship reasoning model for a simple summarization task; calling a full Agent chain for low-value notifications. Cost explosions are usually system design problems, not provider problems.
| Optimization Point | Wrong Approach | Recommended Approach | Expected Benefit |
|---|---|---|---|
| Default model | Always use the strongest model for everything | Tier by task level; set default and upgrade conditions | Directly reduces total bill |
| Context management | Always include the full chat history | Summarize history; keep only necessary context | Reduces token waste; lowers latency |
| Tool-calling | Every request goes through the full tool chain | First check whether tools are actually needed | Reduces failure rate and redundant steps |
| Long-form processing | Dump entire original text directly into an expensive model | Slice and outline first, then hand summary to high-tier model | More stable quality at lower cost |
| Local models | Never use local models at all | Offload privacy and low-value tasks to local | Reduces cloud API call volume |
1. Use a balanced model by default
2. Upgrade only when the following conditions are detected:
- Multi-step tool-calling
- Long-form high-quality rewriting
- High-value user request
- Complex code / data analysis
3. Repeated tasks go through cache first
4. Sensitive data goes to local model first
5. Weekly review:
- Call count per model
- Average response time
- Failure rate
- Average cost per task
If there's one principle to remember, it's this: replace "strongest model" with "optimal system." In the Agent era, the model determines the ceiling; scheduling determines profitability and sustainability.