When you build AI agents, every external call — to an LLM, an API, a database — is a tool call. In production, those calls are expensive, unreliable, and slow. ToolOps is to AI Tools what a Service MeshAn infrastructure layer that handles service-to-service communication, resilience, and observability. is to microservices: a framework-agnostic middleware SDK that upgrades any Python function with caching, resilience, and observability — with zero changes to your business logic.
1. Philosophy
The analogy that best describes ToolOps is the Service Mesh. Just as a Service Mesh — think Istio or Linkerd — sits between microservices to handle retries, timeouts, and circuit breaking transparently, ToolOps sits between your AI agent and its tools. The application code knows nothing about the infrastructure layer beneath it.
"ToolOps is to AI Tools what a Service Mesh is to Microservices."
This separation of concerns is intentional. Tool authors should focus on what a tool does, not on managing the distributed system complexities of calling it reliably at scale. ToolOps absorbs that complexity through a decorator interface — a design deliberately chosen to be the thinnest possible integration surface.
| Feature | Standard @lru_cache |
ToolOps SDK |
|---|---|---|
Async / await support |
❌ Not supported | ✅ Native async |
| Semantic cacheA caching method that uses vector embeddings to match the "intent" of a query rather than just exact text.Learn more → | ❌ Exact match only | ✅ Vector embeddings |
| Distributed / persistent cache | ❌ In-memory only | ✅ Postgres, File, Memory |
| Circuit BreakerA design pattern that stops requests to a failing service to prevent cascading failures.Learn more → & Retries | ❌ None | ✅ Built-in with backoff |
| Request coalescingA technique where multiple concurrent requests for the same data are combined into a single upstream call.Learn more → | ❌ Thundering herd | ✅ Multicast result |
| Stale-if-error fallback | ❌ Raises exception | ✅ Serve last good value |
| ObservabilityThe ability to measure the internal state of a system by examining its outputs (logs, metrics, traces).Learn more → (OTELOpenTelemetry. An open-source observability framework for generating and collecting telemetry data./Prom) | ❌ None | ✅ Structured telemetry |
| AI-native (MCPModel Context Protocol. An open standard for connecting AI models to data and tools safely.Learn more →/Frameworks) | ❌ Generic | ✅ Native integrations |
2. The production wall
Every agent developer hits the same wall when moving from demo to production. The symptoms are predictable: API bills that scale faster than usage, agents that crash on the third retry, request queues that bottleneck under any real concurrency load, and workflows that are completely opaque when they fail. ToolOps addresses each bottleneck at the infrastructure layer.
| Problem | Business Impact | With ToolOps |
|---|---|---|
| Redundant API calls | 💸 10× cost spikes | 100 calls → 1 real + 99 cache hits |
| Similar queries | 💸 LLM tokens wasted | Semantic match → same result |
| API instability | 💥 Agent crashes & loops | Circuit Breaker + auto-retry |
| Concurrency bursts | 🐢 Thundering herd | Request coalescing → 1 real call |
| Zero observability | 🌑 Blind operations | Structured JSON + OTEL traces |
3. Installation
ToolOps is available on PyPI. The core package is zero-dependency and installs in seconds. Optional extras unlock Postgres persistence, semantic embedding, and OpenTelemetry.
# The [extras] syntax requires quotes in zsh/bash
pip install "toolops[all]"
# Modular extras
pip install "toolops[postgres,semantic,otel]"
:: Use double quotes for extras
pip install "toolops[all]"
:: Using the launcher
py -m pip install "toolops[all]"
Note on Shells
Shells like zsh treat square brackets as glob patterns. Always wrap the package name in double quotes: "toolops[all]" to avoid no matches found errors.
It is strongly recommended to install ToolOps within a virtual environment to avoid dependency conflicts:
# Create and activate (.venv)
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install and verify
pip install "toolops[all]"
toolops doctor
The minimal install is intentional: pull only what your environment requires. The [all]
extra is recommended for production deployments where you need persistent caching and distributed
tracing from day one.
4. Core architecture
The architecture has two layers: a decorator interface that sits on your tool functions, and a pluggable backend system that handles storage and embedding. The two layers are entirely decoupled — you can swap backends without changing a single line of tool code.
4.1 Decorators
ToolOps provides two decorators that map cleanly onto the two categories of tool operations:
@readonly — read operations with caching
recommended for reads
from toolops import readonly, cache_manager
from toolops.cache import MemoryCache
cache_manager.register("memory", MemoryCache(), is_default=True)
@readonly(cache_backend="memory", cache_ttl=3600, retry_count=3)
async def get_market_data(ticker: str) -> dict:
return await api.fetch(ticker) # Automatically cached, retried, and traced
# Automatically cached & retried on failure.
@sideeffect — write operations with resilience
for writes
@sideeffect(circuit_breaker=True, timeout=5.0, retry_count=2)
async def execute_trade(order: dict) -> bool:
return await broker.submit(order) # Protected by circuit breaker and timeout
# No caching — but protected by circuit breaker and timeout.
The distinction between read and write operations is a first-class concept in ToolOps. Reads are idempotent and safe to cache and retry. Writes are not — they get resilience patterns only, never automatic retries that could cause double submissions.
4.2 Cache backends
Register backends once at application startup, then reference them by name across all your decorators. Multiple backends can coexist — a fast in-memory layer for hot data, Postgres for persistent audit trails, and a semantic layer for NLP workloads.
| Backend | Install extra | Best suited for |
|---|---|---|
MemoryCache |
— (core) | Development, testing, single-process deployments |
PostgresCache |
[postgres] |
Persistent cache with full audit trail across restarts |
FileCache |
— (core) | Lightweight local persistence without a database |
SemanticCache |
[semantic] |
NLP and RAG pipelines — intent matching over vector embeddings |
from toolops import cache_manager
from toolops.cache import MemoryCache, PostgresCache
# Fast default layer
cache_manager.register("memory", MemoryCache(), is_default=True)
# Persistent layer for audit-sensitive operations
cache_manager.register("postgres", PostgresCache(dsn=DATABASE_URL))
5. Resilience patterns
Beyond basic try/except blocks, ToolOps implements three deterministic patterns drawn from distributed systems engineering. Together they ensure that an agent never gets trapped in a failure loop, never exhausts its API budget on a degraded service, and never serves stale data when the upstream is healthy.
5.1 Circuit breaker
The circuit breaker pattern stops all calls to a failing service after a configurable failure threshold. Once open, the circuit fails fast — returning immediately rather than waiting for a timeout — and enters a recovery window before attempting to re-establish the connection. This prevents a single failing tool from cascading into full agent failure.
@readonly(
circuit_breaker=True,
circuit_failure_threshold=5, # opens after 5 consecutive failures
circuit_recovery_timeout=60 # retries after 60 seconds
)
async def get_exchange_rates() -> dict:
return await forex_api.fetch()
5.2 Stale-if-error
When an upstream service fails and no live data can be retrieved, ToolOps can automatically fall back to the last known good value from the cache — even if that value has exceeded its normal TTL. This is the production equivalent of "serve something useful rather than crashing."
@readonly(
cache_ttl=3600,
stale_if_error=True,
stale_ttl=86400 # serve stale data for up to 24h on failure
)
async def get_exchange_rates() -> dict:
return await forex_api.fetch()
5.3 Request coalescing
When multiple agent instances call the same tool simultaneously — a common pattern in multi-agent pipelines — ToolOps detects the in-flight request and holds subsequent callers until the first completes. The single real result is then multicast to all waiting callers. Under high concurrency, this collapses N upstream calls to exactly one.
Impact
In a benchmark with 50 concurrent agent calls to the same weather tool, request coalescing reduced upstream API calls from 50 to 1 — a 98% reduction in credit consumption with zero changes to the calling agents.
6. Semantic caching
Traditional caches operate on exact key equality. This works for deterministic systems, but agents are not deterministic systems — the same user intent surfaces in dozens of different phrasings. ToolOps uses vector embeddings to understand the meaning of a tool call, not just its literal arguments. Two queries that express the same intent return the same cached result.
— Call 1 (cache miss → real API call)
query: "What is the status of invoice #442?"
— Call 2 (semantic similarity: 0.97 → cache hit)
query: "Check the current status for invoice 442"
— Call 3 (semantic similarity: 0.94 → cache hit)
query: "Invoice 442 — is it paid?"
from toolops.cache import SemanticCache, SentenceTransformerEmbedder
embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")
semantic = SemanticCache(embedder=embedder, threshold=0.92)
cache_manager.register("semantic", semantic)
@readonly(cache_backend="semantic")
async def ask_agent(query: str) -> str:
return await llm.complete(query)
# Reduces LLM latency by up to 90% on repeated intent patterns.
The similarity threshold (0.92 in the example above) is the primary tuning lever.
Higher values require tighter semantic alignment before a cache hit is declared; lower values are
more aggressive. The right value depends on how much variation is acceptable in your tool's
input domain — a factual lookup tolerates a higher threshold than a creative generation task.
7. Observability
Debugging non-deterministic agent workflows requires instrumentation that goes deeper than application-level logging. ToolOps emits structured telemetry at every stage of the tool lifecycle — hits, misses, retries, circuit state transitions — giving you a complete audit trail without any manual instrumentation.
8. Ecosystem & MCP
ToolOps tools are plain Python functions. That design choice is not accidental — it means they work natively with every agent framework that accepts Python callables, with no adapter code and no framework-specific configuration.
| Framework | Integration type | Status |
|---|---|---|
| LangChain / LangGraph | Built-in helper | Available |
| CrewAI | Built-in helper | Available |
| LlamaIndex | Built-in helper | Available |
| Model Context Protocol (MCP) | Built-in helper | Available |
| PydanticAI | General compatibility | Available |
| AutoGPT & custom frameworks | Any Python callable | Available |
The MCP integration deserves particular mention. ToolOps includes a built-in adapter that exposes any decorated tool as an MCP-compatible definition — without writing a single line of JSON Schema. This means your production-grade, resilient tools are available to Claude Desktop, Cursor, or any MCP-compatible host instantly.
from toolops.integrations.mcp import MCPIntegration
# get_weather is already decorated with @readonly
definition = MCPIntegration.to_mcp_definition(get_weather)
# → MCP-compatible tool definition, ready for Claude Desktop or Cursor.
9. CLI & operations
ToolOps ships with a command-line tool for inspecting and managing your tool infrastructure in production. The CLI is designed for operators and CI pipelines — not just developers.
# List all available commands
toolops --help
# Check system health and backend readiness
toolops doctor
# View real-time cache statistics
toolops stats --app my_app:setup_toolops
# Clear a specific cache backend
toolops clear postgres --app my_app:setup_toolops
The toolops doctor command is particularly useful in deployment pipelines: it
validates backend connectivity, checks embedding model availability, and reports circuit breaker
state — a readiness check you can wire directly into your health endpoint.
10. Roadmap
ToolOps is under active development. The following capabilities are planned for upcoming releases, ordered by expected delivery. Contributions and feedback on prioritization are welcome via GitHub.
- Web Dashboard. Real-time metrics, cost attribution, and cache hit rates in a browser UI — no Prometheus or Grafana setup required.
- Budget Control. Hard limits on tool-induced API costs per hour or per day, configurable per tool and per backend.
- Native MCP Server. One-click deployment of ToolOps tools as a standalone MCP host — no Claude Desktop configuration required.
- Streaming Middleware. Full support for streaming tool outputs, enabling real-time response generation in agent pipelines.
- New Backends. MariaDB, ChromaDB, and Pinecone support — extending the persistent and vector-native cache options.
Get involved
ToolOps is open source under Apache 2.0. Star the repository, open an issue, or submit a pull request on GitHub. The project is built in the open — roadmap priorities are shaped by real-world production use cases from the community.
Frequently Asked Questions
Why use ToolOps instead of built-in framework tools?
ToolOps is framework-agnostic. While LangChain or CrewAI have basic retry logic, ToolOps provides industrial-grade patterns like Circuit Breakers, Request Coalescing, and Semantic Caching that work across any Python tool with zero migration cost.
What happens if my API is down?
ToolOps protects your system in three ways: Circuit Breakers stop the hammering, Automatic Retries handle transient blips, and Stale-if-Error fallback can serve the last known good value from the cache so your agent keeps moving.
How does Request Coalescing prevent "Thundering Herds"?
If 50 agents call the same tool simultaneously during a cache miss, ToolOps executes the real API call once and multicasts the result to all 50 callers. This prevents overwhelming your upstream API rate limits.
Why do I get "zsh: no matches found" during install?
Shells like zsh and bash treat square brackets [] as globbing
characters. You must wrap the install command in double quotes: pip install "toolops[all]".
Does it work with LangGraph and MCP?
Yes. ToolOps is designed as a foundation for Model Context Protocol (MCP) servers and LangGraph stateful agents. It provides the industrial-grade infrastructure those frameworks lack natively.