Deep Dive into Model Context Protocol (MCP)

Jun 23, 2025

Great. I’ll begin a deep dive into the Model Context Protocol (MCP), structured to support senior engineers and solution architects. This will cover MCP’s architecture, message flow, capability model, security mechanisms, and real-world adoption—as well as comparisons, implementation snippets in Python, and a cheat-sheet summary.

I’ll let you know once the deliverables are ready for review.

Executive Summary

The Model Context Protocol (MCP) is an open standard introduced by Anthropic to seamlessly connect AI models with external data and tools. It was created to solve the fragmentation of custom integrations: instead of writing a new connector for every data source, MCP provides a universal, plug-and-play interface. MCP’s architecture defines Hosts, Clients, and Servers to expose capabilities like data resources, external tools, reusable prompts, and even letting servers ask the model for completions. Communication is built on JSON-RPC 2.0 for structured request/response exchanges, supporting streaming results over HTTP or stdio. A strong security model underpins MCP – OAuth2 tokens with scoped access and user consent guard every action. Leading AI platforms (Claude, ChatGPT, etc.) and enterprises are rapidly adopting MCP, which promises vendor-neutral interoperability across systems. This report dives into MCP’s design, message flow, security, comparisons to alternatives, and real-world usage.

Origin & Purpose of MCP

Who created MCP and why. The Model Context Protocol was open-sourced by Anthropic in late 2024 as a vendor-neutral standard for integrating AI assistants with real-world data and services. Prior to MCP, any time an AI model needed external information (from a database, app, or API), developers had to build bespoke connectors or plug-ins. This led to isolated “siloed” AI agents that couldn’t easily share integrations, impeding scalability. Anthropic and other contributors designed MCP to be plug-and-play – AI models can “plug” into a growing library of connectors instead of reinventing each integration. The protocol’s key design objectives include being open and vendor-neutral (so it works across different AI providers), modular (a single standard client can interface with any MCP-compliant server), and preserving context across tools. In fact, MCP envisions AI systems maintaining their contextual knowledge even as they move between tools and datasets, rather than losing state between siloed plug-ins. By standardizing how models access external context, MCP aims to unlock more relevant and powerful AI responses without proprietary lock-in. As the official introduction notes, MCP provides “the flexibility to switch between LLM providers and vendors,” while following best practices for data security. In summary, Anthropic created MCP to be a “USB-C for AI” – a universal port that connects any AI model to any data source in a secure, consistent way.

Architecture Overview

Host–Client–Server triad. MCP follows a simple client–server paradigm within an AI host environment. The Host is the AI application or agent platform (for example, Claude Desktop, an IDE plugin, or ChatGPT) that wants to use external capabilities. Inside the host, an MCP Client runs as an intermediary that maintains a dedicated 1-to-1 link with each Server. Each MCP Server is a lightweight connector exposing a specific service or dataset (e.g. a GitHub server, a Slack server, etc.), and a host can connect to multiple servers concurrently through multiple client instances. In effect, the host multiplexes many server connections – one client per server – enabling the AI model to tap into various systems at once. Crucially, servers advertise their “capabilities” when the connection starts, informing the client what they can do or provide. For example, a GitHub MCP server might expose capabilities for repository resources (files, issues) and tools (actions like create_issue). Servers expose these via a standardized interface so the model can discover available functions and data without custom coding. The architecture cleanly separates concerns: tool/service providers implement an MCP Server once for their API, and AI platform developers implement an MCP Client once in their host – after that, any AI model on that host can talk to any tool that speaks MCP. This dramatically reduces integration complexity from an M×N problem down to M+N (M models, N services). The communication across Host–Client–Server can be local (on the same machine) or remote. MCP currently supports multiple transport layers: for local connectors, a simple stdio stream (pipes) can carry messages, whereas remote connectors use HTTP – typically a streamable HTTP connection using Server-Sent Events (SSE) to push real-time responses. All transports share the same message format: JSON messages following the JSON-RPC 2.0 specification. JSON-RPC was chosen for its simplicity and widespread support in many languages, making MCP easier to implement across different tech stacks. Although not yet in the spec, WebSocket transport is anticipated on the roadmap to enable fully bidirectional streaming in the future (some early implementations like Cloudflare have experimented with WebSocket+SSE hybrids for long-lived agent sessions). In summary, MCP’s architecture is a flexible hub-and-spoke: the host (hub) uses MCP clients to connect to many servers (spokes), each server exposing a standard set of capabilities to the AI model in a unified way.

Core Message Types & Lifecycle

JSON-RPC messaging. At the protocol layer, MCP exchanges JSON-RPC 2.0 messages between clients and servers. Requests (with a method and params) expect a response. A successful response carries a Result object (the outcome data), while failures return an Error with an error code and message. MCP also uses Notifications, which are one-way messages that have a method but no response (used for events or updates). This RPC framing allows bidirectional communication – notably, not only can the client ask the server to do something, but the server can also send requests back (for instance, requesting user input or asking the client to call the AI model) as long as the host supports it. The connection lifecycle begins with a handshake: the client sends an initialize request specifying its supported MCP version and declared capabilities; the server responds with its own version and capabilities; then the client confirms with an initialized notification. This version negotiation ensures both sides “speak” a compatible MCP dialect (if not, they’ll gracefully terminate). Once initialized, the connection enters normal message exchange: the client can call the server’s methods (e.g. “list all resources”) and the server can respond, or vice-versa for any server-initiated requests. Either side may also emit notifications – for example, a server might send a notifications/resources/updated event when a data source changes. Finally, either party can terminate the session by closing the transport or calling a shutdown procedure (the spec defines a clean close() sequence as well). MCP also standardizes error codes based on JSON-RPC: e.g. -32601 for “Method not found”, -32602 for invalid params, etc., and allows custom error codes for application-specific issues. These errors propagate as error responses or transport-level errors. In practical usage, an interaction might look like: the AI (via client) sends a request {"id":1,"method":"tools/call","params":{...}} to invoke a tool; the server executes and returns either a result or an error. The stateless JSON-RPC framing, combined with the initialization handshake, yields a robust yet simple lifecycle: init → use → close. This predictable flow makes it straightforward to implement MCP in many languages and to integrate it with conversational AI loops (e.g. the AI model’s reasoning cycle knows when it’s awaiting a tool result or when the session is closing). Additionally, MCP supports streaming responses for long-running operations: with the HTTP SSE transport, a server can send partial results or progress notifications before the final result. This means an AI agent isn’t stuck waiting silently – it could start receiving chunks of a large file or intermediate status updates in real time. The design of MCP’s messaging ensures that multi-step tool use (like an AI calling a sequence of API actions) can happen in a conversational loop without breaking the underlying session.

Capability Surfaces: Resources, Tools, Prompts, Sampling

One of MCP’s core strengths is how it standardizes different capability types that servers can expose. The four primary surfaces are Resources, Tools, Prompts, and Sampling, each serving a distinct role:

Resources (Read-Only Context): Resources represent data that an MCP server can provide to enrich the AI model’s context. They are typically read-only assets like files, database records, documents, images, etc., identified by URIs. For example, a “Google Drive” MCP server might expose files as resources (with URIs like drive://folder/file.txt). Clients can query a list of available resources via a standard resources/list method, and retrieve content with resources/read. Importantly, resources are application-controlled, meaning the user or client chooses which resources to use. Many MCP clients (such as Claude Desktop) require explicit user selection of a resource before the model can see it, ensuring the AI doesn’t automatically ingest sensitive data. This design gives human operators fine control over context. Resources are perfect for retrieval-augmented generation use cases – the AI can ask for a file’s text and then incorporate that information into its response. Servers can also notify clients of resource changes (via notifications/resources/list_changed or .../updated) to support real-time content updates.
Tools (Actions & Function Calls): Tools are operations that an AI can invoke via the server. If resources are about passive data, tools are about doing things – sending an email, querying an API, writing to a database, etc. An MCP server defines a set of tools, each with a name, description, and JSON schema for inputs/outputs. The client can discover available tools by calling tools/list, and then trigger a tool with tools/call. Tools are designed to be model-controlled: the AI model decides when to use a tool (typically via its internal reasoning or function-calling logic), and the call is executed with the user’s approval. For instance, a “Slack” MCP server might have a post_message tool. If a user asks the AI to send a Slack notification, the model can choose that tool, and the host (client) will invoke tools/call with the post_message action. MCP’s March 2025 update expanded tool descriptors to include annotations like whether a tool is read-only or potentially destructive – helping clients and users understand the tool’s effects. Tools give AI systems the ability to act on the world (not just read), but always through a controlled interface. Typically, a user must grant permission (in the UI) before a model-performed action like sending an email actually goes through, providing a human-in-the-loop safety check. In summary, tools are functions the AI can call via MCP, enabling transactional capabilities beyond the chat text itself.
Prompts (Templated Tasks): Prompts in MCP are pre-defined prompt templates or workflows that servers offer as shortcuts for common tasks. Think of them as macro instructions or canned conversations that can be inserted when needed. A server could provide a prompt like “summarize this document” or “onboard a new employee” as a structured sequence of steps for the AI to follow. Each prompt has a name, description, and optional arguments it accepts. Clients list prompts via prompts/list and retrieve the full prompt (with placeholders filled) via prompts/get. The prompt content typically consists of one or more message templates – for example, a user message with certain context attached. Prompts are user-controlled in spirit: a human might explicitly choose a prompt from a menu (or trigger it via a slash-command) to guide the AI. This mechanism ensures standardized, reusable interactions. For instance, a “GitHub” server might include a prompt for “review this PR” that automatically pulls the PR diff as a resource and provides a template for analysis. By exposing it through MCP, any AI client can surface that task to users in a consistent way. Prompts help orchestrate multi-step workflows and ensure the AI follows established patterns for specific tasks, improving reliability and user trust.
Sampling (LLM Delegated Calls): Sampling is a special capability that actually lets the MCP server ask the AI model for help. It enables more advanced agent behaviors by having the server request a completion from an LLM via the client. In practical terms, a server can send a sampling/createMessage request with a prompt (and perhaps preferences like “use a cheaper but faster model”) to the client; the client (the host AI app) will review and route this to an LLM, get the model’s generated response, and return it back to the server. This round-trip allows an agent to handle ambiguous situations or sub-tasks by effectively “asking itself” or another model for a suggestion. For example, an MCP server might encounter a complex data query and use sampling to ask the LLM to generate an SQL statement, which it then executes. The design enforces a human/host review at both request and response steps – the client can modify or veto the server’s prompt or the model’s output to maintain safety. Sampling opens the door to chain-of-thought and reflective agents: an AI can iteratively consult the model for intermediate decisions. This feature is still emerging (Claude’s client did not support sampling as of early 2025), but it is powerful. It essentially makes the AI a tool user as well as a tool executor. Through sampling, MCP servers can leverage the intelligence of LLMs in their own logic, all within the secure protocol framework.

Each of these surfaces – resources, tools, prompts, sampling – is exposed in a standardized way. During the initial handshake, the server advertises which of these capabilities it supports (for example, a server might declare { capabilities: { "resources": {}, "tools": {}, "prompts": {} } } in its initialize response). The client and model can then adjust their behavior accordingly. By providing a rich yet structured set of capability types, MCP ensures that an AI agent can both “read” (via resources) and “act” (via tools) on external data, use higher-level scripts (prompts), and even loop in model assistance (sampling) – all through one protocol. This is a sharp contrast to earlier systems like ChatGPT plug-ins where only API calls (tools) were standardized. MCP’s comprehensive approach is a key reason it’s seen as enabling more agentic AI systems that truly integrate with their environment.

Security & Authentication Model

Secure by design. Given that MCP enables AI agents to access sensitive data and perform actions, security is paramount. The MCP specification includes an in-depth authentication and authorization framework aligned with OAuth 2.0 standards to ensure only permitted access is allowed. In the MCP model, each server typically acts as an OAuth 2 “Resource Server” – meaning it expects a valid bearer access token with each request, proving the client is authorized. For instance, a company’s “Salesforce” MCP server would require an OAuth token scoped to specific Salesforce data, obtained via a user login/consent. MCP’s v0.5 spec (early 2025) formalized this by classifying MCP servers as OAuth-protected resources and adding metadata so clients can discover the server’s OAuth provider. In practice, when an AI host wants to connect to a remote MCP server, it must go through an OAuth flow (or other auth mechanism) to get a token for that server, then present that token on each request (e.g. in an HTTP Authorization header). The server validates that the token’s audience and scope match its service, rejecting anything else. This design prevents a confused deputy scenario where a stolen token for one service could be used on another – MCP servers must not accept tokens not intended for them. Additionally, OAuth scopes allow fine-grained permission: an MCP server might request scopes like “read:driveFiles” or “send:emails”, and if the token lacks those, it will refuse calls (HTTP 403 Forbidden). In short, token-based auth is the norm for SaaS connectors. For local or self-hosted MCP servers, authentication might be simpler (e.g. none or local API keys), but the same principles of explicit permission apply.

User consent and sandboxing. MCP was built with a user-in-the-loop philosophy. Tools and actions exposed via MCP typically require user approval before execution – hosts like Claude Desktop enforce this by prompting “Allow AI to use tool X?”. This ensures that even if an AI agent is planning to do something potentially sensitive (like deleting a file via a tool), the user is aware and can intervene. The protocol itself doesn’t dictate the UI, but it provides the necessary metadata (descriptions, hints like destructiveHint on tools) for clients to present informed consent dialogs. Servers also sandbox their operations. For example, a filesystem server will restrict file access to certain directories to prevent the AI from snooping everywhere unless permitted. According to a Nasuni analysis, MCP’s security features include “explicit user consent requirements, clear permission models, granular access controls, and transparent tool usage” – all aimed at enterprise-grade security.

Threat model and hardening. With great power (AI tools) comes great responsibility to guard against misuse. The community has identified various threat vectors unique to AI+MCP setups, such as tool injection (a malicious server exposing a “harmless” tool that actually does something dangerous), data poisoning where prompt inputs could trick an agent into unauthorized tool use (analogous to prompt injection), or server spoofing (a fake MCP server posing as a popular one to lure users). MCP’s answer is defense in depth. Each server can be independently sandboxed and permissioned, so even if one is compromised or misused, it shouldn’t grant access beyond its scope. The OAuth model means a token for one server can’t magically be reused on another. The spec also forbids token passthrough – an MCP server should never simply forward your token to another service without explicit design – to avoid a chain of trusts issue. Furthermore, best practices have emerged: restrict tool scopes (only install/enable the tools your AI truly needs), pin versions of community connectors so updates don’t introduce surprises, and thoroughly audit any MCP server’s code or metadata before use. The MCP working group has published a security guide emphasizing input validation, output filtering, and safe defaults (for example, limit the length of strings to avoid prompting an LLM with extremely long, possibly malicious content). As of mid-2025, security researchers are actively probing MCP connectors – early audits found issues like command injection, SSRF, and file path traversal in a portion of community-built servers. This is typical for a young ecosystem, and it underlines that operational security reviews are a must when deploying MCP in production. On the flip side, MCP’s openness means these issues are documented and being fixed in the open. In summary, MCP’s security model combines protocol-level guardrails (token auth, audience binding, error codes) with client-side enforcement (user consents, usage policies) to mitigate risks. Organizations can further harden deployments by running only trusted MCP servers (or writing their own), and by monitoring usage logs for anomalies (MCP servers/clients are encouraged to log all requests and actions for auditing). Security remains a top priority on the MCP roadmap, with ongoing improvements to make permissioning more granular and intuitive.

MCP vs ChatGPT Plugins, LangChain, and Others

With several paradigms available for extending AI models, it’s useful to compare MCP with alternatives:

MCP vs. ChatGPT Plug-ins: Both MCP and OpenAI’s ChatGPT plug-in system enable AI to call external APIs, but their approaches differ. ChatGPT plug-ins (launched 2023) require developers to host a web service with an OpenAPI specification. ChatGPT then internally decides when to call those API endpoints based on the user’s query and the plugin definitions. This is powerful but proprietary – the integration only works within OpenAI’s ecosystem, and each plugin is bespoke. MCP, in contrast, is an open standard and not limited to one AI model. Any AI client that supports MCP (Claude, ChatGPT Enterprise, etc.) can use any MCP server. Also, MCP is more comprehensive: it standardizes not just API calls, but also data payloads (resources) and prompt templates. Another difference is bidirectionality – MCP servers can request things of the client (e.g. ask for an LLM completion or more info), whereas ChatGPT plugins are generally one-way (the model calls the API; the API can’t ask the model questions except via a new user prompt). In terms of security, ChatGPT plugins rely on OpenAI’s moderation and a manual review process for plugins, while MCP allows anyone to run connectors but with OAuth and user-consent guardrails. One can think of ChatGPT’s approach as a closed app store, vs MCP as an open protocol akin to the web. Indeed, by late 2024 OpenAI recognized the value of MCP; they introduced a “Work with Apps” feature to let ChatGPT interface with certain developer tools, but this was limited to a curated set of apps and not a general solution. MCP is broader and vendor-neutral, which is why by 2025 OpenAI announced plans to natively support MCP in ChatGPT as well – a significant endorsement of the standard’s interoperability.
MCP vs. LangChain tools: LangChain (and similar AI frameworks) offer a concept of “tools” which developers can program for an LLM to use. For example, in LangChain you might define a Python function that searches a knowledge base, and the LLM can call it. This is very flexible for prototyping, but it’s a framework-specific, in-process approach – not a network protocol. LangChain tools run in the same environment as the AI or its orchestrator; there’s no built-in way to connect to external services except by writing code or using APIs. MCP can be seen as complementary: you could wrap an MCP client/server around LangChain tools to expose them to other systems, or vice versa use LangChain in an MCP server’s implementation. The key difference is that MCP formalizes an interchange format and discovery mechanism (with schemas and methods) so that tooling is reusable across projects and languages. LangChain provides many off-the-shelf tool integrations, but they tend to be tightly coupled to LangChain’s Python/TypeScript code and not directly usable by other AI agents. In contrast, an MCP server (say for Google Calendar) can be written once and any MCP-compatible agent (Claude, GPT-4, etc.) can utilize it. The trade-off: LangChain’s approach might be simpler for quick custom logic (since it’s just function calls in code), whereas MCP’s approach encourages a bit more formal definition (JSON schemas, separate process or service). For production scenarios, MCP’s decoupling (tool code runs separate from the model, possibly even maintained by the data source vendor) brings better isolation and maintainability. In short, LangChain is a toolkit for building one AI agent’s tools, while MCP is building an ecosystem of interchangeable tools.
MCP vs. OpenAI “Work with Apps”: OpenAI’s Work with Apps (introduced in late 2023, also known as ChatGPT’s developer tools integration) allowed ChatGPT to interface with certain IDEs and productivity apps without the user writing API calls. For example, ChatGPT could open a file in Visual Studio Code or retrieve a GitHub snippet if granted access. This was essentially a set of native integrations OpenAI built into their client, not an open API. It improved ChatGPT’s utility for coding and data tasks but was limited in scope – only select apps were supported and there was no way for third parties to add new integrations. MCP by comparison is universal. Rather than hand-picking a few tools, it provides the scaffolding for anyone to add anything in a standard way. A Medium commentary put it succinctly: “While OpenAI’s Work with Apps enhances ChatGPT for select tools, it is narrow. MCP is a universal and scalable solution… opening the door for any AI system to integrate with a wide array of data sources.”. The emergence of MCP likely influenced OpenAI’s strategy – instead of inventing a proprietary equivalent, OpenAI joined the MCP bandwagon to allow their models to connect with the wider ecosystem. Now ChatGPT (especially the Enterprise edition) can act as an MCP host, meaning businesses can let ChatGPT access internal data through MCP connectors rather than waiting for OpenAI to support a custom plugin.
MCP vs. Google’s Agent-to-Agent (A2A): Google’s Agent2Agent (A2A) protocol (announced April 2025) is often mentioned alongside MCP, but it addresses a different layer of the problem. A2A is about multi-agent communication, i.e. how two or more autonomous agents can coordinate tasks with each other securely. For example, A2A would let a sales agent AI delegate a sub-task to a legal agent AI across organizational boundaries. It defines things like an AgentCard (capabilities manifest), a task lifecycle, and uses JSON-RPC over HTTPS for agent messaging. In contrast, MCP is agent-to-tool (one agent connecting to external tools/services). They are complementary: an agent might use A2A to talk to another agent, and MCP within itself to use tools. In fact, Google’s A2A spec explicitly notes that an agent can mount MCP servers and advertise those skills to others via A2A. A convenient analogy from the WorkOS engineering blog: “Think of MCP as ‘plug this model into my data’ and A2A as ‘now let several specialized models talk to each other.’”. Both protocols share similar design principles (both favor JSON, HTTP/SSE, OAuth, etc.), emphasizing standard web tech and security. But if you’re only building a single-agent system that needs tools, MCP is sufficient. If you foresee agent collaboration or a network of AI services, A2A comes into play, possibly using MCP for each agent’s tool interfaces. Finally, there’s the question of interoperability gains: MCP’s open standard means an integration written for Anthropic’s Claude can also be used by OpenAI’s or Google’s systems if they comply. This interoperability is a leap beyond what closed ecosystems like ChatGPT plugins offered – it prevents lock-in to one AI vendor’s tool format. That said, some gaps remain. One challenge is context length: each MCP tool’s description and available actions must be communicated to the model (often via the system prompt or function definitions), which could become large as you add dozens of servers. A noted concern is that plugging in too many MCP servers (each with multiple tools) can bloat the prompt and confuse the model. The community is exploring solutions like dynamic tool suggestion or letting the model query available tools instead of listing all upfront. Another gap is that MCP doesn’t automatically solve quality or trust in third-party connectors – there’s work to be done in curating and verifying servers (which a central registry could help with, as discussed below). Nonetheless, the consensus is that MCP significantly advances the interoperability of AI tools, providing a common fabric much like HTTP did for information on the web.

Implementation Walkthrough (Example)

To illustrate how developers use MCP, let’s walk through implementing a minimal MCP server and a typical client interaction flow:

Server example – “list tools” in code. MCP provides SDKs in multiple languages (TypeScript, Python, etc.) to simplify server and client development. In a TypeScript implementation, for instance, you can set up a basic server in a few lines:

import { Server, StdioServerTransport } from "@modelcontextprotocol/sdk/server";

// Initialize server with metadata and declare supported capabilities
const server = new Server({ name: "example-server", version: "1.0.0" }, { 
  capabilities: { tools: {} } 
});

// Define a tool (e.g. a simple sum calculator) by handling the standard requests
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [{ 
    name: "calculate_sum", description: "Add two numbers", 
    inputSchema: { type: "object", properties: { a: {type:"number"}, b:{type:"number"} }, required: ["a","b"] }
  }]
}));
server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name === "calculate_sum") {
    const { a, b } = req.params.arguments;
    return { content: [{ type: "text", text: String(a + b) }] };  // return the sum result
  }
  throw new Error("Tool not found");
});

// Connect to stdio (could also be HTTP); this begins listening for client messages
await server.connect(new StdioServerTransport());

In this snippet, we create a server exposing a single tool calculate_sum. We register two handlers: one for the tools/list method (returns the tool’s definition) and one for tools/call (executes the tool). Finally, we attach a Stdio transport, meaning this server will communicate via its standard input/output streams. With these few lines, the server is ready – it will automatically handle MCP initialization when a client connects, advertise that it supports “tools”, and then await RPC calls.

In Python, a similar result is achieved using a high-level decorator style. Using the MCP Python SDK, one might write:

from mcp.server import Server
app = Server("example-server")

@app.list_tools()
async def list_tools() -> list[mcp.types.Tool]:
    return [ mcp.types.Tool(name="calculate_sum", description="Add two numbers", 
               inputSchema={"type":"object","properties": {"a": {"type":"number"},"b": {"type":"number"}}, "required":["a","b"]}) ]

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "calculate_sum":
        a, b = arguments["a"], arguments["b"]
        return mcp.types.ToolResponse(content=[mcp.types.Content.text(str(a+b))])
    else:
        raise Exception("Tool not found")

This defines the same tool in a more declarative way. The SDK takes care of generating the JSON schemas and wiring up the handlers. Running this app with an stdio_server() context would start listening for a client exactly like in the TS example. These minimal examples show how lightweight MCP servers can be – often just tens of lines of code to wrap an existing API or function. Many official connectors (for GitHub, Slack, databases, etc.) follow this pattern where they map an API’s endpoints to MCP methods with relatively little boilerplate (helped by GPT-assisted coding, as Anthropic noted Claude can auto-generate server implementations quickly).

Client flow – invoking a tool and streaming result. On the client side (the AI application), using an MCP server involves a few steps. Let’s say we have the above “calculate_sum” server running and an AI model that wants to use it. Here’s a typical flow in pseudo-steps:

Discovery: The host (AI app) starts the MCP Client and connects to the server (e.g., launches the server process or opens an HTTP connection). It sends the initialize request with its supported version and waits for the server’s response. The server replies with capabilities: { tools: {...} } indicating it has tools. The handshake completes.
Listing tools: The client then sends a tools/list request. Our server returns the JSON definition of calculate_sum (including name, description, input schema). The client registers this as an available function for the AI model. For example, if the model is GPT-4 with OpenAI function calling, the client might dynamically construct a function signature def calculate_sum(a: number, b: number) -> string and inject that into the model’s context with the description. Now the model knows it can call “calculate_sum” as a function during the chat.
Model invokes tool: Suppose the user asks, “What is 2+2?” The model decides to use the calculate_sum tool. It produces a function call calculate_sum(a=2, b=2) in its output (this is how function-calling LLMs work). The host intercepts this and maps it to an MCP request: {"method": "tools/call", "params": {"name": "calculate_sum", "arguments": {"a":2,"b":2} }}. The server receives this, executes our handler which adds 2+2, and returns {"content":[{"type":"text","text":"4"}]} as the result. The MCP client then passes this result back into the model’s context (e.g., as the function’s return value), and the model can use it to formulate the final answer “The result is 4.” for the user. From the user’s perspective, the AI answered using external computation seamlessly.
Streaming outputs: Consider a variation where a tool produces a large result or needs time (imagine a search_documents tool that streams multiple findings). MCP’s HTTP transport with SSE allows the server to send chunks of the result progressively. The client would receive partial content pieces or notifications/progress messages as they arrive. These can be fed into the model incrementally or shown to the user in real-time. For instance, Claude’s client could stream chunks of a long document as they are read from a resource. Once the server finishes sending, it might terminate the SSE stream or send a final completion message. The client then signals the model that the tool call is complete. This streaming ability is important for keeping latency low and enabling the AI to work with streaming data (e.g., reading a log file tail, or showing intermediate steps of a long computation). MCP’s design, by default, supports interactive, step-by-step tool use rather than forcing a strict request/response only after full completion.
Termination: After use, if the user is done or the session ends, the client will close the connection. A well-behaved server might do cleanup on its side (close file handles, etc.) and then shut down if not in use. Because MCP connections can be long-lived (especially in desktop or agent scenarios), they often remain open across multiple user queries until explicitly closed or an error occurs.

Throughout this flow, logging and observability are key. Developers can enable logging on both client and server to trace the messages (initialize, list, call, etc.). This helps in debugging when something goes wrong (e.g., a schema mismatch or an unexpected error). The MCP framework often logs protocol events, and it’s recommended to build metrics around usage (how many times a tool was called, latency of each call, etc.) for monitoring.

The implementation example above demonstrates how MCP in practice abstracts the complexity of calling external code. The AI developer doesn’t have to manually call an API or parse JSON – they interact through the uniform MCP interface. Meanwhile, the tool developer doesn’t worry about AI internals; they just expose their service via MCP and the ecosystem takes care of integrating it with various AI models. This separation of concerns, plus the minimal code needed, has led to a fast-growing catalog of MCP servers across many domains, as we’ll see next.

Operational Considerations

Running MCP in production involves more than just writing code – there are important operational and compatibility aspects to address:

Logging, Monitoring, Observability: It is crucial to have visibility into the MCP message flow. Both clients and servers should implement robust logging of requests, responses, notifications, errors, and performance metrics. For example, a server might log an entry each time a tool is invoked (who invoked it, which parameters, and the outcome). This helps with auditing (especially security-related events) and debugging when the AI’s behavior is not as expected. Observability hooks can include tracing (to measure latency of each step), counters for how often each tool is used, and alarms for failures. In practice, enterprise deployments integrate MCP logs with their central monitoring systems. Because MCP involves multiple components, diagnosing issues may require correlating logs – e.g., matching a client-side “timeout calling tool X” with the server-side stack trace of why tool X hung. To facilitate this, MCP messages include request IDs (from JSON-RPC) that can trace a call through the pipeline. Some implementations also support health checks and status queries – for instance, a client might periodically ping servers or use a special MCP method to check if a server is alive and authorized (not a formal part of the spec, but a sensible practice). The MCP docs recommend implementing diagnostics like health endpoints, connection state monitoring, and resource usage tracking for long-running servers.

Version compatibility: MCP is evolving rapidly, so clients and servers need to handle version differences. The protocol uses a date-stamped version (e.g. 2025-06-18) and the handshake negotiates which version to use. In operational terms, if you update your MCP client library, it might start preferring a newer spec version. If it then connects to an older server, the initialize step might fail. To avoid disruption, many implementations support multiple versions – for example, a server could accept either 2025-03-26 or 2025-06-18 protocol, choosing the highest common one with the client. If no overlap, the spec says the client should gracefully error out indicating version mismatch. Practically, maintaining backward compatibility is a goal: minor revisions aim to be backward-compatible (not bumping the main version), while major changes come with a new date tag. Operators should keep an eye on MCP release notes; for instance, an update that removed JSON-RPC batching support or changed an initialization field might require updating servers. The community often tags servers with the MCP version they support (in documentation or even in the name/version metadata). Running integration tests when upgrading MCP libraries (client or server) is highly advisable to catch any incompatibilities. In summary, version negotiation makes mix-and-match feasible, but aligning on supported versions – especially for long-lived deployments – is important to avoid broken connections.

Long-running tasks & concurrency: Some tools might take a long time (seconds or even minutes) to complete – e.g., a tool that kicks off a database backup or a training job. MCP does not dictate a specific job control system, but there are patterns to manage such cases. One approach is to perform work asynchronously and use progress notifications. For example, a server receiving a heavy request can immediately return an acknowledgement or a handle, and then stream progress via notifications/progress events with a token or task ID. The client could present these updates to the user or to the model (the model might not need them unless it’s designed to handle incremental info). When done, the server sends a final result or a separate completion notification. Another approach is leveraging the streaming response – e.g., sending partial results as they’re ready. If neither is implemented, and a tool call simply blocks until completion, the MCP client and user experience might suffer (they’ll just wait). Thus, implementing timeouts and safeguards is key. Clients typically have a timeout for tool calls; if exceeded, they’ll abort and return an error to the model (which might apologize to the user or try something else). Servers should handle cancellations if possible: if a client disconnects mid-operation, the server should try to stop the work to save resources. Concurrency is another concern – if an AI model fires multiple requests in parallel (some agents might, though many LLMs do one at a time), the server must handle it (e.g., ensure thread safety or queue them). Some MCP servers use an internal queue or limit to avoid overloading backend APIs if the model were to spam requests. Resource management (like limiting how many files can be open or how much data can be returned) is also vital in operations, to prevent memory or bandwidth issues.

Deployment and networking: MCP servers can be run locally (on the same machine as the AI) or as remote microservices. Local servers (stdio transport) are easier for speed and security (no network calls, data stays on device). For example, a user might run a local “Filesystem” MCP server that lets the AI read certain folders on their laptop – here stdio is ideal and latency is negligible. Remote servers (HTTP/SSE transport) allow scale-out and cloud integration. An organization might deploy an MCP server for a corporate database on an AWS endpoint; AI agents anywhere can connect via HTTPS (with proper auth). Operators should enforce TLS for any remote transport (often a given with HTTPS). Also, consider network access: if deploying inside a VPC or firewall, clients need network routes to reach it (some enterprises prefer self-hosting all pieces inside their secure environment). Scaling an MCP server horizontally (load-balancing multiple instances) is possible but tricky due to state: if the server holds any context (like open file handles or cached data) per session, a simple stateless load balancer could break things. Typically, either run one server per client session (the client starts a process just for itself, which is common in local usage), or ensure the server is mostly stateless and can handle multiple clients in isolation (with perhaps a sticky session routing if needed). Observability helps here: tracking how many concurrent sessions and their performance can inform capacity planning.

Backward compatibility with tools and prompts: Aside from protocol version, another operational detail is ensuring that the AI model’s prompt is kept in sync with the server’s capabilities. If a server adds a new tool or changes a tool’s name, the host’s prompt templates for the model might need updating (especially if using function calling, where the functions are derived from the server’s tools/list). Using consistent naming and providing good descriptions mitigate confusion. There is also an expectation that servers should handle unknown methods gracefully – e.g., if a newer client calls a method the old server doesn’t know, it should return a JSON-RPC “MethodNotFound” error and not crash. This way, adding new features doesn’t break older components; they’ll just refuse those calls.

Logging & audit deserve a second mention in operations, because when AI agents start performing actions in enterprise settings, being able to audit who did what becomes critical. MCP clients often tag requests with some form of user or session ID in the params or metadata (or in the auth token scopes) so that servers can record the origin of actions. For example, an audit log entry might say: User Alice (via AI Agent) invoked delete_file on Server X at 10:00, approved via UI. This transparency is key to building trust in agentic actions.

In summary, running MCP effectively means treating it as you would any integration middleware: monitor it, secure it, version it, and plan for errors. The good news is that MCP’s simplicity (JSON messages) and strong standards (HTTP, OAuth) make it fit naturally into existing IT practices. Many teams have found that once connectors are running, the day-to-day management is straightforward – especially compared to maintaining dozens of separate API scripts that MCP may have replaced.

Adoption Snapshot (Early 2025)

Since its introduction, MCP has gained significant traction in the AI and enterprise community. A quick snapshot of adoption and ecosystem as of April 2025:

Notable early servers and connectors. Right from launch, Anthropic provided open-source MCP server implementations for popular systems including Google Drive (file storage), Slack (messaging), GitHub (code repositories), Git (source control), Postgres (SQL databases), and even a Puppeteer server (for controlling a headless browser). These served as reference examples and jump-started the ecosystem. Community contributors and companies quickly followed by adding connectors for many other apps and services. By early 2025, there were MCP servers (official or third-party) for tools like Jira, Confluence, Salesforce, Stripe, HubSpot, Notion, Asana, AWS services, and more. Developers compiled “awesome MCP servers” lists, cataloguing connectors for everything from personal to enterprise use. Critically, internal adoption is also strong: many companies built private MCP servers to expose internal databases or proprietary APIs to their AI assistants without opening them to the public. This highlights one of MCP’s advantages – it works equally well for self-hosted data as for cloud services. For example, a bank might have an MCP server for its legacy mainframe, allowing an AI assistant to query account data securely.

Industry support and momentum. MCP being an open standard means multiple AI vendors have embraced it. By April 2025, “companies such as Google, Microsoft, OpenAI, Replit, and Zapier have already announced MCP support”. In practice, this meant: OpenAI was integrating MCP into ChatGPT (especially for enterprise users connecting internal tools), Microsoft’s Azure OpenAI service signaled compatibility with MCP connectors for enterprise data, Google’s generative AI offerings (like Bard or their PaLM API) were exploring MCP to complement their A2A protocol, Replit (the coding platform) built MCP into its IDE to let code assistant models access users’ files/projects, and Zapier (an automation platform) added MCP as a channel for its hundreds of app integrations. This broad industry backing lends credibility that MCP is not just an Anthropic-only project, but rather a community-driven standard (Anthropic has been careful to position it that way, with open governance). The formation of an “MCP Working Group” or similar consortium has been discussed, aiming to involve many stakeholders in evolving the spec – an important step for longevity.

Ecosystem scale. In terms of numbers: “early adopters have built hundreds of MCP servers for everything from GitHub and Slack to Google Drive and Stripe’s API”. This quote from an agentic AI blog in Q1 2025 underscores the explosion of connectors. It’s common to hear that MCP is doing for AI integrations what ODBC did for databases or what the browser did for the internet – providing a common interface that unlocks a flourishing marketplace of extensions. Some metrics: The official MCP GitHub had dozens of connectors in its org by April 2025, and unofficial lists count well over 100 distinct servers (some overlapping in functionality, indicating multiple implementations). On the client side, there were at least 8-10 known MCP-compatible clients (Claude Desktop, ChatGPT via some plugin or Enterprise feature, various IDE plugins like VSCode’s “Cursor AI”, Replit’s Ghostwriter, etc., and even community projects integrating MCP into open-source chat UIs).

Use cases in the wild. Many early use cases revolve around retrieval and action agents. For example, GitHub’s MCP server allows AI agents to fetch code, issues, and post updates – enabling coding assistants that can not only suggest code but also create PRs or file issues. Slack’s server lets a chatbot answer questions using company Slack channel history or send alerts to users. Google Drive’s connector means an AI helper can search your Drive and summarize documents on the fly. A particularly powerful scenario is combining multiple servers: e.g., an AI agent that uses the Jira server to find a ticket’s details, the GitHub server to find the code referenced in the ticket, and then the Slack server to compose a message about it – all in one conversation. This kind of multi-system orchestration is exactly what MCP was designed for, and early adopters are demonstrating it. Anecdotally, Anthropic mentioned Block (formerly Square) and Apollo as early enterprise adopters integrating MCP in their workflows. Developer tool companies like Zed (code editor), Replit, Codeium, and Sourcegraph also started using MCP so their AI features (like code search or inline assistance) can pull in broader context. The MCP concept of context retention across tools is a selling point – for instance, Sourcegraph’s Cody could use MCP to talk to both the codebase and the work tracking system, maintaining context from one to the other, thus giving more informed answers than if those were siloed.

Community and open-source status. MCP is maintained openly: the spec and reference SDKs are on GitHub (under the modelcontextprotocol org). There’s active discussion forums and a growing community contributing improvements. It’s notable that cloud providers like AWS have taken interest – AWS’s Machine Learning Blog featured an in-depth article on deploying MCP on AWS, calling it a “universal translator” for AI to access enterprise data. They even mention how it aligns with challenges they see in customers (information silos, integration complexity). Such endorsement from AWS suggests MCP might become part of official cloud AI solutions (e.g., an AWS-hosted MCP registry or managed connector service in the future).

In summary, by Q2 2025 MCP has shifted from a nascent idea to a burgeoning standard with real-world adoption. Early integrations cover a broad swath of common enterprise apps (from code to CRM to files). Both startups and tech giants are on board, which is accelerating the growth of available connectors. While precise numbers are hard to pin down in a fast-moving space, hundreds of connectors and multiple major AI platforms supporting MCP give a clear signal: MCP has momentum as the de facto way to connect AI agents to the world’s software and data.

Challenges and Roadmap

While MCP’s progress is impressive, there remain challenges and open questions as the protocol and ecosystem evolve. The maintainers have outlined a roadmap to address many of these:

Schema governance & standardization: As more capabilities and edge cases are added, governing the MCP schema and spec changes becomes vital. The community is moving toward a more formal standards process, possibly under a neutral foundation. There’s a Standards Track on GitHub and talk of engaging industry bodies to eventually ratify MCP. One challenge is balancing agility (improving quickly as we learn from use) with stability (not breaking existing implementations). The use of date-based versioning helps, but decisions like which features to include or how to structure schemas benefit from broad input. Schema registry is another aspect: currently, each server provides JSON Schemas for its tools/prompts in an ad-hoc way. A centralized MCP Registry service is planned, which will allow servers to publish their metadata and possibly share common schemas (for example, multiple calendar servers could use a standard “Event” schema). This registry would also aid discovery (e.g., a client UI could query “find a server for Google Calendar” and get a result from the registry). Governance will likely extend to security (ensuring known vulnerabilities in popular servers are tracked) and compliance (maybe certifying certain servers for enterprise use). The open-source community model is a strength but also a challenge – getting wide consensus on changes and ensuring backward compatibility requires diligent coordination.

Permissioning UX: From a user-experience perspective, one of the toughest problems is how to present and manage all these new permissions. An AI agent connected via MCP might have dozens of potential actions it could take on behalf of the user. If the UI bombards the user with “Allow tool X to do Y?” prompts constantly, that’s not viable. The ideal is something like smartphone app permissions or browser OAuth flows: one-time setup of trust for certain scopes, with transparency and revocability. The MCP spec now includes OAuth scope concepts, but the UX on top of that is up to clients. There’s active exploration of more granular permission controls, e.g., allowing certain tools to run automatically up to a point, but requiring explicit permission for destructive actions (like deleting data) or access to sensitive resources. The roadmap mentions improving human-in-the-loop workflows with granular permissioning and standardized interaction patterns. This could include features like pre-approved “safe” tools, or grouping permissions (like “This server can read your files” as one consent, instead of per-file). RBAC (Role-Based Access Control) is also relevant – in enterprise, maybe only users in a certain role can connect an AI to a particular MCP server, or the server itself could enforce that only certain data is exposed to certain roles. Fine-grained RBAC and attribute-based control likely will be layered on by companies as they integrate MCP with their identity systems. In short, making permissioning both secure and user-friendly is an ongoing challenge. Too strict, and the AI is hampered; too loose, and users lose trust. Expect improvements in dashboards for admins to manage what MCP servers are allowed and what scopes each has, as well as clearer prompts for end-users when an AI wants to use a new capability.

Commercial incentives: A more social challenge is: why should companies invest in building and maintaining MCP connectors for their services? For big players like Google or Microsoft, supporting MCP aligns with selling their AI or cloud offerings (and they’ve done so). But for smaller SaaS companies, MCP is another integration to support (similar to supporting Zapier or a REST API). The incentive is that it makes their service more accessible to AI automation – a selling point to customers. Over time, we might see tool marketplaces or monetization related to MCP connectors (akin to app stores). Right now, most connectors are free and open. But one can imagine commercial MCP servers that require a license or are offered by integration vendors. Additionally, maintaining connectors needs resources: as underlying APIs change or new MCP features come, someone must update the servers. The open-source approach (community contributions) is working early on, but some worry about sustainability – will these hundreds of community servers be kept up-to-date? This is an area where a registry with ratings or certifications could help users pick reliable connectors, and where companies might step in to officially maintain connectors for their products (to ensure quality). The MCP community may also establish a governance model for core connectors (similar to how the Linux Foundation hosts some core projects).

Planned technical additions: On the technical front, several enhancements are on the roadmap:

Real-time/bidirectional transport: Official WebSocket support is expected, complementing HTTP+SSE. This would allow truly two-way streaming (SSE is server push only, though MCP gets around that with client polling or separate channels). WebSockets would simplify persistent connections and possibly reduce overhead for cloud-hosted agents (no need to reopen HTTP requests). Cloudflare’s experimentation with WebSockets shows it can reduce latency and let thousands of idle connections park efficiently.
Multimodality and binary data: Currently, MCP handles text and can do binary via base64 in resources, but future versions will likely improve support for other data types (images, audio, etc.) as first-class content. There’s mention of supporting video and other media types in the roadmap. This goes hand-in-hand with models becoming multimodal. We may see new content types in the schemas (e.g., an image type content where the server can send an image and the client chooses to display or have the model analyze it if it can).
Typed schema registry: Beyond just discovery, a concept of a schema registry could standardize common tool and resource types. For example, multiple servers might provide a “search” tool – a centralized schema could define a SearchQuery and SearchResult structure that all adhere to. This would make it easier for AI clients to generalize and perhaps even for models to better understand similar tools across servers. It’s still an open question how far to go with standardization vs. flexibility (one doesn’t want to overly constrain innovation in tool design).
Progressive onboarding and agent graphs: The roadmap hints at more complex agent topologies – like Agent Graphs where multiple agents/tools coordinate through namespace or hierarchy. This overlaps with A2A but in MCP context it could mean an agent spawning sub-agents each with their own MCP connections, or a directed graph of tasks within one agent. Not fully fleshed out yet, but being discussed.
Improved streaming and chunking: Handling large payloads efficiently is a focus. Possibly adding support for chunked binary transfers or multi-part messages so that huge files don’t have to be base64 encoded in one giant JSON. Some proposals include out-of-band channels for bulk data, or a reference mechanism where a resource read can be paginated.
Security & RBAC: Fine-grained RBAC (Role-Based Access Control) is definitely on the wish list. This could involve servers providing role definitions for tools (e.g. “admin role can use delete_user tool, read-only role cannot”). OAuth scopes already allow some of this, but a richer model might allow dynamic user prompts like “This action requires admin privilege, proceed?”. Additionally, more audit and safety features are likely. For instance, perhaps a standard for “dry-run” mode where an agent can query what a tool would do without doing it, or a rollback mechanism for certain changes (these are complex and speculative, but relevant in enterprise settings).

Open challenges: A big challenge is evaluation and safety – how to ensure an AI agent uses MCP tools correctly and doesn’t do something harmful or stupid due to a misunderstanding. While not strictly protocol, it’s a challenge for adoption. Research is ongoing in making LLMs better at tool use, understanding tool results, and not being tricked by malicious content. Some proposals include embedding instructions to the model about tools (like not to use them unless certain conditions, etc.), or giving the model chain-of-thought guardrails when tools are involved. The MCP spec may incorporate more guidance schemas or best practices for that as they learn from real deployments.

Another challenge: context length and efficiency. Loading a dozen MCP tools means the model has to keep track of what they are. Prompt tokens are expensive. Techniques like summarizing or abstracting tool descriptions when not needed are being tried. Perhaps future MCP clients will dynamically provide tool info only when the conversation seems to require it, rather than all upfront. This is an area for tooling around the protocol rather than the protocol itself.

Lastly, commercial models and IP: Some companies might create proprietary MCP extensions or try to fork it. Maintaining an open standard will require coordination to prevent fragmentation. The fact that everyone from Anthropic to OpenAI to Google is aligned on MCP is very positive, but the landscape can change. The community likely will push for MCP to go through a standards organization (maybe Oasis or W3C or similar) to cement it as truly open and neutral.

In conclusion, MCP’s roadmap is about maturing the ecosystem: making it easier to discover and trust connectors, enhancing the protocol for more use cases (real-time, multimodal), and refining security/permissions to enterprise-grade robustness. The challenges – be they technical or organizational – are being actively worked on, and given the pace so far, the next 6-12 months will likely see MCP solidify as a stable cornerstone of AI integration. If successful, MCP could become as ubiquitous for AI agents as USB and HTTP are in their domains, but getting there requires navigating these challenges with care and broad collaboration.

MCP Cheat-Sheet Table

Operation	JSON-RPC Method & Pattern	Description	Auth & Security	Typical Latency
Initialize Handshake	Request: `initialize` Response: server info Notification: `initialized`	Begins session: client sends its protocol version and proposed capabilities; server responds with its version, supported capabilities, and perhaps auth requirements. Client then acknowledges. Establishes common version and feature set for session.	Usually requires auth token in this first request if server is protected (e.g., Bearer token in Authorization header). If versions incompatible or auth fails, server returns error and no session is opened.	Low: a few milliseconds to tens of ms. Local transports (~1–5ms). Network handshake depends on RTT (~50ms). One-time cost per session.
List Resources	Request: `resources/list` Result: `{ resources: [ … ] }`	Client asks server for available data resources (files, records, etc.) the server exposes. Server returns a list of resource descriptors (URI, name, optional description/metadata). Used to let user/model see what data can be accessed.	If sensitive, server may require scope (“read” access) on the token. The client may filter or not display certain resources until user consents. No side effects – just discovery.	Low to Moderate: depends on number of resources. Small list (dozens entries) on local ~5ms, remote ~50-100ms. Huge lists (thousands) might take hundreds of ms or be paginated.
Read Resource	Request: `resources/read` + params `{ uri: X }` Result: `{ contents: [ {uri, text or blob} ] }`	Retrieve the content of a specified resource by URI. Server may return text directly or base64 binary (`blob`). Can return multiple related resources (e.g., a whole folder) if implemented that way.	Requires appropriate access scope (e.g., “read:file”). The server may enforce path whitelists. Large content might be chunked/streamed for performance. Client should sanitize/size-limit content passed to LLM to prevent prompt injection or overload.	Varies: Small text file local <10ms. Remote API call (e.g., DB query) 100–300ms. Very large content (MBs) may stream over seconds.
List Tools	Request: `tools/list` Result: `{ tools: [ … ] }`	Fetches the list and definitions of actions/tools the server offers. Each tool entry includes name, description, input schema, etc. Client uses this to inform the AI model what functions are available.	Tools are described but not executed here. However, server might omit or mark certain tools as requiring elevated permissions. Client should only expose tools to the model once user has granted permission (or mark them as “available with approval”).	Low: Typically fast as it’s just metadata. ~5–50ms (depending on number of tools and transport).
Call Tool	Request: `tools/call` + params `{ name: X, arguments: {…} }` Result: tool-specific output (often `{ content: [...] }`)	Invokes an action on the server. Params include which tool and the input arguments for it. Server executes the tool’s function (e.g., perform API call or computation) and returns a result. Result format can vary (often a content payload that the AI will incorporate, or a confirmation of action).	Auth: Token must include rights to perform the action (e.g., “write” scope for write operations). The host should ensure user approved this specific call. Some hosts do a confirmation prompt just before calling. Security: Server validates inputs (types, bounds) and may sandbox side-effects. Errors (exceptions) are caught and returned as JSON-RPC Errors if something fails.	Variable: Simple computations or quick API calls (send message, small DB query) ~50-200ms. Actions involving external systems (e.g., sending an email via third-party API) could be 0.5–2 seconds. If the tool triggers a long process, server might send an immediate ack and then stream progress (latency to first response low, full completion high).
Prompt Template	Request: `prompts/get` + params `{ name: X, arguments: {...} }` Result: `{ messages: [ … ], description: ... }`	Retrieves a full prompt template from the server. The server may fill in the template with provided arguments and possibly embed resource content. Result typically includes a sequence of message objects (roles and content) that the client can insert into the chat or instruct the model with.	Similar to tools list – no action taken other than constructing text. But server may fetch some data if prompt includes it (e.g., if template says “include latest logs”, it might read logs – requiring read scope). The client should treat the returned prompt content as untrusted text (if server is external), in case of prompt injection attempts. User likely initiated this by choosing a prompt, so consent is implicit.	Low to Moderate: If prompt is static or templated text, <50ms. If it involves gathering data (like reading multiple resources to build the prompt), latency depends on that (could be a few seconds if heavy). Often prompt templates are cached or precomputed for speed.
Sampling (LLM Sub-call)	Request: `sampling/createMessage` + params `{ messages: [...], preferences... }` Result: `{ role, content, model, stopReason }`	Server asks client to invoke an LLM on its behalf. The request contains a mini conversation or prompt that the server wants completed (could be system + user message(s)). Client (host) may modify or approve it, then sends to an LLM (likely the same model driving the agent or another as per preferences). The result (assistant message content) is returned to the server for its use.	The client must ensure this request is safe: since it’s asking the model to run on possibly user-provided text, it should sanitize or review. Often requires user’s real-time consent (“Server X wants the model to brainstorm Y, allow?”) because it shares context with the model. The server should not receive more context than allowed (the `includeContext` parameter controls if broader session context is included). Auth-wise, this likely doesn’t use external tokens since it’s internal to client<->model, but the server might have a usage quota or need permission to consume tokens.	Moderate: There’s inherent model latency. The overhead: one round-trip to model. If using a big model like GPT-4, maybe 1-5 seconds depending on prompt length and tokens. Short completions could be sub-second on smaller models. Since client may stream back the result, server could get partial answer in a second or two and full answer in a few seconds.
Notifications	Server → Client or vice-versa: `notifications/...` methods (e.g. `notifications/resources/updated`)	One-way signals about events. E.g., server tells client a resource changed, or client tells server about a user action. No response expected. Parties listen and handle accordingly.	No auth check beyond the session’s already-authenticated state (if a server sends a notification, the client trusts it as it’s within the session). Still, clients validate the notification format. Use for non-critical info – if reliability needed, use request/response with ack.	N/A (async): Notifications are sent asynchronously. Delivery is essentially instantaneous over the open connection (<10ms overhead), but processing time depends on the event.
Close Session	(Not a JSON-RPC method, but via transport or a special `shutdown` request)	Cleanly terminating the connection. Either side can initiate. They may send a JSON-RPC request like `mcp/shutdown` (if defined), or simply close the underlying stream/HTTP. Resources are freed and both client and server should cease message exchange.	If server requires explicit sign-out (e.g., to revoke tokens or commit logs), the client should follow the proper procedure. Otherwise, closing the connection will signal both to clean up. There’s minimal security concern here aside from ensuring any buffered data is handled before closing.	Low: Closing is typically a few ms operation.

Notes: The above assumes a standard client–server session using JSON over stdio or HTTP. Latency values are rough and assume network latency of tens of ms for remote. Streaming (SSE) can reduce perceived latency by sending first bytes quickly. Also, error handling is implicit in each operation – e.g., if a tools/call fails, the server returns an Error with code and message (and the client would surface that to the user or model). Authentication is largely handled out-of-band (obtaining tokens via OAuth) and then automatically included by the client on each request, so from an MCP API perspective, auth failures show up as HTTP 401/403 or JSON-RPC errors from the server.

SerialReads