Technology · Context Engineering

Context Engineering 101

Published 5 March 2026By Nickle Lyu

Traditional software engineering is deterministic. AI engineering is probabilistic. To build reliable agents, we must stop treating LLMs as magic chatboxes and start treating them as stochastic processing units that require a rigid environment to function correctly.
This manual covers the four layers of the LLM Stack:
1 Prompting: The instruction layer.
2 Context: The memory and attention layer.
3 Harness: The environment and feedback layer.
4 Connectivity: The skills and tools layer (MCP/Plugins).

⠀

2. Level 1: Prompt Engineering (The Instruction)

Before engineering the system, we must ensure the core instruction is valid. This is the "unit test" of AI.

Core Primitives

Persona: Define who the model is (e.g., "You are a Senior Site Reliability Engineer"). This primes the model's latent space for specific jargon and reasoning patterns.
Task: Use clear, imperative verbs (e.g., "Analyze," "Refactor," "Synthesize").
Constraints: Define what the model cannot do (e.g., "Do not use external libraries," "Output JSON only").

⠀Best Practice: Chain-of-Thought (CoT)
Never ask for an answer directly. Ask for the reasoning first.

Bad: "Fix this bug."
Good: "First, analyze the stack trace. Second, hypothesize three potential causes. Third, select the most likely cause. Finally, write the fix."

⠀

3. Level 2: Context Engineering (The Brain)

Concept Source: Anthropic
Definition: Managing the "Attention Budget" of the model. The context window is finite and expensive.

Strategies

1 The Attention Budget:
* Treat tokens like RAM.
* Context Rot: Performance degrades as the window fills with noise.
* Rule: If a piece of text does not change the outcome, remove it.
2 Progressive Disclosure (RAG vs. Agentic Search):
* Old Way (Standard RAG): Retrieve 10 documents and stuff them in the prompt before the model starts.
* New Way (Agentic Search): Give the agent a "Table of Contents" (e.g., AGENTS.md). Let the agent decide what to read.
* Implementation: Provide a tool read_file(path) and search_docs(query). The agent pulls context Just-In-Time.
3 Compaction & Serialization:
* Long conversations drift.
* Compaction: Periodically summarize the chat history. Keep decisions and unresolved errors. Discard chatter.
* Scratchpads: Force the agent to maintain a NOTES.md file. This is "Long-Term RAM" that survives context resets.

⠀

4. Level 3: Harness Engineering (The Body)

Concept Source: OpenAI
Definition: Designing the environment so the agent can "see" and "act" effectively.

The "Ralph Wiggum" Loop

An agent cannot fix a bug if it cannot see the error message.
1 Agent writes code.
2 Harness runs code/tests.
3 Harness captures stderr / stdout**.**
4 Harness feeds output back to Agent.
5 Agent iterates.

⠀Agent Legibility
Your infrastructure must be readable by an LLM.

Logs: Structure logs as JSON, not unstructured text.
Linters: Use strict linters (e.g., Ruff for Python, ESLint for JS).
Invariants: If the agent breaks a rule (e.g., "No circular dependencies"), the harness should block the commit and return a specific error message explaining why.

⠀The Repo is the Truth

Do not hide logic in Slack or Notion.
Golden Principles: Store architectural patterns in docs/architecture/*.md.
Garbage Collection: Run "Janitor Agents" weekly to refactor code that deviates from these patterns.

⠀

5. Level 4: Connectivity (The Hands)

This layer gives the agent agency to affect the outside world.

A. Skills & Tools

A "Tool" is a function definition exposed to the LLM.

Deterministic: Tools should behave predictably.
Type-Safe: Use Pydantic/Zod to validate agent inputs before execution.

⠀Example: @tool
def query_database(sql: str):
    """Executes a READ-ONLY SQL query."""
    # Validate read-only
    if "DROP" in sql.upper(): raise ValueError("Read-only access")
    return db.execute(sql)

⠀B. Model Context Protocol (MCP)
The Standard: MCP is the emerging standard for connecting AI assistants to systems of record (databases, GitHub, Slack).

Why MCP? Instead of building custom "plugins" for every tool, you build an MCP Server. Any MCP-compliant client (Claude Desktop, IDEs, your custom agent) can now use that tool.
Architecture:
- Host: The AI application (e.g., your agent runner).
- Client: The connector.
- Server: The data provider (e.g., a "Postgres MCP Server").
Benefit: Decouples the tool logic from the agent logic.

⠀C. Plugins
Plugins are high-level bundles of tools, often exposed via API.

Manifest Files: Describe your API in an openapi.json spec.
Authentication: Handle OAuth flows outside the agent's context. Pass the token in the header, never in the prompt.

⠀

6. Implementation Checklist

1 Audit Context: Are you dumping 50 files into the prompt? Switch to "Just-in-Time" loading.
2 Build the Harness: Can your agent run its own code and see the error?
3 Define the Scratchpad: Does the agent have a NOTES.md to store state?

⠀Standardize Tools: Adopt MCP for database and filesystem access to future-proof your stack.

Was this helpful?