this post is part of a series about creating production-grade maintainable AI-first projects, using AI-first design patterns and frameworks.

One design pattern that is a particularly great fit for AI-first dev is managed effects (AKA capabilities).

The basic idea with managed effects is that we mechanically restrict what a piece of code can do by only giving it access to specific capabilities.
This creates a sort-of sandbox around that piece of code, which helps us make some issues that are specific to agentic dev far more manageable.

We’ll use a small example that uses this design pattern to illustrate what kind of benefits we can get from it.
Then we’ll talk about why it makes more sense to use these patterns much more often in the age of AI.

Our example: HTTP API endpoint data access

An important aspect of an API endpoint is what data it can read and write.
If we could know this at a glance, it’d be useful for reasoning about both logic and security.

Maybe we can think of a “UI for looking at important things in the software”, and it would contain something like this:

Table of API endpoints showing HTTP method, path, and the capability each endpoint depends on (e.g. PUT /api/projects/{project_id} → BoundProjectWrite).

Note - these UX examples are not from an existing tool. Some of them are screenshots from a POC project that I built to flesh out these ideas (code linked below), and some are just UI mocks.

When looking at the diff while we’re working on a task or reviewing it, we can hope to have this:

A diff tool for showing changes in the data that an API endpoint has access to

If we’d like to go further, we might also imagine a UX for reviewing changes which is “stronger”.
Changes to scope of data access are dangerous, so maybe we’ll want to go with a “dangerous change” UX for that.

Maybe a required-form:

Warning · capability surface change This PR adds capabilities to an HTTP endpoint.

Acknowledge the changes below before merging PR #482.

Diff vs main

GET /api/projects/{project_id}

+ OrgTicketWriteCapability

What this capability does

OrgTicketWriteCapability lets the route create, update, move, assign, and delete tickets in the caller's org. Each verb is role-gated; super-admins bypass the org check.

⚠ Code review agent analysis

Looks like a wiring mistake, not an intentional widening: a GET route shouldn't need write verbs, and a projects route shouldn't need ticket writes. Recommend bouncing back to the author.

Confirm and approve

Type the capability you're approving:

I read the diff and the agent analysis, and the PR justifies this widening.

An imaginary example of a possible approval UX.

We can also hope to “give these benefits to the agent as it codes”:

Smaller context - when considering an API endpoint, it would be good if the agent fetch “just the right part of the DB access layer” for that endpoint.
Make it easier to avoid important categories of bugs - in this case, help the agent write code so that it’s guaranteed that it would stay within the guardrails and not access forbidden data.

Why can’t we just have this with any arbitrary design?

For simple code, it might be possible to have a static analysis tool figure all this out.
For large, complex code, it becomes difficult to know for sure that a rare input + rare state won’t lead us down an unexpected code path.

This is where managing capabilities comes in:

Design pattern: Limit API endpoint DB access

The typical design for an API handler is that it has full access to the db, and it avoids mistakes by being careful.

Instead of that, we’ll only give the handler a narrow object that can only do specific things.

Before

Handler has full DB access, can do anything by mistake

PUT /projects/{id}

depends on

Data Access Layer

full access — every table, read & write

users - full access projects - full access tickets - full access ...

After

Handler has access to relevant action ONLY

PUT /projects/{id}

depends on

Bound Project Write Capability

update specific project delete specific project archive specific project unarchive specific project

the same Data Access Layer — handler can't reach it

users · projects · tickets · ...

The handler gets a “capability” object that’s already been authorized (if auth fails an error is returned before the handler is called) and only exposes the operations this route is allowed to perform.

What this essentially does is make sure that we can’t leak data from other tables by accident, because by construction, it just doesn’t have access to any methods that fetch irrelevant data.

In a production project I’d probably make these even more granular — e.g. BoundProjectUpdateDetailsCapability, BoundProjectArchiveCapability.

How this can help us

This is a standard design pattern, right?
It’s just dependency injection during construction.
But it turns out that if it’s applied systematically, this is all we really need to get all these benefits we talked about earlier:

Smaller agent context

If a handler has full DB access, it’s a lot of context to load the “relevant actions”.

But if the handler takes BoundProjectWriteCapability, we only need to add the code for that specific capability to the context.
We’d need tooling or skills for this to be effective, but that’s fine.

This is better for saving tokens and for not overwhelming the agent with irrelevant details (which also helps open the door for using smaller, cheaper or local models for some tasks).

Categories of mistakes become impossible

Like with any other “make illegal states unrepresentable” (e.g. static typing) -
It’s easier to avoid a mistake like accessing the wrong data if it’s literally impossible to make it than if you have to be careful to avoid it.

We can have our nice UX

The theoretical UX we talked about is now possible.
We need a static analyzer that can understand which capabilities we give each endpoint, but that’s just a couple of prompts to your favorite agent (it only took one prompt IIRC to have claude code make an analyzer for Python-FastAPI).
Then we just need to wire up some tooling to remember capabilities at each commit and have the actual UI.

Possible concerns

We’ve talked about imagining a dev process that is “better” in that it has extra useful parts.
We’ve also seen how we can use classic engineering to make that better process possible.

But I’m sure some objections are coming to mind, so let’s talk about them.

Doesn’t this just move the problem?

Did we not just move the problem from “handlers can leak data” to “capabilities can leak data”?

Not really.

With the classic approach, in a large application with frequent changes over time, eventually some refactor might let a handler call the wrong utility method in some rare edge case, and that utility method fetches some data that this handler shouldn’t see.
Any “if” statement anywhere in the code might be the one that connects two things that shouldn’t be connected.
That’s a lot of careful reasoning and testing, over a lot of code changes.

But the capability layer is very small.
It’s simpler to test, and it changes much less often than the handlers.

So we only need to understand and test:

“Does this capability really only do what it’s supposed to do?”
“Does the handler get the correct capability?”

This is a tiny effort compared to “every line of code might cause a data leak and we must test that”.

But it’s more code! It’s another thing to know and complicates things!

Yeah, it’s more code.
But it’s LLMs writing this code, not humans.

Does it complicate things?
I would argue, for the reasons above, that the human cognitive load is actually much lower.
The agent’s “cognitive load” is definitely lower because of the smaller context.

I would agree, however, that these ideas are more important for large code bases.
In toy applications, things are typically fine either way.

What makes this AI-first?

“Ok, sure, cute ideas, but most of this is not really AI-specific. Why are you saying it’s AI-first engineering?”

What’s different today is the ROI.
My suggestion is to use patterns like this systematically, in many parts of the codebase. And this makes much more sense than it did a few years ago.

First - the value is higher because the bottlenecks have shifted.
Creating code is no longer the most expensive thing, so improving other things can now have a stronger impact on the overall process - reducing the chance of agent mistakes is critical for robust development at scale.
Human attention while understanding and reviewing the code is probably the most expensive part, but things like agent context size are also important.

Second - the cost is lower because tedious work is now cheap.
Creating dashboards, writing basic code analyzers - these would be weeks of work for most teams a few years ago.
Now it’s maybe a day or two to create actual working tools using an agent.

Lastly, and this combines both cost and value - conventions, processes, and technical know-how used to have a very high implicit cost and only sort-of work.
Think of something like “We have these layers of tests, for these reasons, and this is how you write each of them” - we would need someone very experienced to define this in the first place, then train each team member until they have the skill set to do it well.
We would also have to convince each team member that it’s a good idea, and make sure they actually do it, including under time pressure.
Statistically, this just didn’t work.
But now, and this is the main thesis of this post series - we can share these patterns as something like a skill or a skill collection (which is what I think of as framework).
Just decide on a “skill set” and that’s what the code looks like.
Crazy times.

How general is the pattern?

The idea of “limiting functionality by design” is, of course, very standard.
This usually makes agent context smaller, and makes it easier to avoid some category of mistakes.
Both make the agent’s likelihood of success higher and the feedback loop smaller.
It’s often pretty simple and will often just work without extra tooling except something like agent skills.
Stuff like granular types (e.g. ValidatedEmail as a parameter instead of str), and often just separation of concerns.

The idea of having custom workflows can also be useful in many cases

Some type of technical change can require a senior dev signoff and maybe specific testing requirements
Some type of visual change can be “low risk, so you just need to eyeball the result and we have an automatic visual diff tool for that” (I believe this one will be very common)
Domain-specific concerns are also relevant - “no one touches the billing code without senior dev and group manager signoff”.

A couple of words about managed effects

I don’t think this is built-in to any popular language, but it’s been in the functional programming world forever.

It can be applied in many cases and help both reasoning and safety - “What DB tables can this touch?”, “Which files can this read or write?”, “which queues can this send data to?”, “can this access the internet?”.
The special case of zero side effects (pure functions, and with them - immutability) is also extremely important.

I strongly believe that managing side effects is going to be a critical part of agentic dev.
It has such strong implications for setting guardrails, and this is one of the most important concerns with agentic dev, so it’s hard for me to imagine a world where in 10 years (probably 5), 99.99% of agent coding doesn’t have this.
Either robust frameworks will be created that can force this on existing languages, or the currently-popular languages will be abandoned.

Side notes

While not directly the topic of this post, we’ve implicitly touched on some interesting ideas worth mentioning:

Dev tooling is “malleable” - we can look at software in other ways than directly through code (the capabilities table) or decide what the review process looks like (forced approval dialog).
We can have prioritized, domain-specific review artifacts that are separate from the code diff. We can, and should, review by blast radius and category of change.
Matching the above two items, we can note that the UX concepts I suggested for viewing and reviewing software do not show any code. Given a framework that supports them, they would allow us to think at a higher level of abstraction than code when it makes sense.

Finishing up

This capabilities example is a nice example of why I think of AI-first engineering not only as “design patterns”, but as “frameworks”.
We want ways to reduce cognitive load for humans and increase success rate of agents - we want to choose what’s important to us and make that easy to manage (understand, review), we want to reduce context size, prevent dangerous mistakes.
These are the objectives.
The code design pattern itself is simple - it’s just an enabling tool.
But without using this pattern systematically, we don’t have the capabilities view, and we don’t have the capabilities review UX.

As always, happy to hear what you think - (twitter / x, linkedin).

Appendix: example code

Here’s the same example in code — PUT /api/projects/{id}, before and after.
The code uses the “repository” pattern — a thin wrapper around the DB.

This is taken from a project management example repo I’m using to flesh out these ideas.

The “before” — handler with the full Repository (DB access), auth checks inline:

@router.put("/{project_id}")
async def update_project(
    project_id: str,
    update_data: ProjectUpdateCommand,
    # Repository == DB access.
    repo: Repository = Depends(get_repository),  # full read/write access to everything
    current_user: User = Depends(get_current_user),
) -> Project:
    ####################################################
    # Auth checks.
    if current_user.role not in {UserRole.SUPER_ADMIN, UserRole.ADMIN, UserRole.PROJECT_MANAGER}:
        raise HTTPException(403, "Insufficient permissions to update projects")

    project = repo.projects.get_by_id(project_id)
    if project is None:
        raise HTTPException(404, "Project not found")
    if current_user.role != UserRole.SUPER_ADMIN and project.organization_id != current_user.organization_id:
        raise HTTPException(403, "Access denied: project belongs to different organization")
    ####################################################

    # Main logic. Note that it has full access to the entire DB,
    # so any mistake here could leak data from unrelated tables.
    return super_complex_logic(repo, project_id, update_data)

The “after” — handler depends on a narrow capability; auth runs at construction time:

@router.put("/{project_id}", response_model=Project)
async def update_project(
    update_data: ProjectUpdateCommand,
    # Write capability, scoped to this project, only constructed if auth checks pass.
    # The auth checks run when the capability is constructed.
    # The capability's interface only exposes the relevant actions.
    capability: BoundProjectWriteCapability = Depends(get_bound_project_write_capability),
) -> Project:
    # Main logic. It only has access to write operations on this specific project.
    return super_complex_logic(capability, update_data)

# ... elsewhere ..
# The capability will look like this:
class BoundProjectWriteCapability:
    """Project write operations scoped to a specific, already-authorized project."""

    def update(self, command: ProjectUpdateCommand) -> Project:
        ...
    # def delete...
    # def archive...
    # def unarchive ...

    # No other methods are exposed.

You can have a look at the repo to see how this connects to the rest of the codebase (at least at the moment of writing I only implemented this pattern for the project update endpoint, but if you’re reading this in a few days, I’ll probably have Claude finish it).

<< previous post: AI-first design patterns - CRUD backend fundamentals | next post: (coming soon) >>

Our example: HTTP API endpoint data access#

Warning · capability surface change This PR adds capabilities to an HTTP endpoint.

Why can’t we just have this with any arbitrary design?#

Design pattern: Limit API endpoint DB access#

How this can help us#

Smaller agent context#

Categories of mistakes become impossible#

We can have our nice UX#

Possible concerns#

Doesn’t this just move the problem?#

But it’s more code! It’s another thing to know and complicates things!#

What makes this AI-first?#

How general is the pattern?#

A couple of words about managed effects#

Side notes#

Finishing up#

Appendix: example code#