Ayende @ Rahien

filter by tags archive

architecture (632) rss
bugs (451) rss
challenges (137) rss
community (391) rss
databases (483) rss
design (907) rss
development (674) rss
hibernating-practices (75) rss
miscellaneous (593) rss
performance (399) rss
programming (1127) rss
raven (1496) rss
ravendb.net (586) rss
reviews (184) rss

2026
- July (3)
- June (2)
- May (2)
- April (5)
- February (4)
- January (5)
2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Jul 10 2026

The cost of a free feature

time to read 8 min | 1482 words

0 comments

Tags:

About twenty years ago, I was working on what would eventually become RavenDB, and I needed an engine to handle queries. Writing a query engine from scratch is its own very large project, quite separate from writing a database engine. I made a decision that I still consider one of the smartest I made in those early days: I built on top of Lucene as my indexing and query engine.

That let me stand on a firm foundation while I dealt with the problems I actually cared about: building a NoSQL solution that didn't feel like juggling knives. Lucene gave me a mature, battle-tested way to index and query data, and I got to spend my time on the parts that made RavenDB RavenDB.

It was a good decision. But like a lot of good decisions, it came with a bill attached, and the bill arrived about a decade later.

Making use of features that are already there…

Here is a query that every developer has written a thousand times: give me the ten most recent posts on my blog. You write those sorts of queries in every kind of application you write, after all. And there is the natural follow-up: tell me the total number of posts, so I can render the pagination.

In most databases, you need to make two separate queries for this, and those two queries usually mean two separate database roundtrips. One to fetch the page, one to count the total.

In RavenDB, it's one query. You ask for the page, and you get the total count back in the same response.

This matters a lot more than you may initially think. In most systems, the cost of the network round-trip to the database dwarfs the cost of the query itself. The query runs in microseconds; the round-trip is the expensive part. So folding the count into the same response that carries the results eliminates an entire round-trip. When you run a paged query in RavenDB, you get the total number of matching results for free, and that is genuinely a wonderful feature.

The important word in that paragraph is free. I did not design this feature. I did not sit down and decide that paged queries should return a total count. Lucene already computed it, for reasons of its own, as a side effect of how it processes a query. I got it for nothing, exposed it through the API, and it became one of those small touches that made RavenDB pleasant to use and made your application run faster.

A free feature that adds real value, give me more of that!

Fifteen years later

Imagine a time skip of a decade and a half: we start building Corax, the next-generation query engine for RavenDB. Remember when I said that building a querying engine is a Big Project? I meant it.

It turns out that most databases do not give you the total result count for free for a very good reason: It isn't free. Consider the following queries and their internal representation in the database:

from Posts 
where IsPublished = true
limit 10


from Posts
where IsPublished = true and PublishedAt <= $today
limit 10


from Posts
where IsPublished = true and PublishedAt <= $today and Tags in ($tags)
limit 10

It is easy enough to check the number of posts that are published. A range query on a date field is more complex, especially if you have a lot of posts. Getting the total number of posts in a set of tags is more complex, since a post can have multiple tags and you need to deduplicate.

The intersection of three clauses, of course, is distinct from the count of each.

The queries above can stop further processing once they have 10 results, after all. But when you need an exact count for the query, you need to actually evaluate it.

Lucene happens to produce it as a byproduct of its execution model, but in the general case, computing the total number of results for a query can be enormously expensive.

Imagine a physical phone book, and I ask you to get me the first ten people whose family name is Smith. You flip to the S section, find Smith, read off the first ten entries, and hand them back. Notice what you didn't do: you never counted how many Smiths there are.

Now I ask you for the total number of Smiths. Suddenly, you have to go through every single Smith in the book and tally them. Smith is a common name. The cost of counting all of them can be far higher than the cost of just grabbing the first ten.

If you want a count, you have to actually count, which means touching every matching result, even the ones you're about to throw away. For large queries, getting the count can be the most expensive part of the query.

The feature is still meaningful, to be clear. There are plenty of cases where you genuinely need the total so you can build proper pagination. And in those cases, going to the database twice and paying for two network round-trips is wildly wasteful, so bundling the count in is exactly the right thing to do.

How would I design this today?
I would approach this very differently. You often don’t need an exact count; you just need to show something so the application can render the paging controls and maybe show a rough count to the user.
I would probably expose an EstimatedCount property for the queries, which is cheap, as well as a way to ask for an exact count in a single roundtrip. The key is that we would be clear in the contract that this is an estimation only.

But because this was a basic, baked-in behavior of RavenDB, something we'd done "for free" from day one, Corax inherited an expectation that it had to provide an exact count. We were doing work on every query, only to end up discarding the result of that work, because the original engine had made it cost nothing, and so everyone assumed it cost nothing.

How we dealt with it

We solved this with a combination of approaches.

First, it turns out you don't always need the count. Since the count is requested through the API, we made it explicit: the client can say "I care about the total count" or "I don't care about the total count." When the client opts out, we get to skip all that work. That alone recovers a lot of wasted effort.

Second, queries in RavenDB request the count far more often than they do in other databases, precisely because it used to be free. Years of RavenDB code were written assuming the count was always there, so there was enormous pressure to make counting itself fast rather than just optional. We did a significant amount of work to optimize how counting happens.

And this turns out to be a genuinely deep area. There is a whole body of research on how to count query results with as little work as possible. People have earned PhDs on this problem. What looked, from the application developer's seat, like a single integer that just shows up in the response is an entire field of study once you're the one who has to produce it.

Hyrum's Law sends its regards

There's a principle called Hyrum's Law that captures exactly what happened here:

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

The actual lesson

When you get something for free from a dependency, you are not just accepting a feature. You are accepting an obligation. The behavior becomes observable, the observable becomes relied upon, and the relied-upon becomes a promise you have to keep, possibly long after the dependency that gave it to you is gone.

The real price of a dependency is the behavior that you need to carry forward down the line, because it became part of your contract and your users rely on it. And that requires very careful consideration.

I want to be careful here, because the lesson is not "don't take free things" or "don't depend on Lucene." Building on Lucene was the right call, and exposing the count was the right call. I'd probably make both decisions again (although I would weaken the promise about the accuracy of the count).

To be more accurate, I don’t think that the person who took that dependency twenty years ago was even able to properly understand the impact of that feature or its future complications. On the other hand, that specific feature was something that I frequently demoed, and it always got a great reaction. This touched a pain point that many people had.

Hyrum wins again 🙂.

Jul 07 2026

Multi-Agents in RavenDB

time to read 11 min | 2036 words

0 comments

Tags:

I've written before about why I think the “modern” approach of solving every problem by adding agents is a doomed path. The current instinct is to add more layers of agents, judges, reviewers, etc. - and hope the model will be smart enough to do the right thing and actually get something done.

The issue is that we already have a well-understood way to coordinate independent contributors working on a complex system. It's called software design & architecture, and we've been refining it for decades. The instinct to solve a coordination problem by adding a smarter message bus between your agents is exactly backwards.

When talking about multi-agent support in RavenDB, I want to be clear about what we did not build. We didn't build an orchestration framework. We didn't build a swarm. What we built is much more boring, and I mean that as the highest compliment I know how to give.

We built a way for you to solve problems in a predictable manner and without a lot of hassle.

If you haven't looked at RavenDB's AI Agents yet, the short version is this: an agent is a system prompt, a connection to an LLM, and a set of Query tools, which let the agent read specific data through RQL you define, and Action tools, which let it do something in your application, but only through the specific doors you've opened.

The note on “specific” is the whole point. Instead of giving the model the freedom to access any data it wants and execute anything it feels like, we place careful guardrails that it cannot escape.

The agent can only pull the levers you gave it, using the data you explicitly handed it. The database takes care of the tedious, error-prone plumbing (such as conversation state, message history, talking to the model provider, etc.) and stores every conversation as a real document in the @conversations collection. You define capabilities, and RavenDB takes care of all the rest.

Where one agent starts to hurt

A single agent like this will take you a genuinely long way. Right up until it doesn't. The issue is the slow creep of complexity. Let’s say that you start with a simple agent to deal with answering employee questions in the context of the HR department.

The next feature is to assist them in filing expense reports. Then someone wants it to handle time-off requests, and then to look up the org chart, and six months later you have one prompt trying to be an entire company, with thirty tools competing for the model's attention and a context window stuffed with things that have nothing to do with the question being asked.

That situation is problematic on multiple levels:

Your context window is full of a huge prompt (usually not relevant to the task at hand), a lot of tool descriptions and capabilities meant to cover every possible scenario under the sun.
Users’ questions are either squeezed into the remaining context window or you have to move to models with larger context windows, which also cost more.
You are also stuck with a single model for everything, instead of being able to pick the right model for each scenario. You overpay to run trivial chit-chat on your best model, or you cripple the hard tasks by forcing them onto a cheap one.

This is not a new problem. Scope is how you manage complexity, and it works exactly the same whether the thing on the other end is a compiler or a language model.

Multi-agents are just agents that talk to each other

A multi-agent system in RavenDB is not a special kind of agent. It's several ordinary agents, each built exactly the way you already know how to build one, where one of them happens to know that another exists and can hand work to it.

The way you connect them is almost anticlimactic. You add a SubAgents entry to the parent agent: an identifier and a description. That's the entire wiring job. The description matters because that's what the parent reads to decide whether a given request belongs to the specialist. There is no router you write, no dispatch table, no orchestration layer.

You made the subagent available to the root, and RavenDB will invoke it for you when it is needed. RavenDB will also handle all the logistical minutiae needed to accomplish this successfully.

Those include ensuring that the scope for the root agent is shared with the called subagents, managing memory and conversation history, allowing the subagent to invoke its own queries and actions, etc.

We ship a demo for this — an HR chatbot, samples-hr on GitHub if you want to run it yourself. An employee comes back from a conference, uploads the receipt, and types "submit this expense." From their side, it's one smooth conversation.

Behind the scenes, it's two agents. The HR Assistant faces the user and handles benefits, policies, and general questions. It does not know the first thing about filing an expense. What it does know is that there's an Expense Manager specialist, and — from that one-line description — that receipts belong to it. It hands the task over, the specialist analyzes the receipt and creates the BusinessTripBills document, and the answer flows back up.

Go look in the database afterward and you'll find two conversation documents, not one — the HR Manager's log and the Expense Manager's log, each with its own scoped history. That's not an implementation detail I'm mentioning for trivia's sake. It means you can open either conversation and see exactly what that agent received and how it responded. The audit trail falls out of the design for free.

Here's the parent agent, trimmed down to the part that matters:

public static Task Create(IDocumentStore store)
{
 return store.AI.CreateAgentAsync(
  new AiAgentConfiguration
  {
   Name = "HR Assistant",
   Identifier = AgentIdentifier,
   ConnectionStringName = ConnectionStringName,
   SystemPrompt = @"You are an HR assistant.
Provide info on benefits, policies, and departments.
Do not suggest actions that are not explicitly allowed by the tools available to you.
Do NOT discuss non-HR topics. Answer only for the current employee.",


   Parameters =
   [
    new AiAgentParameter(EmployeeIdParameter,
     "Employee ID; answer only for this employee")
   ],


   // The HR agent has no idea how to file an expense.
   // It just knows a specialist exists, and when to call it.
   SubAgents =
   [
    new AiAgentToolSubAgent
    {
     Identifier = ExpenseAgentIdentifier,
     Description = "Manages business trip expenses: analyzing " +
      "receipts/bills, reporting expenses, and retrieving " +
      "monthly expense summaries."
    }
   ],


   // Its own capabilities cover only HR concerns.
   Queries =
   [
    new AiAgentToolQuery
    {
     Name = "GetEmployeeInfo",
     Description = "Retrieve employee details",
     Query = $"from Employees where id() = ${EmployeeIdParameter}",
     ParametersSampleObject = "{}"
    }
   ]
  });
}

The Expense Manager, for its part, is defined in exactly the same way as any other agent. There is nothing special about it. It's a normal agent that happens to be pointed at by a SubAgents entry.

Notice what this buys you. The HR agent never sees the finance tooling. It doesn't integrate the expense system, doesn't swallow that domain, or deal with a BusinessTripBills document. It shells the call out to the specialist and stays in its lane. If tomorrow finance wants their own policy-checking agent in the loop, you add another SubAgents entry to the Expense Manager and the HR agent is none the wiser. Each piece keeps a small & independent scope.

Each of those agents is isolated from the others, so we can have the HR Agent use a pure textual model, maybe with high levels of reasoning. The Expense Manager agent, on the other hand, needs to be multimodal (to be able to read receipt images), but it doesn’t need to be smart. You can customize each for its own needs, without having to find the lowest common denominator.

The fact that each of those agents (even if they both participate in the same conversation) will use separate @conversations documents also means their contexts are isolated from one another. The fact that you just pushed the entire set of receipts from a two-week business trip to the Expense Manager agent doesn’t weigh down the HR agent when you ask about your remaining holiday balance.

When you should not do this

This shouldn’t be your default architecture. Like any advanced technique, you need a sufficient level of complexity to justify it. If the work fits in one sentence, it probably fits in one agent.

Answering from a knowledge base, generating a document from a template, running a linear sequence with no branching — a single well-built agent handles all of that, and reaching for sub-agents just to keep things tidy or to mirror your org chart is perfectionism. It's the same mistake as premature abstraction, and it costs you real latency and real tokens for the privilege.

Multi-agents earn their keep when one of the following is true:

You need to independently develop the agents (different teams are handling different agents).
You will use different models for different parts of the system.
You want to explicitly limit the sharing of context between different parts of the system.

For example, keeping with the HR agent theme, let’s say that we want to allow users to ask questions about our policies (an example of such an agent). The problem here is that we may have a lot of policies, and even when we limit ourselves to the right one, that is a lot of text.

Shoving all of the work of finding the right policy and extracting the specific elements to match the user’s question into an isolated agent can massively reduce the number of tokens that your agents will burn.

Summary

RavenDB’s multi-agents aren’t about orchestrating a robot army or an independent swarm of self-coordinating agents. It is far more prosaic than that. You're composing independent components with clear boundaries and letting each one own a scoped conversation, its own tools, and the model that fits its job.

That's not a new idea we invented for the age of AI. It’s bringing back the notion of independent components that are greater than the sum of their parts, in a way that is manageable, consistent, and hassle-free.

If you want to try it, grab a free Developer license or spin up a free Cloud database, and the multi-agent guide walks through the demo end to end.

Jul 03 2026

The best code is the one you shift+delete

time to read 7 min | 1391 words

0 comments

Tags:

Everyone talking about coding models fixates on the same number: how fast the thing generates code. This misses the point by a lot. The story isn't about how fast the model writes code I would have written anyway.

It's that the model lets me do things that I might have done before but were expensive enough that I didn’t bother. I had three separate interactions this week that led to this blog post.

We had a production problem on an instance and no clear idea what was going on. What we did have was the log: something like 25-30 MB of compressed text describing everything that happened. And the actual problem wasn't spotting an error: finding errors is easy. The problem was correlation. We needed to line up different events across the timeline and understand how they were related.

In the past, I would have to trawl through the log and hope that something would pop up. These days, we can try handing the whole thing to the model and let it figure it out. If the log file wasn’t that big, it might even work. At dozens of MB, it doesn’t work (and it is quite expensive to try).

I went the other way. I told the model: “Write me a script that looks at the structure of this log (I gave it the first ten rows). I want the script to extract and aggregate the parts I care about, and render the result in a nice table to make it easier to understand.”

I had the view in under a minute, then I could explore the log and iterate:

“Oh, I see that there are a lot of indexes. How many of them are for the same database?”
“Give me a histogram of index changes and their versions over time.”

The model wrote some code, produced a view, and I looked at it. Rinse & repeat until I had a pretty good idea what was going on.

The customer had several different versions of their application, each with its own set of indexes, and they kept overwriting one another, leading to a huge amount of indexing overhead. RavenDB actually has a dedicated feature for that scenario.

Here's the part that matters: I never read the code the model wrote. The moment the investigation was done, I threw all of it away. It's throwaway code whose entire purpose was to help me see, and once I had seen enough, I discarded it.

Without the model to write this code, I could have written it myself, but it is enough of a chore that it probably wouldn’t make sense. Doing that manually would have taken roughly the same amount of time.

The second interaction is the opposite kind of work. I'm doing a fairly significant refactor of how a particular query executes in Corax, and that code is going into the product and staying there for a decade or two.

Here, the model writes and I drive. I tell it the overall direction, it goes somewhere, and then I decide if I like the result. I find it genuinely easier to react to something than to produce it from a blank page — having a first draft to push against is faster than writing it all myself. Nevertheless, this is my code. I went over every single line, and I know exactly what's in there.

That last part takes real discipline, and it's worth being honest about why. When you're in the zone chasing a change (try something out, revert, try something else, etc.), it is very easy to surface a few hours later staring at two thousand lines of changes you never actually wrote. You went through a dozen iterations, and somewhere in there the code stopped being something you authored and became something that merely happened. Guarding against that is really important, because otherwise that isn’t your code.

How do I make sure it's still mine? I lean on tests, of course — regression tests to prove I didn't break the old behavior, and new tests built alongside the change to pin down the new behavior. That's the baseline for anything long-lived.

The technique I found most useful for confirming that the change is really mine is a little unusual. I had it build a harness that runs a set of scenarios against both the old version and the new one. It's a small app that issues queries and operations to the database and visualizes the results.

Here is what this looked like:

You can see that I have a bunch of scenarios that I’m testing, and it is very easy for me to track progress and know where I need to pay attention. The actual app had a lot more capabilities: what got faster, what got slower, the ranges, the memory used, everything and the kitchen sink went into that, in a format that made sense for the sort of work I was doing.

Each time I had a new direction, it was either driven by this application or I asked the model to add it to the application, so I could keep working on it. I kept working until nothing in the new version was slower than the old, and the headline paths were dramatically faster.

As an example of what this looked like most of the time, I ran a query, and then I inspected the structure we got back. Here is what some of that looked like:

And as I went, I kept changing the harness itself — show me this instead, group it that way. Trivial to do, because the harness is also throwaway. I'm not carrying it forward. I don't care about its code quality. I never even looked at its code. It exists to make a point, and once it's made the point, it's gone.

I also used the model to add introspection hooks and visibility into what was going on inside the system, surfacing stuff that you would usually have to scratch your head and debug to understand. That meant that I was able to look at a problematic query, then just look at its query plan and the timing in it. I usually knew where I needed to pay attention from there.

To be honest, that part feels a lot like cheating.

In the cases of the log analyzer and the comparison harness, the code is literally disposable. It’s scaffolding that would be thrown away after the work is done. I didn’t pay any attention to that code (I never read it), and it was never meant to be useful for anything else.

In the case of the production code, I went over each line of code so many times, I dreamt of it. A lot of the code there consists of annoying building blocks (building a visualization of the query plan as a graph, for example), which were sped up enormously by asking the model to build it for us. A lot of other code there is hand-crafted to say exactly what I needed it to.

But the fact that I can get good scaffolding from the model for cheap changes a lot of the usual considerations. Because scaffolding is literally disposable code, I don’t have to worry about the usual code quality concerns. The log analyzer would probably take two or three hours to write (without the pretty graphics, which were helpful for easily identifying what was going on).

The comparison harness would be multiple weeks of effort and would probably be a non-interactive ASCII table. In fact, I don’t need to guess. Scaffolding isn’t something that is new, I do that all the time. Here is an example of one, written about a decade ago:

In the image above, you can find the internal structure of a B+Tree inside RavenDB. Contrast that with the following scaffolding for query plans. That one, by the way, is actually staying in the product.

Compare that to the level of insight that you can derive from the query plan higher up in this post. The B+Tree scaffolding, by the way, is essential to understanding the more complex scenarios. It paid for the time it took to write it many times over.

The ability to now effectively do the same at very little cost means that the act of building software itself is now easier. Not because someone else is writing the core code, but because everything else that we need to do is also easier.

Jun 19 2026

"Optimizing" concurrent regexes

time to read 6 min | 1131 words

3 comments

Tags:

I am looking into some regex work, and I ran into a performance problem. I need to run a particular regex over a large number (millions) of strings. That caused my spidey sense to… tingle.

The code in question looked something like this:

long matches = CountMatchingEntries(new Regex(@"\s+user id\s+"));

Creating a regex for each invocation is… expensive. That is why we have the RegexOptions.Compiledflag, after all. And the Regex class is thread-safe, so I did the equivalent of this code:

// class level
static Regex s_regex = new Regex(@"\s+user id\s+", RegexOptions.Compiled);
// inside a method
long matches = CountMatchingEntries(s_regex);

The performance of the system immediately took a big, stinking performance regression all over my benchmarks. At first, I was sure that I wasn’t doing something properly, so I re-wrote this using the modern approach, with source generators, like this:

public partial class MyClass
{


    [GeneratedRegex(@"\s+user id\s+", 
        RegexOptions.Compiled | RegexOptions.CultureInvariant |
        RegexOptions.IgnoreCase)]
     public static partial Regex UserIdRegex();
}

That had the exact same behavior, a major performance regression.

We are talking about this making the code six times slower. That is a huge cost for something that every fiber in my being tells me should be faster. As I started digging into things, I managed to reproduce this in an isolated manner.

The following benchmark shows the core of the problem:

using System.Diagnostics;
using System.Text.RegularExpressions;


var lines = Enumerable.Range(0, 100_000).Select(i => $"The user id #{i}").ToArray();


var regex = new Regex(@"\s+user id\s+", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
// This is fast
Exec(() => lines.Count(l => regex.IsMatch(l)));
// This is slow
Exec(() => lines.AsParallel().Count(l => regex.IsMatch(l)));


static void Exec(Func<int> run)
{
    var before = GC.GetTotalAllocatedBytes();
    var sw = Stopwatch.StartNew();
    //var count = lines.AsParallel().Count(l => MyClass.UserIdRegex().IsMatch(l));
    var count = run();
    sw.Stop();
    var after = GC.GetTotalAllocatedBytes();
    Console.WriteLine($"Count: {count}, Time: {sw.ElapsedMilliseconds} ms, Memory: {after - before} bytes");
}

I create the sameRegex instance and run it 100,000 times. The first time, I do that using a single thread, and the second time I’m using multiple threads.

The only difference between those runs is the addition of AsParallel() to force it to use multiple threads. My code isn’t actually using this or explicit threads, but it is using a single cached Regex instance in a web environment, so under load, it is being used concurrently.

What is going on here? The parallel code is much slower because it allocates. It turns out that deep in the guts of .NET’s Regex engine we have this block of code:

RegexRunner runner = Interlocked.Exchange(ref _runner, null) ?? CreateRunner();
try
{
    // do work
}
finally
{
   _runner = runner;
}

In other words, the actual RegexRunner needs to keep some mutable state, which doesn’t play nicely with threads. In order to maintain itself under concurrent usage, the Regex “checks out” the instance when it is being used, so it can be the sole owner.

Other callers on the same instance at the same time will allocate a new runner instance. That is what is causing the massive slowdown. If we are using the Regex instance in a single-threaded manner, the code will check it in & out as needed, with zero allocations and quite fast.

If you are using that from concurrent code, you’ll allocate like crazy and can expect your performance to drop by as much as five times.

I created an issue for that, since I believe this is quite a tripping hazard in terms of performance.

The current fix that I found was to use ThreadLocal<Regex> for this, ensuring that there is no actual concurrent usage, at the cost of higher memory usage and repeated initializations.

Jun 16 2026

Non obvious optimization with divide and conquer

time to read 8 min | 1546 words

0 comments

Tags:

Deep in the heart of Corax (RavenDB’s querying engine), everything deals with something called a Posting List. Posting Lists are a way for the engine to say “all of those documents have the term Fast for the field Speed”. Conceptually, a Posting List is an ordered set of document ids. In this case (and this is important, the ids in question are numeric, not RavenDB’s document id, which is a string).

An interesting problem with Posting Lists is that a term can be unique (such as a GUID) - only a single document will ever have this value. They can have a small set of values (for example, CreationDate, where only the items created on that day will share the same value). Or they can have many values (for example, Status = ‘Shipped’ for Orders).

The reason those details matter is that we can deal with the three (very) different modes in a distinct manner to reduce the cost of storage that Corax uses for Posting Lists. Internally, Corax has the notion of a PostingListId, which is just a 64-bit number. We use the least significant 2 bits to tell us what kind of Posting List this is.

Typical code to consume this looks like this:

while ((read = provider.FillPostingListIds(plIds)) > 0)
{
    for (int i = 0; i < read; i++)
    {
        var postingListId = plIds[i];
        var termType = (TermIdMask)postingListId & TermIdMask.EnsureIsSingleMask;


        switch (termType)
        {
            case TermIdMask.Single:
            {
                // Accumulate the single decoded entry ID into the entryBuffer
                // Flush to bitmap if the buffer is full
                break;
            }


            case TermIdMask.SmallPostingList:
            {
                // Decode inline via FastPForBufferedReader into the entryBuffer
                // Flush to bitmap if the buffer is full
                break;
            }


            case TermIdMask.PostingList:
            {
                // Flush any accumulated entries from the buffer first
                // Iterate through the full large posting list via FillFromPostings
                break;
            }


            default:
                throw new InvalidOperationException($"Unknown TermIdMask type: {termType}");
        }
    }
}

The code here iterates through a list of Posting List ids, handling each one of the options. This is part of handling queries such as: where CreatedAt > $lastQuarter. We have to get all the distinct dates in the range, and for each date, we need to get all the matching documents for it.

The code here is pretty simple, right? It is fairly obvious what it does, but it has a pretty big problem. It processes each one of the Posting Lists manually & separately. There are two problems with this approach:

We end up doing a lot of branching, based on which Posting List type we are processing.
We lose the ability to handle the Posting Lists in bulk.

Because we use the lowest two bits to store the type of the Posting List, we can do better. The insight behind the new approach is that (id & EnsureIsSingleMask) naturally yields 0, 1, or 2 — a perfect index into an array of buckets. Instead of a switch statement, we partition the batch in one pass with no per-item branches:

var buckets = new List<long>[3];
while ((read = provider.FillPostingListIds(plIds)) > 0)
{
   for (int i = 0; i < read; i++)
   {
       buckets[(int)(plIds[i] & 0b11)].AddUnsafe(pid);
   }
}

Of course, you’ll note that we aren’t actually processing those, just putting them in buckets. The fact that we are able to split them into groups in a branchless manner is a fun optimization task, but it doesn’t matter as much as the next stage… we can now deal with a list of Posting Lists that are already divided by type.

For example, for the list holding only a single unique value, I can run the following:

var singlesSpan = CollectionMarshal.AsSpan(buckets[0]);
EntryIdEncodings.DecodeAndDiscardFrequency(singlesSpan, singlesSpan.Length);
var singlesLen = Sorting.SortAndRemoveDuplicates(singlesSpan);
bitmap.AddRange(singlesSpan[..singlesLen]);

This is great, because now I can process the entire list using SIMD in a highly efficient manner. But things get better when we look at the other lists. In the case of the small Posting Lists, for example, the ids that I’m reading are not the actual values, but point to where the values actually reside.

This gives me the chance to actually load them from disk in an optimal, batched manner, like so:

var smallsSpan = buckets[1].ToSpan();
EntryIdEncodings.DecodeAndDiscardFrequency(smallsSpan, smallsSpan.Length);
var smallLen = Sorting.SortAndRemoveDuplicates(smallsSpan);
Container.GetAll(llt, smallsSpan[..smallLen], containerItems.ToSpan(), long.MinValue, pageLocator);


for (int i = 0; i < smallLen; i++)
{
   var item = containerItems[i];
   // read the small posting list and deal with it
}

The key here is the Container.GetAll() call, which takes the list of unique Postings List locations and loads them in an optimized manner. In the previous code style, that was actually pretty hard to do. In this manner, it falls naturally from the way we write the code.

There are also advantages to the fact that we run tight loops with the same code, instead of branching for each type. We also ended up with less code overall, which is also nice.

May 05 2026

Learning to code, 1990s vs 2026

time to read 7 min | 1326 words

5 comments

Tags:

I still remember the bookstore. I was holding a 600-page brick of a book on how to build Windows applications, trying to convince my mother that I really needed it. This was 1994 or 1995. A book was how you learned to program at that time. You took it home, you read it cover to cover, you typed the examples by hand, and somewhere along the way, the ideas sank in.

From there, the tools for learning kept evolving. Printed books gave way to CD-ROMs and then to online documentation. Then came the explosion of blogs and RSS feeds. I started this blog at that time, and I still consider that era to be one of the best ones in terms of having amazing access to smart and knowledgeable people, freely sharing their insights and experiences.

Google killed Google Reader (yes, I am still angry about that) and a lot of the new people learned via Stack Overflow. The world entered a strange equilibrium that lasted, honestly, more than a decade. If you learned to code any time between roughly 2010 and 2022, you probably learned through some combination of Google, Stack Overflow, and maybe YouTube.

Then the floor moved again. First it was ChatGPT, where you copy-pasted code back and forth. Then the models were integrated into the IDE. Now, with Claude Code and Codex, it is something else entirely: an agent that just runs, makes decisions, and does the thing.

The arc is striking when you lay it out. You used to have to go to a physical library, pick up a physical book, read it, digest it, and think about it. Today, the prevailing message to a new developer is essentially: you do not need to know any of that. Just describe what you want, and it happens.

Hidden costs for reduced conceptual depth

This shift is not just about convenience. It changes the depth of knowledge a developer carries, and that has consequences. Here is the example I keep coming back to. Imagine you ask a developer to show you a website that they built.

If you asked that in the late nineties, it meant something. To do that, you had to purchase a domain. Understand DNS well enough to wire it up correctly. Set up a web server, which meant getting Apache to actually run. Successfully configure PHP and deploy scripts to production.

By the time you could point to a working URL, you had to touch every layer of the stack. There was no other choice. Therefore, you were at least passingly familiar with a lot more than you would be today.

Ask that same question of many developers today, and the answer is a Vercel subdomain. That is not a dig at Vercel, mind you - it is a great product, and abstraction is the whole point. But some of these developers genuinely do not know what DNS is. They do not know what is running on the server versus the client. They do not know that there is even a meaningful distinction. And we have seen real security incidents come out of exactly that gap — secrets leaking into client bundles, auth logic running where it should not, and CORS misconfigurations that nobody understood well enough to notice.

Now extend that same dynamic one more step. Take the cohort of developers who will learn to program primarily through this new generation of agentic tools. The abstraction is no longer just over DNS or deployment. It is over the act of writing the code itself.

What is the role of a junior developer now?

I think we are going to end up with a genuinely different type of engineer and, as a result, a genuinely different type of system.

“If men learn this, it will implant forgetfulness in their souls; they will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks”.

Plato, Phaedrus (c. 429-347 BCE)

Every generation has been accused of being softer than the previous generation, as the quote above can testify. In this case, Plato is decrying writing as a corrupting influence on youth who no longer bother to just remember things.

Without the attribution, I don’t think you would have realized that this isn’t me talking about developers utilizing coding agents instead of learning on their own.

In software, we see much the same pattern. The person who wrote assembly looked down on the C programmer. The C programmer looked down on the Java programmer. The Java programmer looked down on the person gluing libraries together in Python. Each step up the abstraction ladder lets people build bigger, more ambitious things with less effort. That is mostly good.

But there is a real asymmetry this time. The earlier steps abstracted away mechanical work — memory management, boilerplate, deployment plumbing. This step abstracts away the reasoning itself. And reasoning is what you need when the abstraction leaks, which it always eventually does.

The question I am actually struggling with, day to day, is much more practical: how do I evaluate a junior developer in this sort of world?

The classic move was a take-home task. Build a small feature. Show me your thinking. The problem is that a capable model will produce a perfectly clean solution to any reasonable take-home in a few minutes. What you see in the submission tells you almost nothing about what the candidate actually understands. It tells you they can prompt well, which is a real skill, but it is not the skill I am trying to measure.

I can also ask them to solve a task while they are in our offices, so I can verify no AI use. But that is also stupid; I want them to use AI. After all, that is a great productivity enhancer. So I need a way to test understanding, not just the output.

The signals I care about are the ones that are hardest to fake in an agent-assisted world. Can you debug something when the model is wrong? Can you explain why a piece of generated code is subtly unsafe, or slow, or wrong in a way that only matters at the hundredth user? Can you make a reasoned call about which abstraction to reach for and which one to reject? When the system behaves unexpectedly, do you know where to look?

At the same time, those aren’t usually qualities that you can look for in a junior developer. Having those qualities usually means that they aren’t junior anymore.

People used to train on LeetCode tests as a way to show how good they were in interviews. That was a good stand-in to see what they knew and understood. What is the next stage here?

What does a junior do to exercise their skills and show that they can bring value to the team? I don’t know if I have good answers to those questions. But that is something we, as an industry, need to consider carefully.

I do not want to be the old man yelling at the cloud. The tools are genuinely great, and refusing to use them is its own kind of malpractice. AI coding agents can make you meaningfully more productive.

But when I talk to developers just starting out, the thing I keep pushing is this: use the tools, and also, on a regular basis, go down a layer. Set up a server yourself. Deploy something without a platform holding your hand. Read the DNS records. Look at what your framework is actually generating. Write something in a language without a package manager that hides the sharp edges.

Not because you will do it that way at work. But because the next time something breaks in a way the agent cannot fix you will have a mental model to fall back on. You will know where the seams are. You will know what to look at.

That mental model is, I suspect, going to be the thing that separates the engineers who compound over a career from the ones who get stuck the first time the abstraction leaks.

May 01 2026

The GPU Is the New Bangalore

time to read 5 min | 817 words

0 comments

Tags:

In the 2000s, the hottest move in software was offshoring. You'd ship your requirements to a development shop in India, Vietnam, or Bangladesh, pay a fraction of Western developer rates, and wait. The cost savings were real, every spreadsheet said so. The failure modes were also real, every CTO said so.

Even assuming that the teams working on your code were smart, motivated, and hardworking, the distance, communication overhead, the time zone mismatch, and misaligned incentives created a brutal set of constraints. If you wanted to get good results from offshoring, you needed to be able to clearly specify what you wanted and be good at validating that you got what you expected.

You couldn't just say "I need a login system." You had to write detailed specs, break work into reviewable chunks, define acceptance criteria, and actually read the code that came back. Not rubber-stamp it. Read it, make sure that it passed muster and could be accepted internally, because the delta between "looks right" and "is right" could cost you six months of production incidents.

Sound familiar? Today, instead of shipping my requirements to a dev shop overseas, I'm shipping them to a GPU somewhere. I get something back. It looks like code. It might be code. It might be a very convincing facsimile of code that will quietly fail in production under load. I genuinely don't know until I sit down and read it carefully.

The same discipline that separated successful offshore engagements from expensive disasters applies here as well:

Specification quality determines output quality. Vague prompts return vague code. The ability to articulate exactly what you want — at the right level of abstraction — is now a core engineering skill.
Validation is non-negotiable. "It passed the vibe check" is not a code review. The reviewer needs to understand what the code is doing and why, not just that it compiles and the tests are green.
Iterative delivery beats big-bang delivery. Nobody who survived offshoring tried to outsource an entire product in one shot. You stage it. You review at each stage. You course-correct before mistakes compound.

The Bottleneck Has Moved

Here's what I think is the deeper shift: for most of software history, the bottleneck was writing the code. That took time and required expensive humans. So the industry optimized heavily around it, better editors, better frameworks, and better abstractions. All in service of making the act of writing code faster and less error-prone.

That bottleneck is collapsing. What once took six months might take six hours. When the cost of implementation approaches zero, the bottleneck moves upstream: to design, specification, and verification. The expensive parts are now:

Understanding the problem clearly enough to describe it precisely.
Decomposing it into well-scoped, independently verifiable pieces.
Reviewing what comes back and actually understanding it.

These are skills we largely deprioritized during the era when coding itself was the hard part. They're about to become the most valuable things a technical person can do.

A lot of that used to be done “along the way” when you wrote the code. You would explore the problem and gain depth of understanding as you wrote the code. Now that just doesn’t happen, but you still need to do that work explicitly.

A note about the importance of proper architecture

There is this idea that the path to building big systems with AI is to spin up a swarm of specialized agents (a frontend agent, a backend agent, a database administrator agent, etc.) and somehow orchestrate them into a coherent product.

I find this baffling, because we already have a well-established protocol for coordinating the work of specialized, partially independent contributors on a complex system. It's called software design.

Module boundaries. Interface contracts. Separation of concerns. Dependency management. SOLID principles and more. These patterns exist precisely because complex systems built by multiple contributors without clear interfaces turn into unmaintainable messes. This is true whether those contributors are humans, offshore teams, or language models.

The instinct to throw orchestration complexity at a coordination problem is exactly backwards. The answer isn't a smarter message bus between your agents. The answer is better system design that minimizes how much the pieces need to talk to each other in the first place.

We have literally decades of experience in how to build large software systems (and thousands of years of experience in how to handle large projects in general). There isn’t anything inherently new here to deal with.

The developers who will thrive in this environment aren't necessarily the ones who write the most elegant code. They're the ones who can hold a complex system design in their head and communicate it clearly, break the work into well-specified, verifiable increments, and actually read the code that comes back and hold it to a real standard of quality.

These are, in large part, the same skills that made the best engineering leads effective during the offshoring era. The context has changed completely. The discipline hasn't.

The GPU is the new Bangalore. Time to dust off the playbook.

Apr 29 2026

Putting Claude up against our test suite

time to read 5 min | 981 words

0 comments

Tags:

I’m convinced that in hell, there is a special place dedicated to making engineers fix flaky tests.

Not broken tests. Not tests covering a real bug. Flaky tests. Tests that pass 999 times out of 1000 and fail on the 1,000th run for no reason you can explain with a clean conscience.

If you've ever shipped a reasonably complex distributed system, you know exactly what I'm talking about. RavenDB has, at last count, over 32,000 tests that are run continuously on our CI infrastructure. I just checked, and in the past month, we’ve had hundreds of full test runs.

That is actually a problem for our scenario, because with that many tests and that many runs, the law of large numbers starts to apply. Assuming we have tests that have 99.999% reliability, that means that 1 out of every 100,000 test runs may fail. We run tens of millions of those tests in a month.

In a given week, something between ten and twenty of those tests will fail. Given the number of test runs, that is a good number in percentage terms. But each such failure means that we have to investigate it.

Those test failures are expensive. Every ticket is a developer staring at logs, trying to figure out whether this is a genuine bug in the product, a bug in the test itself, or something broken in the environment. In almost all cases, the problem is with the test itself, but we have to investigate.

A test that consistently fails is easy to fix. A test that occasionally fails is the worst.

With a flaky test, you don't just fix something and move on. You spend two days isolating it. Reproducing it. Building a mental model of a race condition that only manifests under specific timing, load, and cosmic alignment.

The tests that do this are almost always the integration tests. The ones that test complex distributed behavior across many parts of the system simultaneously. By definition, they are also the hardest to reason about.

The fact that, in most cases, those test failures add nothing to the product (i.e., they didn’t actually discover a real bug) is just crushed glass on top of the sewer smoothie. You spend a lot of time trying to find and fix the issue, and there is no real value except that the test now consistently passes.

We have a script that runs weekly, collects all test failures, and dumps them into our issue tracker. This is routine maintenance hygiene, to make sure we stay in good shape.

I was looking at the issue tracker when the script ran, and the entire screen lit up with new issues.

Just looking at that list of new annoyances was enough to ruin my mood.

And then, without much deliberate planning, I did something dumb and impulsive: I copy-pasted all of those fresh issues into Claude and told it to fix them. Then I went and did other things. I had very low expectations about this, but there was not much to lose.

A few hours later, I got a notification about a pull request. To be honest, I expected Claude to mark the flaky tests as skipped, or remove the assertions to make them pass.

I got an actual pull request, with real fixes, to my shock. Some of them were fixes applied to test logic. Some were actually fixes in the underlying code.

And then there was this one that stopped me cold. Claude had identified that in one of our test cases, we were waiting on the wrong resource. Not wrong in an obvious way — wrong in the kind of way that works perfectly 99.9998% of the time and silently fails 0.0002% of the time.

The (test) code looked right. We were waiting for something to happen; we just happened to wait on the wrong thing, and usually the value we asserted on was already set by the time we were done waiting.

Claude found it. In one pass. For the price of a subscription I was already paying. For reference, that single “let me throw Claude at it” decision probably saved enough engineering time to cover the cost of Claude for the entire team for that month.

Let me be precise about what happened and what didn't. Claude did not fix everything. Some of the "fixes" it produced were pretty bad, surface-level patches that didn't address the real cause, or things that were legitimately out of scope.

You still need an engineer reviewing the output. And you still need judgment.

But it got things fixed, quickly, without needing two days to context-switch into the problem space. And the things it did fix well, it fixed really well.

The work it compressed would have realistically taken one developer a week or two to grind through — and that's assuming you could get a developer to focus on it for that long in the first place. Flaky test investigation is the kind of work that quietly kills team morale.

Engineers start dreading CI. They start treating red builds as background noise. That's how quality degrades silently. Leaving aside new features or higher velocity, being able to offload the most annoying parts of the job to a machine to do is… wow.

Based on this, we're building this into our actual workflow as an integral part of how we handle test maintenance. Failures are collected, routed to Claude, and it takes a first pass at triage and repair. Then we create an issue in the bug tracker with either an actual fix or a summary of Claude’s findings.

By the time a human reviews this, significant progress has already been made.

It doesn't replace the engineer. But it means the engineer is doing the interesting part of the work: judgment, review, architectural reasoning. Skipping the part that requires staring at race condition logs until your vision blurs.

This isn’t the most exciting aspect of using a coding agent, I’m aware. But it may be one of the best aspects in terms of quality of life.

Apr 27 2026

15+ years of working with coding agents

time to read 6 min | 1198 words

0 comments

Tags:

No, the title is not a mistake, nor did I use my time travel pass to give you insights from the future. Bear with me for a moment while I explain my thinking.

From individual contributor to oversight role

I started writing RavenDB in a spare bedroom, which turned into an office. The project grew from a sparkle in my head that wouldn’t let me sleep into a major project in very short order.

Today, I want to talk about a pretty important stage that happened during that growth phase. Somewhere between having five and ten full-time developers working on RavenDB, I lost the ability to keep track of every single line of code that was going into the project.

I had been the primary developer for years at this point, I wrote the majority of the code, and I was the person making all the key decisions in the project. And then, gradually, I… wasn't that guy anymore.

There were too many moving parts, too many developers, too many decisions happening in parallel for me to have my hands on all of it. That was the whole point of growing the team, dividing the tasks among the team members, and getting good people to do things so I didn’t have to do it all myself.

What I didn't expect was how much it would bother me. Moving from being the primary developer to a supervisory role didn’t mean that I lost the ability to write code. In fact, in many cases, I could “see” what the solution for each issue should be.

I just didn’t have the time to do that, nor the capacity to sit with every single developer on every single issue and craft the right way to solve it. I'd hand a feature to a developer knowing that the way they were going to handle it would not be mine.

That doesn’t mean it would be wrong, but it wouldn’t be the same. It might need a review cycle or two to get to the right level for the product, or they wouldn’t consider how it fits into the grand scheme of things, etc.

And let’s not talk about the time estimates I got. I’m willing to assume that my personal timing estimates are highly subjective and influenced by my deep familiarity with the codebase.

But still. Multiple days for something that felt like it should be a two-hour job was hard to sit with.

I carried around a background level of frustration for quite some time. It killed me that the pace of development wasn’t up to what I wanted it to be. “If I could just have the time to sit and write this”, I kept thinking, “we would be done by the end of the week.”

There was progress, to be clear, but nothing was moving fast enough. Everywhere I looked, we had stalled.

And then something happened. It didn’t happen all at once, but in the space of a month or two, features started to land. Each team had been heads-down on something for quite a while, and by some coincidence of timing, they all finished around the same time.

Suddenly, we moved from “we have nothing to ship” to “we can’t have so many new features all at once”. I realized that I would be able to ship things faster, for sure. I could do two new features, maybe even three, in that same time frame. That would require head-down coding for the entire duration, of course.

Reading that last paragraph again, I have to admit that I may be letting some hubris color my perception 🤷😏.

I wouldn’t be able to deliver the sheer quantity of features that the team was able to deliver.

What had felt like months of stagnation turned out to be parallelism in action.

Yes, some of the code wasn't the same code that I would write. And some of the architectural decisions weren't the ones I'd have made. That didn’t make them wrong, mind. And those developers were working on things I was not working on. And the sum total of what got built was something I could never have done solo.

Treating coding agents as junior developers?

I think about that experience constantly now, because I'm living a version of it again, except the new team member is Claude. Working with AI coding agents today feels remarkably like working with a junior developer who is also a savant.

They've read everything. They know an enormous amount. They can produce working code quickly and confidently across a staggering range of domains. And yet they're also genuinely ignorant in ways that will surprise you: missing context, misreading intent, optimizing for the wrong thing, occasionally producing something that is confidently and completely broken.

This is not a criticism. This is just what it's like. And I've dealt with this before. There are clear parallels between mentoring junior engineers and looking at the output from an AI agent.

There is an assumption that you need to get perfect output from a coding agent. But you are not likely to get perfect output from a human developer. Even experienced developers benefit greatly from reviews, guidance, etc. Junior developers need more of that, of course, but they can still bring value, even if their output goes through several iterations.

For coding agents to bring real value, you need to consider them in the same light.

The shift that happened with my developer team is the same shift that's happening now with AI agents.

Instead of writing every line yourself, you start spending time on the bigger picture: here's the overall direction, here's the architectural constraint, here's what done looks like. Then you review the outputs.

Talking to a coding agent is a little different from discussing a feature with a dev and reviewing their code days later, except that the agent delivers the output in the time it takes to get coffee.

The fact that this cycle is done in a short amount of time means that you still have all the knowledge in your head. You can catch drift before it becomes technical debt.

The cost of going in the wrong direction is greatly reduced, which means that you can be far more radical about how you approach these tasks.

Unnatural impulses as a developer

I wonder if a lot of developers are facing challenges in this area specifically because they don’t have the managerial experience needed for this new aspect of the work.

I have been writing code with Claude recently. And the short feedback cycle means that I’m loving it. I'm not abdicating the technical judgment, mind. I'm applying it differently.

I'm writing the high-level design, not the implementation. I'm doing the review, not the first draft. And I'm being honest with myself that the output, while it isn’t always what I would write, is covering ground I simply would not have covered otherwise.

I have been doing this for a long time and it feels quite natural. I also remember that this was a difficult transition for me at the time.

For those who want to better understand how they can get the most value from coding agents, you are probably better off looking into project management theory rather than optimizing your agents.md file.

Apr 23 2026

Expertise in the age of AI, or: Matt's Claude'll handle this

time to read 4 min | 650 words

0 comments

Tags:

One of our team leads has been working on a major feature using Claude Code. He's been at it for a few days and is nearly done. To put that in context: this feature would normally represent about a month of a senior developer's time.

He did the backend work himself — working with Claude to build it out, applying his knowledge of how the system should behave, reviewing, adjusting, and iterating. He handled only the backend, and when I asked him about the frontend, he said: "I'm going to let Matt’s Claude handle that."

Context: Matt is the frontend team lead.

Note the interesting phrasing. He didn't say "I'll do the UI later" or "Claude’ll handle the UI." He deferred to the frontend lead who has the domain expertise to drive that part.

That's not a throwaway comment. That's an important statement about how work should be divided in the age of AI agents.

Here's the thing: I've told Claude to build a UI for a feature, pointed it at the codebase, and it figured out how the frontend is structured, what patterns we use, and generated something I could work with. It wasn’t a sketch or a wireframe diagram, it was actually usable.

I got a functional UI from Claude in less time than it would take to write up the issue describing what I want.

That UI was enough for me to explore the feature, do a small demo, etc. I’m not a frontend guy, and I didn’t even look at the code, but I assume that the output probably matched the rest of our frontend code.

We won’t be using the UI Claude generated for me, though. The gap in polish between what I got and what a real frontend developer produces is enormous. I got something I could play with, but it was very evident that it wasn’t something that had received real attention.

For the time being, it was more than sufficient. The problem is that even leaning heavily on AI, the investment of time for me to do it right would be significant. I'd need to understand our frontend architecture, our conventions, our component library, how state flows, and what our designers expect. All of that would take real time, even with an AI doing most of the code generation.

That is leaving aside the things that I don’t know about frontend that I wouldn’t even realize I need to handle. I wouldn’t even know what to ask the AI about, even if it could do the right thing if I sent it the right prompt.

Contrast that with the frontend team. They know the architecture of the frontend, of course, and they know how things should slot together and what concerns they should address. They know when Claude's suggestion is on the right track and when it's going to create a mess three layers down. Effectively, they know the magic incantation that the agent needs in order to do the right thing.

What does this say about AI usage in general? Given two people with the same access to a smart coding agent like Claude or Codex, both performing the same task, their domain knowledge will lead to very different results. In other words, it means that Claude and its equivalents are tools. And the wielder of the tool has a huge impact on the end result.

The role of expertise hasn't diminished. It's shifted. The expert is no longer the person who can produce the artifact. They're the person who can direct the production of the artifact correctly and efficiently. That's a different skill profile, but it's no less valuable and the leverage is higher.

We're still figuring out what this means structurally. But the instinct to say "that's not my domain, let the person who knows it handle the AI that does it" is correct. Domain knowledge determines the quality of the output, even when the AI is doing all the typing.

Oren Eini

Oren Eini

CEO of RavenDB

The cost of a free feature

Making use of features that are already there…

Fifteen years later

How we dealt with it

Hyrum's Law sends its regards

The actual lesson

Multi-Agents in RavenDB

Where one agent starts to hurt

Multi-agents are just agents that talk to each other

When you should not do this

Summary

The best code is the one you shift+delete

"Optimizing" concurrent regexes

Non obvious optimization with divide and conquer

Learning to code, 1990s vs 2026

Hidden costs for reduced conceptual depth

What is the role of a junior developer now?

The GPU Is the New Bangalore

The Bottleneck Has Moved

A note about the importance of proper architecture

Putting Claude up against our test suite

15+ years of working with coding agents

From individual contributor to oversight role

Treating coding agents as junior developers?

Unnatural impulses as a developer

Expertise in the age of AI, or: Matt's Claude'll handle this

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed