Keeping AI-Generated Code Under Control with Complexity Limits

Posted by Doug Haber on 2026-04-03

Also available on Medium.

There are a number of ways to improve the quality of code generated by AI agents. One relatively simple technique that I haven't seen widely discussed is enforcing complexity limits.

Cartoon developer giving a thumbs up to a tangled AI-generated code diagram while saying he trusts the tests.

AI-generated code often becomes unwieldy. Most models will happily produce large functions that are difficult to follow and maintain. Complexity limits are a straightforward way to rein this in. Smaller functions are easier to parse and understand, which may also help agents work more quickly and accurately within an existing codebase.

There are many competing measures of code complexity, and their usefulness varies quite a bit. A few will be discussed below, but in practice I've found two of the most effective and widely usable metrics to be cyclomatic complexity (also known as McCabe complexity) and lines-of-code limits (LOC) when tooling supports them.

Where Large Functions Make Sense

Before getting into the metrics themselves, it's worth acknowledging an important caveat: large functions are not always bad.

In some situations, particularly optimized code, splitting a function into smaller pieces can actually make things worse. Sometimes a larger function is easier to maintain because the full flow of logic stays in one place.

LLMs often handle large functions better than humans when performing edits. Since they process the entire context at once, they do not experience the same cognitive overhead developers do when scrolling through a large block of code.

Even so, large functions frequently accumulate hidden complexity over time. They tend to grow in scope, introduce subtle side effects, and become harder to reason about. In practice, giant functions usually start reasonable and then slowly grow into something far more complicated than intended.

Complexity limits are not about forbidding large functions entirely. They create guardrails so complexity does not creep in unnoticed.

Cyclomatic Complexity

Cyclomatic complexity measures the number of branches in a function. It is widely supported by linters and static analysis tools. It is not perfect, but the ecosystem support alone makes it practical.

One advantage of cyclomatic complexity over other metrics is that it is relatively easy to estimate at a glance. You can often look at a function and roughly predict its score just by counting branches. Agentic coding systems also tend to understand what cyclomatic complexity is. If they are told a limit in advance, they can often stay within that limit on the first pass, which reduces rework.

That said, measuring branches is not a perfect proxy for readability. Branches themselves are not inherently bad. In some cases, code with more branches can actually be easier to read, for example when a switch statement clearly enumerates cases.

Because of this, cyclomatic complexity should be viewed as a useful constraint rather than a measurement of code quality.

Example

Consider a simple function that processes an order:

def process_order(order):
    if order.is_cancelled():
        refund(order)
        return

    if order.payment_failed():
        notify_user(order)
        return

    if order.is_preorder():
        reserve_inventory(order)
    else:
        allocate_inventory(order)

    if order.requires_shipping():
        create_shipment(order)
    else:
        mark_as_complete(order)

    send_confirmation(order)

There is nothing obviously wrong with this function, but the branching complexity can grow quickly as new conditions are added.

One possible refactor might look like this:

def process_order(order):
    if is_terminal_order_state(order):
        handle_terminal_state(order)
        return

    allocate_or_reserve_inventory(order)
    finalize_order(order)
    send_confirmation(order)


def is_terminal_order_state(order):
    return order.is_cancelled() or order.payment_failed()


def handle_terminal_state(order):
    if order.is_cancelled():
        refund(order)
    elif order.payment_failed():
        notify_user(order)


def allocate_or_reserve_inventory(order):
    if order.is_preorder():
        reserve_inventory(order)
    else:
        allocate_inventory(order)


def finalize_order(order):
    if order.requires_shipping():
        create_shipment(order)
    else:
        mark_as_complete(order)

The total amount of code has not changed much, but the complexity of the main function has dropped significantly.

The outer function is also easier to understand at a glance. Each line represents a clear step in the process, and the function names describe what is happening. Instead of reading through a series of conditionals, you can quickly see the structure of the workflow and dive into the details only when needed. It would not be incorrect to argue that the original style is, in some ways, easier to parse, but this is just a small example. In the real world, complexity can go far beyond this.

Lines of Code Limits

When tooling supports it, limiting the number of lines in functions, classes, or even entire files can also be effective.

The exact threshold is largely a matter of taste and language style. Personally, even in verbose languages, I rarely like functions larger than about 20 lines, or beyond what comfortably fits on a single screen.

There are legitimate exceptions. Some code benefits from being written as a longer function. This can happen in highly optimized sections, or in code where splitting the function would introduce unnecessary complexity. In some cases, a large function may even be easier for an LLM to work with than a heavily fragmented one.

For that reason, it is useful to allow exceptions when enforcing LOC limits. When disabling a complexity rule, it is usually a good idea to require a short inline comment explaining the rationale.

There are several ways to measure lines of code. The simplest metric is LOC, which literally counts lines. More advanced options exist, such as SLOC or NLOC, which ignore blank lines and comments, and LSLOC, which counts logical statements instead of physical lines.

Each has its own tradeoffs, but in practice the exact metric often matters less than simply having a reasonable limit in place.

Cartoon manager asking if a tangled machine works and the developer replying that it compiles.

Other Complexity Metrics

There are many other complexity metrics available, some of which are arguably more sophisticated.

Halstead metrics provide a family of measurements that analyze the structure of code and estimate properties such as difficulty or effort. These can be interesting for people who enjoy metrics, though they are less commonly enforced in everyday tooling.

Cognitive complexity attempts to estimate how difficult code is to follow mentally. NPath complexity counts the number of possible execution paths through a function.

These metrics can often capture aspects of complexity that cyclomatic complexity misses. Support for them varies widely across languages and tools.

Simpler metrics like cyclomatic complexity and LOC tend to work well in practice because they are widely available and easy to understand.

Agents also tend to work faster and use fewer tokens when the rules are simple enough to follow on the first pass.

Tooling

Cyclomatic complexity is one of the best supported metrics across languages and linters. Support for other metrics varies widely.

Tools like Lizard can provide additional complexity metrics and work across many programming languages.

Regardless of the specific tools you choose, it helps to document the rules clearly. Code style and validation requirements should ideally be included in a project's instructions or in something like an AGENTS.md file.

If these constraints are included in the agent instructions before code is generated, agents will often follow them on the first pass. This reduces the amount of rework after linting and validation.

Agents Can Be Overly Literal

One thing worth keeping in mind is that agents tend to follow instructions very literally.

If the instructions strongly say that functions must never exceed a certain size, an agent may split code into awkward helper functions just to satisfy the rule. This can sometimes produce code that technically satisfies the metric but is worse overall.

In practice, this tends to be a smaller problem than overly large functions, but it is something to watch for.

It can help to phrase instructions as guidelines rather than absolute rules. Providing examples in the instructions can also make a big difference.

For example, an instruction might look like this:

Prefer small functions that are easy to understand.

Aim to keep cyclomatic complexity below 10 and functions under roughly 20 lines when reasonable. These are guidelines rather than strict rules.

Functions should usually do one clear thing. When possible, the function name and structure should make its behavior obvious at a glance.

This kind of guidance often produces better results than rigid instructions.

When to Enable These Checks

Complexity limits should ideally be enabled and enforced at the very beginning of a project.

If these checks are introduced later, the codebase may already contain many violations. Fixing them afterward can require a large amount of refactoring and rework.

Enabling them early prevents the problem from accumulating in the first place.

Migrating Existing Projects

If you want to introduce complexity limits to an existing codebase, the process is a bit different.

A practical approach often looks like this.

Start by building solid test coverage. Refactoring complexity issues often requires splitting functions and reorganizing code, so tests help ensure behavior does not change.

From there, add strictness incrementally. Enable the checks gradually instead of enforcing everything at once.

Refactoring can then be handled through focused tasks. Agents can repeatedly run tasks that address a small batch of violations at a time.

It also helps to provide explicit instructions, preferably with examples of what you want to see. In some places, it might be worth telling the agent exactly how to fix the issues rather than relying on it to infer the correct approach.

Work can be parallelized at the file level. Many linters can produce concise output listing files and line numbers with violations, allowing tasks to target specific files or ranges to minimize merge conflicts.

When conflicts do occur, it is often easier to restart from the new baseline rather than trying to reconcile partially completed work.

This incremental approach tends to scale well when agents are doing most of the refactoring.

Final Thoughts

Complexity limits are not a silver bullet, and no metric perfectly captures what makes code readable or maintainable.

Still, simple constraints such as cyclomatic complexity and lines-of-code limits go a long way toward keeping AI-generated code manageable.

When combined with clear agent instructions and automated linting, they reduce the amount of cleanup required after code generation. The result is faster agent workflows, fewer tokens used, and a codebase that remains easier for both humans and agents to understand.

Complexity limits are not new. What is new is how effective they are as guardrails for AI agents.