An essay on AI

How AI Chips Work: The Pause Is Where the Work Lives

A chip is not a swarm. A hundred billion transistors pause together a billion times a second. The pause is the work. The slowest part sets the pace.

Hanh D. Brown · 11 min read

An essay on AI

How AI Chips Work: The Pause Is Where the Work Lives

A chip is not a swarm. A hundred billion transistors pause together a billion times a second. The pause is the work. The slowest part sets the pace.

Hanh D. Brown May 18, 2026

A tall sand dune at sunset, sand streaming from its sharp crest, with two small acacia trees on the pale desert floor.

In this essay

01 A chip is not a swarm
02 The slowest part sets the pace
03 Hidden in hardware or exposed to software
04 There is no free move

A modern chip has about a hundred billion transistors, all running at the same time. The thing that keeps them coherent is not their independence. It is the moment, one billion times a second, when every one of them pauses and steps forward together.

Short answer

How do AI chips work, and why is the pause the part that matters?

How AI chips work depends on the pause. The pause is where the actual computation lives. The clock pulse is the announcement, not the work. AI chips win by making the pause longer and more predictable, not by making the clock faster than the silicon allows.

A chip is not a swarm#

Picture a hundred billion transistors on a chip. Now picture them all freezing at the same instant, one billion times a second, then taking one tiny step forward together. That moment of freezing is what a clock cycle is.

It is also, in a way that almost nobody explains, the whole reason the chip works.

Without the freeze, the parts of the chip would finish their work at slightly different times. The answers would not line up. The math would come apart.

Parallel computing is not many things happening independently. It is many things happening at the same beat. The beat is the work.

Source: Every nanosecond, every transistor on the chip pauses, latches its state, and steps forward together. The pause is the work.

When many things are doing work at the same time, the results have to meet somewhere. The clock is the moment they meet.

Moving forward in lockstep is what makes it work. Every transistor steps at the same moment, the way a sound stays clean only when every part lands on the same beat. The downbeat is the cycle.

At the clock instant, whatever value happens to be on each wire gets stored in that wire’s register. The chip steps forward one beat. The wires reset. The next computation begins. The cycle repeats a billion times a second.

A chip is not a swarm of independent parts. Every nanosecond, every transistor pauses and steps forward together. The pause is the work.

Read the gigahertz number on a chip’s specification sheet as a count, not a speed rating. It is the number of pauses per second. A faster chip is a chip that pauses more often.

The slowest part sets the pace#

Now comes the next question: what sets its speed.

The answer is unexpected.

Clock speed is not set by a chip’s fastest part. It is set by its slowest. Whatever takes the longest to finish in a single cycle determines how fast the whole chip can run.

This is true for the same reason a convoy on a mountain road moves at the speed of the slowest truck. The fastest truck in the line cannot pull ahead of the slowest one without losing the convoy. The clock is the convoy. The slowest piece of logic is the slowest truck.

For decades, the standard fix was to split a long logic path in half with a register in the middle. The clock could then run twice as fast, at the cost of one extra register. This is called pipelining. It worked for thirty years.

Then the fix stops working. Some logic cannot be split that way. A calculation that feeds its result back into itself, a running sum or an accumulator, breaks if you put a register in the middle of the loop. The loop has to finish in one cycle. Whatever the slowest loop takes is what the whole chip can do.

That is why two chips built on the same manufacturing process can end up at different clock speeds. One has a tighter feedback loop than the other. The chip with the tighter loop runs faster. The chip with the looser loop runs slower. Same factory. Same materials. Different speed limit.

It is also why the era of doubling the clock every few years ended. The slowest feedback loop stopped cooperating. Designers turned to other strategies, more cores, smarter caches, specialized circuits, because the clock could not be pushed past the loop.

In any parallel system, the slowest piece sets the pace. The chip is the cleanest example. The same rule shows up in any workshop where many hands meet a deadline, any kitchen with five cooks waiting on one oven, any household trying to leave on time when one parent is still tying a shoe.

The slowest part runs the meeting.

Hidden in hardware or exposed to software#

One more move on the timing side of a chip is worth understanding. It is the move that makes an Artificial Intelligence (AI) chip an AI chip.

Every timing decision in a chip lives on the same line. On one end, the hardware decides on its own, and the programmer never sees it. On the other end, the programmer is told to decide, and the hardware just executes.

At one end sits a Central Processing Unit (CPU), the chip in a desktop computer or laptop. It uses caches, which are small fast pieces of memory that the hardware fills automatically with whatever data it predicts the program will need next. Caches make programs about a hundred times faster than they would be without them. They also make timing impossible to predict.

Over at the other end sits a Tensor Processing Unit (TPU), the chip in an AI accelerator. It uses scratchpads, which are small fast pieces of memory the programmer fills directly. Timing is predictable because nothing is being decided behind the scenes.

Source: Cache on the left. Scratchpad on the right. Hidden in hardware, or exposed to software. The choice decides whether the timing is predictable.

The CPU is more flexible and less predictable. The TPU is more rigid and more reliable. Neither chip is wrong. The chips are tuned for different work.

On a laptop, the CPU does not know what program it will run next. The cache is the way the hardware copes with that uncertainty. It behaves like a passenger elevator that picks its own floors based on who walks in.

Its counterpart, the TPU, runs the same kind of program over and over. The scratchpad is the right tool because the programmer can plan exactly what data sits where. The TPU is like a freight elevator that waits for the operator to push the button.

This choice matters most when chips work together. When a thousand chips have to coordinate on the same matrix multiply, jitter in any one of them slows the whole job. Predictable timing lets the thousand stay in step. A cache-heavy chip is fast on average and bad at staying in step. A scratchpad chip is slower on average and excellent at staying in step.

The AI workload chose the scratchpad. That choice is most of what makes a TPU look different from a CPU on the inside.

There is no free move#

Every choice on the timing side of a chip costs something.

A faster clock costs more power. A deeper pipeline costs more registers and more silicon area. A bigger cache costs predictability. A scratchpad costs programmer time. There is no free move.

Matching the cost to the work is the whole trick. A laptop runs a thousand different programs in a day. The CPU pays the cache cost because the average-case speed is what the user feels. A training cluster runs one workload for weeks. The TPU pays the scratchpad cost because the coordination cost across a thousand chips is what the operator feels.

One rule sits under all of it. Coordination has a cost. The cost lives somewhere.

The clock pulse is the cost made visible. The pause every nanosecond is the toll the chip pays to keep a hundred billion transistors moving in the same direction. The faster the chip wants to go, the more often it has to pay the toll, and the harder it gets to keep paying.

Anyone who has tried to coordinate a Thanksgiving dinner with eight family members in one kitchen knows the rule from the inside. The turkey is the slowest dish. The turkey sets the pace.

Push past the turkey and you end up with cold sides and a half-raw bird. The chip designer who tries to push past the slowest feedback loop ends up with a chip that does not work.

Once you can see the clock pulse, you read every AI chip announcement through a different lens. Clock speed is the toll rate. The slowest part of the chip sets the rate. Cache versus scratchpad is the choice about who pays in time, the hardware or the programmer. None of it is free.

So a chip is not a swarm. It is a hundred billion transistors that pause together. The price of getting that many things to work at once is that, every nanosecond, they all have to stop and look at each other.

A chip is a hundred billion transistors pausing together a billion times a second. A reader holding the next chip announcement now has a frame for what the gigahertz number means and why the AI chip looks different from the laptop chip on the same desk.

The pause is the work. The slowest part sets the pace. The next time a child asks how a chip clock works, the answer is one sentence long.

Source

The argument draws on Reiner Pope’s podcast interview with Dwarkesh Patel, 2025.

Questions readers ask

Six questions on this essay.

01 What is a clock cycle in a chip?

A clock cycle is the moment every transistor on the chip pauses, latches its state, and steps forward together. It happens about a billion times a second on a modern chip running at one gigahertz. During the cycle, each wire on the chip is doing work and producing a value. At the clock instant, whatever value happens to be on each wire gets stored in that wire's register, and the chip steps forward one beat. The cycle repeats. The gigahertz number on a chip's specification is the count of clock cycles per second. It is not a raw speed rating. It is the count of pauses. Without the pause, the parts of the chip would finish their work at slightly different times, the answers would not line up, and the math would come apart. The pause is what makes the chip a chip and not a swarm.

02 How does a chip with billions of transistors stay coordinated?

By pausing together a billion times a second. The clock pulse reaches every part of the chip at the same instant. At the pulse, every transistor latches its state into its register and steps forward one beat. Between pulses, the wires are doing work. At the pulse, the work gets stored. The chip is not a swarm of independent parts running on their own. The chip is a hundred billion transistors moving in lockstep. The lockstep is what makes the math come out right. Software people often imagine parallelism as many things happening at once. Hardware people know that parallelism in silicon is many things pausing at once. The pause is the synchronization. Without the pause, the chip would still have a hundred billion transistors. None of them would produce a coherent answer to anything.

03 Why did clock speeds stop increasing?

Because the slowest piece of logic on the chip would not cooperate. For thirty years, designers sped up the clock by splitting long logic paths in half with a register in the middle. Each split let the clock run twice as fast on that path. The fix stopped working at the loops. A calculation that feeds its result back into itself, a running sum or an accumulator, has to finish in one cycle. You cannot put a register in the middle of a feedback loop without breaking the loop. Whatever the slowest loop takes is what the whole chip can do. Designers turned to other strategies, more cores, smarter caches, specialized circuits like the ones inside an AI accelerator, because the clock could not be pushed past the slowest feedback loop in the design. The slowest part sets the pace.

04 What is the difference between a cache and a scratchpad?

A cache is a small fast piece of memory that the hardware fills automatically. The hardware predicts what the program needs next and keeps it nearby. The programmer never sees the decision. A scratchpad is a small fast piece of memory the programmer fills directly. The programmer says exactly what data goes there and when. The hardware does not predict. The trade-off is between flexibility and predictability. A cache makes programs about a hundred times faster on average and makes the timing impossible to predict. A scratchpad makes the timing predictable and forces the programmer to plan the data layout. A Central Processing Unit uses caches because it runs many different programs and the average case is what matters. A Tensor Processing Unit uses scratchpads because it runs the same kind of program over and over and the coordination across many chips matters more than the average case.

05 Why do AI chips use scratchpads instead of caches?

Because AI training runs the same workload across hundreds or thousands of chips at the same time. When a thousand chips have to coordinate on the same matrix multiplication, jitter in any one chip slows the whole job. Predictable timing is what lets the thousand stay in step. A cache-heavy chip is fast on average and bad at staying in step, because the cache makes the timing depend on what data happens to be sitting in fast memory at that moment. A scratchpad chip is slower on average and excellent at staying in step, because the programmer planned the data layout in advance and the timing does not depend on a hidden hardware decision. The AI workload chose the scratchpad. That single choice is most of what makes a Tensor Processing Unit look different from a Central Processing Unit on the inside.

06 What is the rule that applies to any parallel system, not just chips?

The slowest piece sets the pace for the whole system. The chip is the cleanest example because the pace shows up as the clock speed and the slowest piece is a specific feedback loop nobody can speed up. The same rule applies anywhere parallel work is coordinated. The convoy on a mountain road moves at the speed of the slowest truck. The kitchen on Thanksgiving moves at the speed of the turkey. The household leaving for school moves at the speed of the child who has not found a shoe. The team shipping a project moves at the speed of the slowest dependency. The lesson is the same. Coordination has a cost. The cost lives in the slowest piece. The faster the system wants to go, the harder it has to work on the slowest piece, because everything else is already waiting.

About the author

Hanh D. Brown, writer.

Hanh D. Brown writes on AI, aging, and the decisions in between. Twenty years building systems for life-stage choices, now writing the publication with time to ask why.

Subscribe: a new essay when it's finished, never before. Join readers thinking about AI, aging, and the decisions in between.

Subscribe From the work See the work

A chip is not a swarm#

The slowest part sets the pace#

Hidden in hardware or exposed to software#

There is no free move#

Six questions on this essay.

AI Orchestration: The Blind Spot Is Not the Model

AI Is Not a Faster Chip. It Is a Reinvention of Computing.

How to Read AI News: Three Layers, One Sentence Skipped