Gabriel Pereira
Back to writing
May 2026·craft·5 min read

The environment gap

There's a class of bugs that only exist between your machine and production. They require a different debugging mindset.

The code was right. It had been right for weeks. It only broke in production, and only on the first request after a cold start.

There's a category of bugs that doesn't live in your code. They live in the space between your development environment and production: in the assumptions baked into which config gets loaded, which network the service resolves to, which schema version is active. These bugs pass your tests. They pass code review. They wait.

Three bugs with the same cause

Working on Unsaid, I hit three of these in about two months. The first: Supabase local development uses a different set of service-role keys than production. I had the local key hardcoded in a helper that wasn't going through the environment variable correctly. Easy to miss, because the function worked fine locally, and the error in production was a permissions failure that looked like a query bug.

The second: Clerk's SDK resolves to different endpoints in different network contexts. A DNS configuration that was correct for the deployment environment was not resolving correctly from the serverless runtime on cold start. The request timed out. The error message said authentication failed. The actual problem was a network layer I hadn't thought to check.

The third: pgvector uses a schema path at the database level. My local database had the extension set up one way. Production had it set up slightly differently, because it had been provisioned from a different baseline. The embedding queries failed on a path resolution that worked locally because the default search path was different.

What they had in common

None of these were code bugs. The functions were correct. The logic was correct. What was wrong, in each case, was an assumption I'd made about the environment. That the keys were the same, that the network resolved identically, that the schema path matched. These assumptions were invisible because they were never written down. They lived in my mental model of how the system worked, not in the code itself.

This is what makes environment bugs hard. You look at the code and it looks right, because it is right. The bug isn't in what you can see.

The failure mode isn't wrong code. It's wrong assumptions about the environment the code runs in.

The failure mode: debugging the code

Each time, my first instinct was to debug the code. I read the function. I added logs. I checked the return values. I spent 40 minutes on the Supabase bug looking at query logic before I checked whether the key being loaded was the one I thought it was.

The debugging mindset for environment bugs is different. You're not looking for what the code does wrong. You're looking for what the code assumes about the world that isn't true in this environment. That's a harder thing to go looking for, because you don't know what you don't know.

The checklist I run now

Before I blame the code now, I check a short list: is the environment variable actually set in this environment and not just locally? Is the service I'm calling resolving to the same endpoint it would locally? Is the database schema in the state the code expects? Is there a permission boundary between local and production that I'm crossing differently here?

It takes five minutes. It has saved me hours. The goal isn't to run through the list methodically every time. It's to internalize the habit of asking whether the environment is the problem before assuming the code is.

Why local-prod parity is a product decision

The underlying problem in all three bugs was that my local environment didn't match production closely enough. Not in a way that was careless. It matched for most purposes. It just didn't match in the specific ways these three functions needed it to.

Local-prod parity is usually treated as a DevOps concern. I think it's a product decision. Every gap between your development environment and production is a place where a class of bugs can hide until a user finds them. Closing those gaps is worth the overhead, not because it's clean, but because the alternative is debugging in production.