If it isn't visible, it's probably broken
In my career building products at startups and in big tech, I've noticed a pattern: anything that isn't really visible yet is probably broken in some way.
Sometimes "broken" is an obvious bug. Sometimes it's a nonsensical UI. Sometimes it's a business process where multiple steps have quietly drifted out of sync. The exact nature doesn't matter. What matters is: if nobody looks at it, it quietly decays.
When something is visible, issues are found quickly, because there is someone or something that knows it's supposed to work and verifies that it stays that way:
- Users use your product and notice regressions because they know how it's supposed to behave.
- Integration tests expect your app to behave in a certain way and fail when it stops doing so.
- Your QA team checks features before they are launched.
Put like that, visibility sounds like an obvious win. Why wouldn't you make everything visible?
Because visibility isn't free.
- Letting your users be the integration tests creates churn and support tickets - and frankly, it's disrespectful.
- Integration tests are non-trivial to write and maintain.
- QA time is expensive and limited.
So you have to decide where to pay for visibility, and how much. As with everything, the first step is to be aware of the visibility of something.
The concept of visibility is already established quite well in infrastructure with observability. But my thinking here came more from product work than from SRE blogs. But similar how every animal's final evolution is a crab, making things more visible (in whatever way) is the logical conclusion to many disciplines.
The visibility spectrum
I've started to think in terms of three axes for visibility. It doesn't matter if the "thing" is a feature, a data pipeline, or an internal process - you can ask the same questions:
- Who can spot issues - and who can actually debug them?
- How much effort does it take to verify?
- How often is it actually verified?
You can imagine a feature sitting somewhere on each of those axes. The more things cluster on the "only one dev knows how to check this, slowly, by running custom code, and nobody ever does" side, the more likely they are to be broken.
Let's go through these.
1. Who can spot issues (and who can investigate them)?
This is the most intuitive dimension: who could notice that this thing is broken? Just the original developer? Any teammate? An internal operations person? The end user?
A lot of bugs are not found by the team who wrote the code.
- A user who sees a weird number they don't understand.
- A support person who gets three similar tickets.
- A finance person whose export doesn't add up.
There's also an important distinction between:
- Spotting that something looks off ("this spike is weird"), and
- Verifying whether it's actually wrong ("yes, this is not valid").
You want both, but they're different skills and different levels of access. If we want to improve this axis, we first aim to increase the number of people who can spot issues. To do this, we have to make things accessible:
- If I can't see the data, I cannot spot issues in it.
- If the process is not documented anywhere, I cannot even think about it.
- If the feature cannot be toggled via a feature flag, I cannot try it myself.
Even a very crude UI or CSV export goes a long way compared to "only accessible via SQL on the production database". I've been surprised many times by how much time is saved simply by giving other people the ability to see and poke at things.
As mentioned, spotting an issue is different from being able to verify it. Just because someone thinks a number looks sketchy in a report doesn't mean they can determine why it is the way it is. Verification is often a gradual thing: over time, more people can understand, trace, and explain what is going on.
If you've ever used open banking, you probably know it has issues. With PSD2, the EU made it law that banks have to provide APIs. It did not standardize how the API should look or behave, so every bank did its own thing.
For a product, this means: you don't integrate with every bank yourself - you use an open banking provider. I've worked with multiple. They all have their quirks. When a customer had trouble connecting their bank, all of these could be at fault:
- The customer (wrong credentials, wrong bank selected, etc.)
- The bank (outage, random error)
- The open banking provider (implementation bug, API change)
- Us (a bug in our integration)
Initially, when our support asked "what went wrong for this user?", I had to go dig through logs. Finding the right entries took time and was annoying.
So I started saving all relevant connection attempts into a table (bank_account_connection), which we already had to handle webhooks anyway. Now I just had to run a simple SQL query to see all attempts and their status.
Then I added a very simple table view for this to our internal operations app. I actually only did this to make my own life easier, but it turned out to be a win for everyone:
- Non-technical team members could see these attempts.
- With a bit of guidance, ops could quickly tell which party was at fault.
- Support stopped needing a developer to investigate this for almost all issues.
The team members were able to spot issues before - as a customer reached out - but they didn't have enough access to verify anything on their own. By giving them access to even the crudest debug view, they were suddenly empowered to investigate themselves.
So next time you have to execute some database queries due to an internal request, think about whether you can make that path easier or even self-serve.
2. How much effort does it take to verify?
Even if people can see an issue and are allowed to investigate, there's still a question: how painful is it to actually verify that something works? The more painful it is, the less often you'll do it.
For me, "effort to verify" usually comes from four things:
- How long it takes
- Ease of access
- Representation
- Required knowledge
Let's look at each one.
2.1 How long does it take to verify?
While working at YouTube, there was often a rush on Fridays to get experiments launched before the weekend. Not because enabling experiments on a Friday is fun, but because:
- If you got the experiment out on Friday,
- You could look at the numbers on Monday,
- Decide quickly whether to ramp up, roll back, or iterate.
The weekend basically worked like a time-skip cheat for verification. If you only managed to launch on Monday, you'd have to wait until Wednesday to get two full days of data.
The same idea shows up everywhere:
- Running your test suite before you go to lunch.
- Running multiple agents or jobs in parallel to shorten feedback loops.
- Using a small sample dataset locally before touching the full one in CI.
If checking whether something works takes weeks or months, it effectively never gets checked unless you have very disciplined automation. People forget to look, priorities shift, that "critical" dashboard tab stays closed.
Pre-commit checks allow you to execute actions whenever you commit code. The pitch is great: never forget to run formatting, type checks, unit tests etc. Simply put these checks into the pre-commit hook.
But if you ever worked at a place where pre-commit started to take longer, you are also familiar with the command line argument --no-verify that skips the pre-commit check. As you don't really want to wait that long, you start to execute it less often.
We faced this exact issue: more and more team members started to skip the checks, me included. So we simply removed the biggest offenders as we also ran them in CI. One of them was the code formatter - which in theory shouldn't break as it runs whenever you save.
This definitely helped... but it didn't actually help in merging PRs faster, as more often than not, the formatting was broken. As Claude Code or Codex skip your on-save hook and are also not the most reliable in following commands, the formatting was often skipped.
The formatter we used was golines. It's an improvement over the default Go formatter gofmt, which doesn't restrict any line length. But contrary to it, golines could take up to a minute on our codebase, while gofmt is basically instant.
So what to do - put it back into pre-commit and never forget it but always be annoyed, or accept that one has to handle the occasional breakage?
The solution was neither. We decided to switch to gofumpt. The project is healthy and while not supporting breaking long lines yet, it's on their roadmap. Even though breaking long lines is nice and helps, it doesn't happen that often in Go - and the time penalty we paid for it was simply too much.
The anecdote shows that how long something takes directly correlates with how often it gets done. Developers already know how annoying long-running tests or compilers are and do strive to make faster tools (thanks to everyone rewriting slow tools in Go, Rust or Zig). I still think decreasing the time-to-feedback for anything is underrated. If your whole test suite ran in a second instead of half an hour, you (and your AI agents) would be able to develop very differently.
2.2 Ease of access
The easier it is to access the feature, data or process, the more likely you are to actually verify it. If you have to:
- Write custom SQL to read a handful of rows,
- Download and manually grep logs from a particular day,
- Or sign in with a special account using an unusual 2-factor setup,
you'll simply do it less often. It might be "possible", but it's not easy. The nice thing: improvements in "who can access this?" also usually improve ease of access in general.
This obviously correlates with how long things take, but it's mainly about friction: even small annoyances compound until you stop doing the check at all.
YouTube is one of the biggest apps ever; as such it runs on every device that can theoretically run it. To make sure we didn't break anything when changing the mobile apps, we had plenty of test devices lying around of different form factors and types (e.g. Android tablet, older iPhone).
Testing on the devices was easy if your feature was already launched - simply sign in with the setup test accounts and check it out. But that's already too late if you want to be careful. The issue was that test accounts on these devices didn't allow the manual override of feature flags. You could do it with your own corporate account, but that meant:
- Signing in with your account on every test device you want to test on
- Doing the security challenges
- Flipping the flag and testing the new feature
- Cleaning up afterwards, as you don't want anyone else to have access to your account
Only point 3 should actually be necessary. I was a bit confused why this was so annoying and why nobody had fixed it yet. Reading the docs it became clear that there actually existed a solution, but only in the main YouTube office in San Bruno: they had a custom WiFi setup that allowed setting feature flags on test accounts.
As we were a sizeable YouTube operation in Zurich back then, I was able to get our own custom WiFi setup, making testing on devices much easier.
This is the part where I should mention how this transformed our device testing, but sadly Covid hit and we were working from home. With me leaving to join re:cap, I was never able to see the full glory of the testing WiFi.
The anecdote should show that it was mostly an annoying access problem that made you not want to test quickly on a device. The steps weren't difficult, nor did they take that long. But they were annoying enough that you really didn't want to do them often.
2.3 Representation
Representation matters a lot. The more data points you have, the more important it becomes: 1000 rows of data are not intuitive; a bar chart is.
I once had to make sure a critical piece of money-handling code was tested properly so its business logic could be changed safely. We would "buy" our customers' contracts to pay them their worth upfront with a discount (factoring). They then had to pay us back over the next months. There were multiple tables involved:
- The payout to the customer
- Monthly payback schedules
- Underlying contracts
- Invoices belonging to those contracts
- Future expected invoices
Every month, the data had to be updated with the current status:
- Did some contracts churn?
- Did the invoices we expected actually get paid?
- Do we need to replace a contract or move expected cash flows to a later month?
In short: a lot of data that changed in non-trivial ways.
I created an integration test that snapshotted these tables at various important stages so we could see how any code change affected the structures. On paper, this made things "visible".
In practice, when I changed the underlying code, the snapshot diffs were huge and noisy. I could see that something changed, but not whether it changed in the right way.
The solution wasn't to become a human diff engine. It was to make the data readable. I created a custom aggregated structure that summarized the important aspects instead:
{
"financeableContracts": 62,
"financedContracts": 0,
"activeContracts": 62,
"rebatedContracts": 0,
"replacedContracts": 0,
"activeInvoices": 1008,
"paidInvoices": 0,
"residualInvoices": 0,
"payoutAmount": "428355",
"paybackAmount": "450900",
"financingFee": "0.05",
"missingPaybackAmount": "0",
"collectedPaybackAmount": "0",
"remainingPaybackAmount": "450900",
"payoutStatus": "requested",
"monthlyPaybackStats": [
{
"paybackAmount": "68610",
"paidAmount": "0",
"status": "active",
"invoices": 160
}
]
}Now I could quickly sanity-check:
- Do the totals still make sense?
- Are there unexpected churned or replaced contracts?
- Are invoices missing or mis-classified?
Any change in the underlying code showed up as a simple, readable diff in this structure. The invisibility problem wasn't "lack of tests"; it was a data representation that no human could parse.
The anecdote shows that if the representation is lacking, visibility can completely tank, even if every other dimension is fulfilled. So ask yourself questions like:
- Could I spot issues looking at it in its current form?
- Is my data aggregated enough?
- Does my data need a special visualization? (e.g. a graph)
2.4 Knowledge
Legacy code is often just "code where nobody on the team has a mental model anymore". This ties directly into how hard it is to verify something. If you set up the business process, you know the idea behind it and whether it still makes sense. If you wrote that weird part of the code with the cryptic comments, you have a better chance of understanding your past self.
If there's a playbook for recurring issues, more people can help.
I won't tell you "just write documentation" - docs have their own problems and are not a silver bullet. But you should keep a paper trail:
- Pull requests should have a description and link to a ticket.
- Commits should have real messages.
- Somewhere, you should write down why you did something a certain way.
The easier it is to find that context, the better, but even a small breadcrumb helps. You will forget your own reasoning, and it's a humbling feeling to stare at code you wrote two years ago and think "why on earth did I do this?".
At Google, they're very good at this via tooling. They don't have much traditional documentation, but they have excellent code history tools and strong habits around leaving traces in code reviews and commits. You quickly learn to navigate through the history of a file and understand why something looks the way it does. That's also a form of visibility.
3. How often is it actually verified?
This is where everything comes together. Even if something is easy and quick in theory, the important bit is: how often does anyone actually do it?
Some examples:
- If my test suite takes 1 second, I'll run it on every save.
- If it takes 5 minutes, I'll probably rely on CI.
- If it takes hours, maybe I'll run it once a day, or just before releases.
At system level:
- Your uptime monitor probably checks your website every few seconds.
- Your off-site backup restoration probably happens... quarterly, if you're lucky.
Features and code that nobody runs, nobody clicks, and no test touches effectively live in a dark corner. They might technically "work" today, but you'd be brave to bet your business on it. A useful rule of thumb:
If something isn't exercised by either users or automation, assume it's broken.
Which leads to the conclusion: you should probably delete more things.
If I ever get tattoos with phrases that are important to me, it would be YAGNI ("You Ain't Gonna Need It") and KISS ("Keep It Simple, Stupid"). But let's go back in time, when I didn't have those yet ingrained through experience.
At my first job, I was somehow responsible for the frontend as a working student. Not great for me, not great for the company, but it had one strong side effect: I got very attached to "my" code.
Like any product, we built experiments and features that later turned out to be unneeded. But when the time came to delete something, it felt... wrong.
- I'd spent my limited hours building it.
- It worked (at least at some point).
- Deleting it felt like throwing away food that's still fine.
So I kept code around "just in case" and maintained code paths that the product didn't actually use anymore. You can probably guess how this ended: one day, someone needed that old feature again. We flipped the switch back on.
And it was broken.
Not dramatically broken; just subtly incompatible. Over the months, everything around it had changed:
- Different data shapes
- Different assumptions
- Different authentication flows
- New invariants that the old code didn't respect
We now had to:
- Understand how it worked back then.
- Understand how the system changed since.
- Patch it back into shape.
In the end, it would have been faster and safer to rebuild it from scratch with current constraints in mind. The painful insight: code that nobody runs and nobody looks at isn't "sleeping". It's decaying.
Once I started looking at features through the visibility spectrum, the rule became simple:
- If something is unused and invisible, it's already broken - you just haven't observed it yet.
- If you ever need it again, you won't trust it anyway.
- So delete it. Be glad to delete it.
You ain't gonna need it. Keep your codebase simple, stupid.
Making things visible on purpose
So what do you do with all of this? When you work on a feature, piece of data, or internal process, ask yourself:
- Who can tell if this is broken?
- Only me?
- Any engineer?
- Support, ops, finance?
- The end user?
- How much effort does it take to verify it, without talking to the original author?
- Is there a debug view?
- Is there a clear representation (aggregate, graph, table)?
- Is there a quick path to the relevant data and history?
- How often does this actually get exercised?
- Tests on every commit?
- Dashboards someone looks at weekly?
- A manual run once per quarter?
- Never?
And one extra question that often decides everything:
- If nobody uses this anymore, why is it still here?
- Can we delete it now?
- If we keep it, are we willing to pay the visibility cost?
Often, problems you've been fighting for months ("that report that is always wrong", "that feature that breaks every second release") are just symptoms of low visibility:
- Nobody can easily see when it drifts.
- Or the only person who can isn't looking anymore.
Making something visible doesn't magically fix it, but it changes the odds:
- It gives people a chance to notice.
- It gives them a representation that matches their mental model.
- And it makes deletion an explicit option when nobody looks at it at all.
If it isn't visible, assume it's broken - and then decide whether it's worth making visible or worth deleting. Both are better than pretending it's fine in the dark.