Prefer duplication over the wrong abstraction (2016)
256 points - today at 4:08 PM
SourceComments
Of course, the worst abstractions are the ones you don't need at all.
But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`
With AI, we really need to rethink the clean code principles.
At the very least it is not once you're working at the wrong kind of scale.
Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.
And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.
I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.
Yes, okay. But with both you will have a bad time cleaning up.
There is a third option: good abstractions.
I did see this pattern described in the blog in practice a lot (and fell victim to it myself) and I think that in general this comes down to inexperienced programmers. Object oriented programming makes it worse.
Teaching these programmers that they should not abstract is not the solution. It is blocking their growth.
Teach them how to make better interfaces instead.
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
Part of being a good engineer is finding the right balance.
I know engineers who would gladly duplicate code all over the code base to avoid creating a new abstraction.
I know engineers who create polymorphic abstractions for a single caller with a very obvious set of parameters.
So much of wisdom is in finding balance and not being dogmatic about rules.
it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'
long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book
Overengineering, abstractions and premature optimisation are the 3 worst plagues of engineering.
At the same time I’m happy they exist because it means we’ll always have a job.
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=35927149 - May 2023 (69 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=27095503 - May 2021 (17 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=23739596 - July 2020 (240 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=17578714 - July 2018 (207 comments)
Prefer duplication over the wrong abstraction - https://news.ycombinator.com/item?id=12061453 - July 2016 (96 comments)
The Wrong Abstraction - https://news.ycombinator.com/item?id=11032296 - Feb 2016 (119 comments)
I've worked in too many projects, where every new feature needs to be built on top of existing abstractions, that often lead to severe restrictions if something slightly different is required. I always try to create reusable units/components, that can either be used as intended or replaced by something that behaves slightly different if needed.
Components are not necessarily frontend components, this extends also to backend logic.
Copy and paste once is fine, twice, not so much.
Often I've seen two totally different things exist in one bit of code, no overlap!
Premature generification is bad, and leads the developer to believe that two things are the same, making it harder to see they are not.
Also, can make it much harder to see that a different abstraction would give a cleaner outcome....
There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.
Code duplication and 'wrong' abstractions both count themselves amongst the other foibles of programming. But they don't directly produce a cost which can be cheap or expensive.
They produce some other high dimensional intermediate value which can then produce highly variable cost dependent on the domain, goals, and scenario.
As ever, it depends.
The depends is quantifiable, but it doesn't fit in a blog post. Think more along the lines of war and peace.
The Node ecosystem is full of wrong abstractions.
This step should also be parameterized by how many times the duplication has occurred. Refactoring preemptively may lead to poor abstractions, but not refactoring after seeing the exact same thing tens of times would also be weird. See also:
Also I’ve seen the kind of codebase that seems to be LZW packed due to the sheer desire to DRY everything out. Not pleasant thing, by the time you goto 10 layers deep on some “helper” function you forgot why you in there.
On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.
If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.
I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.
Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.
Refactoring the code to lift conceptually coherent parts is _abstraction_.
Less compression, more abstraction. Then you're fine.
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
Otherwise what is better is better and we don't know what we don't know
Do you want to iterate using for loop or using .iter().step(2).map()?
I would rather have consistency than a mixed bag of levels of abstractions.
Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.
For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.
Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.
I’ve seen that grow to a half dozen cases until massive production dataloss occurred. A buggy client tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.
I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”.
Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most hires with any sort of engineering talent bounced in under a year.)
Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).
This is a pretty valid approach if you’re an agent hired to perform industrial sabotage, or if you keep replacing keyboards after you knaw through the corner.
Some previous discussions:
2023 https://news.ycombinator.com/item?id=35927149
2021 https://news.ycombinator.com/item?id=27095503
2020 https://news.ycombinator.com/item?id=23739596
Generalizing this in the abstract is a wrong abstraction.
Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.
Abstraction is a vague term when used here. Is a shared function an “abstraction”? It’s more like implementation hiding, maybe some data hiding. But you definitely have a dependency on it now.
Acronyms like DRY are for beginners. Once you get good you know when to break the “rules” (and when not to).