7 types of technical debt and what they say about your team
The 4 main decisions that lead to technical debt
Martin Fowler's article on technical debt has been out there for more than 10 years and is still a definitive piece on how you can categorize technical debt and qualitatively determine the value and cost of that debt. Fowler effectively creates a decision tree that acts as a good high level framework for technical teams and how they can make decisions about how to manage technical debt.
- Do we know the right way to do this but know that there's a faster way to get it done now? This is deliberate/prudent technical debt. With this type of technical debt, the team is knowledgeable about the tradeoffs and can make a justifiable decision that also results in a plan to remediate the debt at a later date. High performing teams will often use this type of debt to constrain timelines while holding themselves accountable to paying it down. With my teams its a general rule that if there's a work item that makes a tradeoff for delivering on time, there should be new work items that get added to the queue that account for the work to pay down the debt, akin to paying off your credit card balance at the end of each month.
- Are we skipping key steps in our process in order to ship now? This is deliberate/wreckless debt. With this type of technical debt, the impact is far less constrained to a single work item but may have significant consequences in the future. Teams under significant deadline pressure with poor leadership and little agency will often use shortcuts in best practices or workflow to try to cheat on a deadline. Almost without failure - this is coupled by management that enables these behaviors or even mandates them. Because the debt is not discrete, it can be far reaching and difficult to correct - like someone who doesn't have a budget or routinely maxes out their credit cards. When I've encountered teams with a history of deliberate/wreckless debt, the only solution is often a reset.
- Are we not knowledgeable enough to do this the right way now? This is inadvertent/prudent debt. When a team is encountering a structural problem for the first time, they often will not have planned for the time needed to become knowledgeable on the right answer. This is not always a smell of inexperienced teams. It happens frequently with more experienced teams that have accumulated a fair amount of deliberate/wreckless debt by not using tools like spikes or tracer bullets to build workable patterns in past tasks. While a lack of knowledge and experience often creates this type of debt, it is often the result of poor planning or design investment in past work. Think of this like having auto insurance with poor coverage - it could be a lack of understanding of how the policy works, but it can equally be a result of not researching the policy enough to begin with.
- Man, it seems obvious now that we should have done it this way. This is the more mystifying inadvertent/prudent debt. In the course of a team's work, there are dozens if not hundreds of smaller, seemingly inconsequential decisions that are made every day that seem benign. In retrospect - a team can look back and say "geez, dd we really do that?" The sum total of those decisions is likely a net gain in cycle time, but a net loss in technical assets leading to some more debt. As Fowler concludes - this debt is likely inevitable. But it may be quantifiable like it's deliberate cousin as we'll discuss in the examples below.
Common examples of debt and how to manage them
Now that we have a framework to thinking about technical debt, we can talk about some discrete examples of how this debt manifests itself in projects. We are never trying to not have technical debt - the goal is to be able to identify it on the project balance sheet and have a plan for how we manage it as a liability, understanding that it is the result of a tradeoff that prioritizes near-term business value. It is incorrect however, to categorize certain other problems as technical debt:
- Bugs are not technical debt. Bugs are the result of poor communication and understanding of requirements that results in an incorrect implementation. A poorly or not at all understood edge case that leads to a defect in the software is not a tradeoff that was deliberately or inadvertently made - it is something that was not accounted for in design (this may be debt in some other area of the business, but it is not technical).
- Staff performance does not mean anything about technical debt. Teams and individuals that struggle are no more or less likely to have technical debt than their higher performing peers. While a team with significant unpaid technical debt may struggle to maintain their productivity and work ethos without the bandwidth to remediate it - there are teams and individuals who can seemingly perform at a high level while continuing to accrue massive amounts of technical debt. This is the 10x engineer fallacy.
- Businesses do not fail because they had too much technical debt. Businesses fail because they did not manage their software as an asset like other assets.
Now - what are typical types of technical debt that we should be on the lookout for?
Outdated foundations and dependencies
Have you applied the latest updates for your operating system today? No? Then congratulations - you have taken on the most common form of technical debt!
As the software world has evolved and we have adopted package managers and nightly updates and patches of operating systems, the benefits of this constant change - more secure software, more highly performing software, stunning bugfix times, are all to be weighed against how to manage for the potential risks: breaking changes, unplanned downtime, and knowledge risk. As a result, controls are often put in place to balance the almost instantaneous remediation of technical debt via outdated software by slowing down the rate of that change. Your NPM package-lock.json file is a great example of how this is done.
The goal should certainly be to always be using the most recent operating system, runtime, API, language version, or package dependency. This is not to be cutting edge, but rather to ensure that the technical debt created by other teams has a limited lifetime on your team's balance sheet. Teams should be familiar with semantic versioning to be able to quickly identify and plan for updates from external software. Changelogs should be read carefully. Vendors or software providers that don't follow these conventions should offer the appropriate free or commercial support to allow their users or customers to upgrade as needed without introducing project risk.
What this says about your team: For a lot of teams, keeping things updated from their desktop OS to the smallest functional package dependencies seems like an afterthought that is low value. But keeping up to date with each small update while creating incremental effort, saves from sea-change size effort to take on multiple and major updates later. If your team is reluctant to take software updates, it's likely that it's been a while since they've made the last update. A plan needs to be implemented to get up to date.
No user experience (UX) design
Yes! You need some level of user experience design for your software project! UX is not solely about software being pretty - it is about designing information and workflows that make the users of a software productive, which is the whole point. UX isn't just about widgets and buttons either - a poorly thought out API is also a poor user experience for the developers who will be integrating it into their software.
This does not mean that you need to hire some fancy agency to do your UX. Simply providing your teams with the support to interview users and understand what their process is and what they think would be an improvement is in fact user experience design. Many years ago working at a well-known hospital in Boston, I was part of a small team building a messaging system to communicate patient needs to doctors at a particular practice. The existing system - literally shoeboxes and paper, was going to be replaced by an online tool. But how should it work? Me and the other two people on the project met with the staff at the practice to find out how the shoebox notes "system" worked, and what was difficult. Once we knew what the users needed, we started building a prototype for them to use. When we built the actual software, the entire practice adopted it immediately.
Poor user adoption is a smell of poor or non-existent UX, which in turn is a form of technical debt. It is typically the result of a lack of process or a short-circuiting of process, and can either be devastatingly reckless or just the result of inexperience or ignorance.
What this says about your team: Not applying a UX process to almost any software project, regardless of who the end user is, is a recipe for disaster. Unless your team is the only user of the software, they need to get "out of the building." Many software teams are more comfortable taking direction, and particularly in end user facing applications - software developers may not be the best to do wireframing and design. Consider adding a UX analyst to your team.
No test-driven development methodology
If I had a dime for every time I heard a manager say "TDD takes too much time" I'd be a rather wealthy man. TDD is often confused as a method for functional testing of software when in fact it is a way of writing code and developing software. The benefits of TDD are clear: fewer bugs on delivery, more flexible code for later refactoring, earlier detection of regression, measurable code quality, etc. Unfortunately, teams are often dissuaded from practicing any tools of test-driven development, and many in fact are poorly educated in how to execute TDD to begin with. If there is any speed benefit to not practicing TDD, it is almost immediately accumulated as debt in the form of poor systems design, lack of maintainability, reduced future cycle times, and frustrated software teams consistently having to recount why something works the way it does. TDD is so mind-numbingly simple to execute on, and yet teams forego it all the time thinking that it has no benefit.
Sometimes teams will attempt to wallpaper over the lack of development-level testing by insisting upon higher-order testing exercises. I once worked with a team that was required to add end-to-end functional testing for all new development for all documented core, edge, and corner cases. While E2E testing can be beneficial - it is many orders of magnitude more complicated and costly than implementing unit tests as part of a TDD approach. E2E testing also tells a team nothing about the design of their software, how maintainable it will be, or suggest anti-patterns that will delay the team later. It also is a longer feedback loop that will only discover gaps between the requirements and the implementation at the very end of the development cycle. While the business is right to require functional test automation, it is NOT a replacement for TDD.
Teams that measure code coverage and make prudent decisions about where to NOT apply TDD succeed more often than their counterparts who don't. I have had clients who have used TDD for prototype software that was likely going to be thrown out once it had been used to learn several things about the behavior of its users. The intent of this project was not to create maintainable software, and therefore the infrastructure and workflow changes required to support the developers in using TDD had very little value add to the project. For any production-ready software project, TDD is a powerful tool to stop a lot of inadvertent/prudent debt.
What this says about your team: Teams that don't practice TDD often lack several competencies - ability to refactor, layering of logic, etc. TDD creates an ability for developers to detect smells in code that can often become monsterous, unmanageable problems later. Implementing TDD on an existing project often encounters resistance because it creates a window into the issues of a project. Starting in small steps, with incremental code coverage improvements as a goal, can help a team get on the right track.
While the agile manifesto values working software over comprehensive documentation, it does not say "though shall not create documentation." Teams that lack documentation - be it comments in code or even the most simplistic of README files will thrash more, onboard new team members slower, deviate from standards, and generally communicate poorly. Documentation can be overdone - but having some degree of directive, specific, targeted documentation is essential and the lack of that documentation not only qualifies as technical debt, but certainly leads to future technical debt.
Good documentation answers questions in a priority order of importance to the team:
- What does this software do? Good documentation concisely says why the software exists by describing the problem it solves.
- How does this software work? Good documentation allows someone unfamiliar with the software to quickly get it running or ready for further development without intervention from another person.
- Who built this software? Knowing who the authors responsible for building and maintaining the software are allows for a user to quickly get help as needed, which often lends to beneficial updates to the documentation.
- Why was the software built this way? Less important but directionally helpful is to understand patterns, standards, and idioms that are fundamental to the software so that deviation is controlled.
It is not critical to have binders of documentation. Good documentation is eventually the foundation of future automation, so it should be instructive and directive towards a deterministic outcome.
What this says about your team: A lot of teams that lack documentation have a few members who know the most about how the underlying codebase works. Hero worship behaviors - like a single team member who is an Oracle to the rest of the team, are usually a result. Those team members who have the knowledge need to be compelled to get their knowledge into a place that demonstrates that other team members no longer have to rely on them.
Human-intensive tasks are the bane of the fractional CTOs existence (or any CTO for that matter). When software teams fail to automate human-intensive tasks, they create debt by slowing down repetitive processes while creating room for human error to be introduced. Quality of software is maintainable solely through automation of auditing standards and better yet, automating the implementation of those standards.
Code linters are the most simplistic but most impactful automation for a software development team. While your .editorconfig file can guide a developer to following some basic standards when writing code - good linters not only report on deviations from the standards but will also correct them, which then in turn removes the dependency on the developers fingers for simplistic quality issues entirely. Linter implementation also allows code reviews to proceed with a focus on structure and logic rather than organization and cosmetics.
In general, my rule for my teams is this: when you find yourself doing the same thing over and over again, script it away. Not only will it save you time in the future, but it future-proofs the code so that its future maintainers have less to account for for process replication in the future.
What this says about your team: Teams that absorb the pain of performing the same tasks over and over again are likely overburdened and unproductive. They are likely not creative at solving problems, and hide a lot of the issues that could lead them to be more productive. Regular retrospectives that account for the drag on productivity and schedule work to remove the drudgework can be helpful to pay off this debt.
No infrastructure as code
If you are in the cloud and can't reproduce your entire infrastructure from a piece of code (or tear down all of the infrastructure) - you are certainly playing with fire. I asked a DevOps colleague once if he thought that using the AWS console to create infrastructure was ever a good idea, and he came back and said people should be fired for lesser wrongs.
While the "no automation" form of technical debt is a directive to take human intensive processes and make them robotic - not having infrastructure as code should be as acceptable today as not having software source controlled in 2010. I use Terraform to manage the infrastructure behind this web site. If I didn't I would be ripping my hair out on things like:
- Patching infrastructure with the latest AWS changes. I just got an email from them the other day about how a certain IAM role in my infrastructure was being retired. Without infrastructure as code - I would have to search through the notoriously difficult to use AWS console to find the resource in question.
- Understanding how the configuration of my cloud resources had changed over time. I once was updating permissions on a load balancer and inadvertently directed traffic to the wrong EC2 instances by a simple keystroke error. With source controlled infrastructure I was able to revert the change in minutes.
- Taking resources offline to save on costs. At Vitals as we moved our infrastructure after the sale to WebMD, we spent a number of weeks having to review our AWS bills and compare them to the infrastructure - as not all of it was managed by code. We ended up finding 2 m4.xlarge instances that were unmanaged. Had they been in our codified infrastructure, we could have saved thousands of dollars sooner.
Even if you are just getting into using cloud infrastructure, using an infrastructure as code solution can be extremely helpful. Even a simple AWS Lambda tutorial may require you to create quite a few resources that unless you remember to tear down when your free tier time ends, you will get charged for.
What this says about your team: Infrastructure as code is somewhat new - but as cloud adoption continues it's endless acceleration, it's important that teams that have not yet adopted a tool like Terraform or use Cloudformation build the competencies to ensure that infrastructure can be built up and destroyed as needed.
Differences between environments
By now unless you are doing mobile application development - there is basically no reason to not be using Docker. And with the explosion of container orchestration whether it be using Kubernetes, Amazon ECS, or Rancher - there's no reason why your development, test, and production infrastructure should not be the same. While virtual machine solutions like Vagrant helped make it so the "it works on my machine" joke became less frequent - containerizing applications eliminates this issue entirely.
Remember the days when you started on a new team with a new project and you spent the first two days just getting a working development environment running? Gone. Remember when code went to production and it turns out that the production server had a version of PHP that your code couldn't support? Gone. All of these things have been largely remedied by moving to containers.
A lot of resistance around moving to containers is around the perceived learning curve. There are many software developers who simply will not be bothered by issues of operations and would just prefer to write code. This is the type of intentional/reckless debt that can really become systematic. There's not a lot of disagreement on the value of containers to a team - but the time to get everyone comfortable with it is often viewed as not adding business value. This is the type of myopia that creates really burdensome technical debt that often falls on the shoulders of the team later in the project lifecycle when it impacts time to market and productivity.
What this says about your team: "It works on my machine" should become an unacceptable idiom for teams today. It wastes an incredible amount of time and is indicative of risks that are hidden. Having a long-term plan to container adoption and eventually container deployments may require hiring outside expertise - but the team needs to be open to learning how containers and container orchestration works.
There are many, many types of technical debt, and technical debt is not a particularly bad thing provided that is identifiable and managed. Teams that deliver frequently and with immediate business value will often make educated trade-offs that lead to debt that needs to be serviced later. Teams that can scale with this approach make the time with their stakeholders to regularly pay down debt so that they can continue to move with speed.
Ryan is the former Chief Product Officer at Medullan, CTO at Be the Partner, and CTO and General Manager at Vitals. He currently works as a fractional CTO offering strategy as a service to growth-stage companies in health care and education.