Sustainable Engineering Practices: Balancing your Team's Portfolio
I was fortunate to build and maintain a product portfolio for over 10+ years in my previous job. There were multiple rewrites, architecture initiatives, new languages, data center migrations, launches, deprecations, and sunsets along the way. I saw numerous architectural decisions and the ripple effects they have many years down the line.
One of the things that come with a longer tenure in an Engineering leadership position is an intentional focus on managing the things you've built along the way. We went through multiple iterations of how to examine how the team is currently spending their time and how that impacts the planned work they can do in the future. We landed on the following high-level way to categorize the work that a Software Engineering team performs into three categories:
- Run the Business - This is what you need to keep the systems up and running. Think responding to and addressing on-call or customer support issues or ensuring that you're running the latest versions of libraries, dependencies, and operating systems.
- Technical Debt - The work must be done by the team to maintain a healthy foundation and build for the future. Rearchitecting systems for better scale, introducing new technologies that are better suited for your use case, addressing unstable or hard to understand parts of the system.
- Feature Work - This is the net new work where the team is building features, products, and experiences to deliver to customers.
I find introducing this shared language helpful for the Product and Engineering teams to stay on the same page about the total throughput and capacity of any given Engineering team.
In the early days of a startup or new team, you'd expect that most of your work should fall into the Feature Work category. If you're lucky enough to find Product-Market fit and your business grows and scales, you'll see the need to focused on Run the Business and Tech Debt to keep up with demand and address increased complexity that comes from the introduction of new Products, Services, and Customers. If you ignore these buckets, you can see the Engineering team's shipping cadence grind to a halt.
You'll likely see your team’s membership grow and expand as well. With that growth, it becomes essential to pay attention to and make visible the work being done by the team to appropriately set expectations about the planned work you'd like to do in the future. With that in mind, I've found it useful for the team to generate the data for and review the following artifact together with some regular cadence to build shared context:
Template: Your Engineering Team's Portfolio
To keep this process lightweight, I often host a meeting with the team and fill out this template based on their intuition through facilitated discussion. That said, you could likely automate a lot of this reporting by examining things like Mean Time to Resolve for incident resolution, inbound customer support requests, story points shipped, etc. for each Product.
Capture the current state of your team's Products and Services
First, you need to take a snapshot that outlines the current state of the Products and Services owned by your team. For each Product that your team owns, enumerate the underlying services that power that Product. For each Service in the list, we assign a letter grade A - F for:
- Stability - How stable is the Service? 'A' means it's rock solid and has performed well with minimal intervention for quite some time and 'F' indicates the Service is falling over regularly and not performing well.
- Team Knowledge - How well does the team understand this Service? 'A' is everyone on the team knows the Service well, and 'F' indicates that only one or two people understand it well.
Look Back to Define Current Portfolio allocation
With the visibility you gathered above as input, take the time to outline how your team is currently spending their time. As a general rule, I ask the group to consider the last three months when determining the percentages. It's usually a good time to remind the crew about Recency bias - Wikipedia as well. Some questions to consider:
- How much time was spent on Feature Work? Was this less or more than we expected? Did we have a flood of inbound support requests that caused us to miss our expected delivery dates?
- Pull up your alerting graphs. Was there a sustained spike in paging/alerts recently? What could be causing this?
Look Ahead to Set Target Portfolio allocation
Now that you understand how you're currently spending your time, does this align with where you'd like to be with a team? With what is needed by the business? Some helpful questions to consider:
- Why was our RTB percentage so high last month? Are there Tech Debt initiatives we should prioritize this soon to address this? How do we expect that to bring down RTB %?
- Should we intentionally slow down on Feature Work in the next few months to address underlying architecture concerns?
- Do we agree as a team that we should pause non-essential architecture investments for the time being to meet the "make it or break it" milestone for our company next month? (think Black Friday, CES, Singles Day, etc.)
- Do we have enough people on the team to meet our desired target portfolio allocation?
- When do we expect our Services to begin to encounter scale issues? Are there user growth or sales forecasts that can help us understand any anticipated growth here soon?
Take Action
With this new visibility, you can take specific and intentional action to address any problems or opportunities you see:
- Lack of Team Knowledge - Perhaps you discover the rock-solid Service that has performed well for three years is mostly unknown by your team because there hasn't been much active development. Maybe there is only one person on the team that understands this Service well? Perhaps this is a critical service that brings your business to a halt if it goes down. Are you ok with this?
- Unstable Services - Maybe you discover your team has been receiving alerts about an unstable service for the month that seems to resolve on their own with no root cause analysis performed. Should prioritize some intentional investigation or training to address this so that you're not caught flat-footed if there's a problem with the Service at 3 am?
- Unbalanced Service Ownership - You can use this information to distribute ownership of Products and Services across your Engineering organization more effectively. On the surface, maybe you think because Team A owns two Products and Team B own's two Products that things are evenly distributed? This exercise can show you that Team A owns five stable services under the covers and has been crushing their Feature Work delivery milestones. Team B owns 30 underlying services, half of which are unstable, causing them to miss deadlines.
- RTB is too high - Leaving the Run the Business component high without addressing it over the long term is going to result in reasonably uninspiring work that will likely lead to burnout or high turnover. Can you reduce RTB work through Tech Debt initiatives? Can you outsource some of this work to a front line support team?
- Missing Critical Launch Milestone - Is the timing of our milestone that important? (See Using Release Dates Effectively) If so, are there opportunities to add more engineers with relevant experience to the team to help swarm on the problem? (fully acknowledging that the The Mythical Man-Month - Wikipedia is real and you can't just throw uninformed people at a situation to speed things up). As a team, can you agree that it makes sense to pause all other work and move towards "all hands on deck" to hit your milestone? It's a much better outcome when the team comes to this decision themselves, rather than having this come as a top-down decision that usually lacks context.
- Faster Onboarding - You can incorporate this onboarding process to give new employees a bird's eye view of the services owned by the team.
- Taking stock of a new Team or Product - perhaps you're the leader of a new team, or you just inherited some new Products and Services? Running through this exercise as a team can be useful in building a sense of shared understanding.
There are more precise ways to automate the gathering and reporting of this information. However you collect the data, I've found reviewing this regularly across your organization helps create a shared understanding for work that is often invisible. In turn, this can help ensure you're employing sustainable engineering practices that will set your team and business up for success in the long term.