The 80/20 rule in data engineering: Why maintenance is the real work

Written by

Paleti Lakshmikanth

Data Engineering

After five years working across data engineering and analytics teams, one pattern shows up everywhere: most data engineering time is not spent building new pipelines.

Only about 20–30% of the time goes into creating new data jobs, transformations, or models. The other 70–80% is spent maintaining what already exists.

The 80/20 rule in software development is well known. 
In data engineering, the ratio often runs the other way and most team plans haven't caught up.

The 80/20 rule in software development is well known. 
In data engineering, the ratio often runs the other way and most team plans haven't caught up.

What 'maintenance' actually means for data engineering teams

Maintenance isn't one thing. It covers a wide range of work that rarely makes it onto a roadmap but never stops arriving in the queue:

Fixing broken pipelines caused by upstream schema changes or source system updates
Diagnosing and resolving data quality issues before they reach reporting
Improving pipeline performance as data volumes grow
Updating tools, libraries, and dependencies to stay current and secure
Answering questions from analytics and business teams about data accuracy, definitions, and freshness

None of this work is optional. And as data grows and more teams build on top of it, this operational surface area keeps expanding.

Pipelines stop being 'nice to have' and become business-critical. When they fail, someone notices immediately and the expectation is that they should never fail at all.

Why this creates constant stress for data teams

Many data engineering teams are still structured and evaluated like feature teams, judged primarily on how many new pipelines they ship. But when the majority of the actual work is operational, that framing creates an impossible situation.

Engineers are expected to build new things while simultaneously keeping existing systems running. Both demands are treated as equally urgent. Neither can be deprioritized without consequences.

Over time, this produces a predictable pattern of problems:

Pipelines become fragile because there is no dedicated time to refactor or harden them
On-call burden increases as more critical systems mean more incidents
New development slows because engineers are context-switching between operational firefighting and project work
Engineers burn out and leave, taking institutional knowledge of legacy systems with them

The problem isn't that teams are doing maintenance wrong. 
It's that they're being planned and measured as if maintenance barely exists

The problem isn't that teams are doing maintenance wrong. 
It's that they're being planned and measured as if maintenance barely exists

What strong data engineering teams do differently

The teams that handle this well don't try to minimize maintenance work, they plan for it honestly and treat it as real engineering.

Reliability work, monitoring improvements, schema change handling, and data quality checks are not treated as distractions from 'real' engineering. They are engineers. They are what keeps business-critical systems running at the quality and consistency that stakeholders depend on.

This shift in framing matters because it changes what gets resourced, what gets prioritized, and how engineers are evaluated. A team that proactively hardens pipelines, builds robust monitoring, and reduces incident frequency is delivering significant engineering value even if they shipped fewer new pipelines that quarter.

Practical advice for engineering leaders

If you lead a data engineering team, here is where to start:

Plan capacity around the real ratio

Assume 60–70% of engineering time will go to maintenance and reliability work. Build that into your roadmaps, your sprint planning, and your hiring model. If your team is consistently failing to deliver new features, the answer may not be more engineers, it may be that the maintenance load simply hasn't been accounted for.

Assign clear ownership of production pipelines

Every production pipeline should have a named owner who is responsible for its reliability, performance, and documentation. Unowned pipelines become everyone's problem when they break and no one's priority when they need improvement.

Invest in automation, monitoring, and data quality

Time spent building robust monitoring, automated data quality checks, and self-healing pipeline compounds over time. Every alert that fires before a business user notices a problem, and every quality check that catches a schema change before it corrupts a report, is operational work that doesn't become an incident.

Measure operational health alongside delivery

Track pipeline reliability, incident frequency, mean time to resolution, and data quality defect rates alongside new pipeline delivery. If you only measure output, you'll build a team that optimizes for output, and inherits a fragile system in return.

Stable, reliable data systems deliver more long-term value 
than a team that endlessly adds new pipelines on top of an unstable foundation

Stable, reliable data systems deliver more long-term value 
than a team that endlessly adds new pipelines on top of an unstable foundation

For more helpful advice, schedule a consultation with a member of our data engineering team today.

Frequently Asked Questions

How much time should a data engineering team realistically spend on maintenance vs. new development?

Based on experience across multiple teams, a realistic baseline is 60–70% maintenance and 30–40% new development. The exact ratio depends on the maturity of your data platform, newer platforms skew toward development, while more established ones with larger pipeline inventories skew heavily toward maintenance. The mistake most leaders make is planning as if the ratio is the reverse, then wondering why new project timelines keep slipping.

How do I make the case to leadership that maintenance work deserves dedicated capacity?

Quantify what unplanned maintenance actually costs. Track how many engineering hours per month go to incident response, ad hoc data quality investigations, and emergency pipeline fixes. Then compare that to what structured investment in monitoring, automation, and reliability tooling would cost. In most cases, the reactive cost far exceeds the proactive investment, and that story is easy to tell in business terms. Stable data systems also directly affect the speed and confidence of business decision-making, which connects maintenance investment to outcomes leadership already cares about.

What's the difference between a fragile data pipeline and a resilient one and how do I know which I have?

A fragile pipeline requires human intervention to recover from routine events: upstream schema changes, late-arriving data, source system downtime, or volume spikes. A resilient pipeline handles these gracefully either automatically or by failing cleanly with a clear alert and a known recovery path. To assess your current state, look at your incident log for the past 90 days. If the same types of failures recur, if incidents are discovered by business users rather than your monitoring, or if recovery requires tribal knowledge rather than runbooks, your pipelines are fragile. That's not a judgment, it's a starting point.

Should data engineering teams have on-call rotations?

Yes, for any team managing business-critical pipelines but on-call should be an engineering practice, not a tax on the team's development capacity. If on-call hours are consistently high, that's a signal that the reliability investment is too low, not that the team needs to staff more engineers into a broken rotation. The goal is to make on-call boring: infrequent, well-documented, and resolvable without heroics. That requires investment in monitoring, runbooks, and automated recovery which brings the conversation back to treating reliability as real engineering work.