Data engineering career advice from Azure Databricks expert

Written by

Kavya Kumari

Senior Data Engineer at Tenjumps

"I found myself enjoying things like building pipelines, handling messy data, and figuring out why something broke at 12 AM."

That's how Kavya Kumari, Senior Data Engineer at Tenjumps, describes what pulled her into data engineering. It wasn't the tools or the technology stack. It was solving real problems when systems broke.

Kavya brings over three years of hands-on experience building ETL pipelines (Extract, Transform, Load processes that move data between systems), integrating Azure data services, and optimizing Spark jobs. She recently joined Tenjumps after spending nearly three years at UST, where she automated data ingestion pipelines and built scalable solutions using Azure Databricks.

In this Q&A, Kavya shares practical data engineering career advice, common ETL mistakes, and what surprised her most about working with Databricks.

From developer to data engineer

What made you choose data engineering as a career?

I actually started working as a developer and slowly moved into data work over time. I found myself enjoying things like building pipelines, handling messy data, and figuring out why something broke at 12 AM more than writing pure application code.

The work felt closer to real business problems. That gradual shift is what naturally pulled me into data engineering.

The most common ETL mistake

What's the most common ETL mistake you see people make?

Making the pipeline too complicated before really understanding the data.

People jump straight into fancy tools without understanding the data or the business use case, and assume the source data is "good enough," which it almost never is.

Then duplicates, missing values, or schema issues show up downstream, and everyone starts firefighting. A little upfront data validation and simpler design save a lot of pain later.

"What really matters is knowing how data is stored, how it moves through systems, how real-time data works, and how pipelines are scheduled and monitored. Once you get those fundamentals, tools become helpers—not the goal."

How to debug a failed pipeline

Walk me through how you approach debugging a failed pipeline.

Check pipeline/job run status
Open failed activity or task
Read error message and logs
Decide if it's an infrastructure issue or data issue
Fix root cause

Favorite Azure Data Factory feature

What's your favorite Azure Data Factory feature and why?

Personally, I like the Copy Activity along with the visual workflow in Azure Data Factory (ADF), Microsoft's cloud-based ETL and data integration service.

It sounds basic, but Copy Activity does a lot of the heavy lifting—handling schema mapping, compression, and different connectors, so you don't have to write complex queries just to move data.

When you're moving data between systems, it saves a lot of time. On top of that, being able to see the whole pipeline visually and reuse it with parameters across environments makes orchestration feel way less scary.

Clean, visual pipelines also make debugging much easier later.

What surprised her about Databricks

What surprised you most when you started working with Databricks?

Coming from tools like ADF, the lack of a visual flow surprised me at first.

But once you start using it, you realize how powerful it is. Data engineers, data scientists, and analysts can all work together in the same notebooks.

On top of that, features like Delta Lake and built-in governance make it feel much more production-ready. It's less about dragging boxes and more about building scalable data systems.

Data engineering career advice for beginners

What advice would you give someone just starting in data engineering?

Mastering data engineering isn't really about tools. It's about understanding how the whole data ecosystem works.

Most people think becoming a data engineer means learning a bunch of tools, but that's a myth. What really matters is knowing:

How data is stored
How it moves through systems
How real-time data works
How pipelines are scheduled and monitored

Once you get those fundamentals, tools like Spark, ADF, or Databricks (a unified data analytics platform) become helpers, not the goal.

About the author

Kavya Kumari is a Senior Data Engineer at Tenjumps, specializing in ETL pipelines, cloud data integration, and performance optimization. She has hands-on experience with Azure Databricks, Spark, SQL, and Python, focusing on automating workflows and enhancing data accessibility.

Her recent work includes:

Automated data ingestion pipeline: Designed and implemented an ETL pipeline using Python and SQL to extract real-time data from an API, transform it using pandas/Spark, and load it into a PostgreSQL database, fully automating data ingestion and eliminating 100% of manual efforts.

Scalable ETL pipeline with Azure Databricks: Built a high-performance ETL pipeline using Azure Databricks Workflows, fully automating data ingestion, transformation, and loading, improving data availability by 100% and reducing pipeline execution time.

Cloud data integration with Azure: Integrated Azure Data Lake Storage (ADLS) and Azure Blob Storage with Databricks and Spark, performing large-scale data transformations and loading curated datasets into Azure Synapse Analytics/SQL Database, enhancing query performance and analytics capabilities.

Performance optimization and cost efficiency: Enhanced pipeline efficiency by optimizing Spark jobs, partitioning large datasets, and implementing caching strategies, reducing processing time by up to 50% and improving cost efficiency.

Kavya holds a Master of Computer Applications in Computer Science from Ramaiah Institute of Technology and is working toward Databricks certification.