Future of Data Engineering

Note: This post was first published on LinkedIn .

This is probably controversial but Data Engineering as a role will perhaps come under some pressure by 2025.

In the last few years, higher order abstractions and modern data tooling has dramatically lowered the entry barrier to setup and manage data infrastructure. Also, SQL is now enabling complex transformations (including streaming) that earlier required specialized skills in distributed systems. One no longer needs to know how to fine-tune Spark internals to process data. Following the success of cloud Data Warehouses, there is a push towards general purpose processing engines that can perform both interactive and batch loads seamlessly.

In my opinion, The classic needs of data engineering (i.e., ETL pipelining) will see a lack of demand in the coming years.

This is also in line with the growing popularity of Analytics Engineering and a push towards autonomous self-serve teams.

Of course, there will always be glue work to be performed (i.e., stitching all the disparate platform components and subsystems together, CI/CD, platform maintenance and upgrades, access mgmt etc) but this is platform engineering and requires a very different set of skills.

Even so, in the aftermath of a bad economy, it would perhaps be difficult to make a strong case for large DE teams (unless you are caught in the sunk cost of managing legacy ‘bigdata’ infrastructure, but even these will be replaced in most places in the next 3-5 years). Unlike what we saw in the last bull run, I don’t think new scale-ups will opt for fully self-managed, home-grown data platforms. Instead, most data platforms will be built around one of the popular commercial solutions (Snowflake, Databricks, Bigquery etc).

I hope new use cases will emerge that will still require good engineering skills in data.

My advice to DEs would be to acquire more generic technical skills. Learn a wider set of skills, not just in data but also outside (for ex. software engineering in general).

I hope that as data gets used for more operational purposes (beyond Analytics and Data Science), the line between DE and SE will blur.

Also don’t pin yourself down to one specific programming language/tool/framework. That was never prudent anyway but may be an even worse strategy going ahead.