POWER READ

What to Know Before Building a Data Engineering Team

Gain actionable insights into:

Getting clarity on your business needs before you start hiring
Choosing between open source and managed services
Key qualities your data engineering team should have

A Clear Start

There’s no one size fits all when it comes to building your data engineering team. What you’ll need in your team will depend on the industry you’re in, your company’s size, what stage your company’s at, where you’re headed, and the solution you choose to use. You’ll also need to consider how your data team fits in with the rest of the company, what kind of influence they may have. All these factors are important, and understandably overwhelming.

If you don’t start on the right foot, you’ll only struggle more as you try to expand your team. Before you begin your search, make sure you have specific answers as to where your product and company is headed.

When Do You Need a Data Engineering Team?

The data engineering layout comprises the following stages: source, production, storage, analytics. Generally, the initial and final stages don’t change.

Say your product is an application. The analytics stage measures how users interact with your application. You don’t need a data engineer to collate and analyse this data. With time, as users become more tech savvy, their needs become more complex than what your application can currently offer. This is where data engineers come in: to build layers within the production stage to upgrade the application.

Over time, as the amount of data involved grows, layers would need to be built within the storage stage. Your data engineering team should help you build data lakes, warehouses and marts as well as pipelines to tie the stages together.

New requirements will demand that different layers be built. What needs to be built and the stage your company is on then requires different skills from your team. As you work out where your company is, it’s as important to also define where you want to see your company six months or a year from today. If you’re in a startup, I suggest being clear on monthly goals. This gives you a better sense of the layers that will need to be built, and the skills your team would need to build them.

The ‘Right’ Solution For You

Even if you have your mind made up about what solutions you’ll use, don’t skip this section. I think it’s good to review your options. At worst, you’ll get confirmation that you’re on the right track. And if not, you may learn of alternative solutions that may now be a better fit for your needs. For instance, the price of open source solutions have fallen even more and they’re now almost as good as managed services. If making the switch makes sense for you, you could cut costs significantly.

At this stage, you should also have defined your business needs and have a clear idea of how your company should grow. You’ll then explore options to see if open source or managed solutions may be a better fit for your requirements.

If you’re really not sure where to start, select an existing prominent use case and test the different solutions. Your endpoint can be something as simple as: how many users are likely going to drop their subscription next month? To determine this, will you need data in real time? Or should you know it one week in advance and load it incrementally? Add these details to the use case and explore which solutions out there can help you get to your endpoint.

Most companies that offer managed services offer a pilot for about a month. If you’re a smaller company, this service is usually free. If you’re with a larger company, you should expect to pay at most a few thousand dollars for cloud resources, given your vast amounts of data. The thing to keep in mind is not to get locked in.

If your company already has data engineers on board, great. Let them test the solution to see how comfortable they are with it. If you’re working with new hires, use this to evaluate if they have the required skills or not.

As you test these solutions, consider your overall budget as well. I work mainly with open source, but have taken on a few managed services simply because it’s more cost effective. If managed services can perform a task that your engineers take five days to complete for a relatively low price, go for it. Conversely, if a managed service is ridiculously expensive, you may then be better off hiring a good engineer to take care of it instead. In addition to costs, your solution should allow you to scale your current needs by 10 to 20 times.

After you’ve identified the solution, the skills you need to look for when hiring your data engineering team would also be clearer. If you can’t find anyone with these skills within your geographic market, expand your search. Engineers from countries like Indonesia, Malaysia and Singapore generally use managed services such as AWS and Google Cloud that don’t require them to be as tech savvy as those who use open source tech stacks. As one of my projects required a data engineer skilled in using the latter, I hired data engineers from Vietnam and India to work remotely with my team instead.

Whatever your choice, I highly recommend that at least one person on your team be an expert in data engineering. Even if this doesn’t seem all that necessary with the managed service you’ve chosen. If something goes wrong, it’s way better to have someone on your team validate what might be happening instead of waiting in line for your support ticket to be addressed.

Plan & Search

Now that you’re clear on your business needs, pinpoint the exact skills your team should have. Think of your data engineering team in these terms, rather than designations, which can be quite fluid. Most of the people you’re looking for are literally data engineers, people who have worked on databases, data warehouses and understand data pipelines.

This said, if you’re new to the field and have been tasked to screen applications, it’s likely you’ll see applicants who are DevOps engineers. Their work may seem similar to data engineers, but where data engineers work to optimise solutions, DevOps engineers prioritise usability. It’s a fine line, but a distinction that it’s important to make because your goal is optimisation. As much as possible, make this clear during the interview.

A quick note on planning: When I first started building my team, I was under the impression that getting the right talent to be ready to work on production independently would take two to three months. This assumption was wrong and threw off the timelines I had envisioned.

Instead, hiring and onboarding the right talent took about three to six months. Depending on the company, this can extend to up a whole year. In setting aside time as you make plans to build your team, note that finding and recruiting candidates takes at least two months. Once they’re hired, they’ll need time to build a strong understanding of your business, of the stacks they’ll be working on, which typically takes 30 to 60 days. If the person you hire isn’t a good fit, this cycle restarts.

These milestones are important to bear in mind, even if you need everything to go much quicker. They help you to form more realistic expectations of what you, and your new hire, can actually achieve.

With the interview, you’re also trying to verify that their skills listed on the resumes are truly in line with what you’re looking for. In addition, you also need your data engineering team to understand the ‘why’ behind the work they’ve done. Namely, if they understand the logic behind their decisions and how their work helps your product and, in turn, your business. So get them to share their previous use cases and experiences, ideally demonstrating the layers you hope to build. How they frame their answers will help you assess if they do indeed have these qualities.

Logic > Skills

In building your team, it’s likely that you’ll review candidates who have spent ten years building expertise in their field. Let’s say you’re looking for someone who has been working with open source software. They may have been conditioned to a certain way of working, and that’s okay. What you need to worry about, and use the interview to find out, is why they’ve chosen this way to work. Is it simply out of habit? Or do they actually understand the logic behind the data source? The latter ensures that your company isn’t restricted to a specific tool.

Logic is also something you need to assess when speaking to candidates with “trending” skills. I don’t mean to discount their skills in any way. Only to caution that if the logic they have behind these skills is not sound, then they won’t be of much help to you and your team. New tools and technologies are rolled out almost every other week – having a firm grasp of the logic will allow you to adapt and use different technologies. Everything else is a bonus.

Understanding Business Needs

In working hard to optimise solutions, it’s easy enough to get tunnel vision and focus solely on the task at hand. Your data engineers need to be able to take a wider view and see how the task at hand integrates with the rest of the business. They need to be able to explain what it means for the business if they achieve optimisation.

If a candidate starts to speak about a use case from an extremely technical standpoint, consider this a red flag. It’s usually a warning sign that they don’t have the basic business knowledge your engineers should have. Someone else would then need to handhold them, or translate business needs as they evolve to them. There are far better ways to spend your resources.

By understanding the business, data engineers also have a better understanding of their role within the company. For example, in bigger companies their role would focus more on maintaining existing solutions. In start-ups and smaller companies, data engineers that come in need to understand that the infrastructure may not be set. That it isn’t like an assembly line where the product is considered ‘finished’ once they’ve built the code. They need to be prepared to build new pipelines as your product evolves. Of course, this needs to be something that you communicate clearly. But if candidates are going to truly grasp the reality of your expectations, they need to also understand the business.

Actionable Steps To Take in 24 Hours

1 Define Clear Business Needs

What does your company need? Where is it headed? What are your immediate priorities? As a data team, where do you stand in your company’s hierarchy? Different instractures, phases within the product cycle and goals demand different talents. Take time to get clear on where you are and where you want to be. Then, identify the skills, experiences and people (not titles) you need to get you there.

2 Learn Interview Warning Signs

It’s key, especially if you’re with a start-up, that the data engineers on your team understand business needs and the logic behind their work. To assess this, you need to see if the candidates you’re interviewing understand the ‘why’ behind their past work. Do they start sharing use cases from an extremely technical standpoint? Are they offering vague explanations to explain how they approached the case? The latter is a red flag.

3 Set Realistic Expectations

Building a new team takes at least six months – get real with what these milestones look like, even if this seems like a luxury. Also, be honest about what the role can really offer and if it does, indeed, match what the candidate wants from the role. Don’t make expensive mistakes simply because you skipped this step.