There’s no one size fits all when it comes to building your data engineering team. What you’ll need in your team will depend on the industry you’re in, your company’s size, what stage your company’s at, where you’re headed, and the solution you choose to use. You’ll also need to consider how your data team fits in with the rest of the company, what kind of influence they may have. All these factors are important, and understandably overwhelming.
If you don’t start on the right foot, you’ll only struggle more as you try to expand your team. Before you begin your search, make sure you have specific answers as to where your product and company is headed.
The data engineering layout comprises the following stages: source, production, storage, analytics. Generally, the initial and final stages don’t change.
Say your product is an application. The analytics stage measures how users interact with your application. You don’t need a data engineer to collate and analyse this data. With time, as users become more tech savvy, their needs become more complex than what your application can currently offer. This is where data engineers come in: to build layers within the production stage to upgrade the application.
Over time, as the amount of data involved grows, layers would need to be built within the storage stage. Your data engineering team should help you build data lakes, warehouses and marts as well as pipelines to tie the stages together.
New requirements will demand that different layers be built. What needs to be built and the stage your company is on then requires different skills from your team. As you work out where your company is, it’s as important to also define where you want to see your company six months or a year from today. If you’re in a startup, I suggest being clear on monthly goals. This gives you a better sense of the layers that will need to be built, and the skills your team would need to build them.
Even if you have your mind made up about what solutions you’ll use, don’t skip this section. I think it’s good to review your options. At worst, you’ll get confirmation that you’re on the right track. And if not, you may learn of alternative solutions that may now be a better fit for your needs. For instance, the price of open source solutions have fallen even more and they’re now almost as good as managed services. If making the switch makes sense for you, you could cut costs significantly.
At this stage, you should also have defined your business needs and have a clear idea of how your company should grow. You’ll then explore options to see if open source or managed solutions may be a better fit for your requirements.
If you’re really not sure where to start, select an existing prominent use case and test the different solutions. Your endpoint can be something as simple as: how many users are likely going to drop their subscription next month? To determine this, will you need data in real time? Or should you know it one week in advance and load it incrementally? Add these details to the use case and explore which solutions out there can help you get to your endpoint.
Most companies that offer managed services offer a pilot for about a month. If you’re a smaller company, this service is usually free. If you’re with a larger company, you should expect to pay at most a few thousand dollars for cloud resources, given your vast amounts of data. The thing to keep in mind is not to get locked in.
If your company already has data engineers on board, great. Let them test the solution to see how comfortable they are with it. If you’re working with new hires, use this to evaluate if they have the required skills or not.
As you test these solutions, consider your overall budget as well. I work mainly with open source, but have taken on a few managed services simply because it’s more cost effective. If managed services can perform a task that your engineers take five days to complete for a relatively low price, go for it. Conversely, if a managed service is ridiculously expensive, you may then be better off hiring a good engineer to take care of it instead. In addition to costs, your solution should allow you to scale your current needs by 10 to 20 times.
After you’ve identified the solution, the skills you need to look for when hiring your data engineering team would also be clearer. If you can’t find anyone with these skills within your geographic market, expand your search. Engineers from countries like Indonesia, Malaysia and Singapore generally use managed services such as AWS and Google Cloud that don’t require them to be as tech savvy as those who use open source tech stacks. As one of my projects required a data engineer skilled in using the latter, I hired data engineers from Vietnam and India to work remotely with my team instead.
Whatever your choice, I highly recommend that at least one person on your team be an expert in data engineering. Even if this doesn’t seem all that necessary with the managed service you’ve chosen. If something goes wrong, it’s way better to have someone on your team validate what might be happening instead of waiting in line for your support ticket to be addressed.
To view the full content, sign up for a free account and unlock 3 free podcasts, power reads or videos every month.
Head of Strategy and Data | Former Head of Data