Disclaimer: this is my opinion, not necessarily the one of my employer or any organisation.
There is a clear difference between what I expected of the job, vs what I know now after 2.5 years on the job. Maybe not every item stated below applies to you, but some might.
This might help you understand the points stated below.
I completed my Master in Mathematical Engineering at the KU Leuven (Belgium) focussing on high-perfomance computing and machine learning. …
The official Azure Machine Learning Studio documentation, the Python SDK reference and the notebook examples are often out-of-date, or don’t cover all important aspects, or don’t provide a compelling end-to-end example. This guide is an attempt to cover the necessary basics, hopefully accelerating you in building a machine learning pipeline on Azure.
The recent explosion of tools including task and data orchestration tools should make you wonder if you’re still doing the right thing. Purely based on Github-stars of the open-source frameworks, Airflow is still the most popular one. This does not take into account the popularity of closed-source, or cloud vendor tools. Understanding where they overlap or differ has been described fairly well by others (this one, or that one).
As companies grow, or as regulations get more strict, or as senior IT architects get up to speed with the latest trends, the need (or obligation) to mitigate privacy and leakage risks get stronger for data processing entities.
Data anonymization or data tokenization techniques are widely used in this context, even though they still allow for the divulgence of private information (see https://mostly.ai/why-synthetic-data/ for an easy explanation on why this is).
Synthetic data is fundamentally different. The goal is to come up with a data generator that shows the same global statistics as the original data. …
First, I’m going to assume that you have chosen a cloud service provider (CSP), or in the position to choose one for your organisation. Secondly, I’m also assuming that you need to be able to build, train, tune, evaluate and deploy a machine learning model, then the first thing you are most likely to do is check out the ML platform of your CSP of choice. Or should you look at all those third-party vendors? How to compare?
Let’s look at what actually matters, namely, the bigger picture.
In each ML or even data science project, there are two phases…
Data Engineer at Data Minded BE