Validate the correctness and performance of machine learning systems through the ML product lifecycle.

Photo by Tolga Ulkan on Unsplash

Table of contents

Testing in machine learning systems

Testing in the software industry is a well-researched and established area. The good practices which have been learned from the countless number of the failed projects help us to release frequently and have fewer opportunities to see defects in production. Industry common practices like CI, test coverage, and TDD are well adopted and tailored for every single project.

However, when we try to borrow the SWE testing philosophy to machine learning areas, we have to solve some unique issues. …

After repeating creating use case-specific and business logic coupled feature engineering code for a couple of years, I am thinking if it is possible to have a model agnostic (both algorithms and frameworks ), production-ready, data scientist friendly way to do feature engineering. I would summarize what we did for the use cases in financial service and healthcare, and propose some best practices (this is not accurate, we’ve seen good and better, but never seen best) to handle common problems in the lifecycle of ML models.

The purpose of feature engineering is to prepare features for ML models to process…

ML projects in the real world. Photo by elCarito on Unsplash

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It has been an important part of computer science for decades. Recently years, with the development of better algorithms, plenty of data, and more powerful computing power, the performance of ML models has been improved in a significant amount, thus ML started more and more contributing to business and industry use cases.

Different from research practice, ML in the industry requires a more standard processing pipeline, more robust experiment analysis, and more affordable deployment, leading to the creation of tools that help companies bring theoretical ML…

In this day and age, more and more organizations would like to have a uniform tool to deal with their tons of features. For a simple POC product, feature governance and linage seem to be overkill, but when it comes to big, complex, and continuously evolving projects, they do prompt high quality of feature and execution efficiency in the following data science projects.

To summarize, why do we need a feature store?

Cited from:

Time series forecasting is a well-studied statistics/ machine learning branch and a common statistical task in business. In the real world, time-series data sometimes need to be combined with other data sources to construct more powerful machine learning models.

In this article, I would like to summarize common ways to combine time-series data and tabular data to complete a machine learning project.

1. Problem analysis

Before diving into methods, let us review the scenarios where we have time-series data and tabular data at the same time.



Config management (to be clarified, not software configuration management)for python application is a common but not trivial task. A good design config management should support a flexible parameter setting and dynamic loading. Here, I would like to summarize the frequently used configuration management strategies. Hopefully, you can find the most suitable methods to help you create applications.

Kevin Du

Data scientist & MLE & SWE

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store