Feature engineering is the most vital part for making good Machine Learning models. Handling missing data is the most basic step in feature engineering. Missing data can completely mess up your models, so it has to be handled properly for creating good machine learning models. Here I’m going to explain multiple methods to handle missing data for different scenarios.

What is missing data?

Missing data occurs when features/columns of a record/row have not been recorded. Missing data results in incomplete records that may impact the performance of the machine learning model created using this data.

Python has a bad reputation for being slow compared to optimized C. But when compared to C, Python is very easy, flexible and has a wide variety of uses. So how do you combine flexibility of Python with the speed of C. This is where packages known as Pandas and Numpy come in. If you have done any sort of data analysis or machine learning using python, I’m pretty sure you have used these packages. They make it very convenient to deal with huge datasets.


Hello fellow Developers, my name's Pranoy. I'm a 24 year old programmer living in Kerala, India.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store