Member-only story

Top 10+ Missing Data Imputation Strategies in Pandas

Debanjan Saha
9 min readJan 25, 2023

Is your dataset filled with missing values and you are confused on how to fill in this missing values? Sit tight, as in this article we shall look into the various common strategies one can apply when dealing with missing data.

Introduction

Missing values, also known as missing data, occur when there is no value recorded for a specific variable in a dataset for a particular observation. This can happen for various reasons, such as:

  • Data collection errors: For example, the surveyor might have accidentally skipped a question or the data entry person might have made a mistake while entering the data.
  • Non-response: Respondents might have refused to answer certain questions or dropped out of the survey before it was completed.
  • Measurement errors: The data collection instrument might not have been able to accurately measure the variable of interest.

Missing data can have a significant impact on the analysis of a dataset, as it can lead to biased or inaccurate results. For example, if the missing data is not handled properly, it can lead to a reduction in the sample size, which can affect the power and precision of the analysis. Additionally, if the missing data is not missing completely at random (MCAR), it can introduce bias…

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Debanjan Saha
Debanjan Saha

Written by Debanjan Saha

Trying to solve a variety of issues with an emphasis on computer vision as a budding data scientist, ML engineer, and data engineering veteran.

No responses yet

Write a response