Member-only story

Top 5 Pandas EDA Value Replacement Techniques

Debanjan Saha
4 min readSep 24, 2022

--

How to replace certain (multiple) values or a range of values with another (multiple) value in a Pandas dataframe?

The majority of EDA practitioners deal with situations like these on a regular basis. Consider the following scenario: Let’s imagine we want to replace certain values (or a range of values) in some columns. There are several ways to accomplish this, and we will go through them all in this post. Additionally, there are more complex situations where you must replace numerous ranges of values with various sets of values within the same column or across multiple columns.

This technique of replacing data based on its values is also known as encoding, in which various categorical variables can be converted into booleans (1s and 0s) for machine learning applications, or numeric variables can be converted into categorical variables or strings for data analytics, and so on.

Suppose, we have a dataframe where the manufacturer name is ‘Hewlett-Packard’ and we want to replace this with the abbreviation ‘HP’. Let’s see how can be done in Pandas.

1. Using Pandas loc() function

This is the most simple and easiest operation where first we filter out the dataset based on the condition (in this case: manufacturer name is ‘Hewlett-Packard’) which we…

--

--

Debanjan Saha
Debanjan Saha

Written by Debanjan Saha

Trying to solve a variety of issues with an emphasis on computer vision as a budding data scientist, ML engineer, and data engineering veteran.

No responses yet

Write a response