Member-only story

Pandas Merge— The only article that you’ll ever need!

Debanjan Saha
10 min readSep 22, 2022

--

As a Data Analyst or Data Scientist, you will frequently be required to analyze several dataframes or datasets (I will use them interchangeably from now on!) by connecting them together depending on specific keys. This is important because data is stored in modern data warehouses in multiple dimensions and fact tables, each of which serves a specific purpose and contains data pertaining to it.

Sometimes, merging multiple dataframes can be a tedious or complex task, but don’t fret yet! This article will explain all of the different strategies and combinations that you may require when merging two dataframes. Merge internally uses the join functionality and henceforth, we will use these two terms interchangeably.

What is a “Merge” or “Join”?

If you are familiar with any SQL dialect, you are presumably aware of what a join is, but if you are not, it is simply what the term implies — it connects something to something else. Consider it like a Lego set, where you merge many little pieces to form a larger structure. It is not any different.

What are the various types of joins?

There are several sorts of joins, some of which are more complicated and variants of the fundamental ones. For the purpose of simplicity, we will just address the four fundamental forms of joins: left join, right join, inner join, and outer join. As previously stated, there are other different joins such as…

--

--

Debanjan Saha
Debanjan Saha

Written by Debanjan Saha

Trying to solve a variety of issues with an emphasis on computer vision as a budding data scientist, ML engineer, and data engineering veteran.

No responses yet