w3resource

Pandas: Joining columns on columns (potentially a many-to-many join)

Pandas Joining and merging DataFrame: Exercise-12 with Solution

Write a Pandas program to create a combination from two dataframes where a column id combination appears more than once in both dataframes.

Test Data:

data1:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
data2:
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Sample Solution:

Python Code :

import pandas as pd
data1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']}) 
data2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
print("Original DataFrames:")
print(data1)
print("--------------------")
print(data2)
print("\nMerged Data (many-to-many join case):")
result = pd.merge(data1, data2, on='key1')
print(result)

Test Data:

Original DataFrames:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
--------------------
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Merged Data (many-to-many join case):
  key1 key2_x   P   Q key2_y   R   S
0   K0     K0  P0  Q0     K0  R0  S0
1   K0     K1  P1  Q1     K0  R0  S0
2   K1     K0  P2  Q2     K0  R1  S1
3   K1     K0  P2  Q2     K0  R2  S2
4   K2     K1  P3  Q3     K0  R3  S3    

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.
Next: Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

For-else construct useful when searched for something and find it:

# For example assume that I need to search through a list and process each item until a flag item is found and 
# then stop processing. If the flag item is missing then an exception needs to be raised.

for i in mylist:
    if i == theflag:
        break
    process(i)
else:
    raise ValueError("List argument missing terminal flag.")