w3resource

Pandas: Joining columns on columns (potentially a many-to-many join)

Pandas Joining and merging DataFrame: Exercise-12 with Solution

Write a Pandas program to create a combination from two dataframes where a column id combination appears more than once in both dataframes.

Test Data:

data1:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
data2:
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Sample Solution:

Python Code :

import pandas as pd
data1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']}) 
data2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
print("Original DataFrames:")
print(data1)
print("--------------------")
print(data2)
print("\nMerged Data (many-to-many join case):")
result = pd.merge(data1, data2, on='key1')
print(result)

Test Data:

Original DataFrames:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
--------------------
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Merged Data (many-to-many join case):
  key1 key2_x   P   Q key2_y   R   S
0   K0     K0  P0  Q0     K0  R0  S0
1   K0     K1  P1  Q1     K0  R0  S0
2   K1     K0  P2  Q2     K0  R1  S1
3   K1     K0  P2  Q2     K0  R2  S2
4   K2     K1  P3  Q3     K0  R3  S3    

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.
Next: Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

How to sort a Python dict by value

Example:

x1 = {'a': 5, 'b': 7, 'c': 9, 'd': 1}

sorted(x1.items(), key=lambda x: x[1])
[('d', 1), ('c', 9), ('b', 7), ('a', 5)]

# Or:

import operator
print(sorted(x1.items(), key=operator.itemgetter(1)))

Output:

[('d', 1), ('a', 5), ('b', 7), ('c', 9)]