w3resource

Pandas: Join two dataframes along rows and merge with another dataframe along common id

Pandas Joining and merging DataFrame: Exercise-5 with Solution

Write a Pandas program to join the two given dataframes along rows and merge with another dataframe along the common column id.

Test Data:

student_data1:
  student_id              name  marks
0         S1  Danniella Fenton    200
1         S2      Ryder Storey    210
2         S3      Bryce Jensen    190
3         S4         Ed Bernal    222
4         S5       Kwame Morin    199
student_data2:
  student_id              name  marks
0         S4  Scarlette Fisher    201
1         S5  Carla Williamson    200
2         S6       Dante Morse    198
3         S7    Kaiser William    219
4         S8   Madeeha Preston    201
exam_data:
   student_id  exam_id
0          S1       23
1          S2       45
2          S3       12
3          S4       67
4          S5       21
5          S7       55
6          S8       33
7          S9       14
8         S10       56
9         S11       83
10        S12       88
11        S13       12

Sample Solution:

Python Code :

import pandas as pd
student_data1 = pd.DataFrame({
        'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
         'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'], 
        'marks': [200, 210, 190, 222, 199]})

student_data2 = pd.DataFrame({
        'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
        'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'], 
        'marks': [201, 200, 198, 219, 201]})

exam_data = pd.DataFrame({
        'student_id': ['S1', 'S2', 'S3', 'S4', 'S5', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13'],
        'exam_id': [23, 45, 12, 67, 21, 55, 33, 14, 56, 83, 88, 12]})

print("Original DataFrames:")
print(student_data1)
print(student_data2)
print(exam_data)

print("\nJoin first two said dataframes along rows:")
result_data = pd.concat([student_data1, student_data2])
print(result_data)

print("\nNow join the said result_data and df_exam_data along student_id:")
final_merged_data = pd.merge(result_data, exam_data, on='student_id')
print(final_merged_data)

Sample Output:

 Original DataFrames:
  student_id              name  marks
0         S1  Danniella Fenton    200
1         S2      Ryder Storey    210
2         S3      Bryce Jensen    190
3         S4         Ed Bernal    222
4         S5       Kwame Morin    199
  student_id              name  marks
0         S4  Scarlette Fisher    201
1         S5  Carla Williamson    200
2         S6       Dante Morse    198
3         S7    Kaiser William    219
4         S8   Madeeha Preston    201
   student_id  exam_id
0          S1       23
1          S2       45
2          S3       12
3          S4       67
4          S5       21
5          S7       55
6          S8       33
7          S9       14
8         S10       56
9         S11       83
10        S12       88
11        S13       12

Join first two said dataframes along rows:
  student_id              name  marks
0         S1  Danniella Fenton    200
1         S2      Ryder Storey    210
2         S3      Bryce Jensen    190
3         S4         Ed Bernal    222
4         S5       Kwame Morin    199
0         S4  Scarlette Fisher    201
1         S5  Carla Williamson    200
2         S6       Dante Morse    198
3         S7    Kaiser William    219
4         S8   Madeeha Preston    201

Now join the said result_data and df_exam_data along student_id:
  student_id              name  marks  exam_id
0         S1  Danniella Fenton    200       23
1         S2      Ryder Storey    210       45
2         S3      Bryce Jensen    190       12
3         S4         Ed Bernal    222       67
4         S4  Scarlette Fisher    201       67
5         S5       Kwame Morin    199       21
6         S5  Carla Williamson    200       21
7         S7    Kaiser William    219       55
8         S8   Madeeha Preston    201       33          

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to append a list of dictioneries or series to a existing DataFrame and display the combined data.
Next: Write a Pandas program to join the two dataframes using the common column of both dataframes.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

How to sort a Python dict by value

Example:

x1 = {'a': 5, 'b': 7, 'c': 9, 'd': 1}

sorted(x1.items(), key=lambda x: x[1])
[('d', 1), ('c', 9), ('b', 7), ('a', 5)]

# Or:

import operator
print(sorted(x1.items(), key=operator.itemgetter(1)))

Output:

[('d', 1), ('a', 5), ('b', 7), ('c', 9)]