Merge multiple subject Excel grade files using Python + Pandas
Suppose you have several Excel sheets with grades for different subjects. Each file contains the information of the same classmates, but each Excel stores the grades of different subjects. Now we want to integrate the scores of each subject of the same class, we use Python's Pandas suite with the DataFrame Merge function to achieve it.
Development environment: We will use Anaconda with Jupyter Notebook, if you use other tools, such as Spyder, VSCode, PyCharm, etc., it is also possible.
And the Anaconda I used in the demonstration has Pandas installed. In addition, Jupyter Notebook can also provide a more readable visualization environment when dealing with tabular data, so I will use Anaconda with Jupyter Notebook to explain.
First, I prepared five Excel files. In each Excel file, there are graded grades and total grades of subjects.
For example, the following is the data of Chinese .xlsx, and the rest of the subjects have the same structure.
Our goal is to integrate the total scores of each subject in these Excel files into one file. as follows:
For the sake of convenience, we prepare a files list and record the names of each file, which will be read in the program for convenience.
# A list of .xlsx files to merge files = [ "Chinese", "English", "math", "physics", "Chemical", ]
Note: If you want to read all Excel files more dynamically in the future, consider using glob
.
load first file
With pd.read_excel, Excel files can be imported into DataFrames for further processing. The Excel of each subject has its own "Total Score" column, and the total score column of each subject can be renamed to each subject name through rename
.
In the end, we only need the two fields of student number and subject total grade. So use df[["學生編號", files[0]]]
to get it.
df = pd.read_excel(files[0] + ".xlsx") # Rename the total grade to the subject name, otherwise all subjects will be called "total grade" df = df.rename(columns={ "Total Score": files[0] }) # We only take the student number and the original total score. df = df[["student number", files[0]]] df
load second file
Then we load the second file and do similar processing.
# Load second file df2 = pd.read_excel(files[1] + ".xlsx") df2 = df2.rename(columns={ "Total Score": files[1] }) df2 = df2[["student number", files[1]]] df2
The code for each file will be similar in the future, so we will eventually use a for-loop to achieve this. But before that, we manually process these two files before copying to all files after success.
Integrate two files
We use the merge function of the DataFrame to integrate, which uses on
to specify the field on which the two DataFrames are based. Since the amount of data on both sides is the same, they are records of the same class of students, so there is no need to use how
to specify left/right/innter/outer.
# Use DataFrame Merge to integrate df and df2 two sets of data df_result = df.merge(df2, on="student number")
After merging, we arrive at the following data.
Merge all Excel files
With the above experience of merging two files, we turn df2 of df and df2 into reading each file in turn and merging.
for topic in files[1:]: df2 = pd.read_excel(topic + ".xlsx") df2 = df2.rename(columns={ "Total Score": topic }) df2 = df2[["student number", topic]] df = df.merge(df2, on="student number") df = df.round() df["student number"] = df["student number"].astype(int) df = df.set_index("student number") df
resulting in the following dataset.
final code
Putting all the code together gives the final code:
import numpy as np import pandas as pd files = [ "Chinese", "English", "Math", "Physics", "Chemistry" ] # The first file topic = files[0] df = pd.read_excel(topic + ".xlsx") df = df.rename(columns={ "Total score": topic }) df = df[["student number", topic]] # All other files for topic in files[1:]: df2 = pd.read_excel(topic + ".xlsx") df2 = df2.rename(columns={ "Total Score": topic }) df2 = df2[["student number", topic]] df = df.merge(df2, on="student number") df = df.round() df["student number"] = df["student number"].astype(int) df = df.set_index("student number") df.to_excel("result.xlsx")
Finally, put on YouTube's demonstration demonstration:
— Makzan by Mai Mai , 2022-06-23.
Like my work? Don't forget to support and clap, let me know that you are with me on the road of creation. Keep this enthusiasm together!
- Author
- More