Reshaping Pandas Data frames with Melt & Pivot

Pandas is a wonderful data manipulation library in python. Working in the field of Data science and Machine learning, I find myself using Pandas pretty much everyday. It's an invaluable tool for data analysis and manipulation.

In this short article, I will show you what Melt and Pivot (Reverse melt or Unmelt) are in Pandas, and how you can use them for reshaping and manipulating data frames.

Happy Panda

Say, I have the data of the closing prices of stock market data of stock market closing prices of two major companies for the last week as follows:

Day	Google	Apple
MON	1129	191
TUE	1132	192
WED	1134	190
THU	1152	190
FRI	1152	188

For an analysis I want to do I need the names of the companies Google & Apple to appear in a single column with the stock price as another column, something like this:

Day	Company	Closing Price
MON	Google	1129
TUE	Google	1132
WED	Google	1134
THU	Google	1152
FRI	Google	1152
MON	Apple	191
TUE	Apple	192
WED	Apple	190
THU	Apple	190
FRI	Apple	188

This is exactly where Melt comes into picture. Melt is used for converting multiple columns into a single column, which is exactly what I need here.

Let's see how we can do this.

Melt

First we need to import pandas.

            import pandas as pd

Then, we'll create the Dataframe with the data.

df = pd.DataFrame(data = {
    'Day' : ['MON', 'TUE', 'WED', 'THU', 'FRI'], 
    'Google' : [1129,1132,1134,1152,1152], 
    'Apple' : [191,192,190,190,188] 
})

And this will get us the dataframe we need as follows:

	Day	Google	Apple
0	MON	1129	191
1	TUE	1132	192
2	WED	1134	190
3	THU	1152	190
4	FRI	1152	188

Let's melt this now. To melt this dataframe, you call the melt() method on the dataframe with the id_vars parameter set.

            reshaped_df = df.melt(id_vars=['Day']) # id_vars is the column you do not want to change

And you're done. Your reshaped_df would like this now.

	Day	variable	value
0	MON	Apple	191
1	TUE	Apple	192
2	WED	Apple	190
3	THU	Apple	190
4	FRI	Apple	188
5	MON	Google	1129
6	TUE	Google	1132
7	WED	Google	1134
8	THU	Google	1152
9	FRI	Google	1152

The id_vars you've passed into the melt() method is to specify which column you want to leave untouched. Since we want the Day column to stay the same even after the melt, we set id_vars=['Day'].

Also, you would have noticed that the output dataframe of melt has the columns variable and value. These are the default names given by pandas for the columns. We can change this either manually with something like

reshaped_df.columns = [['Day', 'Company', 'Closing Price']]

Or, we can specify the values for these columns in the melt() itself. Melt takes arguments var_name and value_name apart from id_vars. These options specify the names for the variable column and the value column respectively.

            reshaped_df = df.melt(id_vars=['Day'], var_name='Company', value_name='Closing Price')

That will give us:

	Day	Company	Closing Price
0	MON	Apple	191
1	TUE	Apple	192
2	WED	Apple	190
3	THU	Apple	190
4	FRI	Apple	188
5	MON	Google	1129
6	TUE	Google	1132
7	WED	Google	1134
8	THU	Google	1152
9	FRI	Google	1152

Unmelt/Reverse Melt/Pivot

We can also do the reverse of the melt operation which is also called as Pivoting. In Pivoting or Reverse Melting, we convert a column with multiple values into several columns of their own.

The pivot() method on the dataframe takes two main arguments index and columns. The index parameter is similar to id_vars we have seen before i.e., It is used to specify the column you don't want to touch. The columns parameter is to specify which column should be used to create the new columns.

reshaped_df.pivot(index='Day', columns='Company')

Running the above command gives you the following:

            +---------+-----------------------+
|         |     Closing Price     |
+=========+:=============:+:=====:+
| Company |     Google    | Apple |
+---------+---------------+-------+
|   Day   |               |       |
+---------+---------------+-------+
|   MON   |     1129      |  191  |
+---------+---------------+-------+
|   TUE   |     1132      |  192  |
+---------+---------------+-------+
|   WED   |     1134      |  190  |
+---------+---------------+-------+
|   THU   |     1152      |  190  |
+---------+---------------+-------+
|   FRI   |     1152      |  188  |
+---------+---------------+-------+

# (Showing in textual format as multi-level columns are not posible in Markdown)

This is close, but probably not exactly what you wanted. The Closing Price is an extra stacked column (index) on top of Google & Apple. So to get exactly the reverse of melt and get the original df dataframe we started with, we do the following:

original_df = reshaped_df.pivot(index='Day', columns='Company')['Closing Price'].reset_index()
original_df.columns.name = None

And that gets us back to what we have started with.

Day	Google	Apple
MON	1129	191
TUE	1132	192
WED	1134	190
THU	1152	190
FRI	1152	188

That is all for this article. I hope this was useful for you and that you'll try to use this in your data processing workflow.

For more programming articles, checkout Freblogg, Freblogg/Java, Freblogg/Spark

Thanks for reading. See you again in the next article.

Reshaping Pandas Data frames with Melt & Pivot

Melt

Unmelt/Reverse Melt/Pivot

Published

Category

Tags