Pandas is a wonderful data manipulation library in python. Working in the field of Data science and Machine learning, I find myself using Pandas pretty much everyday. It's an invaluable tool for data analysis and manipulation.
In this short article, I will show you what Melt and Pivot (Reverse melt or Unmelt) are in Pandas, and how you can use them for reshaping and manipulating data frames.
Say, I have the data of the closing prices of stock market data of stock market closing prices of two major companies for the last week as follows:
Day | Apple | |
---|---|---|
MON | 1129 | 191 |
TUE | 1132 | 192 |
WED | 1134 | 190 |
THU | 1152 | 190 |
FRI | 1152 | 188 |
For an analysis I want to do I need the names of the companies Google & Apple to appear in a single column with the stock price as another column, something like this:
Day | Company | Closing Price |
---|---|---|
MON | 1129 | |
TUE | 1132 | |
WED | 1134 | |
THU | 1152 | |
FRI | 1152 | |
MON | Apple | 191 |
TUE | Apple | 192 |
WED | Apple | 190 |
THU | Apple | 190 |
FRI | Apple | 188 |
This is exactly where Melt
comes into picture. Melt is used for converting multiple columns into a single column, which is exactly what I need here.
Let's see how we can do this.
Melt
First we need to import pandas.
import pandas as pd
Then, we'll create the Dataframe with the data.
df = pd.DataFrame(data = {
'Day' : ['MON', 'TUE', 'WED', 'THU', 'FRI'],
'Google' : [1129,1132,1134,1152,1152],
'Apple' : [191,192,190,190,188]
})
And this will get us the dataframe we need as follows:
Day | Apple | ||
---|---|---|---|
0 | MON | 1129 | 191 |
1 | TUE | 1132 | 192 |
2 | WED | 1134 | 190 |
3 | THU | 1152 | 190 |
4 | FRI | 1152 | 188 |
Let's melt this now. To melt this dataframe, you call the melt()
method on the dataframe with the id_vars
parameter set.
reshaped_df = df.melt(id_vars=['Day']) # id_vars is the column you do not want to change
And you're done. Your reshaped_df would like this now.
Day | variable | value | |
---|---|---|---|
0 | MON | Apple | 191 |
1 | TUE | Apple | 192 |
2 | WED | Apple | 190 |
3 | THU | Apple | 190 |
4 | FRI | Apple | 188 |
5 | MON | 1129 | |
6 | TUE | 1132 | |
7 | WED | 1134 | |
8 | THU | 1152 | |
9 | FRI | 1152 |
The id_vars
you've passed into the melt() method is to specify which column you want to leave untouched. Since we want the Day column to stay the same even after the melt, we set id_vars=['Day']
.
Also, you would have noticed that the output dataframe of melt has the columns variable
and value
. These are the default names given by pandas for the columns. We can change this either manually with something like
reshaped_df.columns = [['Day', 'Company', 'Closing Price']]
Or, we can specify the values for these columns in the melt()
itself. Melt takes arguments var_name
and value_name
apart from id_vars
. These options specify the names for the variable
column and the value
column respectively.
reshaped_df = df.melt(id_vars=['Day'], var_name='Company', value_name='Closing Price')
That will give us:
Day | Company | Closing Price | |
---|---|---|---|
0 | MON | Apple | 191 |
1 | TUE | Apple | 192 |
2 | WED | Apple | 190 |
3 | THU | Apple | 190 |
4 | FRI | Apple | 188 |
5 | MON | 1129 | |
6 | TUE | 1132 | |
7 | WED | 1134 | |
8 | THU | 1152 | |
9 | FRI | 1152 |
Unmelt/Reverse Melt/Pivot
We can also do the reverse of the melt operation which is also called as Pivoting. In Pivoting or Reverse Melting, we convert a column with multiple values into several columns of their own.
The pivot()
method on the dataframe takes two main arguments index
and columns
. The index
parameter is similar to id_vars
we have seen before i.e., It is used to specify the column you don't want to touch. The columns parameter is to specify which column should be used to create the new columns.
reshaped_df.pivot(index='Day', columns='Company')
Running the above command gives you the following:
+---------+-----------------------+
| | Closing Price |
+=========+:=============:+:=====:+
| Company | Google | Apple |
+---------+---------------+-------+
| Day | | |
+---------+---------------+-------+
| MON | 1129 | 191 |
+---------+---------------+-------+
| TUE | 1132 | 192 |
+---------+---------------+-------+
| WED | 1134 | 190 |
+---------+---------------+-------+
| THU | 1152 | 190 |
+---------+---------------+-------+
| FRI | 1152 | 188 |
+---------+---------------+-------+
# (Showing in textual format as multi-level columns are not posible in Markdown)
This is close, but probably not exactly what you wanted. The Closing Price
is an extra stacked column (index) on top of Google & Apple. So to get exactly the reverse of melt and get the original df
dataframe we started with, we do the following:
original_df = reshaped_df.pivot(index='Day', columns='Company')['Closing Price'].reset_index()
original_df.columns.name = None
And that gets us back to what we have started with.
Day | Apple | |
---|---|---|
MON | 1129 | 191 |
TUE | 1132 | 192 |
WED | 1134 | 190 |
THU | 1152 | 190 |
FRI | 1152 | 188 |
That is all for this article. I hope this was useful for you and that you'll try to use this in your data processing workflow.
For more programming articles, checkout Freblogg, Freblogg/Java, Freblogg/Spark
Thanks for reading. See you again in the next article.