Reshaping :

Stack

In [1]:
import numpy as np
import pandas as pd
In [9]:
tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
                         'foo', 'foo'],
                        ['one', 'two', 'one', 'two',
                         'one', 'two']]))
In [10]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
In [11]:
df = pd.DataFrame(np.random.randn(6, 2), index=index, columns=['M', 'N'])
In [12]:
df2 = df[:4]
In [13]:
df2
Out[13]:
M N
first second
bar one -1.043213 -0.491027
two -0.646234 -2.849286
baz one 0.731973 0.637868
two -1.012989 0.554054

The stack() method “compresses” a level in the DataFrame’s columns.

In [14]:
stacked = df2.stack()
In [15]:
stacked
Out[15]:
first  second   
bar    one     M   -1.043213
               N   -0.491027
       two     M   -0.646234
               N   -2.849286
baz    one     M    0.731973
               N    0.637868
       two     M   -1.012989
               N    0.554054
dtype: float64
In [16]:
stacked.unstack()
Out[16]:
M N
first second
bar one -1.043213 -0.491027
two -0.646234 -2.849286
baz one 0.731973 0.637868
two -1.012989 0.554054
In [17]:
stacked.unstack(1)
Out[17]:
second one two
first
bar M -1.043213 -0.646234
N -0.491027 -2.849286
baz M 0.731973 -1.012989
N 0.637868 0.554054
In [18]:
stacked.unstack(0)
Out[18]:
first bar baz
second
one M -1.043213 0.731973
N -0.491027 0.637868
two M -0.646234 -1.012989
N -2.849286 0.554054

Pivot tables

In [19]:
 df = pd.DataFrame({'M': ['one', 'one', 'two', 'three'] * 2,
                       'N': ['A', 'B'] * 4,
                       'O': ['foo', 'foo', 'bar', 'bar'] * 2,
                       'P': np.random.randn(8),
                       'Q': np.random.randn(8)})
In [20]:
df
Out[20]:
M N O P Q
0 one A foo 0.596137 0.462104
1 one B foo 0.598470 0.012078
2 two A bar 0.223138 -1.541724
3 three B bar 1.573414 -0.468205
4 one A foo -0.562009 -0.893717
5 one B foo -1.022035 -0.879408
6 two A bar 1.061792 1.702140
7 three B bar 0.109434 1.156828

You can produce pivot tables from this data very easily:

In [23]:
pd.pivot_table(df, values='P', index=['M', 'N'], columns=['O'])
Out[23]:
O bar foo
M N
one A NaN 0.017064
B NaN -0.211783
three B 0.841424 NaN
two A 0.642465 NaN