View the top and bottom rows of a data frame:

In [1]:
import numpy as np
import pandas as pd
In [2]:
dates = pd.date_range('20190101', periods=8)
In [3]:
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('PQRS'))
In [4]:
df.head()
Out[4]:
P Q R S
2019-01-01 0.010378 -1.084898 -0.442402 -0.278284
2019-01-02 1.863264 0.626363 0.639877 -0.741764
2019-01-03 -0.727480 -0.473615 0.299163 -2.374019
2019-01-04 -0.114543 1.873071 -0.721372 -0.759984
2019-01-05 0.097422 -0.428159 0.056089 -0.035998

Pandas Head

In [5]:
df.tail(3)
Out[5]:
P Q R S
2019-01-06 -0.030784 0.007262 -0.007135 1.159472
2019-01-07 -0.281074 -0.595091 1.025124 -2.602746
2019-01-08 0.178235 0.038130 0.305273 -1.551300

Pandas Tail

Display the index, columns:

In [6]:
df.index
Out[6]:
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08'],
              dtype='datetime64[ns]', freq='D')
In [7]:
df.columns
Out[7]:
Index(['P', 'Q', 'R', 'S'], dtype='object')

DataFrame.to_numpy() gives a NumPy representation of the underlying data.

DataFrame.to_numpy() is fast and doesn’t require copying data.

In [8]:
df.to_numpy()
Out[8]:
array([[ 0.01037757, -1.08489787, -0.44240246, -0.27828417],
       [ 1.86326425,  0.62636265,  0.63987705, -0.74176401],
       [-0.72747971, -0.47361484,  0.29916325, -2.37401899],
       [-0.11454301,  1.87307102, -0.7213719 , -0.75998405],
       [ 0.09742166, -0.42815908,  0.05608862, -0.03599818],
       [-0.03078364,  0.0072619 , -0.00713497,  1.15947185],
       [-0.28107373, -0.59509149,  1.02512422, -2.60274602],
       [ 0.17823484,  0.03812998,  0.30527348, -1.55129971]])

For df2, the DataFrame with multiple dtypes, DataFrame.to_numpy() is relatively expensive.

In [9]:
df2 = pd.DataFrame({'A': 1.,
                        'B': pd.Timestamp('20190102'),
                        'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                        'D': np.array([3] * 4, dtype='int32'),
                        'E': pd.Categorical(["test", "train", "test", "train"]),
                        'F': 'foo'})
df2
Out[9]:
A B C D E F
0 1.0 2019-01-02 1.0 3 test foo
1 1.0 2019-01-02 1.0 3 train foo
2 1.0 2019-01-02 1.0 3 test foo
3 1.0 2019-01-02 1.0 3 train foo
In [10]:
df2.to_numpy()
Out[10]:
array([[1.0, Timestamp('2019-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2019-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2019-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2019-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

Note: DataFrame.to_numpy() does not include the index or column labels in the output.

describe() function shows a quick statistic summary of your data:

In [11]:
df.describe()
Out[11]:
P Q R S
count 8.000000 8.000000 8.000000 8.000000
mean 0.124427 -0.004617 0.144327 -0.898078
std 0.757020 0.913457 0.560059 1.248733
min -0.727480 -1.084898 -0.721372 -2.602746
25% -0.156176 -0.503984 -0.115952 -1.756980
50% -0.010203 -0.210449 0.177626 -0.750874
75% 0.117625 0.185188 0.388924 -0.217713
max 1.863264 1.873071 1.025124 1.159472

Transposing data:

In [12]:
df.T
Out[12]:
2019-01-01 2019-01-02 2019-01-03 2019-01-04 2019-01-05 2019-01-06 2019-01-07 2019-01-08
P 0.010378 1.863264 -0.727480 -0.114543 0.097422 -0.030784 -0.281074 0.178235
Q -1.084898 0.626363 -0.473615 1.873071 -0.428159 0.007262 -0.595091 0.038130
R -0.442402 0.639877 0.299163 -0.721372 0.056089 -0.007135 1.025124 0.305273
S -0.278284 -0.741764 -2.374019 -0.759984 -0.035998 1.159472 -2.602746 -1.551300

Sorting data by an axis:

In [13]:
df.sort_index(axis=1, ascending=False)
Out[13]:
S R Q P
2019-01-01 -0.278284 -0.442402 -1.084898 0.010378
2019-01-02 -0.741764 0.639877 0.626363 1.863264
2019-01-03 -2.374019 0.299163 -0.473615 -0.727480
2019-01-04 -0.759984 -0.721372 1.873071 -0.114543
2019-01-05 -0.035998 0.056089 -0.428159 0.097422
2019-01-06 1.159472 -0.007135 0.007262 -0.030784
2019-01-07 -2.602746 1.025124 -0.595091 -0.281074
2019-01-08 -1.551300 0.305273 0.038130 0.178235

Sorting by values:

In [14]:
df.sort_values(by='Q')
Out[14]:
P Q R S
2019-01-01 0.010378 -1.084898 -0.442402 -0.278284
2019-01-07 -0.281074 -0.595091 1.025124 -2.602746
2019-01-03 -0.727480 -0.473615 0.299163 -2.374019
2019-01-05 0.097422 -0.428159 0.056089 -0.035998
2019-01-06 -0.030784 0.007262 -0.007135 1.159472
2019-01-08 0.178235 0.038130 0.305273 -1.551300
2019-01-02 1.863264 0.626363 0.639877 -0.741764
2019-01-04 -0.114543 1.873071 -0.721372 -0.759984