Select a single column, which yields a Series, equivalent to df.A:

In [1]:
import numpy as np
import pandas as pd
In [2]:
dates = pd.date_range('20190101', periods=8)
In [3]:
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('PQRS'))
In [4]:
df
Out[4]:
P Q R S
2019-01-01 -1.464165 1.010614 1.109461 0.503447
2019-01-02 -0.587479 -0.624253 1.647541 0.824041
2019-01-03 -0.361385 -2.603255 -0.945360 1.200653
2019-01-04 0.750404 -0.674213 0.199929 0.831605
2019-01-05 0.358591 -0.323865 -0.494276 0.772347
2019-01-06 -1.229054 -1.514850 0.336238 0.428171
2019-01-07 1.232194 -1.814151 0.352850 0.352128
2019-01-08 -0.338794 -1.800993 -0.192666 -0.604790
In [5]:
df['P']
Out[5]:
2019-01-01   -1.464165
2019-01-02   -0.587479
2019-01-03   -0.361385
2019-01-04    0.750404
2019-01-05    0.358591
2019-01-06   -1.229054
2019-01-07    1.232194
2019-01-08   -0.338794
Freq: D, Name: P, dtype: float64

Selecting via [], which slices the rows.

In [6]:
df[0:3]
Out[6]:
P Q R S
2019-01-01 -1.464165 1.010614 1.109461 0.503447
2019-01-02 -0.587479 -0.624253 1.647541 0.824041
2019-01-03 -0.361385 -2.603255 -0.945360 1.200653
In [7]:
df['20190102':'20190104']
Out[7]:
P Q R S
2019-01-02 -0.587479 -0.624253 1.647541 0.824041
2019-01-03 -0.361385 -2.603255 -0.945360 1.200653
2019-01-04 0.750404 -0.674213 0.199929 0.831605

Selection by label

For getting a cross section using a label:

In [8]:
df.loc[dates[0]]
Out[8]:
P   -1.464165
Q    1.010614
R    1.109461
S    0.503447
Name: 2019-01-01 00:00:00, dtype: float64

Selecting on a multi-axis by label:

In [9]:
df.loc[:, ['P', 'Q']]
Out[9]:
P Q
2019-01-01 -1.464165 1.010614
2019-01-02 -0.587479 -0.624253
2019-01-03 -0.361385 -2.603255
2019-01-04 0.750404 -0.674213
2019-01-05 0.358591 -0.323865
2019-01-06 -1.229054 -1.514850
2019-01-07 1.232194 -1.814151
2019-01-08 -0.338794 -1.800993

Show label slicing, both endpoints are included:

In [10]:
df.loc['20190102':'20190104', ['P', 'Q']]
Out[10]:
P Q
2019-01-02 -0.587479 -0.624253
2019-01-03 -0.361385 -2.603255
2019-01-04 0.750404 -0.674213

Reduction in the dimensions of the returned object:

In [11]:
df.loc['20190102', ['P', 'Q']]
Out[11]:
P   -0.587479
Q   -0.624253
Name: 2019-01-02 00:00:00, dtype: float64

For getting a scalar value:

In [12]:
df.loc[dates[0], 'P']
Out[12]:
-1.464165465219312

For getting fast access to a scalar (equivalent to the prior method):

In [13]:
df.at[dates[0], 'P']
Out[13]:
-1.464165465219312

Selection by position

In [14]:
df.iloc[3]
Out[14]:
P    0.750404
Q   -0.674213
R    0.199929
S    0.831605
Name: 2019-01-04 00:00:00, dtype: float64

By integer slices, acting similar to numpy/python:

In [15]:
df.iloc[3:5, 0:2]
Out[15]:
P Q
2019-01-04 0.750404 -0.674213
2019-01-05 0.358591 -0.323865

By lists of integer position locations, similar to the numpy/python style:

In [16]:
df.iloc[[1, 2, 4], [0, 2]]
Out[16]:
P R
2019-01-02 -0.587479 1.647541
2019-01-03 -0.361385 -0.945360
2019-01-05 0.358591 -0.494276

Slice rows explicitly:

In [17]:
df.iloc[1:3, :]
Out[17]:
P Q R S
2019-01-02 -0.587479 -0.624253 1.647541 0.824041
2019-01-03 -0.361385 -2.603255 -0.945360 1.200653

Slice columns explicitly:

In [18]:
df.iloc[:, 1:3]
Out[18]:
Q R
2019-01-01 1.010614 1.109461
2019-01-02 -0.624253 1.647541
2019-01-03 -2.603255 -0.945360
2019-01-04 -0.674213 0.199929
2019-01-05 -0.323865 -0.494276
2019-01-06 -1.514850 0.336238
2019-01-07 -1.814151 0.352850
2019-01-08 -1.800993 -0.192666

Get a value explicitly:

In [19]:
df.iloc[1, 1]
Out[19]:
-0.6242526622088572

Get fast access to a scalar:

In [20]:
df.iat[1, 1]
Out[20]:
-0.6242526622088572

Boolean indexing:
Using a single column’s values to select data.

In [21]:
df[df.P > 0]
Out[21]:
P Q R S
2019-01-04 0.750404 -0.674213 0.199929 0.831605
2019-01-05 0.358591 -0.323865 -0.494276 0.772347
2019-01-07 1.232194 -1.814151 0.352850 0.352128

Select values from a DataFrame where a boolean condition is met.

In [22]:
df[df > 0]
Out[22]:
P Q R S
2019-01-01 NaN 1.010614 1.109461 0.503447
2019-01-02 NaN NaN 1.647541 0.824041
2019-01-03 NaN NaN NaN 1.200653
2019-01-04 0.750404 NaN 0.199929 0.831605
2019-01-05 0.358591 NaN NaN 0.772347
2019-01-06 NaN NaN 0.336238 0.428171
2019-01-07 1.232194 NaN 0.352850 0.352128
2019-01-08 NaN NaN NaN NaN

Using the isin() method for filtering:

In [23]:
df2 = pd.DataFrame({'A': 1.,
                        'B': pd.Timestamp('20190102'),
                        'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                        'D': np.array([3] * 4, dtype='int32'),
                        'E': pd.Categorical(["test", "train", "test", "train"]),
                        'F': 'foo'})
In [24]:
df2 = df.copy()
In [25]:
import pandas as pd
dates = pd.date_range('20190101', periods=4)
df = pd.DataFrame(np.random.randn(4, 4), index=dates, columns=list('PQRS'))
df2 = df.copy()
df2['E'] = ['test', 'train', 'test', 'train']
df2
Out[25]:
P Q R S E
2019-01-01 -0.829298 -0.653406 2.365732 0.665950 test
2019-01-02 -1.150448 -0.557560 0.643302 1.998400 train
2019-01-03 0.359970 1.771774 -1.112879 -1.536518 test
2019-01-04 0.295762 0.367110 0.120878 -1.088377 train
In [26]:
df2[df2['E'].isin(['test', 'train'])]
Out[26]:
P Q R S E
2019-01-01 -0.829298 -0.653406 2.365732 0.665950 test
2019-01-02 -1.150448 -0.557560 0.643302 1.998400 train
2019-01-03 0.359970 1.771774 -1.112879 -1.536518 test
2019-01-04 0.295762 0.367110 0.120878 -1.088377 train

Setting

Setting a new column automatically aligns the data by the indexes.

In [27]:
s = pd.Series([1, 4, np.nan, 6, 8])
In [28]:
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20130102', periods=6))
In [29]:
s1
Out[29]:
2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64

Setting values by label:

In [30]:
df.at[dates[0], 'P'] = 0

Setting values by position:

In [31]:
In [49]: df.iat[0, 1] = 0

Setting by assigning with a NumPy array:

In [32]:
In [50]: df.loc[:, 'S'] = np.array([5] * len(df))

The result of the prior setting operations.

In [33]:
df
Out[33]:
P Q R S
2019-01-01 0.000000 0.000000 2.365732 5
2019-01-02 -1.150448 -0.557560 0.643302 5
2019-01-03 0.359970 1.771774 -1.112879 5
2019-01-04 0.295762 0.367110 0.120878 5

A where operation with setting.

In [34]:
df2 = df.copy()
In [35]:
df2[df2 > 0] = -df2
In [36]:
df2
Out[36]:
P Q R S
2019-01-01 0.000000 0.000000 -2.365732 -5
2019-01-02 -1.150448 -0.557560 -0.643302 -5
2019-01-03 -0.359970 -1.771774 -1.112879 -5
2019-01-04 -0.295762 -0.367110 -0.120878 -5