-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
# /// script
# requires-python = ">=3.13"
# dependencies = [
# "pandas==2.3.3",
# ]
# ///
import numpy as np
import pandas as pd
# Create a Series with non-contiguous integer index (step of 5)
# Index: 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, ...
# Values: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
T = np.arange(0, 100, 5)
series = pd.Series(np.arange(len(T)), index=T)
# Without step: returns all labels from 10 to 50 inclusive (label-based)
result_no_step = series.loc[10:50]
print("series.loc[10:50] (no step):")
print(series.loc[10:50])
print()
# With step=1: same as no step
print("series.loc[10:50:1] (step=1):")
print(series.loc[10:50:1])
print()
# With step=2: step is applied positionally, start, stop applied to labels
print("series.loc[10:50:2] (step=2):")
print(series.loc[10:50:2])
print()
# With step=5: Same behavior as step 2
print("series.loc[10:50:5] (step=5):")
print(series.loc[10:50:5])
print()
# Using arange with same arguments as the slice
print("series.loc[np.arange(10,50,5)] (step=5):")
print(series.loc[np.arange(10, 50, 5)])
print()Issue Description
When using .loc with a slice start/stop are applied over a different space than step which I found very counterintuitive. I also was not able to find any docs (on this admittedly niche use case)
start/stop are applied over the values of the labels
step is applied positionally over the index
The result of the above script is
series.loc[10:50] (no step):
10 2
15 3
20 4
25 5
30 6
35 7
40 8
45 9
50 10
dtype: int64
series.loc[10:50:1] (step=1):
10 2
15 3
20 4
25 5
30 6
35 7
40 8
45 9
50 10
dtype: int64
series.loc[10:50:2] (step=2):
10 2
20 4
30 6
40 8
50 10
dtype: int64
series.loc[10:50:5] (step=5):
10 2
35 7
dtype: int64
series.loc[np.arange(10,50,5)] (step=5):
10 2
15 3
20 4
25 5
30 6
35 7
40 8
45 9
dtype: int64
Expected Behavior
I would have expected either of the following:
error
Throw an error saying that step is ambiguous and cannot be used here. This seems to be the approach of IntervalIndex:
pandas/pandas/core/indexes/interval.py
Lines 978 to 982 in 499c5d4
| def _convert_slice_indexer(self, key: slice, kind: Literal["loc", "getitem"]): | |
| if not (key.step is None or key.step == 1): | |
| # GH#31658 if label-based, we require step == 1, | |
| # if positional, we disallow float start/stop | |
| msg = "label-based slicing with step!=1 is not supported for IntervalIndex" |
(Though as a sidenote I wasn't able to hit that code path)
Step applies to Label Space
In my example I would expect the slice with step=5 to behave the same as step=1 as it should hit each of the same values. My mental model is that for the case of integers as in my example
series.loc[slice(start, stop, step)]
should be equivalent to
series.loc[np.arange(start, stop, step)]
and in more amgious cases e.g. slice("a", "f", 3) and error should be thrown
Installed Versions
INSTALLED VERSIONS
------------------
commit : 9c8bc3e55188c8aff37207a74f1dd144980b8874
python : 3.13.0
python-bits : 64
OS : Darwin
OS-release : 24.6.0
Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:51 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8112
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.3.3
numpy : 2.3.5
pytz : 2025.2
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None