Код, который я использовал. Я думаю, что это самое лучшее:
def tail(f, n, offset=None):
"""Reads a n lines from f with an offset of offset lines. The return
value is a tuple in the form ``(lines, has_more)`` where `has_more` is
an indicator that is `True` if there are more lines in the file.
"""
avg_line_length = 74
to_read = n + (offset or 0)
while 1:
try:
f.seek(-(avg_line_length * to_read), 2)
except IOError:
# woops. apparently file is smaller than what we want
# to step back, go to the beginning instead
f.seek(0)
pos = f.tell()
lines = f.read().splitlines()
if len(lines) >= to_read or pos == 0:
return lines[-to_read:offset and -offset or None], \
len(lines) > to_read or pos > 0
avg_line_length *= 1.3
Используйте DataFrameGroupBy.cumsum
с указанными столбцами после groupby
:
#if DatetimeIndex
idx = data_aggregated.index.date
#if column
#idx = data_aggregated['creationDateTime'].dt.date
data_aggregated[['RollingOK','RollingFail']] = (data_aggregated.groupby(idx)['OK','Fail']
.cumsum())
print (data_aggregated)
OK Fail RollingOK RollingFail
creationDateTime
2017-01-06 21:30:00 4 0 4 0
2017-01-06 21:35:00 4 0 8 0
2017-01-06 21:36:00 4 0 12 0
2017-01-07 21:48:00 3 1 3 1
2017-01-07 21:53:00 4 0 7 1
2017-01-08 21:22:00 3 1 3 1
2017-01-08 21:27:00 3 1 6 2
2017-01-09 21:49:00 3 1 3 1
Вы также можете работать со всеми столбцами:
data_aggregated = (data_aggregated.join(data_aggregated.groupby(idx)
.cumsum()
.add_prefix('Rolling')))
print (data_aggregated)
OK Fail RollingOK RollingFail
creationDateTime
2017-01-06 21:30:00 4 0 4 0
2017-01-06 21:35:00 4 0 8 0
2017-01-06 21:36:00 4 0 12 0
2017-01-07 21:48:00 3 1 3 1
2017-01-07 21:53:00 4 0 7 1
2017-01-08 21:22:00 3 1 3 1
2017-01-08 21:27:00 3 1 6 2
2017-01-09 21:49:00 3 1 3 1
Ваше решение должно быть изменены:
data_aggregated[['RollingOK','RollingFail']] = (data_aggregated.groupby(idx)['OK','Fail']
.expanding(0)
.sum()
.reset_index(level=0, drop=True))