Вы можете использовать numpy.where
, который вы можете обернуть в простой функции, соответствующей вашим требованиям:
def max_by_index(idx, arr):
return np.where(arr[idx] == np.max(arr[idx]))
В действии:
>>> max_by_index(0, a)
(array([1], dtype=int64), array([2], dtype=int64))
Вы можете индексировать свой массив с этим результатом для доступа к максимальному значению:
>>> a[0][max_by_index(0, a)]
array([0.7])
Это вернет все местоположения максимального значения, если вам нужен только один вы можете заменить np.max
на np.argmax
.
Комментарии в коде:
#!/bin/bash
# Scenario 1
echo 'filename filesize data_received_dt tname createdt
ccaa/01APR2018-revised/ 0 2019-01-17T06:16:59.000Z sample 2018-03-15T09:51:36.000Z
ccaa/01APR2018/content_01APR2018-00000.csv 115814528 2018-12-05T23:38:10.000Z live 2018-03-15T09:51:36.000Z
ccaa/01APR2018-revised/content_01APR2018-00001.csv 116584541 2018-12-05T23:38:09.000Z test 2018-03-15T09:51:36.000Z
ccaa/01JUN2018-revised/content_01JUN2018-00002.csv 117363985 2018-12-05T23:38:09.000Z sample 2018-03-15T09:51:36.000Z
ccaa/10JUL2018/content_10JUL2018-00002.csv 117363985 2018-12-05T23:38:09.000Z sample 2018-03-15T09:51:36.000Z
ccaa/21AUG2018-revised/content_21AUG2018-00002.csv 117363985 2018-12-05T23:38:09.000Z sample 2018-03-15T09:51:36.000Z
' |
# remove first line with headers
tail -n +2 |
# for each line
while
IFS=' ' read -r name size received_dt tname create_dt &&
# stop on empty lines
[ -n "$name" ]
do
# get the second directory name from the name
# this is a smarty way of getting the last second field from the right
dir=$(<<<"$name" rev | cut -d'/' -f2 | rev)
# if the 2nd dir doesn't end with -revised, add -revised
# (I think this could be just one sed command)
if ! <<<"$dir" grep -q -- "-revised$"; then
dir2=$(dirname "$(dirname "$name")")
dir="${dir}-revised"
name=$dir2/$dir/$(basename "$name")
fi
# extract date data from the dir
day_from_dir=${dir:0:2}
month_from_dir="${dir:2:1}$(<<<"${dir:3:2}" tr [:upper:] [:lower:])"
year_from_dir="20${dir:7:2}"
# get start and end dates
start_dt=$(
LC_ALL=C date \
--date="${day_from_dir} ${month_from_dir} ${year_from_dir} 00:00:00" \
+%-d-%b-%g
)
end_dt=$start_dt
# printf ouput
printf "%s %s %s %s %s %s %s\n" \
"$name" \
"$start_dt" \
"$end_dt" \
"$size" \
"$received_dt" \
"$tname" \
"$create_dt"
done |
# format the output - left justify and set column names
column -t -s ' ' -o ' ' -N \
"filename,start_dt,end_dt,filesize,data_received_dt,name,createdt"
# Scenario 2
echo 'filename size date tname
ccaa/201802/ 0 2019-01-17T06:16:34.000Z sample
ccaa/201802/Feb2018000000_0.csv 32602738 2018-09-11T04:05:38.000Z live
ccaa/201804/Feb2018000001_0.csv 32602738 2018-09-11T04:05:38.000Z test
ccaa/201805/Feb2018000002_0.csv 32602738 2018-09-11T04:05:38.000Z sample
ccaa/201806/Feb2018000003_0.csv 32602187 2018-09-11T04:05:38.000Z sample
' |
tail -n +2 |
while
IFS=' ' read -r name size date tname && [ -n "$name" ]
do
# get the second directory name from the name
# this is a smarty way of getting the last second field from the right
dir=$(<<<"$name" rev | cut -d'/' -f2 | rev)
# extract date from dir
year=${dir:0:4}
month=${dir:4:2}
ts="${year}-${month}-01T00:00:00-00:00"
# set date format
start_dt=$(
LC_ALL=C date \
--date="$ts" \
+%-d-%b-%g
)
# we want last month day - add 1 month and subtract 1 day
end_dt=$(
LC_ALL=C date \
--date="$ts +1 month -1 day" \
+%-d-%b-%g
)
# and output
printf "%s %s %s %s %s %s\n" \
"$name" \
"$start_dt" \
"$end_dt" \
"$size" \
"$date" \
"$tname"
done |
column -t -s ' ' -o ' ' -N \
"filename,start_dt,end_dt,size,date,tname"
# Scenario 3 - same as 2 but different naming scheme or smth
echo 'filename size date tname
ccaa/201802/ 0 2019-01-17T06:16:34.000Z sample
ccaa/201802/Feb2018000000_0.csv 32602738 2018-09-11T04:05:38.000Z live
ccaa/201804/Feb2018000001_0.csv 32602738 2018-09-11T04:05:38.000Z test
ccaa/201805/Feb2018000002_0.csv 32602738 2018-09-11T04:05:38.000Z sample
ccaa/201808/Feb2018000003_0.csv 32602187 2018-09-11T04:05:38.000Z sample
' |
tail -n +2 |
while
IFS=' ' read -r name size date tname &&
[ -n "$name" ]
do
# get the second directory name from the name
# this is a smarty way of getting the last second field from the right
dir=$(<<<"$name" rev | cut -d'/' -f2 | rev)
#extract date from dir
year=${dir:0:4}
month=${dir:4:2}
ts="${year}-${month}-01T00:00:00-00:00"
quarter=$(
LC_ALL=C date \
--date="$ts" \
+%q
)
#rename the file with the year-Qq
tmp="$(dirname "$(dirname "$name")")/${year}-Q${quarter}/"
if ! <<<"$name" grep -q "/$"; then
name="${tmp}$(basename "$name")"
fi
# set date format
start_dt=$(
LC_ALL=C date \
--date="$ts" \
+%-d-%b-%g
)
# we want last month day - add 1 month and subtract 1 day
end_dt=$(
LC_ALL=C date \
--date="$ts +1 month -1 day" \
+%-d-%b-%g
)
# and output
printf "%s %s %s %s %s %s\n" \
"$name" \
"$start_dt" \
"$end_dt" \
"$size" \
"$date" \
"$tname"
done |
# add create nice looking table with header names
column -t -s ' ' -o ' ' -N \
"filename,start_dt,end_dt,size,date,tname"
Вывод из jdoodle :
filename start_dt end_dt filesize data_received_dt name createdt
ccaa/01APR2018-revised/ 1-Apr-18 1-Apr-18 0 2019-01-17T06:16:59.000Z sample 2018-03-15T09:51:36.000Z
ccaa/01APR2018-revised/content_01APR2018-00000.csv 1-Apr-18 1-Apr-18 115814528 2018-12-05T23:38:10.000Z live 2018-03-15T09:51:36.000Z
ccaa/01APR2018-revised/content_01APR2018-00001.csv 1-Apr-18 1-Apr-18 116584541 2018-12-05T23:38:09.000Z test 2018-03-15T09:51:36.000Z
ccaa/01JUN2018-revised/content_01JUN2018-00002.csv 1-Jun-18 1-Jun-18 117363985 2018-12-05T23:38:09.000Z sample 2018-03-15T09:51:36.000Z
ccaa/10JUL2018-revised/content_10JUL2018-00002.csv 10-Jul-18 10-Jul-18 117363985 2018-12-05T23:38:09.000Z sample 2018-03-15T09:51:36.000Z
ccaa/21AUG2018-revised/content_21AUG2018-00002.csv 21-Aug-18 21-Aug-18 117363985 2018-12-05T23:38:09.000Z sample 2018-03-15T09:51:36.000Z
filename start_dt end_dt size date tname
ccaa/201802/ 1-Feb-18 28-Feb-18 0 2019-01-17T06:16:34.000Z sample
ccaa/201802/Feb2018000000_0.csv 1-Feb-18 28-Feb-18 32602738 2018-09-11T04:05:38.000Z live
ccaa/201804/Feb2018000001_0.csv 1-Apr-18 30-Apr-18 32602738 2018-09-11T04:05:38.000Z test
ccaa/201805/Feb2018000002_0.csv 1-May-18 31-May-18 32602738 2018-09-11T04:05:38.000Z sample
ccaa/201806/Feb2018000003_0.csv 1-Jun-18 30-Jun-18 32602187 2018-09-11T04:05:38.000Z sample
filename start_dt end_dt size date tname
ccaa/201802/ 1-Feb-18 28-Feb-18 0 2019-01-17T06:16:34.000Z sample
ccaa/2018-Q1/Feb2018000000_0.csv 1-Feb-18 28-Feb-18 32602738 2018-09-11T04:05:38.000Z live
ccaa/2018-Q2/Feb2018000001_0.csv 1-Apr-18 30-Apr-18 32602738 2018-09-11T04:05:38.000Z test
ccaa/2018-Q2/Feb2018000002_0.csv 1-May-18 31-May-18 32602738 2018-09-11T04:05:38.000Z sample
ccaa/2018-Q3/Feb2018000003_0.csv 1-Aug-18 31-Aug-18 32602187 2018-09-11T04:05:38.000Z sample
$( ... )
.