Pipeline-based processing in pypillometry¶
[1]:
import sys
sys.path.insert(0,"..")
import pypillometry as pp
pypillometry
implements a pipeline-like approach where each operation executed on a PupilData
-object returns a copy of the (modified) object. This enables the “chaining” of commands as follows:
[2]:
d=pp.PupilData.from_file("../data/test.pd")\
.blinks_detect()\
.blinks_merge()\
.lowpass_filter(3)\
.downsample(50)
This command loads a data-file (test.pd
), applies a 3Hz low-pass filter to it, downsamples the signal to 50 Hz, detects blinks in the signal and merges short, successive blinks together. The final result of this processing-pipeline is stored in object d
.
Here, for better visibility, we put each operation in a separate line. For that to work, we need to tell Python that the line has not yet ended at the end of the statement which we achieve by putting a backslash \
at the end of each (non-final) line.
We can get a useful summary of the dataset and the operations applied to it by simply printing it:
[3]:
print(d)
PupilData(test_ro_ka_si_hu_re_vu_vi_be, 331.3KiB):
n : 6001
nmiss : 117.2
perc_miss : 1.9530078320279955
nevents : 56
nblinks : 24
ninterpolated : 0.0
blinks_per_min : 11.998000333277787
fs : 50
duration_minutes : 2.0003333333333333
start_min : 4.00015
end_min : 6.0
baseline_estimated: False
response_estimated: False
History:
*
└ reset_time()
└ blinks_detect()
└ sub_slice(4,6,units=min)
└ drop_original()
└ blinks_detect()
└ blinks_merge()
└ lowpass_filter(3)
└ downsample(50)
We see that sampling rate, number of datapoints and more is automatically printed along with the history of all operations applied to the dataset. This information can also be retrieved separately and in a form useful for further processing the function summary()
which returns the information in the form of a dict
:
[4]:
d.summary()
[4]:
{'name': 'test_ro_ka_si_hu_re_vu_vi_be',
'n': 6001,
'nmiss': 117.2,
'perc_miss': 1.9530078320279955,
'nevents': 56,
'nblinks': 24,
'ninterpolated': 0.0,
'blinks_per_min': 11.998000333277787,
'fs': 50,
'duration_minutes': 2.0003333333333333,
'start_min': 4.00015,
'end_min': 6.0,
'baseline_estimated': False,
'response_estimated': False}
The history is internally stored in PupilData
’s history
member and can be applied to another object for convenience. That way, a pipeline can be developed on a single dataset and later be transferred to a whole folder of other (similar) datasets.
As an example, we create several “fake” datasets representing data from several subjects (each with 10 trials):
[5]:
nsubj=10 # number of subjects
data={k:pp.create_fake_pupildata(ntrials=10, fs=500) for k in range(1,nsubj+1)}
The dict
data
now contains ten PupilData
datasets. We will now use the data from the first subject to create a pipeline of processing operations:
[6]:
template=data[1].lowpass_filter(5).downsample(100)
template.print_history()
* fake_bomitime_ni_fu
└ lowpass_filter(5)
└ downsample(100)
We have stored the result of these operations in a new dataset template
which contains a record of these operations. We can now easily apply identical operations on all the datasets using the apply_history()
function:
[7]:
preproc_data={k:template.apply_history(d) for k,d in data.items()}
preproc_data[5].print_history()
* fake_kowelale_wu_ni
└ lowpass_filter(5)
└ downsample(100)