This file was created from the following Jupyter-notebook: docs/modeling.ipynb
Interactive version: Binder badge

Modeling the pupillometric signal

WARNING: This functionality is experimental. Some of the provided algorithms are unpublished (or in the process of being published) and may not work well.

The idea behind the algorithms is detailed in this notebook (slides from a symposium talk):

This notebooks also includes results from a simulation study showing the superiority of the algorithm to more traditional approaches.

[1]:
import sys
sys.path.insert(0,"..") ## not necessary if pypillometry is installed on your system
import pypillometry as pp
import pylab as plt
import numpy as np
import scipy

It can sometimes be useful to think about the pupillometric signal as being composed of different components. One comon assumption, based on the finding that the pupil reflects activity in the norepinephrinergic system, consists of slow, tonic (baseline) and faster, phasic (response) fluctuations.

pypillometry comes with functions to create artificial data. These functions are built on such a model where stimulus- or event-induced responses are superimposed on a slow, baseline-like component:

[2]:
faked=pp.create_fake_pupildata(ntrials=20)
plt.figure(figsize=(15,5))
faked.plot()
../_images/docs_modeling_3_0.png

The orange line is the tonic, baseline-component. At the timing of each event (grey lines), a scaled version of a “response-kernel” (Hoeks & Levelt, 1993) is added. Finally, random noise is added on top of the modeled data. The challenge is to extract both the size of the response as well as the baseline value at each stimulus when only using the raw pupillometric data (the blue line).

Traditional approaches for disentangling tonic and phasic components

One common way to solve this problem and analyse pupillometric data on the trial-by-trial level is therefore to extract the average pupillometric signal just before a stimulus (as a measure of the baseline signal) and just after the stimulus (as a measure of the pupil’s response. In pypillometry this functionality is implemented by PupilData.stat_per_event() which allows to extract a summary of the signal relative to the events in the dataset.

For example, the following code extracts - the average signal in the time-window from 200 ms before each event until the timing of the event itself (0) as baseline - the average signal in the time-window from 800 ms after each event until 1200 ms after the event as a measure of the response

often, the baseline is subtracted from the response:

[3]:
baseline=faked.stat_per_event( (-200, 0   ) )
response=faked.stat_per_event( ( 800, 1200) )-baseline
baseline,response
[3]:
(array([5497.22338969, 5505.84230613, 5504.95766177, 5502.83389561,
        5519.26795734, 5500.93556967, 5521.47146776, 5501.94256506,
        5508.15286254, 5502.7746573 , 5507.29647592, 5516.96014844,
        5510.85424104, 5515.54653477, 5518.08930806, 5504.37508904,
        5526.7414329 , 5507.19514677, 5507.87170839, 5509.50337549,
        5502.12178611, 5505.54231241, 5502.98219096, 5500.63723808,
        5500.5047791 , 5501.95173174, 5501.98036015, 5499.75653658,
        5505.69011558, 5507.55745466, 5516.96108655, 5510.85644785,
        5511.06295341, 5506.1358251 , 5504.84753547, 5506.00476406,
        5519.09558608, 5504.34079763, 5510.49458483, 5502.97984779]),
 array([  7.87399393,  -2.33877461,  -3.16018652,  13.57307738,
        -18.91961534,  18.66346135, -20.29346888,   4.90085667,
         -5.43644688,   3.31891129,   8.1205739 ,  -6.9479222 ,
          3.82577502,   1.20349654, -14.09887951,  21.19385231,
        -20.06858291,   0.25081461,   0.72956157,  -4.55229805,
          3.19235136,  -2.93086936,  14.50920921,  19.22592926,
         16.11029615,   5.47806116,   5.20489091,   6.21127191,
          5.57229406,   8.38191287,  -7.08207666,   3.79996363,
         -1.66852599,  15.08431094,  20.87224058,   6.45305353,
        -15.01720952,   6.72919367,  -0.8283899 ,   2.7633361 ]))

Note that PupilData.stat_per_event() supports selecting specific events, any summary-function (default is numpy.mean) and has functionality for handling missing data in the used time-windows, e.g.:

[4]:
faked.stat_per_event( (-200,0), event_select="event", statfct=np.median, return_missing="nmiss")
[4]:
(array([5497.23314426, 5505.74146081, 5504.96472822, 5502.80653328,
        5519.34855095, 5500.9466973 , 5521.61377533, 5501.9995312 ,
        5508.18492429, 5502.80102799, 5507.32622705, 5516.99380643,
        5510.86912668, 5515.56038628, 5518.10239105, 5504.39980492,
        5526.80368604, 5507.20711887, 5507.81666783, 5509.55677362,
        5502.05091102, 5505.53396316, 5502.99002577, 5500.60428626,
        5500.49712934, 5501.99230174, 5502.01976366, 5499.76123856,
        5505.68953437, 5507.56270336, 5517.00246874, 5510.86912668,
        5511.03050428, 5506.11351642, 5504.79115636, 5505.76330355,
        5519.40998699, 5504.28362604, 5510.45780027, 5503.00292442]),
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0.]))

Advance methods for tonic/phasic component estimation

The simple method detailed above is appealing for its simplicity but has severe limitations. Most importantly, multiple overlapping pupillary responses can “look like” baseline-fluctuations when added together, thereby artificially inflating baseline-estimates particularly in cases where events are spaced closely in time (“fast paradigms”). For that reason, we developed specialized algorithms to disentangle tonic and phasic components of the pupillometric signal.

This algorithm uses an iterative procedure to remove an initial estimate of the responses from the signal to continue to estimate the underlying baseline. Details about how this algorithm works and which parameters it supports are available in this notebook and will be available in a forthcoming publication.

In practice, the functionality is implemented in PupilData.estimate_baseline() and PupilData.estimate_response(). The response-estimation depends on the estimated baseline, hence the estimate_baseline() function should always be called first. In order to increase speed, we filter the data and downsample it to 50 Hz before running the baseline- and response-estimation functions

[5]:
d=faked.lowpass_filter(cutoff=5)\
    .downsample(fsd=50)\
    .estimate_baseline()\
    .estimate_response()
WARNING:pystan:Automatic Differentiation Variational Inference (ADVI) is an EXPERIMENTAL ALGORITHM.
WARNING:pystan:ADVI samples may be found on the filesystem in the file `/var/folders/28/_ftmv1_n41n48znrymwflm940000gp/T/tmp0twejg2w/output.csv`
Using cached StanModel
WARNING:pystan:Automatic Differentiation Variational Inference (ADVI) is an EXPERIMENTAL ALGORITHM.
WARNING:pystan:ADVI samples may be found on the filesystem in the file `/var/folders/28/_ftmv1_n41n48znrymwflm940000gp/T/tmpu2rv_yv0/output.csv`
MSG: optimizing both npar and tmax, might take a while...
......................................................................................................................................................................................................................................................

Here, we allowed the two shape parameters npar and tmax to vary freely together with the amplitude of the responses. This allows an individualized shape of the pupillary response for each subject (which appears reasonable given the results in Hoeks & Levelt, 1993) but may also take a long time to optimize and potentially results in pathological solutions. In that case, one or both of the parameters can be fixed, for example to reasonable group-level values.

After running these methods, the baseline is stored in the PupilData.baseline variable:

[6]:
d.baseline
[6]:
array([5498.57714444, 5498.6606226 , 5498.7422532 , ..., 5491.61364603,
       5491.29801098, 5490.99052775])

and the estimated response in PupilData.response_pars:

[7]:
d.response_pars
[7]:
{'npar': 11.872666808731674,
 'npar_free': True,
 'tmax': 899.6180651831687,
 'tmax_free': True,
 'coef': array([ 7.69816302,  4.28928702,  3.8821806 ,  9.52004979,  3.60587198,
        15.93467607,  4.31018922,  2.38161115,  3.5997683 ,  5.8555845 ,
        12.65606094,  7.67183007,  8.4533453 , 10.55249302,  1.67096352,
        23.38023839,  5.53552921,  4.1318766 ,  4.97613487,  6.11872951,
         3.74244261,  2.41996112, 13.48671774,  4.22908343,  5.0185794 ,
         0.14989147,  9.72097211,  3.30870393,  0.        ,  3.11073058,
         0.        ,  0.        ,  0.17421374,  6.88414573,  0.        ,
         8.97973124,  2.15049561,  4.66517684,  8.81778523,  9.65322884]),
 'bounds': {'npar': (1, 20), 'tmax': (100, 2000)}}

The resulting baseline-estimation and the estimated full model (baseline+response) can be plotted:

[8]:
plt.figure(figsize=(15,5))
d.plot()
../_images/docs_modeling_17_0.png

The orange curve (estimated baseline) is still more wiggly than the real baseline (green) but give a considerable better estimate than a traditional method. The overall fit of the model (red) is excellent (which is to be expected, as this is simulated data). The misfit comes from randomly interspersed “spurious” events in the randomly generated data.

We can quantify how well the novel baseline-estimation works relative to the traditional method by comparing it to the “ground-truth” which is available for artificial data.

We calculate the ground-truth and the traditional and novel estimates for each event-onset (=trial):

[9]:
real_baseline=pp.stat_event_interval(d.tx, d.sim_baseline, d.event_onsets, [0,0])
real_response=d.sim_response_coef

traditional_baseline=d.stat_per_event( (-200,0) )
traditional_response=d.stat_per_event( ( 800,1200) )-traditional_baseline

novel_baseline=pp.stat_event_interval(d.tx, d.baseline, d.event_onsets, [0,0])
novel_response=d.response_pars["coef"]

And compare them by means of the correlation of the estimated and mean values:

[10]:
print("Traditional method:")
print("Baseline: Corr(trad,real)=",scipy.stats.pearsonr(traditional_baseline, real_baseline)[0])
print("Response: Corr(trad,real)=",scipy.stats.pearsonr(traditional_response, real_response)[0])
print("\n")
print("Novel method:")
print("Baseline: Corr(nov, real)=",scipy.stats.pearsonr(novel_baseline, real_baseline)[0])
print("Response: Corr(nov, real)=",scipy.stats.pearsonr(novel_response, real_response)[0])
Traditional method:
Baseline: Corr(trad,real)= 0.37434828350870303
Response: Corr(trad,real)= 0.38109869050417233


Novel method:
Baseline: Corr(nov, real)= 0.7623237267947978
Response: Corr(nov, real)= 0.6876359509086555

We see that the correlations are much higher for the novel method when compared to the traditional methods. More sophisticated simulation studies are reported in this notebook.

The parameters for the baseline-estimation function are described in the API-documentation for pypillometry.baseline.baseline_envelope_iter_bspline() and, in more detail, in this notebook.

The parameters for the response-estimation function are described in the API-documentation for pypillometry.pupil.pupil_response().

This file was created from the following Jupyter-notebook: docs/modeling.ipynb
Interactive version: Binder badge