This file was created from the following Jupyter-notebook: importdata.ipynb
Interactive version: Binder badge

Importing Data Example

In order to import data into pypillometry, we have to load the data from the source using other packages and then wrap it into PupilData objects.

Here we will show and example where we translate a file recorded in Eyelinks EDF-format to a file readable by pandas.read_table().

First, we import the needed modules.

[1]:
import sys, os
sys.path.insert(0,"..") # this is not needed if you have installed pypillometry
import pypillometry as pp
import pandas as pd
import numpy as np
import pylab as plt

In this example, we use data recorded with an Eyelink-eyetracker. These eyetrackers store the files in binary files with the extension .edf. Some information about this file-format is here. We use a command-line utility released by Eyelink to convert this proprietory format into a more easily read .asc file that is a whitespace-separated plain-text format. The converter, edf2asc is a program that can be downloaded for different platforms from the Eyelink support forum. There is a GUI-based program for windows and command-line programs for linux and mac. Binaries of the command-line tools for linux and mac are included in pypillometry under this link.

On linux, we would call these programs on an example edf-file twice as follows.

[2]:
!../external/edf2asc-linux -y -s ../data/test.edf ../data/test_samples.asc
!../external/edf2asc-linux -y -e ../data/test.edf ../data/test_events.asc

EDF2ASC: EyeLink EDF file -> ASCII (text) file translator
EDF2ASC version 3.0 Linux Dec  1 2008
(c)1995-2007 by SR Research, last modified Dec  1 2008

processing file ../data/test.edf
=======================Preamble of file ../data/test.edf=======================
| DATE: Fri Feb 14 08:48:33 2020                                              |
| TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED                                   |
| VERSION: EYELINK II 1                                                       |
| SOURCE: EYELINK CL                                                          |
| EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)                      |
| CAMERA: EyeLink USBCAM Version 1.01                                         |
| SERIAL NUMBER: CLU-DAC49                                                    |
| CAMERA_CONFIG: DAC49200.SCD                                                 |
| Psychopy GC demo                                                            |
===============================================================================

Converted successfully: 0 events, 1245363 samples, 6 blocks.

EDF2ASC: EyeLink EDF file -> ASCII (text) file translator
EDF2ASC version 3.0 Linux Dec  1 2008
(c)1995-2007 by SR Research, last modified Dec  1 2008

processing file ../data/test.edf
=======================Preamble of file ../data/test.edf=======================
| DATE: Fri Feb 14 08:48:33 2020                                              |
| TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED                                   |
| VERSION: EYELINK II 1                                                       |
| SOURCE: EYELINK CL                                                          |
| EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)                      |
| CAMERA: EyeLink USBCAM Version 1.01                                         |
| SERIAL NUMBER: CLU-DAC49                                                    |
| CAMERA_CONFIG: DAC49200.SCD                                                 |
| Psychopy GC demo                                                            |
===============================================================================

Converted successfully: 37139 events, 0 samples, 6 blocks.

This results in two files, one containing all the samples and one all the recorded events.

[3]:
fname_samples="../data/test_samples.asc"
fname_events="../data/test_events.asc"

The samples-files contains a large table containing the timestamp, x/y-coordinates for the eyeposition and pupil-area for both the left and the right eye. Here are the first few rows of this file:

[4]:
!head $fname_samples
3385900   817.3   345.2  1707.0   860.6   375.2  1738.0 .....
3385902   817.0   343.5  1706.0   860.7   375.9  1739.0 .....
3385904   816.7   341.6  1705.0   861.2   376.6  1739.0 .....
3385906   816.7   340.4  1706.0   861.7   376.8  1740.0 .....
3385908   816.7   340.2  1707.0   861.6   376.9  1742.0 .....
3385910   816.8   340.2  1708.0   861.1   377.1  1743.0 .....
3385912   816.9   340.9  1708.0   860.7   377.5  1744.0 .....
3385914   816.1   342.1  1710.0   861.1   378.7  1745.0 .....
3385916   815.2   343.2  1712.0   862.5   380.0  1746.0 .....
3385918   814.4   343.6  1713.0   863.9   380.7  1747.0 .....

We can easily read this file using pandas.read_csv().

[5]:
df=pd.read_table(fname_samples, index_col=False,
                  names=["time", "left_x", "left_y", "left_p",
                         "right_x", "right_y", "right_p"])
left_x=df.left_x.values
left_x[left_x=="   ."] = np.nan
left_x = left_x.astype(float)
df.left_x = left_x

left_y=df.left_y.values
left_y[left_y=="   ."] = np.nan
left_y = left_y.astype(float)
df.left_y = left_y

right_x=df.right_x.values
right_x[right_x=="   ."] = np.nan
right_x = right_x.astype(float)
df.right_x = right_x

right_y=df.right_y.values
right_y[right_y=="   ."] = np.nan
right_y = right_y.astype(float)
df.right_y = right_y


df
/tmp/ipykernel_1346565/154614805.py:1: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
  df=pd.read_table(fname_samples, index_col=False,
[5]:
time left_x left_y left_p right_x right_y right_p
0 3385900 817.3 345.2 1707.0 860.6 375.2 1738.0
1 3385902 817.0 343.5 1706.0 860.7 375.9 1739.0
2 3385904 816.7 341.6 1705.0 861.2 376.6 1739.0
3 3385906 816.7 340.4 1706.0 861.7 376.8 1740.0
4 3385908 816.7 340.2 1707.0 861.6 376.9 1742.0
... ... ... ... ... ... ... ...
1245358 5923060 NaN NaN 0.0 NaN NaN 0.0
1245359 5923062 NaN NaN 0.0 NaN NaN 0.0
1245360 5923064 NaN NaN 0.0 NaN NaN 0.0
1245361 5923066 NaN NaN 0.0 NaN NaN 0.0
1245362 5923068 NaN NaN 0.0 NaN NaN 0.0

1245363 rows × 7 columns

We can already use this information to create our PupilData-object. We simply pass in the pupil-area of the right eye (column right_p) and the timestamp-array from the samples-file (Note: we could just as easily have used the left eye or the mean of both):

[6]:
pp.PupilData(right_pupil=df.right_p, left_pupil=df.left_p, time=df.time, name="test")
pp: 17:57:11 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:11 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds
[6]:
PupilData(test, 19.1MiB):
 n               : 1268585
 sampling_rate   : 500.0
 eyes            : ['left', 'right']
 data            : ['left_pupil', 'right_pupil']
 nevents         : 0
 nblinks         : {}
 blinks          : {'left': None, 'right': None}
 duration_minutes: 42.28616666666667
 start_min       : 56.431666666666665
 end_min         : 98.7178
 params          : {}
 glimpse         : EyeDataDict(vars=2,n=310151,shape=(310151,)):
  left_pupil (float64): 1707.0, 1706.0, 1705.0, 1706.0, 1707.0...
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 History:
 *
 └ fill_time_discontinuities()

We can also import the eye-tracking data from the same file if desired. In that case, we would use the EyeData class:

[7]:
pp.EyeData(right_pupil=df.right_p, left_pupil=df.left_p,
             right_x=df.right_x, right_y=df.right_y,
             left_x=df.left_x, left_y=df.left_y,
             time=df.time, name="test")
pp: 17:57:15 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:15 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds
[7]:
EyeData(test, 38.1MiB):
 n                   : 1268585
 sampling_rate       : 500.0
 data                : ['left_x', 'left_y', 'left_pupil', 'right_x', 'right_y', 'right_pupil']
 nevents             : 0
 screen_limits       : not set
 physical_screen_size: not set
 screen_eye_distance : not set
 duration_minutes    : 42.28616666666667
 start_min           : 56.431666666666665
 end_min             : 98.7178
 parameters          : {}
 glimpse             : EyeDataDict(vars=6,n=310151,shape=(310151,)):
  left_x (float64): 817.3, 817.0, 816.7, 816.7, 816.7...
  left_y (float64): 345.2, 343.5, 341.6, 340.4, 340.2...
  left_pupil (float64): 1707.0, 1706.0, 1705.0, 1706.0, 1707.0...
  right_x (float64): 860.6, 860.7, 861.2, 861.7, 861.6...
  right_y (float64): 375.2, 375.9, 376.6, 376.8, 376.9...
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 eyes                : ['left', 'right']
 nblinks             : {}
 blinks              : {'left': None, 'right': None}
 params              : {}
 History:
 *
 └ fill_time_discontinuities()

Of course, this dataset is still missing the important information contained in the event-file which we will use for analysing trial-related pupil-diameter data. For that, we will have to read the events-file, which has a more complicated structure than the samples-file:

[8]:
!head -20 $fname_events
** CONVERTED FROM ../data/test.edf using edfapi 3.0 Linux Dec  1 2008 on Mon May 12 17:57:07 2025
** DATE: Fri Feb 14 08:48:33 2020
** TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED
** VERSION: EYELINK II 1
** SOURCE: EYELINK CL
** EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)
** CAMERA: EyeLink USBCAM Version 1.01
** SERIAL NUMBER: CLU-DAC49
** CAMERA_CONFIG: DAC49200.SCD
** Psychopy GC demo
**

MSG     2728855 DISPLAY_COORDS = 0 0 1919 1079
INPUT   2767568 0
MSG     2784000 !CAL
>>>>>>> CALIBRATION (HV9,P-CR) FOR LEFT: <<<<<<<<<
MSG     2784000 !CAL Calibration points:
MSG     2784000 !CAL -29.4, -23.5        -0,     -2
MSG     2784000 !CAL -29.3, -35.7        -0,  -1544
MSG     2784000 !CAL -32.9, -10.4        -0,   1559

After a header (lines starting with ‘**’) containing meta-information, we get a sequence of “events” which have different formats for all rows. We are interested in lines starting with “MSG” because those contain our experimental markers. Therefore, we read the samples file and remove all rows that do not start with “MSG” first:

[9]:
# read the whole file into variable `events` (list with one entry per line)
with open(fname_events) as f:
    events=f.readlines()

# keep only lines starting with "MSG"
events=[ev for ev in events if ev.startswith("MSG")]
events[0:10]
[9]:
['MSG\t2728855 DISPLAY_COORDS = 0 0 1919 1079\n',
 'MSG\t2784000 !CAL \n',
 'MSG\t2784000 !CAL Calibration points:  \n',
 'MSG\t2784000 !CAL -29.4, -23.5        -0,     -2   \n',
 'MSG\t2784000 !CAL -29.3, -35.7        -0,  -1544   \n',
 'MSG\t2784000 !CAL -32.9, -10.4        -0,   1559   \n',
 'MSG\t2784000 !CAL -49.7, -23.0     -2835,     -2   \n',
 'MSG\t2784000 !CAL -10.8, -27.4      2835,     -2   \n',
 'MSG\t2784000 !CAL -48.3, -33.3     -2818,  -1544   \n',
 'MSG\t2784000 !CAL -11.0, -34.2      2818,  -1544   \n']

Next, we added an experimental marker that was sent as the experiment was started. This marker was called experiment_start. Hence, we can remove all events before this marker.

[10]:
experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
events=events[experiment_start_index+1:]
events[0:10]
[10]:
['MSG\t3387245 C_GW_1_1_UD_UD\n',
 'MSG\t3390421 F_GW_1_1_10_0\n',
 'MSG\t3392759 C_NW_1_2_UD_UD\n',
 'MSG\t3394293 R_NW_1_2_UD_UD\n',
 'MSG\t3395952 F_NW_1_2_-1_0\n',
 'MSG\t3397974 C_NA_1_3_UD_UD\n',
 'MSG\t3399892 R_NA_1_3_UD_UD\n',
 'MSG\t3400999 F_NA_1_3_-11_0\n',
 'MSG\t3403206 C_GA_1_4_UD_UD\n',
 'MSG\t3404640 R_GA_1_4_UD_UD\n']

This is in a format where we can convert it into a pandas.DataFrame object for further processing.

[11]:
df_ev=pd.DataFrame([ev.split() for ev in events])
df_ev
[11]:
0 1 2 3 4 5 6 7 8
0 MSG 3387245 C_GW_1_1_UD_UD None None None None None None
1 MSG 3390421 F_GW_1_1_10_0 None None None None None None
2 MSG 3392759 C_NW_1_2_UD_UD None None None None None None
3 MSG 3394293 R_NW_1_2_UD_UD None None None None None None
4 MSG 3395952 F_NW_1_2_-1_0 None None None None None None
... ... ... ... ... ... ... ... ... ...
1065 MSG 5893078 V_UD_UD_16_UD_UD None None None None None None
1066 MSG 5899076 V_UD_UD_17_UD_UD None None None None None None
1067 MSG 5905073 V_UD_UD_18_UD_UD None None None None None None
1068 MSG 5911072 V_UD_UD_19_UD_UD None None None None None None
1069 MSG 5917071 V_UD_UD_20_UD_UD None None None None None None

1070 rows × 9 columns

In this table, the second column contains the time-stamp (identical to the time-stamp in the samples file), and the third column contains our custom markers (the format like “C_GW_1_1_UD_UD” and so on is specific for our experimental design). There are many more columns which seem to contain no information in our samples. Let’s check what those columns are for by printing the rows in our data-frame where these columns are not None:

[12]:
df_ev[np.array(df_ev[4])!=None].head()
[12]:
0 1 2 3 4 5 6 7 8
209 MSG 3900393 RECCFG CR 500 2 1 LR None
211 MSG 3900393 GAZE_COORDS 0.00 0.00 1919.00 1079.00 None None
212 MSG 3900393 THRESHOLDS L 56 231 R 66 239
213 MSG 3900393 ELCL_WINDOW_SIZES 176 188 0 0 None None
215 MSG 3900393 ELCL_PROC CENTROID (3) None None None None

Apparently, there are more eye-tracker specific markers in our files (in this case due to drift-checks during the experiments). We can safely drop those from our set of interesting events by dropping all rows in which the fourth column is not None and then dropping all non-interesting columns.

[13]:
df_ev=df_ev[np.array(df_ev[4])==None][[1,2]]
df_ev.columns=["time", "event"]
df_ev
[13]:
time event
0 3387245 C_GW_1_1_UD_UD
1 3390421 F_GW_1_1_10_0
2 3392759 C_NW_1_2_UD_UD
3 3394293 R_NW_1_2_UD_UD
4 3395952 F_NW_1_2_-1_0
... ... ...
1065 5893078 V_UD_UD_16_UD_UD
1066 5899076 V_UD_UD_17_UD_UD
1067 5905073 V_UD_UD_18_UD_UD
1068 5911072 V_UD_UD_19_UD_UD
1069 5917071 V_UD_UD_20_UD_UD

1035 rows × 2 columns

Finally, we can pass those event-markers into our PupilData, EyeData or GazeData-object.

[14]:
d=pp.PupilData(right_pupil=df.right_p, time=df.time, event_onsets=df_ev.time, event_labels=df_ev.event, name="test")
d
pp: 17:57:21 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:21 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds
[14]:
PupilData(test, 14.4MiB):
 n               : 1268585
 sampling_rate   : 500.0
 eyes            : ['right']
 data            : ['right_pupil']
 nevents         : 1035
 nblinks         : {}
 blinks          : {'right': None}
 duration_minutes: 42.28616666666667
 start_min       : 56.431666666666665
 end_min         : 98.7178
 params          : {}
 glimpse         : EyeDataDict(vars=1,n=310151,shape=(310151,)):
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 History:
 *
 └ fill_time_discontinuities()

The summary of the dataset shows us that the eyetracker started recording at time=56.4 minutes. We can reset the time index to start with 0 by using the reset_time() function.

[15]:
d=d.reset_time().pupil_blinks_detect()

Now we can store away this dataset in pypillometry-format and use all the pypillometry-functions on it, e.g., plot a minute of this dataset.

[16]:
d.sub_slice(4, 6, units="min")
[16]:
PupilData(test, 1.4MiB):
 n               : 60001
 sampling_rate   : 500.0
 eyes            : ['right']
 data            : ['right_pupil']
 nevents         : 56
 nblinks         : {'right_pupil': 150}
 blinks          : {'right': 150 intervals, 1455.23 +/- 11522.47, [22.00, 141022.00]}
 duration_minutes: 2.0000333333333336
 start_min       : 4.0
 end_min         : 6.0
 params          : {}
 glimpse         : EyeDataDict(vars=1,n=60001,shape=(60001,)):
  right_pupil (float64): 1684.0, 1683.0, 1682.0, 1681.0, 1682.0...

 History:
 *
 └ fill_time_discontinuities()
  └ reset_time()
   └ pupil_blinks_detect()
    └ sub_slice(4,6,units=min)
[17]:
plt.figure(figsize=(15,5));
d.plot.pupil_plot(plot_range=(4, 5), units="min")
_images/importdata_32_0.png

Generalize to multiple similar datasets

Now that we have successfully found a way to create our PupilData structure from the raw .EDF files, we can wrap the code from this notebook into an easily accessible function that creates PupilData objects for a given .EDF file that has the same structure.

We simply create a function that takes the name of an EDF-file as input and runs all the code above, returning the final PupilData object. For convenience, we will assume that the EDF2ASC utility has already run such that .asc files are already available (see above for details).

[18]:
datapath="../data" ## this is where the datafiles are located

def read_dataset(edffile):
    basename=os.path.splitext(edffile)[0] ## remove .edf from filename
    fname_samples=os.path.join(datapath, basename+"_samples.asc")
    fname_events=os.path.join(datapath, basename+"_events.asc")

    print("> Attempt loading '%s' and '%s'"%(fname_samples, fname_events))
    ## read samples-file
    df=pd.read_table(fname_samples, index_col=False,
                  names=["time", "left_x", "left_y", "left_p",
                         "right_x", "right_y", "right_p"])
    left_x=df.left_x.values
    left_x[left_x=="   ."] = np.nan
    left_x = left_x.astype(float)
    df.left_x = left_x

    left_y=df.left_y.values
    left_y[left_y=="   ."] = np.nan
    left_y = left_y.astype(float)
    df.left_y = left_y

    right_x=df.right_x.values
    right_x[right_x=="   ."] = np.nan
    right_x = right_x.astype(float)
    df.right_x = right_x

    right_y=df.right_y.values
    right_y[right_y=="   ."] = np.nan
    right_y = right_y.astype(float)
    df.right_y = right_y

    ## read events-file
    # read the whole file into variable `events` (list with one entry per line)
    with open(fname_events) as f:
        events=f.readlines()

    # keep only lines starting with "MSG"
    events=[ev for ev in events if ev.startswith("MSG")]
    # remove events before experiment start
    experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
    events=events[experiment_start_index+1:]

    # re-arrange as described above
    df_ev=pd.DataFrame([ev.split() for ev in events])
    df_ev=df_ev[np.array(df_ev[4])==None][[1,2]]
    df_ev.columns=["time", "event"]

    # create `PupilData`-object
    d=pp.EyeData(right_pupil=df.right_p, left_pupil=df.left_p, time=df.time,
                   right_x=df.right_x, right_y=df.right_y,
                   left_x=df.left_x, left_y=df.left_y,
                   event_onsets=df_ev.time, event_labels=df_ev.event, name=edffile)
    return d

We can test this code by simply running the function with a certain filename located in datapath:

[19]:
d=read_dataset("test.edf")
> Attempt loading '../data/test_samples.asc' and '../data/test_events.asc'
/tmp/ipykernel_1346565/89281874.py:10: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
  df=pd.read_table(fname_samples, index_col=False,
pp: 17:57:29 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:29 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds

After that, we might want to save the final PupilData/EyeData objects as binary files that can be readily loaded back. This can be done using EyeData.write_file or pypillometry.write_pickle():

[20]:
fname=os.path.join(datapath, d.name+".pd")
print(fname)
d.write_file(fname)
../data/test.edf.pd

These datasets can be read back using the EyeData.from_file() method:

[21]:
d2=pp.EyeData.from_file(fname)
d2
[21]:
EyeData(test.edf, 38.1MiB):
 n                   : 1268585
 sampling_rate       : 500.0
 data                : ['left_x', 'left_y', 'left_pupil', 'right_x', 'right_y', 'right_pupil']
 nevents             : 1035
 screen_limits       : not set
 physical_screen_size: not set
 screen_eye_distance : not set
 duration_minutes    : 42.28616666666667
 start_min           : 56.431666666666665
 end_min             : 98.7178
 parameters          : {}
 glimpse             : EyeDataDict(vars=6,n=310151,shape=(310151,)):
  left_x (float64): 817.3, 817.0, 816.7, 816.7, 816.7...
  left_y (float64): 345.2, 343.5, 341.6, 340.4, 340.2...
  left_pupil (float64): 1707.0, 1706.0, 1705.0, 1706.0, 1707.0...
  right_x (float64): 860.6, 860.7, 861.2, 861.7, 861.6...
  right_y (float64): 375.2, 375.9, 376.6, 376.8, 376.9...
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 eyes                : ['left', 'right']
 nblinks             : {}
 blinks              : {'left': None, 'right': None}
 params              : {}
 History:
 *
 └ fill_time_discontinuities()
This file was created from the following Jupyter-notebook: importdata.ipynb
Interactive version: Binder badge