This file was created from the following Jupyter-notebook: importdata.ipynb
Interactive version:

Importing Data Example¶

In order to import data into pypillometry, we have to load the data from the source using other packages and then wrap it into PupilData objects.

Here we will show and example where we translate a file recorded in Eyelinks EDF-format to a file readable by pandas.read_table().

First, we import the needed modules.

[1]:

import sys, os
sys.path.insert(0,"..") # this is not needed if you have installed pypillometry
import pypillometry as pp
import pandas as pd
import numpy as np
import pylab as plt

In this example, we use data recorded with an Eyelink-eyetracker. These eyetrackers store the files in binary files with the extension .edf. Some information about this file-format is here. We use a command-line utility released by Eyelink to convert this proprietory format into a more easily read .asc file that is a whitespace-separated plain-text format. The converter, edf2asc is a program that can be downloaded for different platforms from the Eyelink support forum. There is a GUI-based program for windows and command-line programs for linux and mac. Binaries of the command-line tools for linux and mac are included in pypillometry under this link.

On linux, we would call these programs on an example edf-file twice as follows.

[2]:

!../external/edf2asc-linux -y -s ../data/test.edf ../data/test_samples.asc
!../external/edf2asc-linux -y -e ../data/test.edf ../data/test_events.asc


EDF2ASC: EyeLink EDF file -> ASCII (text) file translator
EDF2ASC version 3.0 Linux Dec  1 2008
(c)1995-2007 by SR Research, last modified Dec  1 2008

processing file ../data/test.edf
=======================Preamble of file ../data/test.edf=======================
| DATE: Fri Feb 14 08:48:33 2020                                              |
| TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED                                   |
| VERSION: EYELINK II 1                                                       |
| SOURCE: EYELINK CL                                                          |
| EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)                      |
| CAMERA: EyeLink USBCAM Version 1.01                                         |
| SERIAL NUMBER: CLU-DAC49                                                    |
| CAMERA_CONFIG: DAC49200.SCD                                                 |
| Psychopy GC demo                                                            |
===============================================================================

Converted successfully: 0 events, 1245363 samples, 6 blocks.

EDF2ASC: EyeLink EDF file -> ASCII (text) file translator
EDF2ASC version 3.0 Linux Dec  1 2008
(c)1995-2007 by SR Research, last modified Dec  1 2008

processing file ../data/test.edf
=======================Preamble of file ../data/test.edf=======================
| DATE: Fri Feb 14 08:48:33 2020                                              |
| TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED                                   |
| VERSION: EYELINK II 1                                                       |
| SOURCE: EYELINK CL                                                          |
| EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)                      |
| CAMERA: EyeLink USBCAM Version 1.01                                         |
| SERIAL NUMBER: CLU-DAC49                                                    |
| CAMERA_CONFIG: DAC49200.SCD                                                 |
| Psychopy GC demo                                                            |
===============================================================================

Converted successfully: 37139 events, 0 samples, 6 blocks.

This results in two files, one containing all the samples and one all the recorded events.

[3]:

fname_samples="../data/test_samples.asc"
fname_events="../data/test_events.asc"

The samples-files contains a large table containing the timestamp, x/y-coordinates for the eyeposition and pupil-area for both the left and the right eye. Here are the first few rows of this file:

[4]:

!head $fname_samples

3385900   817.3   345.2  1707.0   860.6   375.2  1738.0 .....
3385902   817.0   343.5  1706.0   860.7   375.9  1739.0 .....
3385904   816.7   341.6  1705.0   861.2   376.6  1739.0 .....
3385906   816.7   340.4  1706.0   861.7   376.8  1740.0 .....
3385908   816.7   340.2  1707.0   861.6   376.9  1742.0 .....
3385910   816.8   340.2  1708.0   861.1   377.1  1743.0 .....
3385912   816.9   340.9  1708.0   860.7   377.5  1744.0 .....
3385914   816.1   342.1  1710.0   861.1   378.7  1745.0 .....
3385916   815.2   343.2  1712.0   862.5   380.0  1746.0 .....
3385918   814.4   343.6  1713.0   863.9   380.7  1747.0 .....

We can easily read this file using pandas.read_csv().

[5]:

df=pd.read_table(fname_samples, index_col=False,
                  names=["time", "left_x", "left_y", "left_p",
                         "right_x", "right_y", "right_p"])
left_x=df.left_x.values
left_x[left_x=="   ."] = np.nan
left_x = left_x.astype(float)
df.left_x = left_x

left_y=df.left_y.values
left_y[left_y=="   ."] = np.nan
left_y = left_y.astype(float)
df.left_y = left_y

right_x=df.right_x.values
right_x[right_x=="   ."] = np.nan
right_x = right_x.astype(float)
df.right_x = right_x

right_y=df.right_y.values
right_y[right_y=="   ."] = np.nan
right_y = right_y.astype(float)
df.right_y = right_y


df

/tmp/ipykernel_1346565/154614805.py:1: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
  df=pd.read_table(fname_samples, index_col=False,

[5]:

	time	left_x	left_y	left_p	right_x	right_y	right_p
0	3385900	817.3	345.2	1707.0	860.6	375.2	1738.0
1	3385902	817.0	343.5	1706.0	860.7	375.9	1739.0
2	3385904	816.7	341.6	1705.0	861.2	376.6	1739.0
3	3385906	816.7	340.4	1706.0	861.7	376.8	1740.0
4	3385908	816.7	340.2	1707.0	861.6	376.9	1742.0
...	...	...	...	...	...	...	...
1245358	5923060	NaN	NaN	0.0	NaN	NaN	0.0
1245359	5923062	NaN	NaN	0.0	NaN	NaN	0.0
1245360	5923064	NaN	NaN	0.0	NaN	NaN	0.0
1245361	5923066	NaN	NaN	0.0	NaN	NaN	0.0
1245362	5923068	NaN	NaN	0.0	NaN	NaN	0.0

1245363 rows × 7 columns

We can already use this information to create our PupilData-object. We simply pass in the pupil-area of the right eye (column right_p) and the timestamp-array from the samples-file (Note: we could just as easily have used the left eye or the mean of both):

[6]:

pp.PupilData(right_pupil=df.right_p, left_pupil=df.left_p, time=df.time, name="test")

pp: 17:57:11 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:11 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds

[6]:

PupilData(test, 19.1MiB):
 n               : 1268585
 sampling_rate   : 500.0
 eyes            : ['left', 'right']
 data            : ['left_pupil', 'right_pupil']
 nevents         : 0
 nblinks         : {}
 blinks          : {'left': None, 'right': None}
 duration_minutes: 42.28616666666667
 start_min       : 56.431666666666665
 end_min         : 98.7178
 params          : {}
 glimpse         : EyeDataDict(vars=2,n=310151,shape=(310151,)):
  left_pupil (float64): 1707.0, 1706.0, 1705.0, 1706.0, 1707.0...
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 History:
 *
 └ fill_time_discontinuities()

We can also import the eye-tracking data from the same file if desired. In that case, we would use the EyeData class:

[7]:

pp.EyeData(right_pupil=df.right_p, left_pupil=df.left_p,
             right_x=df.right_x, right_y=df.right_y,
             left_x=df.left_x, left_y=df.left_y,
             time=df.time, name="test")

pp: 17:57:15 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:15 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds

[7]:

EyeData(test, 38.1MiB):
 n                   : 1268585
 sampling_rate       : 500.0
 data                : ['left_x', 'left_y', 'left_pupil', 'right_x', 'right_y', 'right_pupil']
 nevents             : 0
 screen_limits       : not set
 physical_screen_size: not set
 screen_eye_distance : not set
 duration_minutes    : 42.28616666666667
 start_min           : 56.431666666666665
 end_min             : 98.7178
 parameters          : {}
 glimpse             : EyeDataDict(vars=6,n=310151,shape=(310151,)):
  left_x (float64): 817.3, 817.0, 816.7, 816.7, 816.7...
  left_y (float64): 345.2, 343.5, 341.6, 340.4, 340.2...
  left_pupil (float64): 1707.0, 1706.0, 1705.0, 1706.0, 1707.0...
  right_x (float64): 860.6, 860.7, 861.2, 861.7, 861.6...
  right_y (float64): 375.2, 375.9, 376.6, 376.8, 376.9...
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 eyes                : ['left', 'right']
 nblinks             : {}
 blinks              : {'left': None, 'right': None}
 params              : {}
 History:
 *
 └ fill_time_discontinuities()

Of course, this dataset is still missing the important information contained in the event-file which we will use for analysing trial-related pupil-diameter data. For that, we will have to read the events-file, which has a more complicated structure than the samples-file:

[8]:

!head -20 $fname_events

** CONVERTED FROM ../data/test.edf using edfapi 3.0 Linux Dec  1 2008 on Mon May 12 17:57:07 2025
** DATE: Fri Feb 14 08:48:33 2020
** TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED
** VERSION: EYELINK II 1
** SOURCE: EYELINK CL
** EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)
** CAMERA: EyeLink USBCAM Version 1.01
** SERIAL NUMBER: CLU-DAC49
** CAMERA_CONFIG: DAC49200.SCD
** Psychopy GC demo
**

MSG     2728855 DISPLAY_COORDS = 0 0 1919 1079
INPUT   2767568 0
MSG     2784000 !CAL
>>>>>>> CALIBRATION (HV9,P-CR) FOR LEFT: <<<<<<<<<
MSG     2784000 !CAL Calibration points:
MSG     2784000 !CAL -29.4, -23.5        -0,     -2
MSG     2784000 !CAL -29.3, -35.7        -0,  -1544
MSG     2784000 !CAL -32.9, -10.4        -0,   1559

After a header (lines starting with ‘**’) containing meta-information, we get a sequence of “events” which have different formats for all rows. We are interested in lines starting with “MSG” because those contain our experimental markers. Therefore, we read the samples file and remove all rows that do not start with “MSG” first:

[9]:

# read the whole file into variable `events` (list with one entry per line)
with open(fname_events) as f:
    events=f.readlines()

# keep only lines starting with "MSG"
events=[ev for ev in events if ev.startswith("MSG")]
events[0:10]

[9]:

['MSG\t2728855 DISPLAY_COORDS = 0 0 1919 1079\n',
 'MSG\t2784000 !CAL \n',
 'MSG\t2784000 !CAL Calibration points:  \n',
 'MSG\t2784000 !CAL -29.4, -23.5        -0,     -2   \n',
 'MSG\t2784000 !CAL -29.3, -35.7        -0,  -1544   \n',
 'MSG\t2784000 !CAL -32.9, -10.4        -0,   1559   \n',
 'MSG\t2784000 !CAL -49.7, -23.0     -2835,     -2   \n',
 'MSG\t2784000 !CAL -10.8, -27.4      2835,     -2   \n',
 'MSG\t2784000 !CAL -48.3, -33.3     -2818,  -1544   \n',
 'MSG\t2784000 !CAL -11.0, -34.2      2818,  -1544   \n']

Next, we added an experimental marker that was sent as the experiment was started. This marker was called experiment_start. Hence, we can remove all events before this marker.

[10]:

experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
events=events[experiment_start_index+1:]
events[0:10]

[10]:

['MSG\t3387245 C_GW_1_1_UD_UD\n',
 'MSG\t3390421 F_GW_1_1_10_0\n',
 'MSG\t3392759 C_NW_1_2_UD_UD\n',
 'MSG\t3394293 R_NW_1_2_UD_UD\n',
 'MSG\t3395952 F_NW_1_2_-1_0\n',
 'MSG\t3397974 C_NA_1_3_UD_UD\n',
 'MSG\t3399892 R_NA_1_3_UD_UD\n',
 'MSG\t3400999 F_NA_1_3_-11_0\n',
 'MSG\t3403206 C_GA_1_4_UD_UD\n',
 'MSG\t3404640 R_GA_1_4_UD_UD\n']

This is in a format where we can convert it into a pandas.DataFrame object for further processing.

[11]:

df_ev=pd.DataFrame([ev.split() for ev in events])
df_ev

[11]:

	0	1	2	3	4	5	6	7	8
0	MSG	3387245	C_GW_1_1_UD_UD	None	None	None	None	None	None
1	MSG	3390421	F_GW_1_1_10_0	None	None	None	None	None	None
2	MSG	3392759	C_NW_1_2_UD_UD	None	None	None	None	None	None
3	MSG	3394293	R_NW_1_2_UD_UD	None	None	None	None	None	None
4	MSG	3395952	F_NW_1_2_-1_0	None	None	None	None	None	None
...	...	...	...	...	...	...	...	...	...
1065	MSG	5893078	V_UD_UD_16_UD_UD	None	None	None	None	None	None
1066	MSG	5899076	V_UD_UD_17_UD_UD	None	None	None	None	None	None
1067	MSG	5905073	V_UD_UD_18_UD_UD	None	None	None	None	None	None
1068	MSG	5911072	V_UD_UD_19_UD_UD	None	None	None	None	None	None
1069	MSG	5917071	V_UD_UD_20_UD_UD	None	None	None	None	None	None

1070 rows × 9 columns

In this table, the second column contains the time-stamp (identical to the time-stamp in the samples file), and the third column contains our custom markers (the format like “C_GW_1_1_UD_UD” and so on is specific for our experimental design). There are many more columns which seem to contain no information in our samples. Let’s check what those columns are for by printing the rows in our data-frame where these columns are not None:

[12]:

df_ev[np.array(df_ev[4])!=None].head()

[12]:

	0	1	2	3	4	5	6	7	8
209	MSG	3900393	RECCFG	CR	500	2	1	LR	None
211	MSG	3900393	GAZE_COORDS	0.00	0.00	1919.00	1079.00	None	None
212	MSG	3900393	THRESHOLDS	L	56	231	R	66	239
213	MSG	3900393	ELCL_WINDOW_SIZES	176	188	0	0	None	None
215	MSG	3900393	ELCL_PROC	CENTROID	(3)	None	None	None	None

Apparently, there are more eye-tracker specific markers in our files (in this case due to drift-checks during the experiments). We can safely drop those from our set of interesting events by dropping all rows in which the fourth column is not None and then dropping all non-interesting columns.

[13]:

df_ev=df_ev[np.array(df_ev[4])==None][[1,2]]
df_ev.columns=["time", "event"]
df_ev

[13]:

	time	event
0	3387245	C_GW_1_1_UD_UD
1	3390421	F_GW_1_1_10_0
2	3392759	C_NW_1_2_UD_UD
3	3394293	R_NW_1_2_UD_UD
4	3395952	F_NW_1_2_-1_0
...	...	...
1065	5893078	V_UD_UD_16_UD_UD
1066	5899076	V_UD_UD_17_UD_UD
1067	5905073	V_UD_UD_18_UD_UD
1068	5911072	V_UD_UD_19_UD_UD
1069	5917071	V_UD_UD_20_UD_UD

1035 rows × 2 columns

Finally, we can pass those event-markers into our PupilData, EyeData or GazeData-object.

[14]:

d=pp.PupilData(right_pupil=df.right_p, time=df.time, event_onsets=df_ev.time, event_labels=df_ev.event, name="test")
d

pp: 17:57:21 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:21 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds

[14]:

PupilData(test, 14.4MiB):
 n               : 1268585
 sampling_rate   : 500.0
 eyes            : ['right']
 data            : ['right_pupil']
 nevents         : 1035
 nblinks         : {}
 blinks          : {'right': None}
 duration_minutes: 42.28616666666667
 start_min       : 56.431666666666665
 end_min         : 98.7178
 params          : {}
 glimpse         : EyeDataDict(vars=1,n=310151,shape=(310151,)):
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 History:
 *
 └ fill_time_discontinuities()

The summary of the dataset shows us that the eyetracker started recording at time=56.4 minutes. We can reset the time index to start with 0 by using the reset_time() function.

[15]:

d=d.reset_time().pupil_blinks_detect()

Now we can store away this dataset in pypillometry-format and use all the pypillometry-functions on it, e.g., plot a minute of this dataset.

[16]:

d.sub_slice(4, 6, units="min")

[16]:

PupilData(test, 1.4MiB):
 n               : 60001
 sampling_rate   : 500.0
 eyes            : ['right']
 data            : ['right_pupil']
 nevents         : 56
 nblinks         : {'right_pupil': 150}
 blinks          : {'right': 150 intervals, 1455.23 +/- 11522.47, [22.00, 141022.00]}
 duration_minutes: 2.0000333333333336
 start_min       : 4.0
 end_min         : 6.0
 params          : {}
 glimpse         : EyeDataDict(vars=1,n=60001,shape=(60001,)):
  right_pupil (float64): 1684.0, 1683.0, 1682.0, 1681.0, 1682.0...

 History:
 *
 └ fill_time_discontinuities()
  └ reset_time()
   └ pupil_blinks_detect()
    └ sub_slice(4,6,units=min)

[17]:

plt.figure(figsize=(15,5));
d.plot.pupil_plot(plot_range=(4, 5), units="min")

Generalize to multiple similar datasets¶

Now that we have successfully found a way to create our PupilData structure from the raw .EDF files, we can wrap the code from this notebook into an easily accessible function that creates PupilData objects for a given .EDF file that has the same structure.

We simply create a function that takes the name of an EDF-file as input and runs all the code above, returning the final PupilData object. For convenience, we will assume that the EDF2ASC utility has already run such that .asc files are already available (see above for details).

[18]:

datapath="../data" ## this is where the datafiles are located

def read_dataset(edffile):
    basename=os.path.splitext(edffile)[0] ## remove .edf from filename
    fname_samples=os.path.join(datapath, basename+"_samples.asc")
    fname_events=os.path.join(datapath, basename+"_events.asc")

    print("> Attempt loading '%s' and '%s'"%(fname_samples, fname_events))
    ## read samples-file
    df=pd.read_table(fname_samples, index_col=False,
                  names=["time", "left_x", "left_y", "left_p",
                         "right_x", "right_y", "right_p"])
    left_x=df.left_x.values
    left_x[left_x=="   ."] = np.nan
    left_x = left_x.astype(float)
    df.left_x = left_x

    left_y=df.left_y.values
    left_y[left_y=="   ."] = np.nan
    left_y = left_y.astype(float)
    df.left_y = left_y

    right_x=df.right_x.values
    right_x[right_x=="   ."] = np.nan
    right_x = right_x.astype(float)
    df.right_x = right_x

    right_y=df.right_y.values
    right_y[right_y=="   ."] = np.nan
    right_y = right_y.astype(float)
    df.right_y = right_y

    ## read events-file
    # read the whole file into variable `events` (list with one entry per line)
    with open(fname_events) as f:
        events=f.readlines()

    # keep only lines starting with "MSG"
    events=[ev for ev in events if ev.startswith("MSG")]
    # remove events before experiment start
    experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
    events=events[experiment_start_index+1:]

    # re-arrange as described above
    df_ev=pd.DataFrame([ev.split() for ev in events])
    df_ev=df_ev[np.array(df_ev[4])==None][[1,2]]
    df_ev.columns=["time", "event"]

    # create `PupilData`-object
    d=pp.EyeData(right_pupil=df.right_p, left_pupil=df.left_p, time=df.time,
                   right_x=df.right_x, right_y=df.right_y,
                   left_x=df.left_x, left_y=df.left_y,
                   event_onsets=df_ev.time, event_labels=df_ev.event, name=edffile)
    return d

We can test this code by simply running the function with a certain filename located in datapath:

[19]:

d=read_dataset("test.edf")

> Attempt loading '../data/test_samples.asc' and '../data/test_events.asc'

/tmp/ipykernel_1346565/89281874.py:10: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
  df=pd.read_table(fname_samples, index_col=False,
pp: 17:57:29 | INFO     | fill_time_discontinuities:636 | Filling in 5 gaps
pp: 17:57:29 | INFO     | fill_time_discontinuities:638 | [32.35   4.012  6.21   2.02   1.862] seconds

After that, we might want to save the final PupilData/EyeData objects as binary files that can be readily loaded back. This can be done using EyeData.write_file or pypillometry.write_pickle():

[20]:

fname=os.path.join(datapath, d.name+".pd")
print(fname)
d.write_file(fname)

../data/test.edf.pd

These datasets can be read back using the EyeData.from_file() method:

[21]:

d2=pp.EyeData.from_file(fname)
d2

[21]:

EyeData(test.edf, 38.1MiB):
 n                   : 1268585
 sampling_rate       : 500.0
 data                : ['left_x', 'left_y', 'left_pupil', 'right_x', 'right_y', 'right_pupil']
 nevents             : 1035
 screen_limits       : not set
 physical_screen_size: not set
 screen_eye_distance : not set
 duration_minutes    : 42.28616666666667
 start_min           : 56.431666666666665
 end_min             : 98.7178
 parameters          : {}
 glimpse             : EyeDataDict(vars=6,n=310151,shape=(310151,)):
  left_x (float64): 817.3, 817.0, 816.7, 816.7, 816.7...
  left_y (float64): 345.2, 343.5, 341.6, 340.4, 340.2...
  left_pupil (float64): 1707.0, 1706.0, 1705.0, 1706.0, 1707.0...
  right_x (float64): 860.6, 860.7, 861.2, 861.7, 861.6...
  right_y (float64): 375.2, 375.9, 376.6, 376.8, 376.9...
  right_pupil (float64): 1738.0, 1739.0, 1739.0, 1740.0, 1742.0...

 eyes                : ['left', 'right']
 nblinks             : {}
 blinks              : {'left': None, 'right': None}
 params              : {}
 History:
 *
 └ fill_time_discontinuities()

This file was created from the following Jupyter-notebook: importdata.ipynb
Interactive version:

Importing Data Example¶

Generalize to multiple similar datasets¶

Table of Contents

Previous topic

Next topic

This Page