This file was created from the following Jupyter-notebook: docs/importdata.ipynb
Interactive version:

Importing Data Example¶

In order to import data into pypillometry, we have to load the data from the source using other packages and then wrap it into PupilData objects.

Here we will show and example where we translate a file recorded in Eyelinks EDF-format to a file readable by pandas.read_table().

First, we import the needed modules.

[2]:

import sys, os
sys.path.insert(0,"..") # this is not needed if you have installed pypillometry
import pypillometry as pp
import pandas as pd
import numpy as np
import pylab as plt

In this example, we use data recorded with an Eyelink-eyetracker. These eyetrackers store the files in binary files with the extension .edf. Some information about this file-format is here. We use a command-line utility released by Eyelink to convert this proprietory format into a more easily read .asc file that is a whitespace-separated plain-text format. The converter, edf2asc is a program that can be downloaded for different platforms from the Eyelink support forum. There is a GUI-based program for windows and command-line programs for linux and mac. Binaries of the command-line tools for linux and mac are included in pypillometry under this link.

On linux, we would call these programs on an example edf-file twice as follows.

[1]:

!../external/edf2asc-mac -y -s ../data/test.edf ../data/test_samples.asc
!../external/edf2asc-mac -y -e ../data/test.edf ../data/test_events.asc


EDF2ASC: EyeLink EDF file -> ASCII (text) file translator
EDF2ASC version 3.1 MacOS X Jul 13 2010
(c)1995-2009 by SR Research, last modified Jul 13 2010

processing file ../data/test.edf
=======================Preamble of file ../data/test.edf=======================
| DATE: Fri Feb 14 08:48:33 2020                                              |
| TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED                                   |
| VERSION: EYELINK II 1                                                       |
| SOURCE: EYELINK CL                                                          |
| EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)                      |
| CAMERA: EyeLink USBCAM Version 1.01                                         |
| SERIAL NUMBER: CLU-DAC49                                                    |
| CAMERA_CONFIG: DAC49200.SCD                                                 |
| Psychopy GC demo                                                            |
===============================================================================

Converted successfully: 0 events, 1245363 samples, 6 blocks.

EDF2ASC: EyeLink EDF file -> ASCII (text) file translator
EDF2ASC version 3.1 MacOS X Jul 13 2010
(c)1995-2009 by SR Research, last modified Jul 13 2010

processing file ../data/test.edf
=======================Preamble of file ../data/test.edf=======================
| DATE: Fri Feb 14 08:48:33 2020                                              |
| TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED                                   |
| VERSION: EYELINK II 1                                                       |
| SOURCE: EYELINK CL                                                          |
| EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)                      |
| CAMERA: EyeLink USBCAM Version 1.01                                         |
| SERIAL NUMBER: CLU-DAC49                                                    |
| CAMERA_CONFIG: DAC49200.SCD                                                 |
| Psychopy GC demo                                                            |
===============================================================================

Converted successfully: 36371 events, 0 samples, 6 blocks.

This results in two files, one containing all the samples and one all the recorded events.

[3]:

fname_samples="../data/test_samples.asc"
fname_events="../data/test_events.asc"

The samples-files contains a large table containing the timestamp, x/y-coordinates for the eyeposition and pupil-area for both the left and the right eye. Here are the first few rows of this file:

[4]:

!head $fname_samples

3385900   817.3   345.2  1707.0   860.6   375.2  1738.0 .....
3385902   817.0   343.5  1706.0   860.7   375.9  1739.0 .....
3385904   816.7   341.6  1705.0   861.2   376.6  1739.0 .....
3385906   816.7   340.4  1706.0   861.7   376.8  1740.0 .....
3385908   816.7   340.2  1707.0   861.6   376.9  1742.0 .....
3385910   816.8   340.2  1708.0   861.1   377.1  1743.0 .....
3385912   816.9   340.9  1708.0   860.7   377.5  1744.0 .....
3385914   816.1   342.1  1710.0   861.1   378.7  1745.0 .....
3385916   815.2   343.2  1712.0   862.5   380.0  1746.0 .....
3385918   814.4   343.6  1713.0   863.9   380.7  1747.0 .....

We can easily read this file using pandas.read_csv().

[5]:

df=pd.read_table(fname_samples, index_col=False,
                  names=["time", "left_x", "left_y", "left_p",
                         "right_x", "right_y", "right_p"])
df

[5]:

	time	left_x	left_y	left_p	right_x	right_y	right_p
0	3385900	817.3	345.2	1707.0	860.6	375.2	1738.0
1	3385902	817.0	343.5	1706.0	860.7	375.9	1739.0
2	3385904	816.7	341.6	1705.0	861.2	376.6	1739.0
3	3385906	816.7	340.4	1706.0	861.7	376.8	1740.0
4	3385908	816.7	340.2	1707.0	861.6	376.9	1742.0
...	...	...	...	...	...	...	...
1245358	5923060	.	.	0.0	.	.	0.0
1245359	5923062	.	.	0.0	.	.	0.0
1245360	5923064	.	.	0.0	.	.	0.0
1245361	5923066	.	.	0.0	.	.	0.0
1245362	5923068	.	.	0.0	.	.	0.0

1245363 rows × 7 columns

We can already use this information to create our PupilData-object. We simply pass in the pupil-area of the right eye (column right_p) and the timestamp-array from the samples-file (Note: we could just as easily have used the left eye or the mean of both):

[6]:

pp.PupilData(df.right_p, time=df.time, name="test")

> Filling in 5 gaps
[32.35   4.012  6.21   2.02   1.862] seconds

[6]:

PupilData(test, 135.5MiB):
 n                 : 1268585
 nmiss             : 212551
 perc_miss         : 16.75496714843704
 nevents           : 0
 nblinks           : 0
 ninterpolated     : 0
 blinks_per_min    : 0.0
 fs                : 500.0
 duration_minutes  : 42.28616666666667
 start_min         : 56.431666666666665
 end_min           : 98.7178
 baseline_estimated: False
 response_estimated: False
 History:
 *

Of course, this dataset is still missing the important information contained in the event-file which we will use for analysing trial-related pupil-diameter data. For that, we will have to read the events-file, which has a more complicated structure than the samples-file:

[7]:

!head -20 $fname_events

** CONVERTED FROM ../data/test.edf using edfapi 3.1 MacOS X Jul 13 2010 on Wed May 27 16:45:20 2020
** DATE: Fri Feb 14 08:48:33 2020
** TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED
** VERSION: EYELINK II 1
** SOURCE: EYELINK CL
** EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)
** CAMERA: EyeLink USBCAM Version 1.01
** SERIAL NUMBER: CLU-DAC49
** CAMERA_CONFIG: DAC49200.SCD
** Psychopy GC demo
**

INPUT   2767568 0
MSG     2784000 !CAL
>>>>>>> CALIBRATION (HV9,P-CR) FOR LEFT: <<<<<<<<<
MSG     2784000 !CAL Calibration points:
MSG     2784000 !CAL -29.4, -23.5        -0,     -2
MSG     2784000 !CAL -29.3, -35.7        -0,  -1544
MSG     2784000 !CAL -32.9, -10.4        -0,   1559
MSG     2784000 !CAL -49.7, -23.0     -2835,     -2

After a header (lines starting with ‘**’) containing meta-information, we get a sequence of “events” which have different formats for all rows. We are interested in lines starting with “MSG” because those contain our experimental markers. Therefore, we read the samples file and remove all rows that do not start with “MSG” first:

[8]:

# read the whole file into variable `events` (list with one entry per line)
with open(fname_events) as f:
    events=f.readlines()

# keep only lines starting with "MSG"
events=[ev for ev in events if ev.startswith("MSG")]
events[0:10]

[8]:

['MSG\t2784000 !CAL \n',
 'MSG\t2784000 !CAL Calibration points:  \n',
 'MSG\t2784000 !CAL -29.4, -23.5        -0,     -2   \n',
 'MSG\t2784000 !CAL -29.3, -35.7        -0,  -1544   \n',
 'MSG\t2784000 !CAL -32.9, -10.4        -0,   1559   \n',
 'MSG\t2784000 !CAL -49.7, -23.0     -2835,     -2   \n',
 'MSG\t2784000 !CAL -10.8, -27.4      2835,     -2   \n',
 'MSG\t2784000 !CAL -48.3, -33.3     -2818,  -1544   \n',
 'MSG\t2784000 !CAL -11.0, -34.2      2818,  -1544   \n',
 'MSG\t2784000 !CAL -56.2, -9.2     -2852,   1559   \n']

Next, we added an experimental marker that was sent as the experiment was started. This marker was called experiment_start. Hence, we can remove all events before this marker.

[9]:

experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
events=events[experiment_start_index+1:]
events[0:10]

[9]:

['MSG\t3387245 C_GW_1_1_UD_UD\n',
 'MSG\t3390421 F_GW_1_1_10_0\n',
 'MSG\t3392759 C_NW_1_2_UD_UD\n',
 'MSG\t3394293 R_NW_1_2_UD_UD\n',
 'MSG\t3395952 F_NW_1_2_-1_0\n',
 'MSG\t3397974 C_NA_1_3_UD_UD\n',
 'MSG\t3399892 R_NA_1_3_UD_UD\n',
 'MSG\t3400999 F_NA_1_3_-11_0\n',
 'MSG\t3403206 C_GA_1_4_UD_UD\n',
 'MSG\t3404640 R_GA_1_4_UD_UD\n']

This is in a format where we can convert it into a pandas.DataFrame object for further processing.

[10]:

df_ev=pd.DataFrame([ev.split() for ev in events])
df_ev

[10]:

	0	1	2	3	4	5	6	7	8
0	MSG	3387245	C_GW_1_1_UD_UD	None	None	None	None	None	None
1	MSG	3390421	F_GW_1_1_10_0	None	None	None	None	None	None
2	MSG	3392759	C_NW_1_2_UD_UD	None	None	None	None	None	None
3	MSG	3394293	R_NW_1_2_UD_UD	None	None	None	None	None	None
4	MSG	3395952	F_NW_1_2_-1_0	None	None	None	None	None	None
...	...	...	...	...	...	...	...	...	...
1065	MSG	5893078	V_UD_UD_16_UD_UD	None	None	None	None	None	None
1066	MSG	5899076	V_UD_UD_17_UD_UD	None	None	None	None	None	None
1067	MSG	5905073	V_UD_UD_18_UD_UD	None	None	None	None	None	None
1068	MSG	5911072	V_UD_UD_19_UD_UD	None	None	None	None	None	None
1069	MSG	5917071	V_UD_UD_20_UD_UD	None	None	None	None	None	None

1070 rows × 9 columns

In this table, the second column contains the time-stamp (identical to the time-stamp in the samples file), and the third column contains our custom markers (the format like “C_GW_1_1_UD_UD” and so on is specific for our experimental design). There are many more columns which seem to contain no information in our samples. Let’s check what those columns are for by printing the rows in our data-frame where these columns are not None:

[11]:

df_ev[np.array(df_ev[4])!=None].head()

[11]:

	0	1	2	3	4	5	6	7	8
209	MSG	3900393	RECCFG	CR	500	2	1	LR	None
211	MSG	3900393	GAZE_COORDS	0.00	0.00	1919.00	1079.00	None	None
212	MSG	3900393	THRESHOLDS	L	56	231	R	66	239
213	MSG	3900393	ELCL_WINDOW_SIZES	176	188	0	0	None	None
215	MSG	3900393	ELCL_PROC	CENTROID	(3)	None	None	None	None

Apparently, there are more eye-tracker specific markers in our files (in this case due to drift-checks during the experiments). We can safely drop those from our set of interesting events by dropping all rows in which the fourth column is not None and then dropping all non-interesting columns.

[12]:

df_ev=df_ev[np.array(df_ev[4])==None][[1,2]]
df_ev.columns=["time", "event"]
df_ev

[12]:

	time	event
0	3387245	C_GW_1_1_UD_UD
1	3390421	F_GW_1_1_10_0
2	3392759	C_NW_1_2_UD_UD
3	3394293	R_NW_1_2_UD_UD
4	3395952	F_NW_1_2_-1_0
...	...	...
1065	5893078	V_UD_UD_16_UD_UD
1066	5899076	V_UD_UD_17_UD_UD
1067	5905073	V_UD_UD_18_UD_UD
1068	5911072	V_UD_UD_19_UD_UD
1069	5917071	V_UD_UD_20_UD_UD

1035 rows × 2 columns

Finally, we can pass those event-markers into our PupilData-object.

[13]:

d=pp.PupilData(df.right_p, time=df.time, event_onsets=df_ev.time, event_labels=df_ev.event, name="test")
d

> Filling in 5 gaps
[32.35   4.012  6.21   2.02   1.862] seconds

[13]:

PupilData(test, 135.5MiB):
 n                 : 1268585
 nmiss             : 212551
 perc_miss         : 16.75496714843704
 nevents           : 1035
 nblinks           : 0
 ninterpolated     : 0
 blinks_per_min    : 0.0
 fs                : 500.0
 duration_minutes  : 42.28616666666667
 start_min         : 56.431666666666665
 end_min           : 98.7178
 baseline_estimated: False
 response_estimated: False
 History:
 *

The summary of the dataset shows us that the eyetracker started recording at time=56.4 minutes. We can reset the time index to start with 0 by using the reset_time() function.

[14]:

d=d.reset_time().blinks_detect()

Now we can store away this dataset in pypillometry-format and use all the pypillometry-functions on it, e.g., plot a minute of this dataset.

[16]:

d.sub_slice(4, 6, units="min").drop_original().write_file("../data/test.pd")

[15]:

plt.figure(figsize=(15,5));
d.plot((4, 5), units="min")

Generalize to multiple similar datasets¶

Now that we have successfully found a way to create our PupilData structure from the raw .EDF files, we can wrap the code from this notebook into an easily accessible function that creates PupilData objects for a given .EDF file that has the same structure.

We simply create a function that takes the name of an EDF-file as input and runs all the code above, returning the final PupilData object. For convenience, we will assume that the EDF2ASC utility has already run such that .asc files are already available (see above for details).

[42]:

datapath="../data" ## this is where the datafiles are located

def read_dataset(edffile):
    basename=os.path.splitext(edffile)[0] ## remove .edf from filename
    fname_samples=os.path.join(datapath, basename+"_samples.asc")
    fname_events=os.path.join(datapath, basename+"_events.asc")

    print("> Attempt loading '%s' and '%s'"%(fname_samples, fname_events))
    ## read samples-file
    df=pd.read_table(fname_samples, index_col=False,
                  names=["time", "left_x", "left_y", "left_p",
                         "right_x", "right_y", "right_p"])

    ## read events-file
    # read the whole file into variable `events` (list with one entry per line)
    with open(fname_events) as f:
        events=f.readlines()

    # keep only lines starting with "MSG"
    events=[ev for ev in events if ev.startswith("MSG")]
    # remove events before experiment start
    experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
    events=events[experiment_start_index+1:]

    # re-arrange as described above
    df_ev=pd.DataFrame([ev.split() for ev in events])
    df_ev=df_ev[np.array(df_ev[4])==None][[1,2]]
    df_ev.columns=["time", "event"]

    # create `PupilData`-object
    d=pp.PupilData(df.right_p, time=df.time, event_onsets=df_ev.time, event_labels=df_ev.event, name=edffile)
    return d

We can test this code by simply running the function with a certain filename located in datapath:

[43]:

read_dataset("test.edf")

> Attempt loading '../data/test_samples.asc' and '../data/test_events.asc'
> Filling in 5 gaps
[32.35   4.012  6.21   2.02   1.862] seconds

[43]:

PupilData(test.edf, 135.5MiB):
 n                 : 1268585
 nmiss             : 212551
 perc_miss         : 16.75496714843704
 nevents           : 1035
 nblinks           : 0
 ninterpolated     : 0
 blinks_per_min    : 0.0
 fs                : 500.0
 duration_minutes  : 42.28616666666667
 start_min         : 56.431666666666665
 end_min           : 98.7178
 baseline_estimated: False
 response_estimated: False
 History:
 *

Storing/Loading several datasets¶

So now it is easy to read a set of datasets into a Python list from the same experimental setup with a simple loop, e.g.,

[46]:

files=["test.edf", "test2.edf", "test3.edf"]
datasets=[read_dataset(fname) for fname in files]

> Attempt loading '../data/test_samples.asc' and '../data/test_events.asc'
> Filling in 5 gaps
[32.35   4.012  6.21   2.02   1.862] seconds

After that, we might want to save the final PupilData objects as .pd files that can be readily loaded back. Here, we loop through the list of datasets and store each of them in separate files using the name attribute of the object as filename.

[47]:

for ds in datasets:
    fname=os.path.join(datapath, ds.name+".pd")
    ds.write_file(fname)

These datasets can be read back using the PupilData.from_file() method:

[58]:

# all filenames in `datapath` that end with `.pd`
pd_files=[fname for fname in os.listdir(datapath) if fname.endswith(".pd")]
datasets=[]
for fname in pd_files:
    fname=os.path.join(datapath, fname)
    d=pp.PupilData.from_file(fname)
    datasets.append(d)

It is also possible to store the whole list as a single file by using the pd_write_pickle()-function:

[60]:

pp.pd_write_pickle(datasets, "full_dataset.pd")

which can be read-back using the pd_read_pickle() function like so:

[61]:

datasets=pp.pd_read_pickle("full_dataset.pd")

This file was created from the following Jupyter-notebook: docs/importdata.ipynb
Interactive version:

Importing Data Example¶

Generalize to multiple similar datasets¶

Storing/Loading several datasets¶

Table of Contents

Previous topic

Next topic

This Page