Sharing/Importing study data

pypillometry provides functionality to easily load a complete study from a local cache directory and a user-provided configuration file. That way, scripts can avoid to include lengthy code for parsing the raw data files. Once the configuation script is in place, the user can load the data using the load_study_local() function:

import pypillometry as pp
study,conf = pp.load_study_local(path="./data", config="pypillometry_conf.py")

In addition, you can upload the data and configuration file to the Open Science Framework (OSF) and use the following function to load the data:

study,conf = pp.load_study_osf(osf_id="your_project_id", path="./data")

(in that case, the data will be downloaded and cached in the specified path).

Creating a configuration file

A configuration file is a Python script that defines the data files, contains meta-data about the study and implements a function to read a dataset. The standard name for the configuration file is pypillometry_conf.py but you can use any name you want.

The configuration file (pypillometry_conf.py) should define:

  • raw_data: Dictionary mapping subject IDs to their data files

  • read_subject() function that processes the raw data files

Here is an template for a configuration file (pypillometry_conf.py), see examples/pypillometry_conf.py for a complete example:

"""
Configuration file for my study.
"""
import pypillometry as pp
# other relevant imports

# study metadata (optional)
study_info = {
    "name": "My new study",
    "description": "...",
    ... # other metadata
}

# Dictionary of raw data files to be downloaded (required)
# Keys are participant IDs, values are dictionaries containing paths to raw data files
raw_data = {
    "001": {
        "events": "data/eyedata/asc/001_rlmw_events.asc",
        "samples": "data/eyedata/asc/001_rlmw_samples.asc"
    },
    "002": {
        ...
    }
}

# Function to read a subject's data from the raw data files (required)
def read_subject(info):
    """
    Read a subject's data from the raw data files.
    """
    # code for reading the data
    return pp.EyeData(...)

One you load the study data, the function read_subject() will be called each entry of the raw_data dictionary (usually one per subject but it can also be multiple entries per subject, e.g., if you have multiple tasks per subject). The way the raw_data dictionary and the read_subject() function are defined is completely up to you. Simply assure that the read_subject() function can process the data files specified in the raw_data dictionary.

The configuration file will be imported as a module and returned as the second argument of the load_study_local() and load_study_osf() functions. Therefore, you can access all variables and functions defined in the configuration file:

study,conf = pp.load_study_local(path="./data", config="pypillometry_conf.py")
study["001"] # access data for subject 001
conf.study_info # access study metadata
conf.read_subject # access function to read data

The functions load_study_local() and load_study_osf() a;so allow to specify a list of subject codes that are to be loaded (default is to load all subjects).

Sharing Your Study on OSF

To share your study on OSF:

  1. Create a new project on [Open Science Framework](https://osf.io)

  2. Upload your study data files and configuration file (pypillometry_conf.py) to the project
    • the data files must be arranged in a folder structure that matches the one specified in the raw_data dictionary of the configuration file

    • NOTE: It is completely fine to upload the data files to an already existing OSF project (simply add a subfolder for the data/configuration file)

  3. Note down your project’s OSF ID (found in the project URL)

To load a shared study from OSF, use load_study_osf():

from pypillometry import load_study_osf

# Load all subjects
study_data = load_study_osf(
    osf_id="your_project_id",
    path="local/cache/path"
)

# Load specific subjects
study_data = load_study_osf(
    osf_id="your_project_id",
    path="local/cache/path",
    subjects=["sub01", "sub02"]
)

The function will:

  1. Download the project’s configuration file

  2. Download the required data files for each subject

  3. Process the data using the configuration’s read_subject() function

  4. Return a dictionary mapping subject IDs to their processed data

Files are cached locally in the specified path to avoid repeated downloads.

Example configuration file

Here is an example configuration file (pypillometry_conf.py) that could be used to share a study:

  1"""
  2Configuration file for the RLMW study.
  3This file contains information about the raw data files and how to read them.
  4"""
  5import pypillometry as pp
  6import pandas as pd
  7import os
  8import numpy as np
  9
 10
 11# Additional study metadata
 12study_info = {
 13    "name": "RLMW Study",
 14    "osf_id": "ca95r",
 15    "description": "Reinforcement learning study with mind wandering probes",
 16    "author": "Matthias Mittner",
 17    "doi": "",
 18    "date": "2024-04-10",
 19    "sampling_rate": 1000.0,  # Hz
 20    "time_unit": "ms",
 21    "screen_eye_distance": 60, # cm (distance between screen and eye)
 22    "screen_resolution": (1280,1024),  # pixels (width, height)
 23    "physical_screen_size": (30, 20) # cm (width, height)
 24}
 25
 26
 27# Dictionary of raw data files to be downloaded
 28# Keys are participant IDs, values are dictionaries containing paths to .asc files
 29raw_data = {
 30    "001": {
 31        "events": "data/eyedata/asc/001_rlmw_events.asc",
 32        "samples": "data/eyedata/asc/001_rlmw_samples.asc"
 33    },
 34    "002": {
 35        "events": "data/eyedata/asc/002_rlmw_events.asc",
 36        "samples": "data/eyedata/asc/002_rlmw_samples.asc"
 37    },
 38    "003": {
 39        "events": "data/eyedata/asc/003_rlmw_events.asc",
 40        "samples": "data/eyedata/asc/003_rlmw_samples.asc"
 41    },
 42    "004": {
 43        "events": "data/eyedata/asc/004_rlmw_events.asc",
 44        "samples": "data/eyedata/asc/004_rlmw_samples.asc"
 45    },
 46    "005": {
 47        "events": "data/eyedata/asc/005_rlmw_events.asc",
 48        "samples": "data/eyedata/asc/005_rlmw_samples.asc"
 49    },
 50    "006": {
 51        "events": "data/eyedata/asc/006_rlmw_events.asc",
 52        "samples": "data/eyedata/asc/006_rlmw_samples.asc"
 53    },
 54    "007": {
 55        "events": "data/eyedata/asc/007_rlmw_events.asc",
 56        "samples": "data/eyedata/asc/007_rlmw_samples.asc"
 57    },
 58    "008": {
 59        "events": "data/eyedata/asc/008_rlmw_events.asc",
 60        "samples": "data/eyedata/asc/008_rlmw_samples.asc"
 61    },
 62    "009": {
 63        "events": "data/eyedata/asc/009_rlmw_events.asc",
 64        "samples": "data/eyedata/asc/009_rlmw_samples.asc"
 65    },
 66    "010": {
 67        "events": "data/eyedata/asc/010_rlmw_events.asc",
 68        "samples": "data/eyedata/asc/010_rlmw_samples.asc"
 69    },
 70    "011": {
 71        "events": "data/eyedata/asc/011_rlmw_events.asc",
 72        "samples": "data/eyedata/asc/011_rlmw_samples.asc"
 73    },
 74    "012": {
 75        "events": "data/eyedata/asc/012_rlmw_events.asc",
 76        "samples": "data/eyedata/asc/012_rlmw_samples.asc"
 77    },
 78    "013": {
 79        "events": "data/eyedata/asc/013_rlmw_events.asc",
 80        "samples": "data/eyedata/asc/013_rlmw_samples.asc"
 81    },
 82    "014": {
 83        "events": "data/eyedata/asc/014_rlmw_events.asc",
 84        "samples": "data/eyedata/asc/014_rlmw_samples.asc"
 85    },
 86    "015": {
 87        "events": "data/eyedata/asc/015_rlmw_events.asc",
 88        "samples": "data/eyedata/asc/015_rlmw_samples.asc"
 89    },
 90    "016": {
 91        "events": "data/eyedata/asc/016_rlmw_events.asc",
 92        "samples": "data/eyedata/asc/016_rlmw_samples.asc"
 93    },
 94    "017": {
 95        "events": "data/eyedata/asc/017_rlmw_events.asc",
 96        "samples": "data/eyedata/asc/017_rlmw_samples.asc"
 97    },
 98    "018": {
 99        "events": "data/eyedata/asc/018_rlmw_events.asc",
100        "samples": "data/eyedata/asc/018_rlmw_samples.asc"
101    },
102    "019": {
103        "events": "data/eyedata/asc/019_rlmw_events.asc",
104        "samples": "data/eyedata/asc/019_rlmw_samples.asc"
105    },
106    "020": {
107        "events": "data/eyedata/asc/020_rlmw_events.asc",
108        "samples": "data/eyedata/asc/020_rlmw_samples.asc"
109    },
110    "021": {
111        "events": "data/eyedata/asc/021_rlmw_events.asc",
112        "samples": "data/eyedata/asc/021_rlmw_samples.asc"
113    },
114    "022": {
115        "events": "data/eyedata/asc/022_rlmw_events.asc",
116        "samples": "data/eyedata/asc/022_rlmw_samples.asc"
117    },
118    "023": {
119        "events": "data/eyedata/asc/023_rlmw_events.asc",
120        "samples": "data/eyedata/asc/023_rlmw_samples.asc"
121    },
122    "024": {
123        "events": "data/eyedata/asc/024_rlmw_events.asc",
124        "samples": "data/eyedata/asc/024_rlmw_samples.asc"
125    },
126    "025": {
127        "events": "data/eyedata/asc/025_rlmw_events.asc",
128        "samples": "data/eyedata/asc/025_rlmw_samples.asc"
129    },
130    "026": {
131        "events": "data/eyedata/asc/026_rlmw_events.asc",
132        "samples": "data/eyedata/asc/026_rlmw_samples.asc"
133    },
134    "027": {
135        "events": "data/eyedata/asc/027_rlmw_events.asc",
136        "samples": "data/eyedata/asc/027_rlmw_samples.asc"
137    },
138    "028": {
139        "events": "data/eyedata/asc/028_rlmw_events.asc",
140        "samples": "data/eyedata/asc/028_rlmw_samples.asc"
141    },
142    "029": {
143        "events": "data/eyedata/asc/029_rlmw_events.asc",
144        "samples": "data/eyedata/asc/029_rlmw_samples.asc"
145    },
146    "030": {
147        "events": "data/eyedata/asc/030_rlmw_events.asc",
148        "samples": "data/eyedata/asc/030_rlmw_samples.asc"
149    },
150    "031": {
151        "events": "data/eyedata/asc/031_rlmw_events.asc",
152        "samples": "data/eyedata/asc/031_rlmw_samples.asc"
153    },
154    "032": {
155        "events": "data/eyedata/asc/032_rlmw_events.asc",
156        "samples": "data/eyedata/asc/032_rlmw_samples.asc"
157    },
158    "033": {
159        "events": "data/eyedata/asc/033_rlmw_events.asc",
160        "samples": "data/eyedata/asc/033_rlmw_samples.asc"
161    },
162    "034": {
163        "events": "data/eyedata/asc/034_rlmw_events.asc",
164        "samples": "data/eyedata/asc/034_rlmw_samples.asc"
165    },
166    "035": {
167        "events": "data/eyedata/asc/035_rlmw_events.asc",
168        "samples": "data/eyedata/asc/035_rlmw_samples.asc"
169    },
170    "036": {
171        "events": "data/eyedata/asc/036_rlmw_events.asc",
172        "samples": "data/eyedata/asc/036_rlmw_samples.asc"
173    },
174    "037": {
175        "events": "data/eyedata/asc/037_rlmw_events.asc",
176        "samples": "data/eyedata/asc/037_rlmw_samples.asc"
177    },
178    "038": {
179        "events": "data/eyedata/asc/038_rlmw_events.asc",
180        "samples": "data/eyedata/asc/038_rlmw_samples.asc"
181    },
182    "039": {
183        "events": "data/eyedata/asc/039_rlmw_events.asc",
184        "samples": "data/eyedata/asc/039_rlmw_samples.asc"
185    },
186    "040": {
187        "events": "data/eyedata/asc/040_rlmw_events.asc",
188        "samples": "data/eyedata/asc/040_rlmw_samples.asc"
189    },
190    "041": {
191        "events": "data/eyedata/asc/041_rlmw_events.asc",
192        "samples": "data/eyedata/asc/041_rlmw_samples.asc"
193    },
194    "042": {
195        "events": "data/eyedata/asc/042_rlmw_events.asc",
196        "samples": "data/eyedata/asc/042_rlmw_samples.asc"
197    },
198    "043": {
199        "events": "data/eyedata/asc/043_rlmw_events.asc",
200        "samples": "data/eyedata/asc/043_rlmw_samples.asc"
201    },
202    "044": {
203        "events": "data/eyedata/asc/044_rlmw_events.asc",
204        "samples": "data/eyedata/asc/044_rlmw_samples.asc"
205    },
206    "045": {
207        "events": "data/eyedata/asc/045_rlmw_events.asc",
208        "samples": "data/eyedata/asc/045_rlmw_samples.asc"
209    },
210    "046": {
211        "events": "data/eyedata/asc/046_rlmw_events.asc",
212        "samples": "data/eyedata/asc/046_rlmw_samples.asc"
213    },
214    "047": {
215        "events": "data/eyedata/asc/047_rlmw_events.asc",
216        "samples": "data/eyedata/asc/047_rlmw_samples.asc"
217    },
218    "048": {
219        "events": "data/eyedata/asc/048_rlmw_events.asc",
220        "samples": "data/eyedata/asc/048_rlmw_samples.asc"
221    },
222    "049": {
223        "events": "data/eyedata/asc/049_rlmw_events.asc",
224        "samples": "data/eyedata/asc/049_rlmw_samples.asc"
225    },
226    "050": {
227        "events": "data/eyedata/asc/050_rlmw_events.asc",
228        "samples": "data/eyedata/asc/050_rlmw_samples.asc"
229    }
230}
231
232
233# write down notes about each subject when going through the preprocs
234notes={
235    "001":"good",
236    "002":"good but many blinks",
237    "003":"ok, but some segments with many blinks (min 8, 12, ...)",
238    "004":"ok, beginning is crap, many 'double blinks', a few 'dip blinks' which don't go all the way to zero but filter is ok",
239    "005":"ok, some pretty long blinks and some dips but recovery mostly ok",
240    "006":"ok, near ideal in the beginning, more blinks later",
241    "007":"ok",
242    "008":"ok but problems around 8.7-14 mins",
243    "009":"ok, many multi-blinks but good recovery",
244    "010":"ok, very slow opening of eye, difficult to correct - used rather large margin",
245    "011":"very nice and regular blinking",
246    "012":"ok, there are some weird 'spikes' upwards in the data, saccades? filter seems ok",
247    "013":"ok",
248    "014":"nice and regular",
249    "015":"ok but not great, especially later parts",
250    "016":"ok but not great",
251    "017":"ok",
252    "018":"consider exclusion, another one with slow opening of eye, used large margin, lots of missings",
253    "019":"amazing, almost no blinks. Some spikes but filtered out ok",
254    "020":"ok",
255    "021":"ok, somewhat more messy during second half",
256    "022":"ok",
257    "023":"ok",
258    "024":"ok but not great, some double-blinks",   
259    "025":"ok",
260    "026":"ok",
261    "027":"ok but data quality in last part is getting worse",
262    "028":"ok",
263    "029":"consider exclusion, pretty bad",
264    "030":"ok",
265    "031":"ok but not great",
266    "032":"consider exclusion, but not super bad",
267    "033":"ok",
268    "034":"ok",
269    "035":"ok but not great",
270    "036":"ok; very many but very short blinks... signal looks ok, though",
271    "037":"ok",
272    "038":"ok",
273    "039":"exclude: ok quality but saccades are super prominent in this subject",
274    "040":"ok",
275    "041":"ok",
276    "042":"ok", 
277    "043":"ok, many blinks but ok recovery",
278    "044":"exclude: not great qual and saccades are prominent",
279    "045":"ok",
280    "046":"ok; qual not great but preproc seems to deal with it",
281    "047":"exclude: getting a lot worse at the end",
282    "048":"ok, slow opening of eye, used large margin, but signal looks ok",
283    "049":"exclude: huge saccades",
284    "050":"ok"
285}
286exclude = ["018", "029", "032", "039", "044", "047", "049"]
287
288# Function to use for reading the data files
289# This should be a string that can be evaluated to get the actual function
290def read_subject(info):
291    """
292    Read the data for a single subject. Input is each element of `raw_data`.
293    """
294    ## loading the raw samples from the asc file
295    fname_samples=os.path.join(info["samples"])
296    df=pd.read_table(fname_samples, index_col=False,
297                    names=["time", "left_x", "left_y", "left_p",
298                            "right_x", "right_y", "right_p"])
299
300    ## Eyelink tracker puts "   ." when no data is available for x/y coordinates
301    left_x=df.left_x.values
302    left_x[left_x=="   ."] = np.nan
303    left_x = left_x.astype(float)
304
305    left_y=df.left_y.values
306    left_y[left_y=="   ."] = np.nan
307    left_y = left_y.astype(float)
308
309    right_x=df.right_x.values
310    right_x[right_x=="   ."] = np.nan
311    right_x = right_x.astype(float)
312
313    right_y=df.right_y.values
314    right_y[right_y=="   ."] = np.nan
315    right_y = right_y.astype(float)
316
317    ## Loading the events from the events file
318    fname_events=os.path.join(info["events"])
319    # read the whole file into variable `events` (list with one entry per line)
320    with open(fname_events) as f:
321        events=f.readlines()
322
323    # keep only lines starting with "MSG"
324    events=[ev for ev in events if ev.startswith("MSG")]
325    experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
326    events=events[experiment_start_index+1:]
327    df_ev=pd.DataFrame([ev.split() for ev in events])
328    df_ev=df_ev[[1,2]]
329    df_ev.columns=["time", "event"]
330
331    # Creating EyeData object that contains both X-Y coordinates
332    # and pupil data
333    d = pp.EyeData(time=df.time, name=info["subject"],
334                screen_resolution=study_info["screen_resolution"], 
335                physical_screen_size=study_info["physical_screen_size"],
336                screen_eye_distance=study_info["screen_eye_distance"],
337                left_x=left_x, left_y=left_y, left_pupil=df.left_p,
338                right_x=right_x, right_y=right_y, right_pupil=df.right_p,
339                event_onsets=df_ev.time, event_labels=df_ev.event, notes=notes[info["subject"]],
340                keep_orig=True)\
341                .reset_time()
342    d.set_experiment_info(screen_eye_distance=study_info["screen_eye_distance"], 
343                        screen_resolution=study_info["screen_resolution"], 
344                        physical_screen_size=study_info["physical_screen_size"])
345    return d    

This configuration file is a real study that was shared on OSF (<https://osf.io/ca95r/>) and you can download the corresponding data using the following code (note that the size of the dataset is 3GB or so):

from pypillometry import load_study_osf
study_data, config = load_study_osf("ca95r", path="./data")

The data will be downloaded and cached in the ./data directory and study_data will be a dictionary mapping subject IDs to their data. The config object will be a module that contains the configuration file.