Sharing/Importing study data from OSF

pypillometry provides functionality to share and import study data via the Open Science Framework (OSF). The load_study_osf() function allows you to easily download and load study data that has been shared on OSF.

Sharing Your Study

To share your study on OSF:

  1. Create a new project on OSF

  2. Upload your study data files and configuration file (pypillometry_conf.py) to the project
    • see

  3. Note down your project’s OSF ID (found in the project URL)

The configuration file (pypillometry_conf.py) should define:

  • raw_data: Dictionary mapping subject IDs to their data files

  • A read_subject() function that processes the raw data files

Loading Shared Data

To load a shared study, use load_study_osf():

from pypillometry.io import load_study_osf

# Load all subjects
study_data = load_study_osf(
    osf_id="your_project_id",
    path="local/cache/path"
)

# Load specific subjects
study_data = load_study_osf(
    osf_id="your_project_id",
    path="local/cache/path",
    subjects=["sub01", "sub02"]
)

Parameters

  • osf_id (str): The OSF project ID

  • path (str): Local path where files should be downloaded/stored

  • subjects (list[str], optional): List of specific subject IDs to load

  • force_download (bool, optional): Force re-download of files even if they exist locally

The function will:

  1. Download the project’s configuration file

  2. Download the required data files for each subject

  3. Process the data using the configuration’s read_subject() function

  4. Return a dictionary mapping subject IDs to their processed data

Files are cached locally in the specified path to avoid repeated downloads.

Example configuration file

Here is an example configuration file (pypillometry_conf.py) that could be used to share a study:

  1"""
  2Configuration file for the RLMW study.
  3This file contains information about the raw data files and how to read them.
  4"""
  5import pypillometry as pp
  6import pandas as pd
  7import os
  8import numpy as np
  9
 10
 11# Additional study metadata
 12study_info = {
 13    "name": "RLMW Study",
 14    "osf_id": "ca95r",
 15    "description": "Reinforcement learning study with mind wandering probes",
 16    "author": "Matthias Mittner",
 17    "doi": "",
 18    "date": "2024-04-10",
 19    "sampling_rate": 1000.0,  # Hz
 20    "time_unit": "ms",
 21    "screen_eye_distance": 60, # cm (distance between screen and eye)
 22    "screen_resolution": (1280,1024),  # pixels (width, height)
 23    "physical_screen_size": (30, 20) # cm (width, height)
 24}
 25
 26
 27# Dictionary of raw data files to be downloaded
 28# Keys are participant IDs, values are dictionaries containing paths to .asc files
 29raw_data = {
 30    "001": {
 31        "events": "data/eyedata/asc/001_rlmw_events.asc",
 32        "samples": "data/eyedata/asc/001_rlmw_samples.asc"
 33    },
 34    "002": {
 35        "events": "data/eyedata/asc/002_rlmw_events.asc",
 36        "samples": "data/eyedata/asc/002_rlmw_samples.asc"
 37    },
 38    "003": {
 39        "events": "data/eyedata/asc/003_rlmw_events.asc",
 40        "samples": "data/eyedata/asc/003_rlmw_samples.asc"
 41    },
 42    "004": {
 43        "events": "data/eyedata/asc/004_rlmw_events.asc",
 44        "samples": "data/eyedata/asc/004_rlmw_samples.asc"
 45    },
 46    "005": {
 47        "events": "data/eyedata/asc/005_rlmw_events.asc",
 48        "samples": "data/eyedata/asc/005_rlmw_samples.asc"
 49    },
 50    "006": {
 51        "events": "data/eyedata/asc/006_rlmw_events.asc",
 52        "samples": "data/eyedata/asc/006_rlmw_samples.asc"
 53    },
 54    "007": {
 55        "events": "data/eyedata/asc/007_rlmw_events.asc",
 56        "samples": "data/eyedata/asc/007_rlmw_samples.asc"
 57    },
 58    "008": {
 59        "events": "data/eyedata/asc/008_rlmw_events.asc",
 60        "samples": "data/eyedata/asc/008_rlmw_samples.asc"
 61    },
 62    "009": {
 63        "events": "data/eyedata/asc/009_rlmw_events.asc",
 64        "samples": "data/eyedata/asc/009_rlmw_samples.asc"
 65    },
 66    "010": {
 67        "events": "data/eyedata/asc/010_rlmw_events.asc",
 68        "samples": "data/eyedata/asc/010_rlmw_samples.asc"
 69    },
 70    "011": {
 71        "events": "data/eyedata/asc/011_rlmw_events.asc",
 72        "samples": "data/eyedata/asc/011_rlmw_samples.asc"
 73    },
 74    "012": {
 75        "events": "data/eyedata/asc/012_rlmw_events.asc",
 76        "samples": "data/eyedata/asc/012_rlmw_samples.asc"
 77    },
 78    "013": {
 79        "events": "data/eyedata/asc/013_rlmw_events.asc",
 80        "samples": "data/eyedata/asc/013_rlmw_samples.asc"
 81    },
 82    "014": {
 83        "events": "data/eyedata/asc/014_rlmw_events.asc",
 84        "samples": "data/eyedata/asc/014_rlmw_samples.asc"
 85    },
 86    "015": {
 87        "events": "data/eyedata/asc/015_rlmw_events.asc",
 88        "samples": "data/eyedata/asc/015_rlmw_samples.asc"
 89    },
 90    "016": {
 91        "events": "data/eyedata/asc/016_rlmw_events.asc",
 92        "samples": "data/eyedata/asc/016_rlmw_samples.asc"
 93    },
 94    "017": {
 95        "events": "data/eyedata/asc/017_rlmw_events.asc",
 96        "samples": "data/eyedata/asc/017_rlmw_samples.asc"
 97    },
 98    "018": {
 99        "events": "data/eyedata/asc/018_rlmw_events.asc",
100        "samples": "data/eyedata/asc/018_rlmw_samples.asc"
101    },
102    "019": {
103        "events": "data/eyedata/asc/019_rlmw_events.asc",
104        "samples": "data/eyedata/asc/019_rlmw_samples.asc"
105    },
106    "020": {
107        "events": "data/eyedata/asc/020_rlmw_events.asc",
108        "samples": "data/eyedata/asc/020_rlmw_samples.asc"
109    },
110    "021": {
111        "events": "data/eyedata/asc/021_rlmw_events.asc",
112        "samples": "data/eyedata/asc/021_rlmw_samples.asc"
113    },
114    "022": {
115        "events": "data/eyedata/asc/022_rlmw_events.asc",
116        "samples": "data/eyedata/asc/022_rlmw_samples.asc"
117    },
118    "023": {
119        "events": "data/eyedata/asc/023_rlmw_events.asc",
120        "samples": "data/eyedata/asc/023_rlmw_samples.asc"
121    },
122    "024": {
123        "events": "data/eyedata/asc/024_rlmw_events.asc",
124        "samples": "data/eyedata/asc/024_rlmw_samples.asc"
125    },
126    "025": {
127        "events": "data/eyedata/asc/025_rlmw_events.asc",
128        "samples": "data/eyedata/asc/025_rlmw_samples.asc"
129    },
130    "026": {
131        "events": "data/eyedata/asc/026_rlmw_events.asc",
132        "samples": "data/eyedata/asc/026_rlmw_samples.asc"
133    },
134    "027": {
135        "events": "data/eyedata/asc/027_rlmw_events.asc",
136        "samples": "data/eyedata/asc/027_rlmw_samples.asc"
137    },
138    "028": {
139        "events": "data/eyedata/asc/028_rlmw_events.asc",
140        "samples": "data/eyedata/asc/028_rlmw_samples.asc"
141    },
142    "029": {
143        "events": "data/eyedata/asc/029_rlmw_events.asc",
144        "samples": "data/eyedata/asc/029_rlmw_samples.asc"
145    },
146    "030": {
147        "events": "data/eyedata/asc/030_rlmw_events.asc",
148        "samples": "data/eyedata/asc/030_rlmw_samples.asc"
149    },
150    "031": {
151        "events": "data/eyedata/asc/031_rlmw_events.asc",
152        "samples": "data/eyedata/asc/031_rlmw_samples.asc"
153    },
154    "032": {
155        "events": "data/eyedata/asc/032_rlmw_events.asc",
156        "samples": "data/eyedata/asc/032_rlmw_samples.asc"
157    },
158    "033": {
159        "events": "data/eyedata/asc/033_rlmw_events.asc",
160        "samples": "data/eyedata/asc/033_rlmw_samples.asc"
161    },
162    "034": {
163        "events": "data/eyedata/asc/034_rlmw_events.asc",
164        "samples": "data/eyedata/asc/034_rlmw_samples.asc"
165    },
166    "035": {
167        "events": "data/eyedata/asc/035_rlmw_events.asc",
168        "samples": "data/eyedata/asc/035_rlmw_samples.asc"
169    },
170    "036": {
171        "events": "data/eyedata/asc/036_rlmw_events.asc",
172        "samples": "data/eyedata/asc/036_rlmw_samples.asc"
173    },
174    "037": {
175        "events": "data/eyedata/asc/037_rlmw_events.asc",
176        "samples": "data/eyedata/asc/037_rlmw_samples.asc"
177    },
178    "038": {
179        "events": "data/eyedata/asc/038_rlmw_events.asc",
180        "samples": "data/eyedata/asc/038_rlmw_samples.asc"
181    },
182    "039": {
183        "events": "data/eyedata/asc/039_rlmw_events.asc",
184        "samples": "data/eyedata/asc/039_rlmw_samples.asc"
185    },
186    "040": {
187        "events": "data/eyedata/asc/040_rlmw_events.asc",
188        "samples": "data/eyedata/asc/040_rlmw_samples.asc"
189    },
190    "041": {
191        "events": "data/eyedata/asc/041_rlmw_events.asc",
192        "samples": "data/eyedata/asc/041_rlmw_samples.asc"
193    },
194    "042": {
195        "events": "data/eyedata/asc/042_rlmw_events.asc",
196        "samples": "data/eyedata/asc/042_rlmw_samples.asc"
197    },
198    "043": {
199        "events": "data/eyedata/asc/043_rlmw_events.asc",
200        "samples": "data/eyedata/asc/043_rlmw_samples.asc"
201    },
202    "044": {
203        "events": "data/eyedata/asc/044_rlmw_events.asc",
204        "samples": "data/eyedata/asc/044_rlmw_samples.asc"
205    },
206    "045": {
207        "events": "data/eyedata/asc/045_rlmw_events.asc",
208        "samples": "data/eyedata/asc/045_rlmw_samples.asc"
209    },
210    "046": {
211        "events": "data/eyedata/asc/046_rlmw_events.asc",
212        "samples": "data/eyedata/asc/046_rlmw_samples.asc"
213    },
214    "047": {
215        "events": "data/eyedata/asc/047_rlmw_events.asc",
216        "samples": "data/eyedata/asc/047_rlmw_samples.asc"
217    },
218    "048": {
219        "events": "data/eyedata/asc/048_rlmw_events.asc",
220        "samples": "data/eyedata/asc/048_rlmw_samples.asc"
221    },
222    "049": {
223        "events": "data/eyedata/asc/049_rlmw_events.asc",
224        "samples": "data/eyedata/asc/049_rlmw_samples.asc"
225    },
226    "050": {
227        "events": "data/eyedata/asc/050_rlmw_events.asc",
228        "samples": "data/eyedata/asc/050_rlmw_samples.asc"
229    }
230}
231
232
233# write down notes about each subject when going through the preprocs
234notes={
235    "001":"good",
236    "002":"good but many blinks",
237    "003":"ok, but some segments with many blinks (min 8, 12, ...)",
238    "004":"ok, beginning is crap, many 'double blinks', a few 'dip blinks' which don't go all the way to zero but filter is ok",
239    "005":"ok, some pretty long blinks and some dips but recovery mostly ok",
240    "006":"ok, near ideal in the beginning, more blinks later",
241    "007":"ok",
242    "008":"ok but problems around 8.7-14 mins",
243    "009":"ok, many multi-blinks but good recovery",
244    "010":"ok, very slow opening of eye, difficult to correct - used rather large margin",
245    "011":"very nice and regular blinking",
246    "012":"ok, there are some weird 'spikes' upwards in the data, saccades? filter seems ok",
247    "013":"ok",
248    "014":"nice and regular",
249    "015":"ok but not great, especially later parts",
250    "016":"ok but not great",
251    "017":"ok",
252    "018":"consider exclusion, another one with slow opening of eye, used large margin, lots of missings",
253    "019":"amazing, almost no blinks. Some spikes but filtered out ok",
254    "020":"ok",
255    "021":"ok, somewhat more messy during second half",
256    "022":"ok",
257    "023":"ok",
258    "024":"ok but not great, some double-blinks",   
259    "025":"ok",
260    "026":"ok",
261    "027":"ok but data quality in last part is getting worse",
262    "028":"ok",
263    "029":"consider exclusion, pretty bad",
264    "030":"ok",
265    "031":"ok but not great",
266    "032":"consider exclusion, but not super bad",
267    "033":"ok",
268    "034":"ok",
269    "035":"ok but not great",
270    "036":"ok; very many but very short blinks... signal looks ok, though",
271    "037":"ok",
272    "038":"ok",
273    "039":"exclude: ok quality but saccades are super prominent in this subject",
274    "040":"ok",
275    "041":"ok",
276    "042":"ok", 
277    "043":"ok, many blinks but ok recovery",
278    "044":"exclude: not great qual and saccades are prominent",
279    "045":"ok",
280    "046":"ok; qual not great but preproc seems to deal with it",
281    "047":"exclude: getting a lot worse at the end",
282    "048":"ok, slow opening of eye, used large margin, but signal looks ok",
283    "049":"exclude: huge saccades",
284    "050":"ok"
285}
286exclude = ["018", "029", "032", "039", "044", "047", "049"]
287
288# Function to use for reading the data files
289# This should be a string that can be evaluated to get the actual function
290def read_subject(info):
291    """
292    Read the data for a single subject. Input is each element of `raw_data`.
293    """
294    ## loading the raw samples from the asc file
295    fname_samples=os.path.join(info["samples"])
296    df=pd.read_table(fname_samples, index_col=False,
297                    names=["time", "left_x", "left_y", "left_p",
298                            "right_x", "right_y", "right_p"])
299
300    ## Eyelink tracker puts "   ." when no data is available for x/y coordinates
301    left_x=df.left_x.values
302    left_x[left_x=="   ."] = np.nan
303    left_x = left_x.astype(float)
304
305    left_y=df.left_y.values
306    left_y[left_y=="   ."] = np.nan
307    left_y = left_y.astype(float)
308
309    right_x=df.right_x.values
310    right_x[right_x=="   ."] = np.nan
311    right_x = right_x.astype(float)
312
313    right_y=df.right_y.values
314    right_y[right_y=="   ."] = np.nan
315    right_y = right_y.astype(float)
316
317    ## Loading the events from the events file
318    fname_events=os.path.join(info["events"])
319    # read the whole file into variable `events` (list with one entry per line)
320    with open(fname_events) as f:
321        events=f.readlines()
322
323    # keep only lines starting with "MSG"
324    events=[ev for ev in events if ev.startswith("MSG")]
325    experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
326    events=events[experiment_start_index+1:]
327    df_ev=pd.DataFrame([ev.split() for ev in events])
328    df_ev=df_ev[[1,2]]
329    df_ev.columns=["time", "event"]
330
331    # Creating EyeData object that contains both X-Y coordinates
332    # and pupil data
333    d = pp.EyeData(time=df.time, name=info["subject"],
334                screen_resolution=study_info["screen_resolution"], 
335                physical_screen_size=study_info["physical_screen_size"],
336                screen_eye_distance=study_info["screen_eye_distance"],
337                left_x=left_x, left_y=left_y, left_pupil=df.left_p,
338                right_x=right_x, right_y=right_y, right_pupil=df.right_p,
339                event_onsets=df_ev.time, event_labels=df_ev.event, notes=notes[info["subject"]],
340                keep_orig=True)\
341                .reset_time()
342    d.set_experiment_info(screen_eye_distance=study_info["screen_eye_distance"], 
343                        screen_resolution=study_info["screen_resolution"], 
344                        physical_screen_size=study_info["physical_screen_size"])
345    return d    

This configuration file is a real study that was shared on OSF and you can download the corresponding data using the following code:

from pypillometry import load_study_osf
study_data = load_study_osf("ca95r", path="./data")

The data will be downloaded and cached in the ./data directory and study_data will be a dictionary mapping subject IDs to their data.