Sharing/Importing study data from OSF¶
pypillometry provides functionality to share and import study data via the Open Science Framework (OSF). The load_study_osf()
function allows you to easily download and load study data that has been shared on OSF.
Sharing Your Study¶
To share your study on OSF:
Create a new project on OSF
- Upload your study data files and configuration file (
pypillometry_conf.py
) to the project see
- Upload your study data files and configuration file (
Note down your project’s OSF ID (found in the project URL)
The configuration file (pypillometry_conf.py
) should define:
raw_data
: Dictionary mapping subject IDs to their data filesA
read_subject()
function that processes the raw data files
Loading Shared Data¶
To load a shared study, use load_study_osf()
:
from pypillometry.io import load_study_osf
# Load all subjects
study_data = load_study_osf(
osf_id="your_project_id",
path="local/cache/path"
)
# Load specific subjects
study_data = load_study_osf(
osf_id="your_project_id",
path="local/cache/path",
subjects=["sub01", "sub02"]
)
Parameters¶
osf_id
(str): The OSF project IDpath
(str): Local path where files should be downloaded/storedsubjects
(list[str], optional): List of specific subject IDs to loadforce_download
(bool, optional): Force re-download of files even if they exist locally
The function will:
Download the project’s configuration file
Download the required data files for each subject
Process the data using the configuration’s
read_subject()
functionReturn a dictionary mapping subject IDs to their processed data
Files are cached locally in the specified path to avoid repeated downloads.
Example configuration file¶
Here is an example configuration file (pypillometry_conf.py
) that could be used to share a study:
1"""
2Configuration file for the RLMW study.
3This file contains information about the raw data files and how to read them.
4"""
5import pypillometry as pp
6import pandas as pd
7import os
8import numpy as np
9
10
11# Additional study metadata
12study_info = {
13 "name": "RLMW Study",
14 "osf_id": "ca95r",
15 "description": "Reinforcement learning study with mind wandering probes",
16 "author": "Matthias Mittner",
17 "doi": "",
18 "date": "2024-04-10",
19 "sampling_rate": 1000.0, # Hz
20 "time_unit": "ms",
21 "screen_eye_distance": 60, # cm (distance between screen and eye)
22 "screen_resolution": (1280,1024), # pixels (width, height)
23 "physical_screen_size": (30, 20) # cm (width, height)
24}
25
26
27# Dictionary of raw data files to be downloaded
28# Keys are participant IDs, values are dictionaries containing paths to .asc files
29raw_data = {
30 "001": {
31 "events": "data/eyedata/asc/001_rlmw_events.asc",
32 "samples": "data/eyedata/asc/001_rlmw_samples.asc"
33 },
34 "002": {
35 "events": "data/eyedata/asc/002_rlmw_events.asc",
36 "samples": "data/eyedata/asc/002_rlmw_samples.asc"
37 },
38 "003": {
39 "events": "data/eyedata/asc/003_rlmw_events.asc",
40 "samples": "data/eyedata/asc/003_rlmw_samples.asc"
41 },
42 "004": {
43 "events": "data/eyedata/asc/004_rlmw_events.asc",
44 "samples": "data/eyedata/asc/004_rlmw_samples.asc"
45 },
46 "005": {
47 "events": "data/eyedata/asc/005_rlmw_events.asc",
48 "samples": "data/eyedata/asc/005_rlmw_samples.asc"
49 },
50 "006": {
51 "events": "data/eyedata/asc/006_rlmw_events.asc",
52 "samples": "data/eyedata/asc/006_rlmw_samples.asc"
53 },
54 "007": {
55 "events": "data/eyedata/asc/007_rlmw_events.asc",
56 "samples": "data/eyedata/asc/007_rlmw_samples.asc"
57 },
58 "008": {
59 "events": "data/eyedata/asc/008_rlmw_events.asc",
60 "samples": "data/eyedata/asc/008_rlmw_samples.asc"
61 },
62 "009": {
63 "events": "data/eyedata/asc/009_rlmw_events.asc",
64 "samples": "data/eyedata/asc/009_rlmw_samples.asc"
65 },
66 "010": {
67 "events": "data/eyedata/asc/010_rlmw_events.asc",
68 "samples": "data/eyedata/asc/010_rlmw_samples.asc"
69 },
70 "011": {
71 "events": "data/eyedata/asc/011_rlmw_events.asc",
72 "samples": "data/eyedata/asc/011_rlmw_samples.asc"
73 },
74 "012": {
75 "events": "data/eyedata/asc/012_rlmw_events.asc",
76 "samples": "data/eyedata/asc/012_rlmw_samples.asc"
77 },
78 "013": {
79 "events": "data/eyedata/asc/013_rlmw_events.asc",
80 "samples": "data/eyedata/asc/013_rlmw_samples.asc"
81 },
82 "014": {
83 "events": "data/eyedata/asc/014_rlmw_events.asc",
84 "samples": "data/eyedata/asc/014_rlmw_samples.asc"
85 },
86 "015": {
87 "events": "data/eyedata/asc/015_rlmw_events.asc",
88 "samples": "data/eyedata/asc/015_rlmw_samples.asc"
89 },
90 "016": {
91 "events": "data/eyedata/asc/016_rlmw_events.asc",
92 "samples": "data/eyedata/asc/016_rlmw_samples.asc"
93 },
94 "017": {
95 "events": "data/eyedata/asc/017_rlmw_events.asc",
96 "samples": "data/eyedata/asc/017_rlmw_samples.asc"
97 },
98 "018": {
99 "events": "data/eyedata/asc/018_rlmw_events.asc",
100 "samples": "data/eyedata/asc/018_rlmw_samples.asc"
101 },
102 "019": {
103 "events": "data/eyedata/asc/019_rlmw_events.asc",
104 "samples": "data/eyedata/asc/019_rlmw_samples.asc"
105 },
106 "020": {
107 "events": "data/eyedata/asc/020_rlmw_events.asc",
108 "samples": "data/eyedata/asc/020_rlmw_samples.asc"
109 },
110 "021": {
111 "events": "data/eyedata/asc/021_rlmw_events.asc",
112 "samples": "data/eyedata/asc/021_rlmw_samples.asc"
113 },
114 "022": {
115 "events": "data/eyedata/asc/022_rlmw_events.asc",
116 "samples": "data/eyedata/asc/022_rlmw_samples.asc"
117 },
118 "023": {
119 "events": "data/eyedata/asc/023_rlmw_events.asc",
120 "samples": "data/eyedata/asc/023_rlmw_samples.asc"
121 },
122 "024": {
123 "events": "data/eyedata/asc/024_rlmw_events.asc",
124 "samples": "data/eyedata/asc/024_rlmw_samples.asc"
125 },
126 "025": {
127 "events": "data/eyedata/asc/025_rlmw_events.asc",
128 "samples": "data/eyedata/asc/025_rlmw_samples.asc"
129 },
130 "026": {
131 "events": "data/eyedata/asc/026_rlmw_events.asc",
132 "samples": "data/eyedata/asc/026_rlmw_samples.asc"
133 },
134 "027": {
135 "events": "data/eyedata/asc/027_rlmw_events.asc",
136 "samples": "data/eyedata/asc/027_rlmw_samples.asc"
137 },
138 "028": {
139 "events": "data/eyedata/asc/028_rlmw_events.asc",
140 "samples": "data/eyedata/asc/028_rlmw_samples.asc"
141 },
142 "029": {
143 "events": "data/eyedata/asc/029_rlmw_events.asc",
144 "samples": "data/eyedata/asc/029_rlmw_samples.asc"
145 },
146 "030": {
147 "events": "data/eyedata/asc/030_rlmw_events.asc",
148 "samples": "data/eyedata/asc/030_rlmw_samples.asc"
149 },
150 "031": {
151 "events": "data/eyedata/asc/031_rlmw_events.asc",
152 "samples": "data/eyedata/asc/031_rlmw_samples.asc"
153 },
154 "032": {
155 "events": "data/eyedata/asc/032_rlmw_events.asc",
156 "samples": "data/eyedata/asc/032_rlmw_samples.asc"
157 },
158 "033": {
159 "events": "data/eyedata/asc/033_rlmw_events.asc",
160 "samples": "data/eyedata/asc/033_rlmw_samples.asc"
161 },
162 "034": {
163 "events": "data/eyedata/asc/034_rlmw_events.asc",
164 "samples": "data/eyedata/asc/034_rlmw_samples.asc"
165 },
166 "035": {
167 "events": "data/eyedata/asc/035_rlmw_events.asc",
168 "samples": "data/eyedata/asc/035_rlmw_samples.asc"
169 },
170 "036": {
171 "events": "data/eyedata/asc/036_rlmw_events.asc",
172 "samples": "data/eyedata/asc/036_rlmw_samples.asc"
173 },
174 "037": {
175 "events": "data/eyedata/asc/037_rlmw_events.asc",
176 "samples": "data/eyedata/asc/037_rlmw_samples.asc"
177 },
178 "038": {
179 "events": "data/eyedata/asc/038_rlmw_events.asc",
180 "samples": "data/eyedata/asc/038_rlmw_samples.asc"
181 },
182 "039": {
183 "events": "data/eyedata/asc/039_rlmw_events.asc",
184 "samples": "data/eyedata/asc/039_rlmw_samples.asc"
185 },
186 "040": {
187 "events": "data/eyedata/asc/040_rlmw_events.asc",
188 "samples": "data/eyedata/asc/040_rlmw_samples.asc"
189 },
190 "041": {
191 "events": "data/eyedata/asc/041_rlmw_events.asc",
192 "samples": "data/eyedata/asc/041_rlmw_samples.asc"
193 },
194 "042": {
195 "events": "data/eyedata/asc/042_rlmw_events.asc",
196 "samples": "data/eyedata/asc/042_rlmw_samples.asc"
197 },
198 "043": {
199 "events": "data/eyedata/asc/043_rlmw_events.asc",
200 "samples": "data/eyedata/asc/043_rlmw_samples.asc"
201 },
202 "044": {
203 "events": "data/eyedata/asc/044_rlmw_events.asc",
204 "samples": "data/eyedata/asc/044_rlmw_samples.asc"
205 },
206 "045": {
207 "events": "data/eyedata/asc/045_rlmw_events.asc",
208 "samples": "data/eyedata/asc/045_rlmw_samples.asc"
209 },
210 "046": {
211 "events": "data/eyedata/asc/046_rlmw_events.asc",
212 "samples": "data/eyedata/asc/046_rlmw_samples.asc"
213 },
214 "047": {
215 "events": "data/eyedata/asc/047_rlmw_events.asc",
216 "samples": "data/eyedata/asc/047_rlmw_samples.asc"
217 },
218 "048": {
219 "events": "data/eyedata/asc/048_rlmw_events.asc",
220 "samples": "data/eyedata/asc/048_rlmw_samples.asc"
221 },
222 "049": {
223 "events": "data/eyedata/asc/049_rlmw_events.asc",
224 "samples": "data/eyedata/asc/049_rlmw_samples.asc"
225 },
226 "050": {
227 "events": "data/eyedata/asc/050_rlmw_events.asc",
228 "samples": "data/eyedata/asc/050_rlmw_samples.asc"
229 }
230}
231
232
233# write down notes about each subject when going through the preprocs
234notes={
235 "001":"good",
236 "002":"good but many blinks",
237 "003":"ok, but some segments with many blinks (min 8, 12, ...)",
238 "004":"ok, beginning is crap, many 'double blinks', a few 'dip blinks' which don't go all the way to zero but filter is ok",
239 "005":"ok, some pretty long blinks and some dips but recovery mostly ok",
240 "006":"ok, near ideal in the beginning, more blinks later",
241 "007":"ok",
242 "008":"ok but problems around 8.7-14 mins",
243 "009":"ok, many multi-blinks but good recovery",
244 "010":"ok, very slow opening of eye, difficult to correct - used rather large margin",
245 "011":"very nice and regular blinking",
246 "012":"ok, there are some weird 'spikes' upwards in the data, saccades? filter seems ok",
247 "013":"ok",
248 "014":"nice and regular",
249 "015":"ok but not great, especially later parts",
250 "016":"ok but not great",
251 "017":"ok",
252 "018":"consider exclusion, another one with slow opening of eye, used large margin, lots of missings",
253 "019":"amazing, almost no blinks. Some spikes but filtered out ok",
254 "020":"ok",
255 "021":"ok, somewhat more messy during second half",
256 "022":"ok",
257 "023":"ok",
258 "024":"ok but not great, some double-blinks",
259 "025":"ok",
260 "026":"ok",
261 "027":"ok but data quality in last part is getting worse",
262 "028":"ok",
263 "029":"consider exclusion, pretty bad",
264 "030":"ok",
265 "031":"ok but not great",
266 "032":"consider exclusion, but not super bad",
267 "033":"ok",
268 "034":"ok",
269 "035":"ok but not great",
270 "036":"ok; very many but very short blinks... signal looks ok, though",
271 "037":"ok",
272 "038":"ok",
273 "039":"exclude: ok quality but saccades are super prominent in this subject",
274 "040":"ok",
275 "041":"ok",
276 "042":"ok",
277 "043":"ok, many blinks but ok recovery",
278 "044":"exclude: not great qual and saccades are prominent",
279 "045":"ok",
280 "046":"ok; qual not great but preproc seems to deal with it",
281 "047":"exclude: getting a lot worse at the end",
282 "048":"ok, slow opening of eye, used large margin, but signal looks ok",
283 "049":"exclude: huge saccades",
284 "050":"ok"
285}
286exclude = ["018", "029", "032", "039", "044", "047", "049"]
287
288# Function to use for reading the data files
289# This should be a string that can be evaluated to get the actual function
290def read_subject(info):
291 """
292 Read the data for a single subject. Input is each element of `raw_data`.
293 """
294 ## loading the raw samples from the asc file
295 fname_samples=os.path.join(info["samples"])
296 df=pd.read_table(fname_samples, index_col=False,
297 names=["time", "left_x", "left_y", "left_p",
298 "right_x", "right_y", "right_p"])
299
300 ## Eyelink tracker puts " ." when no data is available for x/y coordinates
301 left_x=df.left_x.values
302 left_x[left_x==" ."] = np.nan
303 left_x = left_x.astype(float)
304
305 left_y=df.left_y.values
306 left_y[left_y==" ."] = np.nan
307 left_y = left_y.astype(float)
308
309 right_x=df.right_x.values
310 right_x[right_x==" ."] = np.nan
311 right_x = right_x.astype(float)
312
313 right_y=df.right_y.values
314 right_y[right_y==" ."] = np.nan
315 right_y = right_y.astype(float)
316
317 ## Loading the events from the events file
318 fname_events=os.path.join(info["events"])
319 # read the whole file into variable `events` (list with one entry per line)
320 with open(fname_events) as f:
321 events=f.readlines()
322
323 # keep only lines starting with "MSG"
324 events=[ev for ev in events if ev.startswith("MSG")]
325 experiment_start_index=np.where(["experiment_start" in ev for ev in events])[0][0]
326 events=events[experiment_start_index+1:]
327 df_ev=pd.DataFrame([ev.split() for ev in events])
328 df_ev=df_ev[[1,2]]
329 df_ev.columns=["time", "event"]
330
331 # Creating EyeData object that contains both X-Y coordinates
332 # and pupil data
333 d = pp.EyeData(time=df.time, name=info["subject"],
334 screen_resolution=study_info["screen_resolution"],
335 physical_screen_size=study_info["physical_screen_size"],
336 screen_eye_distance=study_info["screen_eye_distance"],
337 left_x=left_x, left_y=left_y, left_pupil=df.left_p,
338 right_x=right_x, right_y=right_y, right_pupil=df.right_p,
339 event_onsets=df_ev.time, event_labels=df_ev.event, notes=notes[info["subject"]],
340 keep_orig=True)\
341 .reset_time()
342 d.set_experiment_info(screen_eye_distance=study_info["screen_eye_distance"],
343 screen_resolution=study_info["screen_resolution"],
344 physical_screen_size=study_info["physical_screen_size"])
345 return d
This configuration file is a real study that was shared on OSF and you can download the corresponding data using the following code:
from pypillometry import load_study_osf
study_data = load_study_osf("ca95r", path="./data")
The data will be downloaded and cached in the ./data
directory and study_data will be a dictionary mapping subject IDs to their data.