Dataset set up
March 4, 2025 ยท View on GitHub
- to quickstart, just download the responses / wordsequences for 3 subjects from the encoding scaling laws paper
- this is all the data you need if you only want to analyze 3 subjects and don't want to make flatmaps
- to run Eng1000, need to grab
em_datadirectory from here and move its contents to{root_dir}/em_data - for more, download data with
python experiments/00_load_dataset.py- create a
datadir under wherever you run it and will use datalad to download the preprocessed data as well as feature spaces needed for fitting semantic encoding models
- create a
- to make flatmaps, need to set [pycortex filestore] to
{root_dir}/ds003020/derivative/pycortex-db/
Code install
pip install ridge_utils(for full control, could alternativelypip install -e ridge_utils_frozenfrom the repo directory)pip install -e .from the repo directory to locally install theneuropackage- set
neuro.config.root_dir/datato where you put all the data- loading responses
neuro.data.response_utilsfunctionload_response- loads responses from at
{neuro.config.root_dir}/ds003020/derivative/preprocessed_data/{subject}, where they are stored in an h5 file for each story, e.g.wheretheressmoke.h5
- loading stimulus
ridge_utils.features.stim_utilsfunctionload_story_wordseqs- loads textgrids from
{root_dir}/ds003020/derivative/TextGrids, where each story has a TextGrid file, e.g.wheretheressmoke.TextGrid - uses
{root_dir}/ds003020/derivative/respdict.jsonto get the length of each story
- loading responses
python experiments/02_fit_encoding.py- This script takes many relevant arguments through argparse
Reference
- builds on the dataset repo from the huth lab. See that wonderful repo!
- uses data from openneuro.
- builds on code from encoding-model-scaling-laws, which is the repo for the paper "Scaling laws for language encoding models in fMRI" (antonello, vaidya, & huth, 2023). See the cool results there!
- it also copies a lot of code from the repo for SASC