Input/Output Module

The spine.io module handles data ingress and egress for SPINE jobs. It provides readers and writers for event data, parsers that translate raw storage products into SPINE parser objects, and the dataset/collation tools used during model training and inference.

Input/output tools for SPINE.

Readers and writers are storage-format adapters. Parsers convert source formats into framework-neutral parser products. Datasets, collation, augmentation, and sampling are generic data pipeline tools used by training and inference configurations.

Overview

The I/O layer is organized into a few cooperating pieces:

  • Readers expose event products from on-disk formats such as HDF5 and LArCV.

  • Writers persist flat outputs and staged cache products.

  • Parsers convert raw reader outputs into SPINE parser products used by downstream code.

  • Datasets and pipeline utilities bridge readers/parsers into PyTorch data loading workflows.

This is the first stage of the driver pipeline and the point where external detector data is mapped into SPINE’s internal data structures.

File Readers

read.HDF5Reader([file_keys, file_list, ...])

Class which reads information stored in HDF5 files.

read.LArCVReader([file_keys, file_list, ...])

Class which reads information stored in LArCV files.

read.StageHDF5Reader([stage, file_keys, ...])

Read products stored under one or more stage groups in a cache file.

File Writers

write.HDF5Writer([file_name, directory, ...])

Writes data to an HDF5 file.

write.CSVWriter([file_name, directory, ...])

Writes data to a CSV file with optimized performance.

write.StageHDF5Writer([file_name, ...])

Write additive stage caches to one HDF5 file per source file.

Datasets

The dataset layer bridges low-level readers and parser logic into PyTorch Dataset objects. The staged cache workflow is exposed through the HDF5 dataset and the mixed LArCV/HDF5 dataset.

dataset.LArCVDataset(schema, dtype[, ...])

Torch dataset that parses LArCV entries into SPINE products.

dataset.HDF5Dataset([dtype, staged, stage, ...])

Torch dataset wrapper around flat or staged HDF5 readers.

dataset.MixedDataset(larcv, hdf5, dtype[, ...])

Torch dataset that merges aligned samples from LArCV and HDF5.

Parsers

Parsers translate raw reader outputs into framework-neutral parser products. The HDF5 parser layer includes generic tensor, index, and object parsers for cached data products.

parse.base

Shared parser base classes and input-data plumbing.

parse.data

Data structures used as canonical outputs of IO parsers.

parse.clean_data

Module which contains functions used to clean up cluster data.

parse.hdf5.tensor

Lightweight parsers for cached HDF5 tensor products.

parse.hdf5.index

Lightweight parsers for cached HDF5 index products.

parse.hdf5.object

Lightweight parsers for cached HDF5 object products.

parse.larcv.misc

Module that contains parsers that do not fit in other categories.

parse.larcv.sparse

Module that contains all parsers related to LArCV sparse data.

parse.larcv.cluster

Module that contains all parsers related to LArCV cluster data.

parse.larcv.particle

Module that contains all parsers related to LArCV particle data.

Data Pipeline Utilities

Tools for dataset preparation, augmentation, collation, and sampling.

collate

Contains implementations of data collation classes.

sample

Used to define which dataset entries to load at each iteration

augment

Data augmentation managers and modules.

overlay

Module with methods to overlay multiple events.

factories

Functions that instantiate IO tools from configuration blocks.