Input/Output Module

The spine.io module handles data ingress and egress for SPINE jobs. It provides readers and writers for event data, parsers that translate raw storage products into SPINE parser objects, and the dataset/collation tools used during model training and inference.

Input/output tools for SPINE.

Readers and writers are storage-format adapters. Parsers convert source formats into framework-neutral parser products. Datasets, collation, augmentation, and sampling are generic data pipeline tools used by training and inference configurations.

Overview

The I/O layer is organized into a few cooperating pieces:

Readers expose event products from on-disk formats such as HDF5 and LArCV.
Writers persist flat outputs and staged cache products.
Parsers convert raw reader outputs into SPINE parser products used by downstream code.
Datasets and pipeline utilities bridge readers/parsers into PyTorch data loading workflows.

This is the first stage of the driver pipeline and the point where external detector data is mapped into SPINE’s internal data structures.

File Readers

`read.HDF5Reader`([file_keys, file_list, ...])	Class which reads information stored in HDF5 files.
`read.LArCVReader`([file_keys, file_list, ...])	Class which reads information stored in LArCV files.
`read.StageHDF5Reader`([stage, file_keys, ...])	Read products stored under one or more stage groups in a cache file.

File Writers

`write.HDF5Writer`([file_name, directory, ...])	Writes data to an HDF5 file.
`write.CSVWriter`([file_name, directory, ...])	Writes data to a CSV file with optimized performance.
`write.StageHDF5Writer`([file_name, ...])	Write additive stage caches to one HDF5 file per source file.

Datasets

The dataset layer bridges low-level readers and parser logic into PyTorch Dataset objects. The staged cache workflow is exposed through the HDF5 dataset and the mixed LArCV/HDF5 dataset.

`dataset.LArCVDataset`(schema, dtype[, ...])	Torch dataset that parses LArCV entries into SPINE products.
`dataset.HDF5Dataset`([dtype, staged, stage, ...])	Torch dataset wrapper around flat or staged HDF5 readers.
`dataset.MixedDataset`(larcv, hdf5, dtype[, ...])	Torch dataset that merges aligned samples from LArCV and HDF5.

Parsers

Parsers translate raw reader outputs into framework-neutral parser products. The HDF5 parser layer includes generic tensor, index, and object parsers for cached data products.

`parse.base`	Shared parser base classes and input-data plumbing.
`parse.data`	Data structures used as canonical outputs of IO parsers.
`parse.clean_data`	Module which contains functions used to clean up cluster data.
`parse.hdf5.tensor`	Lightweight parsers for cached HDF5 tensor products.
`parse.hdf5.index`	Lightweight parsers for cached HDF5 index products.
`parse.hdf5.object`	Lightweight parsers for cached HDF5 object products.
`parse.larcv.misc`	Module that contains parsers that do not fit in other categories.
`parse.larcv.sparse`	Module that contains all parsers related to LArCV sparse data.
`parse.larcv.cluster`	Module that contains all parsers related to LArCV cluster data.
`parse.larcv.particle`	Module that contains all parsers related to LArCV particle data.

Data Pipeline Utilities

Tools for dataset preparation, augmentation, collation, and sampling.

`collate`	Contains implementations of data collation classes.
`sample`	Used to define which dataset entries to load at each iteration
`augment`	Data augmentation managers and modules.
`overlay`	Module with methods to overlay multiple events.
`factories`	Functions that instantiate IO tools from configuration blocks.