spine.io.dataset.HDF5Dataset
- class spine.io.dataset.HDF5Dataset(dtype: str | None = None, staged: bool = False, stage: str | None = None, schema: Mapping[str, Mapping[str, Any]] | None = None, keys: Sequence[str] | None = None, skip_keys: Sequence[str] | None = None, data_types: Mapping[str, str] | None = None, overlay_methods: Mapping[str, str] | None = None, augment: Mapping[str, Any] | None = None, **kwargs: Any)[source]
Torch dataset wrapper around flat or staged HDF5 readers.
The dataset can operate in two modes:
flat HDF5 mode, backed by
spine.io.read.HDF5Readerstaged cache mode, backed by
spine.io.read.StageHDF5Reader
In both cases the dataset exposes a uniform parser-driven interface to the DataLoader layer. Reader-produced metadata such as entry indexes and source provenance are forwarded automatically alongside any parsed products.
- Attributes:
data_keysReturn the names of all data products exposed by the dataset.
data_typesReturn the collate type for each exposed HDF5 product.
overlay_methodsReturn the overlay method for each exposed HDF5 product.
Methods
apply_augmenter(data)Apply the configured augmenter, if present.
build_augmenter(augment)Instantiate the configured augmenter, if any.
index_data_types()Return the standard collate types for metadata keys.
index_overlay_methods()Return the standard overlay methods for metadata keys.
metadata_dict(data)Extract standard dataset metadata from one reader output.
- __init__(dtype: str | None = None, staged: bool = False, stage: str | None = None, schema: Mapping[str, Mapping[str, Any]] | None = None, keys: Sequence[str] | None = None, skip_keys: Sequence[str] | None = None, data_types: Mapping[str, str] | None = None, overlay_methods: Mapping[str, str] | None = None, augment: Mapping[str, Any] | None = None, **kwargs: Any) None[source]
Instantiate the HDF5-backed dataset.
- Parameters:
dtype (str, optional) – Floating-point dtype forwarded to parser factories
staged (bool, default False) – If True, use
StageHDF5Readeras the backend instead of the flatHDF5Readerstage (str, optional) – Default stage name to read when staged=True. Individual schema entries may override this with their own
stagefield.schema (mapping, optional) – Parser schema used to reconstruct higher-level products
keys (sequence[str], optional) – Explicit list of raw HDF5 products to keep
skip_keys (sequence[str], optional) – Explicit list of raw HDF5 products to drop
data_types (mapping, optional) – Explicit collate type overrides for raw-product mode
overlay_methods (mapping, optional) – Explicit overlay-method overrides for raw-product mode
augment (mapping, optional) – Augmentation applied to each loaded sample
**kwargs (Any) – Reader-specific keyword arguments forwarded to the selected HDF5 backend reader
Methods
__init__([dtype, staged, stage, schema, ...])Instantiate the HDF5-backed dataset.
apply_augmenter(data)Apply the configured augmenter, if present.
build_augmenter(augment)Instantiate the configured augmenter, if any.
index_data_types()Return the standard collate types for metadata keys.
index_overlay_methods()Return the standard overlay methods for metadata keys.
metadata_dict(data)Extract standard dataset metadata from one reader output.
Attributes
Return the names of all data products exposed by the dataset.
Return the collate type for each exposed HDF5 product.
Return the overlay method for each exposed HDF5 product.
augmenter- name: ClassVar[str] = 'hdf5'
- parsers: dict[str, Any]
- reader: HDF5Reader | StageHDF5Reader
- property data_types: dict[str, str]
Return the collate type for each exposed HDF5 product.
- Returns:
Mapping from dataset output key to collate type.
- Return type:
dict[str, str]
- property overlay_methods: dict[str, str]
Return the overlay method for each exposed HDF5 product.
- Returns:
Mapping from dataset output key to overlay strategy.
- Return type:
dict[str, str]
- property data_keys: tuple[str, ...]
Return the names of all data products exposed by the dataset.
- Returns:
Ordered tuple of metadata and parser-product keys.
- Return type:
tuple[str, …]