spine.io.read.StageHDF5Reader

class spine.io.read.StageHDF5Reader(stage: str | None = None, file_keys: str | list[str] | None = None, file_list: str | None = None, limit_num_files: int | None = None, max_print_files: int = 10, n_entry: int | None = None, n_skip: int | None = None, entry_list: list[int] | None = None, skip_entry_list: list[int] | None = None, build_classes: bool = True, skip_unknown_attrs: bool = False, allow_missing: bool = False, keep_open: bool = True, swmr: bool = False, ignore_incomplete: bool = False, stage_map: Mapping[str, str] | None = None, keys: Sequence[str] | None = None)[source]

Read products stored under one or more stage groups in a cache file.

The reader exposes the same event-level interface as HDF5Reader, but resolves requested product keys under /stages/<stage> instead of the flat top-level namespace.

Attributes:
run_info
run_map

Methods

check_stage_complete(stage_group, path, stage)

Reject incomplete stages unless explicitly allowed.

close()

Close any persistent HDF5 handles owned by this reader.

get(idx)

Return one merged cache entry.

get_file_entry_index(idx)

Returns the index of an entry within the file it lives in, provided a global index over the list of files.

get_file_index(idx)

Returns the index of the file corresponding to a specific entry.

get_file_path(idx)

Returns the path to the file corresponding to a specific entry.

get_run_event(run, subrun, event)

Returns an entry corresponding to a specific (run, subrun, event) triplet.

get_run_event_index(run, subrun, event)

Returns an entry index corresponding to a specific (run, subrun, event) triplet.

get_source_provenance(file_idx, file_entry_idx)

Return lightweight source-file provenance for one entry.

get_stage_group(in_file, path, stage)

Return one named stage group.

get_stage_lengths(in_file, path, ...)

Return the event count of each referenced stage.

get_stages_group(in_file, path)

Return the top-level stages group.

is_remote_path(path)

Checks whether a path points to a remote resource.

list_stage_keys(stage_group)

List product keys stored in one stage group.

load_key(in_file, event, data, key)

Fetch a specific key for a specific event.

parse_entry_list(list_source)

Parses a list into an np.ndarray.

parse_run_event_list(list_source)

Parses a list of (run, subrun, event) triplets into an np.ndarray.

process_cfg()

Return the stored configuration for the referenced stage(s), if any.

process_entry_list([n_entry, n_skip, ...])

Create a list of entries that can be accessed by __getitem__().

process_file_paths([file_keys, file_list, ...])

Process list of files.

process_run_info()

Process the run information.

process_version()

Returns the SPINE release version used to produce the HDF5 file.

read_source_info(in_file)

Return top-level source provenance stored in the cache file.

resolve_object_class(class_name, array)

Resolve an HDF5 object class name to the concrete SPINE class.

resolve_product_stages(in_file, path)

Resolve each requested product key to one stage.

validate_stage_lengths(path, stage_lengths)

Ensure all referenced stages in one file have the same length.

__init__(stage: str | None = None, file_keys: str | list[str] | None = None, file_list: str | None = None, limit_num_files: int | None = None, max_print_files: int = 10, n_entry: int | None = None, n_skip: int | None = None, entry_list: list[int] | None = None, skip_entry_list: list[int] | None = None, build_classes: bool = True, skip_unknown_attrs: bool = False, allow_missing: bool = False, keep_open: bool = True, swmr: bool = False, ignore_incomplete: bool = False, stage_map: Mapping[str, str] | None = None, keys: Sequence[str] | None = None) None[source]

Initialize the stage-cache reader.

Parameters:
  • stage (str, optional) – Default stage from which to load products. If omitted, keys are searched across all stages and must resolve uniquely.

  • stage_map (mapping, optional) – Explicit map from product keys to stage names. This overrides the default stage on a per-product basis.

  • keys (sequence[str], optional) – Product keys that should be exposed by the reader. If omitted, all products from the selected stage(s) are exposed.

  • file_keys – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • file_list – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • limit_num_files – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • max_print_files – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • n_entry – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • n_skip – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • entry_list – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • skip_entry_list – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • build_classes – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • skip_unknown_attrs – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • allow_missing – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • keep_open – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • swmr – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

  • ignore_incomplete – See spine.io.read.HDF5Reader. These options control file discovery, entry selection, object reconstruction, file-handle lifetime, and incomplete-stage handling.

Methods

__init__([stage, file_keys, file_list, ...])

Initialize the stage-cache reader.

check_stage_complete(stage_group, path, stage)

Reject incomplete stages unless explicitly allowed.

close()

Close any persistent HDF5 handles owned by this reader.

get(idx)

Return one merged cache entry.

get_file_entry_index(idx)

Returns the index of an entry within the file it lives in, provided a global index over the list of files.

get_file_index(idx)

Returns the index of the file corresponding to a specific entry.

get_file_path(idx)

Returns the path to the file corresponding to a specific entry.

get_run_event(run, subrun, event)

Returns an entry corresponding to a specific (run, subrun, event) triplet.

get_run_event_index(run, subrun, event)

Returns an entry index corresponding to a specific (run, subrun, event) triplet.

get_source_provenance(file_idx, file_entry_idx)

Return lightweight source-file provenance for one entry.

get_stage_group(in_file, path, stage)

Return one named stage group.

get_stage_lengths(in_file, path, ...)

Return the event count of each referenced stage.

get_stages_group(in_file, path)

Return the top-level stages group.

is_remote_path(path)

Checks whether a path points to a remote resource.

list_stage_keys(stage_group)

List product keys stored in one stage group.

load_key(in_file, event, data, key)

Fetch a specific key for a specific event.

parse_entry_list(list_source)

Parses a list into an np.ndarray.

parse_run_event_list(list_source)

Parses a list of (run, subrun, event) triplets into an np.ndarray.

process_cfg()

Return the stored configuration for the referenced stage(s), if any.

process_entry_list([n_entry, n_skip, ...])

Create a list of entries that can be accessed by __getitem__().

process_file_paths([file_keys, file_list, ...])

Process list of files.

process_run_info()

Process the run information.

process_version()

Returns the SPINE release version used to produce the HDF5 file.

read_source_info(in_file)

Return top-level source provenance stored in the cache file.

resolve_object_class(class_name, array)

Resolve an HDF5 object class name to the concrete SPINE class.

resolve_product_stages(in_file, path)

Resolve each requested product key to one stage.

validate_stage_lengths(path, stage_lengths)

Ensure all referenced stages in one file have the same length.

Attributes

name

run_info

run_map

source_keys

file_paths

file_index

file_offsets

entry_index

num_entries

name: str = 'stage_hdf5'
static get_stages_group(in_file: File, path: str) Group[source]

Return the top-level stages group.

Parameters:
  • in_file (h5py.File) – Open cache file handle.

  • path (str) – File path used to build informative error messages.

classmethod get_stage_group(in_file: File, path: str, stage: str) Group[source]

Return one named stage group.

Parameters:
  • in_file (h5py.File) – Open cache file handle.

  • path (str) – File path used to build informative error messages.

  • stage (str) – Name of the stage group to load under /stages.

static read_source_info(in_file: File) dict[str, object][source]

Return top-level source provenance stored in the cache file.

Parameters:

in_file (h5py.File) – Open cache file handle.

Returns:

File-level provenance dictionary. If the cache predates the source group convention, this returns an empty dictionary.

Return type:

dict[str, object]

list_stage_keys(stage_group: Group) tuple[str, ...][source]

List product keys stored in one stage group.

This excludes the administrative info and events members.

resolve_product_stages(in_file: File, path: str) dict[str, str][source]

Resolve each requested product key to one stage.

Resolution order is:

  1. explicit stage_map entry for the key

  2. dataset-level default stage

  3. automatic discovery across all available stages

Automatic discovery requires a unique match. If the same product name appears in multiple stages, the caller must disambiguate it.

check_stage_complete(stage_group: Group, path: str, stage: str) None[source]

Reject incomplete stages unless explicitly allowed.

Parameters:
  • stage_group (h5py.Group) – Resolved stage group.

  • path (str) – Cache file path.

  • stage (str) – Stage name used in the error message.

get_stage_lengths(in_file: File, path: str, product_stage_map: Mapping[str, str]) dict[str, int][source]

Return the event count of each referenced stage.

Parameters:
  • in_file (h5py.File) – Open cache file handle.

  • path (str) – Cache file path.

  • product_stage_map (mapping) – Mapping from requested raw product key to resolved stage name.

static validate_stage_lengths(path: str, stage_lengths: Mapping[str, int]) int[source]

Ensure all referenced stages in one file have the same length.

Returns:

Shared number of entries across all referenced stages.

Return type:

int

process_cfg() dict[str, object] | None[source]

Return the stored configuration for the referenced stage(s), if any.

Returns:

Parsed YAML configuration stored under stage metadata. A single stage yields its parsed object directly; multiple stages return a mapping from stage name to parsed object.

Return type:

dict or object or None

get(idx: int) dict[str, object][source]

Return one merged cache entry.

Parameters:

idx (int) – Dataset entry index in the staged cache.

Returns:

Raw merged event dictionary containing standard metadata plus all requested stage products for the selected entry.

Return type:

dict[str, object]