ftag.vds#

Functions#

parse_args([args])

get_virtual_layout(→ h5py.VirtualLayout)

Concatenate group from multiple files into a single VirtualDataset.

glob_re(→ list[str] | None)

Return list of filenames that match REGEX pattern inside regex_path.

regex_files_from_dir(→ list[str] | None)

Turn a list of basenames into full paths; dive into sub-dirs if needed.

sum_counts_once(→ numpy.ndarray)

Reduce the arrays in the counts dataset for one file to a scalar via summation.

check_subgroups(→ list[str])

Check which subgroups are available for the bookkeeper.

aggregate_cutbookkeeper(→ dict[str, numpy.ndarray] | None)

Aggregate the cutBookkeeper in the input files.

create_virtual_file(→ pathlib.Path)

Create the virtual dataset file for the given inputs.

main(→ None)

Module Contents#

ftag.vds.parse_args(args=None)#
ftag.vds.get_virtual_layout(fnames: list[str], group: str) h5py.VirtualLayout#

Concatenate group from multiple files into a single VirtualDataset.

Parameters:
  • fnames (list[str]) – List with the file names

  • group (str) – Name of the group that is concatenated

Returns:

Virtual layout of the new virtual dataset

Return type:

h5py.VirtualLayout

ftag.vds.glob_re(pattern: str | None, regex_path: str | None) list[str] | None#

Return list of filenames that match REGEX pattern inside regex_path.

Parameters:
  • pattern (str) – Pattern for the input files

  • regex_path (str) – Regex path for the input files

Returns:

List of the file basenames that matched the regex pattern

Return type:

list[str]

ftag.vds.regex_files_from_dir(reg_matched_fnames: list[str] | None, regex_path: str | None) list[str] | None#

Turn a list of basenames into full paths; dive into sub-dirs if needed.

Parameters:
  • reg_matched_fnames (list[str]) – List of the regex matched file names

  • regex_path (str) – Regex path for the input files

Returns:

List of file paths (as strings) that matched the regex and any subsequent globbing inside matched directories.

Return type:

list[str]

ftag.vds.sum_counts_once(counts: numpy.ndarray) numpy.ndarray#

Reduce the arrays in the counts dataset for one file to a scalar via summation.

Parameters:

counts (np.ndarray) – Array from the h5py dataset (counts) from the cutBookkeeper groups

Returns:

Array with the summed variables for the file

Return type:

np.ndarray

ftag.vds.check_subgroups(fnames: list[str], group_name: str = 'cutBookkeeper') list[str]#

Check which subgroups are available for the bookkeeper.

Find the intersection of sub-group names that have a ‘counts’ dataset in every input file. (Using the intersection makes the script robust even if a few files are missing a variation.)

Parameters:
  • fnames (list[str]) – List of the input files

  • group_name (str, optional) – Group name in the h5 files of the bookkeeper, by default “cutBookkeeper”

Returns:

Returns the files with common sub-groups

Return type:

set[str]

Raises:
  • KeyError – When a file does not have a bookkeeper

  • ValueError – When no common bookkeeper sub-groups were found

ftag.vds.aggregate_cutbookkeeper(fnames: list[str], group_name: str = 'cutBookkeeper') dict[str, numpy.ndarray] | None#

Aggregate the cutBookkeeper in the input files.

For every input file: For every sub-group (nominal, sysUp, sysDown, …): 1. Sum the 4-entry record array inside each file into 1 record 1. Add those records from all files together into grand total Returns a dict {subgroup_name: scalar-record-array}

Parameters:

fnames (list[str]) – List of the input files

Returns:

Dict with the accumulated cutBookkeeper groups. If the cut bookkeeper is not in the files, return None.

Return type:

dict[str, np.ndarray] | None

ftag.vds.create_virtual_file(pattern: pathlib.Path | str, out_fname: pathlib.Path | str | None = None, use_regex: bool = False, regex_path: str | None = None, overwrite: bool = False, bookkeeper_name: str = 'cutBookkeeper') pathlib.Path#

Create the virtual dataset file for the given inputs.

Parameters:
  • pattern (Path | str) – Pattern of the input files used. Wildcard is supported

  • out_fname (Path | str | None, optional) – Output path to which the virtual dataset file is written. By default None

  • use_regex (bool, optional) – If you want to use regex instead of glob, by default False

  • regex_path (str | None, optional) – Regex logic used to define the input files, by default None

  • overwrite (bool, optional) – Decide, if an existing output file is overwritten, by default False

  • bookkeeper_name (str, optional) – Name of the cut bookkeeper in the h5 files.

Returns:

Path object of the path to which the output file is written

Return type:

Path

Raises:
  • FileNotFoundError – If not input files were found for the given pattern

  • ValueError – If no output file is given and the input comes from multiple directories

ftag.vds.main(args=None) None#