ftag.vds
========

.. py:module:: ftag.vds


Functions
---------

.. autoapisummary::

   ftag.vds.parse_args
   ftag.vds.get_virtual_layout
   ftag.vds.glob_re
   ftag.vds.regex_files_from_dir
   ftag.vds.sum_counts_once
   ftag.vds.check_subgroups
   ftag.vds.aggregate_cutbookkeeper
   ftag.vds.create_virtual_file
   ftag.vds.main


Module Contents
---------------

.. py:function:: parse_args(args=None)

.. py:function:: get_virtual_layout(fnames: list[str], group: str) -> h5py.VirtualLayout

   Concatenate group from multiple files into a single VirtualDataset.

   :param fnames: List with the file names
   :type fnames: list[str]
   :param group: Name of the group that is concatenated
   :type group: str

   :returns: Virtual layout of the new virtual dataset
   :rtype: h5py.VirtualLayout


.. py:function:: glob_re(pattern: str | None, regex_path: str | None) -> list[str] | None

   Return list of filenames that match REGEX pattern inside regex_path.

   :param pattern: Pattern for the input files
   :type pattern: str
   :param regex_path: Regex path for the input files
   :type regex_path: str

   :returns: List of the file basenames that matched the regex pattern
   :rtype: list[str]


.. py:function:: regex_files_from_dir(reg_matched_fnames: list[str] | None, regex_path: str | None) -> list[str] | None

   Turn a list of basenames into full paths; dive into sub-dirs if needed.

   :param reg_matched_fnames: List of the regex matched file names
   :type reg_matched_fnames: list[str]
   :param regex_path: Regex path for the input files
   :type regex_path: str

   :returns: List of file paths (as strings) that matched the regex and any subsequent
             globbing inside matched directories.
   :rtype: list[str]


.. py:function:: sum_counts_once(counts: numpy.ndarray) -> numpy.ndarray

   Reduce the arrays in the counts dataset for one file to a scalar via summation.

   :param counts: Array from the h5py dataset (counts) from the cutBookkeeper groups
   :type counts: np.ndarray

   :returns: Array with the summed variables for the file
   :rtype: np.ndarray


.. py:function:: check_subgroups(fnames: list[str], group_name: str = 'cutBookkeeper') -> list[str]

   Check which subgroups are available for the bookkeeper.

   Find the intersection of sub-group names that have a 'counts' dataset
   in every input file. (Using the intersection makes the script robust
   even if a few files are missing a variation.)

   :param fnames: List of the input files
   :type fnames: list[str]
   :param group_name: Group name in the h5 files of the bookkeeper, by default "cutBookkeeper"
   :type group_name: str, optional

   :returns: Returns the files with common sub-groups
   :rtype: set[str]

   :raises KeyError: When a file does not have a bookkeeper
   :raises ValueError: When no common bookkeeper sub-groups were found


.. py:function:: aggregate_cutbookkeeper(fnames: list[str], group_name: str = 'cutBookkeeper') -> dict[str, numpy.ndarray] | None

   Aggregate the cutBookkeeper in the input files.

   For every input file:
   For every sub-group (nominal, sysUp, sysDown, …):
   1. Sum the 4-entry record array inside each file into 1 record
   1. Add those records from all files together into grand total
   Returns a dict  {subgroup_name: scalar-record-array}

   :param fnames: List of the input files
   :type fnames: list[str]

   :returns: Dict with the accumulated cutBookkeeper groups. If the cut bookkeeper
             is not in the files, return None.
   :rtype: dict[str, np.ndarray] | None


.. py:function:: create_virtual_file(pattern: pathlib.Path | str, out_fname: pathlib.Path | str | None = None, use_regex: bool = False, regex_path: str | None = None, overwrite: bool = False, bookkeeper_name: str = 'cutBookkeeper') -> pathlib.Path

   Create the virtual dataset file for the given inputs.

   :param pattern: Pattern of the input files used. Wildcard is supported
   :type pattern: Path | str
   :param out_fname: Output path to which the virtual dataset file is written. By default None
   :type out_fname: Path | str | None, optional
   :param use_regex: If you want to use regex instead of glob, by default False
   :type use_regex: bool, optional
   :param regex_path: Regex logic used to define the input files, by default None
   :type regex_path: str | None, optional
   :param overwrite: Decide, if an existing output file is overwritten, by default False
   :type overwrite: bool, optional
   :param bookkeeper_name: Name of the cut bookkeeper in the h5 files.
   :type bookkeeper_name: str, optional

   :returns: Path object of the path to which the output file is written
   :rtype: Path

   :raises FileNotFoundError: If not input files were found for the given pattern
   :raises ValueError: If no output file is given and the input comes from multiple directories


.. py:function:: main(args=None) -> None