ftag.hdf5.h5add_col#

Functions#

merge_dicts(→ dict[str, dict[str, numpy.ndarray]])

Merges a list of dictionaries.

get_shape(→ dict[str, tuple[int, Ellipsis]])

Returns a dictionary with the correct output shapes for the H5Writer.

get_all_groups(→ dict[str, None])

Returns a dictionary with all the groups in the h5 file.

h5_add_column(→ None)

Appends one or more columns to one or more groups in an h5 file.

parse_append_function(→ Callable)

Attempts to load the function specified by func_path.

get_args(args)

main([args])

Module Contents#

ftag.hdf5.h5add_col.merge_dicts(dicts: list[dict[str, dict[str, numpy.ndarray]]]) dict[str, dict[str, numpy.ndarray]]#

Merges a list of dictionaries.

Each dict is of the form:
{
group1: {

variable_1: np.array variable_2: np.array

}, group2: {

variable_1: np.array variable_2: np.array

}

}

E.g.

dict1 = {
“jets”: {

“pt”: np.array([1, 2, 3]), “eta”: np.array([4, 5, 6])

},

} dict2 = {

“jets”: {

“phi”: np.array([7, 8, 9]), “energy”: np.array([10, 11, 12])

},

}

merged = {
“jets”: {

“pt”: np.array([1, 2, 3]), “eta”: np.array([4, 5, 6]), “phi”: np.array([7, 8, 9]), “energy”: np.array([10, 11, 12])

}

}

Parameters:

dicts (list[dict[str, dict[str, np.ndarray]]]) – List of dictionaries to merge. Each dictionary should be of the form:

Returns:

Merged dictionary of the form: {

group1: {

variable_1: np.array variable_2: np.array

}, group2: {

variable_1: np.array variable_2: np.array

}

}

Return type:

dict[str, dict[str, np.ndarray]]

Raises:

ValueError – If a variable already exists in the merged dictionary.

ftag.hdf5.h5add_col.get_shape(num_jets: int, batch: dict[str, numpy.ndarray]) dict[str, tuple[int, Ellipsis]]#

Returns a dictionary with the correct output shapes for the H5Writer.

Parameters:
  • num_jets (int) – Number of jets to write in total

  • batch (dict[str, np.ndarray]) – Dictionary representing the batch

Returns:

Dictionary with the shapes of the output arrays

Return type:

dict[str, tuple[int, …]]

ftag.hdf5.h5add_col.get_all_groups(file: pathlib.Path | str) dict[str, None]#

Returns a dictionary with all the groups in the h5 file.

Parameters:

file (Path | str) – Path to the h5 file

Returns:

A dictionary with all the groups in the h5 file as keys and None as values, such that h5read.stream(all_groups) will return all the groups in the file.

Return type:

dict[str, None]

ftag.hdf5.h5add_col.h5_add_column(input_file: str | pathlib.Path, output_file: str | pathlib.Path, append_function: Callable | list[Callable], num_jets: int = -1, input_groups: list[str] | None = None, output_groups: list[str] | None = None, reader_kwargs: dict | None = None, writer_kwargs: dict | None = None, overwrite: bool = False) None#

Appends one or more columns to one or more groups in an h5 file.

Parameters:
  • input_file (str | Path) – Input h5 file to read from.

  • output_file (str | Path) – Output h5 file to write to.

  • append_function (callable | list[callable]) –

    A function, or list of functions, which take a batch from H5Reader and returns a dictionary of the form:

    {
    group1{

    new_column1 : data, new_column2 : data,

    }, group2 : {

    new_column3 : data, new_column4 : data,

    }

  • num_jets (int, optional) – Number of jets to read from the input file. If -1, reads all jets. By default -1.

  • input_groups (list[str] | None, optional) – List of groups to read from the input file. If None, reads all groups. By default None.

  • output_groups (list[str] | None, optional) – List of groups to write to the output file. If None, writes all groups. By default None. Note that this is a subset of the input groups, and must include all groups that the append functions wish to write to.

  • reader_kwargs (dict, optional) – Additional arguments to pass to the H5Reader. By default None.

  • writer_kwargs (dict, optional) – Additional arguments to pass to the H5Writer. By default None.

  • overwrite (bool, optional) – If True, will overwrite the output file if it exists. By default False. If False, will raise a FileExistsError if the output file exists. If None, will check if the output file exists and raise an error if it does unless overwrite is True.

Raises:
  • FileNotFoundError – If the input file does not exist.

  • FileExistsError – If the output file exists and overwrite is False.

  • ValueError – If the new variable already exists, shape is incorrect, or the output group is not in the input groups.

ftag.hdf5.h5add_col.parse_append_function(func_path: str) Callable#

Attempts to load the function specified by func_path. The function should be specified as ‘path/to/file.py:function_name’.

Parameters:

func_path (str) – Path to the function to load. Should be of the form ‘path/to/file.py:function_name’.

Returns:

The function specified by func_path.

Return type:

Callable

Raises:
  • ValueError – If the function path is not of the form ‘path/to/file.py:function_name’.

  • FileNotFoundError – If the file does not exist.

  • ImportError – If the file cannot be imported.

  • AttributeError – If the function does not exist in the file.

ftag.hdf5.h5add_col.get_args(args)#
ftag.hdf5.h5add_col.main(args=None)#