ftag.hdf5.h5add_col#
Functions#
|
Merges a list of dictionaries. |
|
Returns a dictionary with the correct output shapes for the H5Writer. |
|
Returns a dictionary with all the groups in the h5 file. |
|
Appends one or more columns to one or more groups in an h5 file. |
|
Attempts to load the function specified by func_path. |
|
|
|
Module Contents#
- ftag.hdf5.h5add_col.merge_dicts(dicts: list[dict[str, dict[str, numpy.ndarray]]]) dict[str, dict[str, numpy.ndarray]] #
Merges a list of dictionaries.
- Each dict is of the form:
- {
- group1: {
variable_1: np.array variable_2: np.array
}, group2: {
variable_1: np.array variable_2: np.array
}
}
E.g.
- dict1 = {
- “jets”: {
“pt”: np.array([1, 2, 3]), “eta”: np.array([4, 5, 6])
},
} dict2 = {
- “jets”: {
“phi”: np.array([7, 8, 9]), “energy”: np.array([10, 11, 12])
},
}
- merged = {
- “jets”: {
“pt”: np.array([1, 2, 3]), “eta”: np.array([4, 5, 6]), “phi”: np.array([7, 8, 9]), “energy”: np.array([10, 11, 12])
}
}
- Parameters:
dicts (list[dict[str, dict[str, np.ndarray]]]) – List of dictionaries to merge. Each dictionary should be of the form:
- Returns:
Merged dictionary of the form: {
- group1: {
variable_1: np.array variable_2: np.array
}, group2: {
variable_1: np.array variable_2: np.array
}
}
- Return type:
dict[str, dict[str, np.ndarray]]
- Raises:
ValueError – If a variable already exists in the merged dictionary.
- ftag.hdf5.h5add_col.get_shape(num_jets: int, batch: dict[str, numpy.ndarray]) dict[str, tuple[int, Ellipsis]] #
Returns a dictionary with the correct output shapes for the H5Writer.
- Parameters:
num_jets (int) – Number of jets to write in total
batch (dict[str, np.ndarray]) – Dictionary representing the batch
- Returns:
Dictionary with the shapes of the output arrays
- Return type:
dict[str, tuple[int, …]]
- ftag.hdf5.h5add_col.get_all_groups(file: pathlib.Path | str) dict[str, None] #
Returns a dictionary with all the groups in the h5 file.
- Parameters:
file (Path | str) – Path to the h5 file
- Returns:
A dictionary with all the groups in the h5 file as keys and None as values, such that h5read.stream(all_groups) will return all the groups in the file.
- Return type:
dict[str, None]
- ftag.hdf5.h5add_col.h5_add_column(input_file: str | pathlib.Path, output_file: str | pathlib.Path, append_function: Callable | list[Callable], num_jets: int = -1, input_groups: list[str] | None = None, output_groups: list[str] | None = None, reader_kwargs: dict | None = None, writer_kwargs: dict | None = None, overwrite: bool = False) None #
Appends one or more columns to one or more groups in an h5 file.
- Parameters:
input_file (str | Path) – Input h5 file to read from.
output_file (str | Path) – Output h5 file to write to.
append_function (callable | list[callable]) –
A function, or list of functions, which take a batch from H5Reader and returns a dictionary of the form:
- {
- group1{
new_column1 : data, new_column2 : data,
}, group2 : {
new_column3 : data, new_column4 : data,
}
num_jets (int, optional) – Number of jets to read from the input file. If -1, reads all jets. By default -1.
input_groups (list[str] | None, optional) – List of groups to read from the input file. If None, reads all groups. By default None.
output_groups (list[str] | None, optional) – List of groups to write to the output file. If None, writes all groups. By default None. Note that this is a subset of the input groups, and must include all groups that the append functions wish to write to.
reader_kwargs (dict, optional) – Additional arguments to pass to the H5Reader. By default None.
writer_kwargs (dict, optional) – Additional arguments to pass to the H5Writer. By default None.
overwrite (bool, optional) – If True, will overwrite the output file if it exists. By default False. If False, will raise a FileExistsError if the output file exists. If None, will check if the output file exists and raise an error if it does unless overwrite is True.
- Raises:
FileNotFoundError – If the input file does not exist.
FileExistsError – If the output file exists and overwrite is False.
ValueError – If the new variable already exists, shape is incorrect, or the output group is not in the input groups.
- ftag.hdf5.h5add_col.parse_append_function(func_path: str) Callable #
Attempts to load the function specified by func_path. The function should be specified as ‘path/to/file.py:function_name’.
- Parameters:
func_path (str) – Path to the function to load. Should be of the form ‘path/to/file.py:function_name’.
- Returns:
The function specified by func_path.
- Return type:
Callable
- Raises:
ValueError – If the function path is not of the form ‘path/to/file.py:function_name’.
FileNotFoundError – If the file does not exist.
ImportError – If the file cannot be imported.
AttributeError – If the function does not exist in the file.
- ftag.hdf5.h5add_col.get_args(args)#
- ftag.hdf5.h5add_col.main(args=None)#