ftag#
atlas-ftag-tools - Common tools for ATLAS flavour tagging software.
Submodules#
Attributes#
Exceptions#
Raised when a Git-related precondition is not satisfied. |
Classes#
Functions#
|
Calculate the best fraction values for a given tagger and working point. |
|
Ensure the local clone's |
|
Raise if the repository at |
|
Create an annotated Git tag and push it to |
|
Return the short commit hash for |
|
Return whether |
|
Get a mock file for testing. |
Package Contents#
- ftag.__version__ = 'v0.3.2'#
- class ftag.Cuts#
Cuts dataclass to store multiple Cut instances and apply them.
- __post_init__()#
- property variables: list[str]#
- ignore(variables: list[str])#
- __call__(array: numpy.ndarray) CutsResult#
- __len__() int#
- __iter__() collections.abc.Iterator#
- __getitem__(variable)#
- __repr__() str#
- ftag.calculate_best_fraction_values(jets: numpy.ndarray, tagger: str, signal: ftag.labels.Label, flavours: ftag.labels.LabelContainer, working_point: float, rejection_weights: dict | None = None, optimizer_method: str = 'Powell') dict#
Calculate the best fraction values for a given tagger and working point.
- Parameters:
jets (np.ndarray) – Loaded jets
tagger (str) – Name of the tagger
signal (Label) – Label instance of the signal
flavours (LabelContainer) – LabelContainer with all flavours
working_point (float) – Working point that is used
rejection_weights (dict | None, optional) – Rejection weights for the background classes, by default None
optimizer_method (str, optional) – Optimizer method for the minimization, by default “Powell”
- Returns:
Dict with the best fraction values
- Return type:
dict
- exception ftag.GitError#
Bases:
ExceptionRaised when a Git-related precondition is not satisfied.
Initialize self. See help(type(self)) for accurate signature.
- ftag.check_for_fork(path: str | os.PathLike[str], upstream: str) None#
Ensure the local clone’s
originremote is a fork ofupstream.- Parameters:
path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.
upstream (str) – Expected upstream repository URL substring (e.g.
'github.com/org/repo').
- Raises:
GitError – If the repository is present but its
originURL does not containupstream.
Notes
If
pathis not a Git repository, the function returns silently.
- ftag.check_for_uncommitted_changes(path: str | os.PathLike[str]) None#
Raise if the repository at
pathhas uncommitted changes.- Parameters:
path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.
- Raises:
GitError – If
pathis a Git repository and there are uncommitted changes.
Notes
If
pathis not a Git repository, the function returns silently.If the current process is running under
pytest(detected viasys.modules), the check is skipped and the function returns.
- ftag.create_and_push_tag(path: str | os.PathLike[str], upstream: str, tagname: str, msg: str) None#
Create an annotated Git tag and push it to
origin.- Parameters:
path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.
upstream (str) – Expected upstream repository URL substring; passed to
check_for_fork().tagname (str) – Name of the tag to create.
msg (str) – Annotation message for the tag (
git tag -m).
Notes
If
pathis not a Git repository, the function returns silently.
- ftag.get_git_hash(path: str | os.PathLike[str]) str | None#
Return the short commit hash for
HEADatpath, if available.- Parameters:
path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.
- Returns:
The short (
--short) commit hash as a string, orNoneifpathis not a Git repository.- Return type:
str | None
- ftag.is_git_repo(path: str | os.PathLike[str]) bool#
Return whether
pathis inside a Git working tree.- Parameters:
path (str | PathLike[str]) – Filesystem path used as the current working directory for the Git command.
- Returns:
Trueifpathis inside a Git working tree,Falseotherwise.- Return type:
bool
Notes
This function runs:
git rev-parse --is-inside-work-tree HEAD
Any non-zero exit status is treated as “not a Git repository”. If Git is not available on the system, an
OSErrormay be raised bysubprocess.
- ftag.get_mock_file(num_jets: int = 1000, fname: str | None = None, tracks_name: str = 'tracks', num_tracks: int = 40) tuple[str, h5py.File]#
Get a mock file for testing.
- Parameters:
num_jets (int, optional) – Number of jets in the file, by default 1000
fname (str | None, optional) – Name of the file, by default None
tracks_name (str, optional) – Name of the tracks dataset, by default “tracks”
num_tracks (int, optional) – Number of tracks per jet, by default 40
- Returns:
Tuple with the path and the h5 file
- Return type:
tuple[str, h5py.File]
- class ftag.Sample#
Dataclass which holds info about a specific sample.
- pattern#
Filepattern for the h5 files
- Type:
Path | str | tuple[Path | str, …]
- ntuple_dir#
Ntuple directory where the h5 files are stored, by default None
- Type:
Path | str | None, optional
- name#
Name of the sample, for internal identification, by default None
- Type:
str | None, optional
- weights#
List of weights for this sample, by default None
- Type:
list[float] | None, optional
- skip_checks#
Decide, if certain checks are skipped, by default False
- Type:
bool, optional
- vds_dir#
Directory where virtual datasets will be stored if wildcard is used, by default None. If None, the virtual files will be created in the same directory as the input files.
- Type:
Path | str | None, optional
- pattern: pathlib.Path | str | tuple[pathlib.Path | str, Ellipsis]#
- ntuple_dir: pathlib.Path | str | None = None#
- name: str | None = None#
- weights: list[float] | None = None#
- skip_checks: bool = False#
- vds_dir: pathlib.Path | str | None = None#
- __post_init__() None#
- property path: tuple[pathlib.Path, Ellipsis]#
- property files: list[str]#
- property num_files: int#
- property dsid: list[str]#
- property sample_id: list[str]#
- property tags: list[str]#
- property ptag: list[str]#
- property rtag: list[str]#
- property dumper_tag: list[str]#
- virtual_file(**kwargs) list[pathlib.Path | str]#
- __str__()#
- __lt__(other)#
- __eq__(other)#
- class ftag.Transform#
Apply variable name remapping, integer remapping, and float transformations.
The Transform class provides a unified mechanism to perform: - variable renaming (variable_map) - integer value remapping (ints_map) - float transformations (floats_map)
Each transformation is applied to a batch consisting of a dictionary of structured numpy arrays.
- variable_map#
A nested mapping where variable_map[group][old] = new specifies how variable names should be renamed inside a given group. If None, no variable renaming is applied.
- Type:
dict[str, dict[str, str]]
- ints_map#
A nested mapping where ints_map[group][variable][old] = new specifies how integer values should be remapped. If None, no integer remapping is applied.
- Type:
dict[str, dict[str, dict[int, int]]]
- floats_map#
A nested mapping where floats_map[group][variable] = func specifies a float transformation function. func may either be: - a callable - a string giving the name of a numpy function (e.g. “log”)
Strings are resolved to numpy.<func> automatically.
- Type:
dict[str, dict[str, str | Callable]]
- variable_map_inv#
Automatically generated inverse of variable_map used for reverse variable lookup in
map_variable_names().- Type:
dict[str, dict[str, str]]
- variable_map: dict[str, dict[str, str]]#
- ints_map: dict[str, dict[str, dict[int, int]]]#
- floats_map: dict[str, dict[str, str | collections.abc.Callable]]#
- variable_map_inv: dict[str, dict[str, str]]#
- __post_init__() None#
Initialize internal maps and convert float transformation strings.
This method ensures that variable_map, ints_map, and floats_map are always dictionaries (never None), constructs the inverse variable map, and converts any string-based float transformations into their numpy equivalents.
- __call__(batch: Batch) Batch#
Apply integer remapping, float transformations, and variable renaming.
- Parameters:
batch (Batch) – A mapping from group name to structured numpy arrays.
- Returns:
The transformed batch.
- Return type:
Batch
- map_variables(batch: Batch) Batch#
Rename variables in each group according to variable_map.
- Parameters:
batch (Batch) – Dictionary mapping group names to structured numpy arrays.
- Returns:
The batch with variables renamed where applicable.
- Return type:
Batch
- map_ints(batch: Batch) Batch#
Remap integer values for specified variables inside each group.
- Parameters:
batch (Batch) – Dictionary mapping group names to structured numpy arrays.
- Returns:
The batch with integer values remapped.
- Return type:
Batch
- map_floats(batch: Batch) Batch#
Apply float transformations to selected variables.
- Parameters:
batch (Batch) – Dictionary mapping group names to structured numpy arrays.
- Returns:
The batch with float transformations applied.
- Return type:
Batch
- map_dtype(name: str, dtype: numpy.dtype) numpy.dtype#
Compute a new dtype with renamed fields according to variable_map.
- Parameters:
name (str) – Group name associated with the dtype.
dtype (np.dtype) – Structured dtype whose field names may be modified.
- Returns:
A dtype with renamed fields where required.
- Return type:
np.dtype
- Raises:
ValueError – When the variables already exist in the dataset.
- map_variable_names(name: str, variables: list[str], inverse: bool = False) list[str]#
Map a list of variable names using variable_map or variable_map_inv.
- Parameters:
name (str) – Group name used to select the appropriate name-mapping dictionary.
variables (list[str]) – List of variable names to be mapped.
inverse (bool, optional) – If False (default), apply variable_map. If True, apply the inverse mapping variable_map_inv.
- Returns:
A new list of mapped variable names.
- Return type:
list[str]