ftag#

atlas-ftag-tools - Common tools for ATLAS flavour tagging software.

Submodules#

Attributes#

Exceptions#

GitError

Raised when a Git-related precondition is not satisfied.

Classes#

Cuts

Cuts dataclass to store multiple Cut instances and apply them.

Sample

Dataclass which holds info about a specific sample.

Transform

Apply variable name remapping, integer remapping, and float transformations.

Functions#

calculate_best_fraction_values(→ dict)

Calculate the best fraction values for a given tagger and working point.

check_for_fork(→ None)

Ensure the local clone's origin remote is a fork of upstream.

check_for_uncommitted_changes(→ None)

Raise if the repository at path has uncommitted changes.

create_and_push_tag(→ None)

Create an annotated Git tag and push it to origin.

get_git_hash(→ str | None)

Return the short commit hash for HEAD at path, if available.

is_git_repo(→ bool)

Return whether path is inside a Git working tree.

get_mock_file(→ tuple[str, h5py.File])

Get a mock file for testing.

Package Contents#

ftag.__version__ = 'v0.3.2'#
class ftag.Cuts#

Cuts dataclass to store multiple Cut instances and apply them.

cuts#

Tuple with the Cut instances

Type:

tuple[Cut, …]

cuts: tuple[Cut, Ellipsis]#
classmethod from_list(cuts: list) Cuts#
classmethod empty() Cuts#
__post_init__()#
property variables: list[str]#
ignore(variables: list[str])#
__call__(array: numpy.ndarray) CutsResult#
__add__(other: Cuts)#
__len__() int#
__iter__() collections.abc.Iterator#
__getitem__(variable)#
__repr__() str#
ftag.calculate_best_fraction_values(jets: numpy.ndarray, tagger: str, signal: ftag.labels.Label, flavours: ftag.labels.LabelContainer, working_point: float, rejection_weights: dict | None = None, optimizer_method: str = 'Powell') dict#

Calculate the best fraction values for a given tagger and working point.

Parameters:
  • jets (np.ndarray) – Loaded jets

  • tagger (str) – Name of the tagger

  • signal (Label) – Label instance of the signal

  • flavours (LabelContainer) – LabelContainer with all flavours

  • working_point (float) – Working point that is used

  • rejection_weights (dict | None, optional) – Rejection weights for the background classes, by default None

  • optimizer_method (str, optional) – Optimizer method for the minimization, by default “Powell”

Returns:

Dict with the best fraction values

Return type:

dict

exception ftag.GitError#

Bases: Exception

Raised when a Git-related precondition is not satisfied.

Initialize self. See help(type(self)) for accurate signature.

ftag.check_for_fork(path: str | os.PathLike[str], upstream: str) None#

Ensure the local clone’s origin remote is a fork of upstream.

Parameters:
  • path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.

  • upstream (str) – Expected upstream repository URL substring (e.g. 'github.com/org/repo').

Raises:

GitError – If the repository is present but its origin URL does not contain upstream.

Notes

If path is not a Git repository, the function returns silently.

ftag.check_for_uncommitted_changes(path: str | os.PathLike[str]) None#

Raise if the repository at path has uncommitted changes.

Parameters:

path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.

Raises:

GitError – If path is a Git repository and there are uncommitted changes.

Notes

  • If path is not a Git repository, the function returns silently.

  • If the current process is running under pytest (detected via sys.modules), the check is skipped and the function returns.

ftag.create_and_push_tag(path: str | os.PathLike[str], upstream: str, tagname: str, msg: str) None#

Create an annotated Git tag and push it to origin.

Parameters:
  • path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.

  • upstream (str) – Expected upstream repository URL substring; passed to check_for_fork().

  • tagname (str) – Name of the tag to create.

  • msg (str) – Annotation message for the tag (git tag -m).

Notes

If path is not a Git repository, the function returns silently.

ftag.get_git_hash(path: str | os.PathLike[str]) str | None#

Return the short commit hash for HEAD at path, if available.

Parameters:

path (str | PathLike[str]) – Filesystem path to the repository root or any directory within it.

Returns:

The short (--short) commit hash as a string, or None if path is not a Git repository.

Return type:

str | None

ftag.is_git_repo(path: str | os.PathLike[str]) bool#

Return whether path is inside a Git working tree.

Parameters:

path (str | PathLike[str]) – Filesystem path used as the current working directory for the Git command.

Returns:

True if path is inside a Git working tree, False otherwise.

Return type:

bool

Notes

This function runs:

git rev-parse --is-inside-work-tree HEAD

Any non-zero exit status is treated as “not a Git repository”. If Git is not available on the system, an OSError may be raised by subprocess.

ftag.get_mock_file(num_jets: int = 1000, fname: str | None = None, tracks_name: str = 'tracks', num_tracks: int = 40) tuple[str, h5py.File]#

Get a mock file for testing.

Parameters:
  • num_jets (int, optional) – Number of jets in the file, by default 1000

  • fname (str | None, optional) – Name of the file, by default None

  • tracks_name (str, optional) – Name of the tracks dataset, by default “tracks”

  • num_tracks (int, optional) – Number of tracks per jet, by default 40

Returns:

Tuple with the path and the h5 file

Return type:

tuple[str, h5py.File]

class ftag.Sample#

Dataclass which holds info about a specific sample.

pattern#

Filepattern for the h5 files

Type:

Path | str | tuple[Path | str, …]

ntuple_dir#

Ntuple directory where the h5 files are stored, by default None

Type:

Path | str | None, optional

name#

Name of the sample, for internal identification, by default None

Type:

str | None, optional

weights#

List of weights for this sample, by default None

Type:

list[float] | None, optional

skip_checks#

Decide, if certain checks are skipped, by default False

Type:

bool, optional

vds_dir#

Directory where virtual datasets will be stored if wildcard is used, by default None. If None, the virtual files will be created in the same directory as the input files.

Type:

Path | str | None, optional

pattern: pathlib.Path | str | tuple[pathlib.Path | str, Ellipsis]#
ntuple_dir: pathlib.Path | str | None = None#
name: str | None = None#
weights: list[float] | None = None#
skip_checks: bool = False#
vds_dir: pathlib.Path | str | None = None#
__post_init__() None#
property path: tuple[pathlib.Path, Ellipsis]#
property files: list[str]#
property num_files: int#
property dsid: list[str]#
property sample_id: list[str]#
property tags: list[str]#
property ptag: list[str]#
property rtag: list[str]#
property dumper_tag: list[str]#
virtual_file(**kwargs) list[pathlib.Path | str]#
__str__()#
__lt__(other)#
__eq__(other)#
class ftag.Transform#

Apply variable name remapping, integer remapping, and float transformations.

The Transform class provides a unified mechanism to perform: - variable renaming (variable_map) - integer value remapping (ints_map) - float transformations (floats_map)

Each transformation is applied to a batch consisting of a dictionary of structured numpy arrays.

variable_map#

A nested mapping where variable_map[group][old] = new specifies how variable names should be renamed inside a given group. If None, no variable renaming is applied.

Type:

dict[str, dict[str, str]]

ints_map#

A nested mapping where ints_map[group][variable][old] = new specifies how integer values should be remapped. If None, no integer remapping is applied.

Type:

dict[str, dict[str, dict[int, int]]]

floats_map#

A nested mapping where floats_map[group][variable] = func specifies a float transformation function. func may either be: - a callable - a string giving the name of a numpy function (e.g. “log”)

Strings are resolved to numpy.<func> automatically.

Type:

dict[str, dict[str, str | Callable]]

variable_map_inv#

Automatically generated inverse of variable_map used for reverse variable lookup in map_variable_names().

Type:

dict[str, dict[str, str]]

variable_map: dict[str, dict[str, str]]#
ints_map: dict[str, dict[str, dict[int, int]]]#
floats_map: dict[str, dict[str, str | collections.abc.Callable]]#
variable_map_inv: dict[str, dict[str, str]]#
__post_init__() None#

Initialize internal maps and convert float transformation strings.

This method ensures that variable_map, ints_map, and floats_map are always dictionaries (never None), constructs the inverse variable map, and converts any string-based float transformations into their numpy equivalents.

__call__(batch: Batch) Batch#

Apply integer remapping, float transformations, and variable renaming.

Parameters:

batch (Batch) – A mapping from group name to structured numpy arrays.

Returns:

The transformed batch.

Return type:

Batch

map_variables(batch: Batch) Batch#

Rename variables in each group according to variable_map.

Parameters:

batch (Batch) – Dictionary mapping group names to structured numpy arrays.

Returns:

The batch with variables renamed where applicable.

Return type:

Batch

map_ints(batch: Batch) Batch#

Remap integer values for specified variables inside each group.

Parameters:

batch (Batch) – Dictionary mapping group names to structured numpy arrays.

Returns:

The batch with integer values remapped.

Return type:

Batch

map_floats(batch: Batch) Batch#

Apply float transformations to selected variables.

Parameters:

batch (Batch) – Dictionary mapping group names to structured numpy arrays.

Returns:

The batch with float transformations applied.

Return type:

Batch

map_dtype(name: str, dtype: numpy.dtype) numpy.dtype#

Compute a new dtype with renamed fields according to variable_map.

Parameters:
  • name (str) – Group name associated with the dtype.

  • dtype (np.dtype) – Structured dtype whose field names may be modified.

Returns:

A dtype with renamed fields where required.

Return type:

np.dtype

Raises:

ValueError – When the variables already exist in the dataset.

map_variable_names(name: str, variables: list[str], inverse: bool = False) list[str]#

Map a list of variable names using variable_map or variable_map_inv.

Parameters:
  • name (str) – Group name used to select the appropriate name-mapping dictionary.

  • variables (list[str]) – List of variable names to be mapped.

  • inverse (bool, optional) – If False (default), apply variable_map. If True, apply the inverse mapping variable_map_inv.

Returns:

A new list of mapped variable names.

Return type:

list[str]