ftag.find_metadata#

Attributes#

Functions#

validate_url_scheme(→ urllib.parse.ParseResult)

Validate the scheme of a given URL, ensuring it is http or https.

download_xsecdb_files(→ None)

Download the PMG xsecDB files from CERN if they are not present locally.

extract_taskid_from_filename(→ str | None)

Extract the BigPanDA Task ID (8-digit) from an HDF5 filename.

fetch_taskinfo_from_bigpanda(→ dict[str, Any] | None)

Fetch task information from BigPanDA for a given Task ID.

extract_mc_container_from_json(→ str | None)

Extract the MC container name (e.g., mc16_13TeV.<something>) from a task JSON.

parse_line_from_taskname(→ tuple[int | None, str | None])

Extract DSID and etag from a task name string.

parse_campaign_from_taskname(→ str | None)

Derive campaign (mc15/mc16/etc.) from a task or container name.

extract_info_from_container(→ tuple[int, str, str] | None)

Extract DSID, etag, and campaign name from a container string.

query_xsecdb(→ dict[str, Any] | None)

Look up cross-section metadata in the PMG xsecDB.

write_metadata_to_h5(→ None)

Write metadata values into an HDF5 file under metadata/<DSID>.

handle_yaml_fallback(→ None)

Use fallback metadata from YAML if automatic lookup fails.

parse_args_and_yaml(→ tuple[list[str], dict[str, Any]])

Parse CLI arguments and load YAML metadata if provided.

process_single_file(→ None)

Process a single .h5 file by attempting BigPanDA lookup, then fallback to YAML.

main(→ None)

Entry point: parse arguments, download xsecDBs, process each file, and clean up.

Module Contents#

ftag.find_metadata.XSECDB_MAP: dict[str, str]#
ftag.find_metadata.XSECDB_URL_BASE: str = 'https://atlas-groupdata.web.cern.ch/atlas-groupdata/dev/PMGTools/'#
ftag.find_metadata.validate_url_scheme(url: str) urllib.parse.ParseResult#

Validate the scheme of a given URL, ensuring it is http or https.

Parameters:

url (str) – URL string to validate.

Returns:

Parsed URL object.

Return type:

ParseResult

Raises:

ValueError – If the URL scheme is not supported.

ftag.find_metadata.download_xsecdb_files() None#

Download the PMG xsecDB files from CERN if they are not present locally.

ftag.find_metadata.extract_taskid_from_filename(h5_path: pathlib.Path) str | None#

Extract the BigPanDA Task ID (8-digit) from an HDF5 filename.

Parameters:

h5_path (Path) – Path object pointing to the .h5 file.

Returns:

The Task ID as a string if found, otherwise None.

Return type:

str | None

ftag.find_metadata.fetch_taskinfo_from_bigpanda(taskid: str) dict[str, Any] | None#

Fetch task information from BigPanDA for a given Task ID.

Parameters:

taskid (str) – BigPanDA task ID.

Returns:

Task info as a dictionary if found, otherwise None.

Return type:

dict[str, Any] | None

ftag.find_metadata.extract_mc_container_from_json(data: dict[str, Any]) str | None#

Extract the MC container name (e.g., mc16_13TeV.<something>) from a task JSON.

Parameters:

data (dict[str, Any]) – Task info dictionary from BigPanDA.

Returns:

The container string if found, otherwise None.

Return type:

str | None

ftag.find_metadata.parse_line_from_taskname(taskname: str) tuple[int | None, str | None]#

Extract DSID and etag from a task name string.

Parameters:

taskname (str) – Full task name.

Returns:

A tuple of (DSID as int, etag as string), or (None, None) if not found.

Return type:

tuple[int | None, str | None]

ftag.find_metadata.parse_campaign_from_taskname(taskname: str) str | None#

Derive campaign (mc15/mc16/etc.) from a task or container name.

Parameters:

taskname (str) – The name string.

Returns:

Campaign string, or None if not found.

Return type:

str | None

ftag.find_metadata.extract_info_from_container(container: str) tuple[int, str, str] | None#

Extract DSID, etag, and campaign name from a container string.

Parameters:

container (str) – The MC container string.

Returns:

A tuple of (DSID, etag, campaign), or None if parsing fails.

Return type:

tuple[int, str, str] | None

ftag.find_metadata.query_xsecdb(campaign: str, dsid: int, etag: str) dict[str, Any] | None#

Look up cross-section metadata in the PMG xsecDB.

Parameters:
  • campaign (str) – Campaign name (e.g., mc16).

  • dsid (int) – Dataset ID.

  • etag (str) – Event tag.

Returns:

Dictionary with cross_section_pb, genFiltEff, kfactor, and etag if found, otherwise None.

Return type:

dict[str, Any] | None

ftag.find_metadata.write_metadata_to_h5(h5_filename: str, dsid: int, metadata_dict: dict[str, Any]) None#

Write metadata values into an HDF5 file under metadata/<DSID>.

Parameters:
  • h5_filename (str) – Target HDF5 file.

  • dsid (int) – Dataset ID to write metadata for.

  • metadata_dict (dict[str, Any]) – Dictionary of metadata to inject.

ftag.find_metadata.handle_yaml_fallback(h5_path: pathlib.Path, yaml_data: dict[str, Any]) None#

Use fallback metadata from YAML if automatic lookup fails.

Parameters:
  • h5_path (Path) – Path to the HDF5 file.

  • yaml_data (dict[str, Any]) – Metadata dictionary loaded from YAML.

Raises:

ValueError – If YAML is invalid, empty, or missing required fields.

ftag.find_metadata.parse_args_and_yaml() tuple[list[str], dict[str, Any]]#

Parse CLI arguments and load YAML metadata if provided.

Returns:

A tuple of (list of HDF5 file paths, YAML metadata dict).

Return type:

tuple[list[str], dict[str, Any]]

ftag.find_metadata.process_single_file(path: pathlib.Path, yaml_data: dict[str, Any]) None#

Process a single .h5 file by attempting BigPanDA lookup, then fallback to YAML.

Parameters:
  • path (Path) – Path to the HDF5 file.

  • yaml_data (dict[str, Any]) – Optional fallback metadata.

ftag.find_metadata.main() None#

Entry point: parse arguments, download xsecDBs, process each file, and clean up.