Skip to content

Dummy Data#

To test/demonstrate the puma API, we just want to use dummy data.

There are three methods in puma to generate dummy data:

The first function returns directly a pandas.DataFrame including the following columns:

  • HadronConeExclTruthLabelID
  • rnnip_pu
  • rnnip_pc
  • rnnip_pb
  • dips_pu
  • dips_pc
  • dips_pb

which can be used in the following manner:

from puma.utils import get_dummy_2_taggers

df = get_dummy_2_taggers()

The second function is get_dummy_multiclass_scores which returns an output array with shape (size, 3), which is the usual output of our multi-class classifiers like DIPS, and the labels conform with the HadronConeExclTruthLabelID variable.

from puma.utils import get_dummy_multiclass_scores

output, labels = get_dummy_multiclass_scores()

Finally, the get_dummy_tagger_aux function returns a h5 file with both jet and track collections (needed for aux task plots). These include the following columns (aux task information is generated for both vertexing and track origin classification):

jets:

  • HadronConeExclTruthLabelID
  • GN2_pu
  • GN2_pc
  • GN2_pb
  • pt
  • eta
  • n_truth_promptLepton

tracks:

  • ftagTruthVertexIndex
  • GN2_VertexIndex
  • ftagTruthOriginLabel
  • GN2_TrackOrigin

which can be used in the following manner:

from puma.utils import get_dummy_tagger_aux

df = get_dummy_tagger_aux()