wiki:DatabaseDiscussions

Version 5 (modified by dgrell, 10 years ago) (diff)

--

Database discussions

Separation into two sub-packages: Toolchain Tagging and Model Database

Model Database

One entry in the model DB contains:

  • unique tag, analog to arXiv: moDel:1003.0123
  • title
  • author list
  • free text abstract
  • a classification of the model. Should not be hierarchical, since this makes multiple attachments hard. Options are
    • entirely free tagging
    • choice from a pre-set list of tags
  • model description in terms of the (versioned) input file for a specific (versioned) package
    • this combination should also already have a tool chain ID
  • list of validations that have been performed.
    • This format is described below
  • Thumbnails of plots of MC comparisons, including the tool chain ID that was used to make them
  • pointer to the main arXiv paper
  • pointers to downstream papers
  • a "supersedes moDel:0909.3210" entry
  • a sharing licence (maybe pre-defined such that only models adhering to a certain licence can enter)

features of the interaction:

  • allow for a 24h period for corrections at initial upload
  • authority to modify entries lies with author
    • author can grant modification rights to validators
    • author can relinquish access rights
    • automatic opening after a time limit / max idle time?
  • a system for citation handling?

possible uses:

QUESTIONANSWER
specific BSM modelwhich papers refer to it, which validations have been done
specific particle (maybe PDG code)which models have it

Validation

Proposal for 4-Star ratings:

  1. Documentation
  2. Theory
  3. 1 MC verification
  4. n MC verifications

Toolchain tagging

A toolchain is an arbitrarily large list of versioned software together with actual copies of their parameter settings files. It should be specific enough to allow fully scripted reconstruction of the whole chain's workflow.

An central interactive system which

  • assigns a unique, unchangeable tag for a submitted toolchain, and
  • allows retrieval of the toolchain from that tag.

or

  • decentral user-specific hashing - problem: retrieval of hashed chain

necessary features:

  • a tag for a shorter toolchain can stand as an abbreviation of one part of a longer chain
    • User A uses a LHE input file with tag XYZ, generated earlier by someone else. Without needing to know the actual sequence of tools represented by XYZ, a new tag can be generated for the extended workflow.

possible uses:

  • unique identification of reproducible workflows
  • inclusion in LHE header blocks, can substitute for explicit parameter listings
  • inclusion in Model database, to refer uniquely to
    • a given model-card/tool combination
    • the workflows that have created the validation plots/tables
  • Anywhere that a detailed specification of a set of tools and inputs is required

questions / answers

  • All chains starting with mSUGRA
  • "which tags contain Herwig-2.4.2 ?"
  • Given Spires ID of experimental paper, which chains were used in the paper