DESY 2011 Terascale School

Statistical Methods for Discovery and Limits

Workshop on Data Combination and Limit Setting 2011

Glen Cowan, Physics Department, Royal Holloway, University of London, e-mail: g.cowan@rhul.ac.uk

Dates and times: See here.

Course description:The series of lectures will cover the statistical methods used in searches for new phenomena in a particle physics experiment. Statistical tests will be formally defined and used to quantify the level of agreement between a specified model and the observed data. Specifically, one tries to reject the Standard Model in such a test, as this will indicate the discovery of something new. Even in the absence of a discovery, we would like to say what possible signal models one may exclude by setting limits on their parameters. Several procedures for doing this will be discussed, including CLs, Power-Constrained Limits (PCL), Bayesian, and Feldman-Cousins methods. The lectures will focus on frequentist methods, but the Bayesian approach will be addressed as well. In both cases the role of systematic uncertainties will be emphasized. Computer tutorials will provide a practical exposure to the procedures covered in the lectures.

Lecture Notes:

Computer Exercises: Some standalone C++ code to compute discovery and exclusions significances is here. You can download everything from this tarball. There is a note that describes the mathematics behind the exercises, and more details will be given in the session.

You can also try using the routine runSigCalc_MC.cc instead of runSigCalc.cc (edit the makefile to link the new one). This routine will calculate the distribution of qmu using Monte Carlo and from this it finds the p-value. By finding pmu versus mu one can find the value of mu where pmu = 5%, which gives the limit.

RooStats: We will may also have time to solve some problems using the RooStats package, which is based on RooFit and Root. Here are some useful links:

The RooStats Wiki.

Information on RooFit

The RooStats class definitions

The RooFit pdf class definitions

The RooFit core class definitions

The directory $ROOTSYS/tutorials/roostats contains many tutorials, e.g., IntervalExamples.C. The file IntervalExamples.cc shows the necessary modification to make this a standalone C++ program, which can be built with the makefile here (download files and type gmake).

SimpleCount.C is a RooStats macro that illustrates the problem of observing n events assumed to follow a Poisson distribution with mean s + b. Here s is the expected number of signal events (the parameter of interest) and b is the expected number of background events. In the present version, b is treated as a constant, and the macro finds for a given observed value of n the p-value of the background-only hypothesis and also an limits on the signal parameter s based on a two-sided test.

SimpleCount2.C adds to the previous example a calculation of the one-sided upper limit by using Monte Carlo. The stand-alone C++ version of this macro is SimpleCount.cc, which can be built with this GNUmakefile.

In the practical sessions we will extend this example to the case where b is not known exactly but is constrained by a measurement m, which is assumed to follow a Poisson distribution with mean tau*b, where the scale factor tau is a known constant.

The full macro that can be used to compute upper limits according to the full frequentist procedure can be found here.

Some material has been adapted from a course for postgraduate students at the University of London. The complete set of lecture notes for that course plus other resources can be found here.

Glen Cowan