Freiburg 2011 Statistics for HEP

Statistical Data Analysis for High Energy Physics

Graduiertenkolleg der Uni Freiburg "Physik an Hadron-Beschleunigern"

Glen Cowan, Physics Department, Royal Holloway, University of London, e-mail: g.cowan@rhul.ac.uk

Dates and times: See here.

Course description:The series of lectures will cover the statistical methods used in searches for new phenomena in a particle physics experiment. Statistical tests will be formally defined and used to quantify the level of agreement between a specified model and the observed data. Specifically, one tries to reject the Standard Model in such a test, as this will indicate the discovery of something new. Even in the absence of a discovery, we would like to say what possible signal models one may exclude by setting limits on their parameters. Several procedures for doing this will be discussed, including CLs, Power-Constrained Limits (PCL), Bayesian, and Feldman-Cousins methods. The lectures will focus on frequentist methods, but the Bayesian approach will be addressed as well. In both cases the role of systematic uncertainties will be emphasized. Computer tutorials will provide a practical exposure to the procedures covered in the lectures.

Lecture Notes (approx. by day and still evolving):

Here is a short draft note discussing two alternative procedures for treating nuisance parameters: marginalization or profiling

Exercises: Here are the statistics exercises used for the London course (problem sheets 1 to 3 were on computing). We will select from these:

Problem sheet 4: ps, pdf

Problem sheet 5: ps, pdf

Problem sheet 6: ps, pdf. The materials for this problem sheet can be found here.

Problem sheet 7: ps, pdf For problem 3 you need the programs here (see also the file readme.txt).

Problem sheet 8: ps, pdf For problem 2 you need the programs makeData and expFit (download the files and type gmake).

Optional problem: look at simpleFit.C and the related files here . simpleFit.C is a simple root macro for doing a least-squares fit of a user-supplied function to a set of x,y points, which are read from a file. Try to run the fit, and then try modifying the fit function (e.g., change the order of the polynomial).

On Tuesday for the computer exercises we will compute discovery and exclusions significances using the code here (1 July 2011, bug fixed in fitPar.cc to allow negative muHat). You can download everything from this tarball. There is a note that describes the mathematics behind the exercises, and more details will be given in the session.

RooStats:

We will also solve some problems using the RooStats package, which is based on RooFit and Root. Here are some useful links:

The RooStats Wiki.

Information on RooFit

The RooStats class definitions

The RooFit pdf class definitions

The RooFit core class definitions

The directory $ROOTSYS/tutorials/roostats contains many tutorials, e.g., IntervalExamples.C. The file IntervalExamples.cc shows the necessary modification to make this a standalone C++ program, which can be built with the makefile here (download files and type gmake).

SimpleCount.C is a RooStats macro that illustrates the problem of observing n events assumed to follow a Poisson distribution with mean s + b. Here s is the expected number of signal events (the parameter of interest) and b is the expected number of background events. In the present version, b is treated as a constant, and the macro finds for a given observed value of n the p-value of the background-only hypothesis and also an limits on the signal parameter s based on a two-sided test.

SimpleCount2.C adds to the previous example a calculation of the one-sided upper limit by using Monte Carlo. The stand-alone C++ version of this macro is SimpleCount.cc, which can be built with this GNUmakefile.

In the practical sessions we will extend this example to the case where b is not known exactly but is constrained by a measurement m, which is assumed to follow a Poisson distribution with mean tau*b, where the scale factor tau is a known constant.

The full macro that can be used to compute upper limits according to the full frequentist procedure can be found here.

Some material has been adapted from a course for postgraduate students at the University of London. The complete set of lecture notes for that course plus other resources can be found here.

If we have time we may also look in some more depth at multivariate methods, e.g., using the lectures here (shown earlier at CERN and University of Mainz):

Some books:

G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford, 1998.

R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley, 1989;

Frederick James, Statistical Methods in Experimental Physics, 2nd Edition, World Scientific, 2006;

S.Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998;

L.Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986.

You can also download the sections on probability, statistics, and Monte Carlo (pdf files) from the Review of Particle Physics by the Particle Data Group.

Glen Cowan