Statistical Data Analysis for High Energy Physics
Dates and times: See here.
Course description:The series of lectures will cover the statistical methods used in searches for new phenomena in a particle physics experiment. Statistical tests will be formally defined and used to quantify the level of agreement between a specified model and the observed data. Specifically, one tries to reject the Standard Model in such a test, as this will indicate the discovery of something new. Even in the absence of a discovery, we would like to say what possible signal models one may exclude by setting limits on their parameters. Several procedures for doing this will be discussed, including CLs, Power-Constrained Limits (PCL), Bayesian, and Feldman-Cousins methods. The lectures will focus on frequentist methods, but the Bayesian approach will be addressed as well. In both cases the role of systematic uncertainties will be emphasized. Computer tutorials will provide a practical exposure to the procedures covered in the lectures.
Lecture Notes (approx. by day and still evolving):
Exercises: Here are the statistics exercises used for the London course (problem sheets 1 to 3 were on computing). We will select from these:
On Tuesday for the computer exercises we will compute discovery and exclusions significances using the code here (1 July 2011, bug fixed in fitPar.cc to allow negative muHat). You can download everything from this tarball. There is a note that describes the mathematics behind the exercises, and more details will be given in the session.
We will also solve some problems using the RooStats package, which is based on RooFit and Root. Here are some useful links:
The directory $ROOTSYS/tutorials/roostats contains many tutorials, e.g., IntervalExamples.C. The file IntervalExamples.cc shows the necessary modification to make this a standalone C++ program, which can be built with the makefile here (download files and type gmake).
SimpleCount.C is a RooStats macro that illustrates the problem of observing n events assumed to follow a Poisson distribution with mean s + b. Here s is the expected number of signal events (the parameter of interest) and b is the expected number of background events. In the present version, b is treated as a constant, and the macro finds for a given observed value of n the p-value of the background-only hypothesis and also an limits on the signal parameter s based on a two-sided test.
SimpleCount2.C adds to the previous example a calculation of the one-sided upper limit by using Monte Carlo. The stand-alone C++ version of this macro is SimpleCount.cc, which can be built with this GNUmakefile.
In the practical sessions we will extend this example to the case where b is not known exactly but is constrained by a measurement m, which is assumed to follow a Poisson distribution with mean tau*b, where the scale factor tau is a known constant.
The full macro that can be used to compute upper limits according to the full frequentist procedure can be found here.
Some material has been adapted from a course for postgraduate students at the University of London. The complete set of lecture notes for that course plus other resources can be found here.
If we have time we may also look in some more depth at multivariate methods, e.g., using the lectures here (shown earlier at CERN and University of Mainz):
You can also download the sections on probability, statistics, and Monte Carlo (pdf files) from the Review of Particle Physics by the Particle Data Group.