Statistical Data Analysis

2021/22 University of London Postgraduate
Lectures for Particle Physicists

University of London MSci PH4515

 

  University of London crest


Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail: g.cowan@rhul.ac.uk

Time & Place: The 2021/22 course take place Mondays 3-6 pm starting 4 October ending 13 December. The lectures are given in person at RHUL in Tolansky 125 and are also live-streamed online with MS Teams. Recordings of the lectures are made available and in addition, videos of last year's lectures can be found below. The core material is presented in the first two hours; the third hour is used for examples and discussion.

Moodle page: U. of London MSc and MSci students should access the course through its RHUL moodle page.

Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515. The purpose of the lectures on probability and statistics is to present the basic mathematical tools needed for the analysis of experimental data. The methods will be practiced by writing and running short computer programs.

Although the examples used in the course often relate to particle physics this is done in a relatively simple way and MSci students from other physics areas should not find this too great a difficulty.

Computing: The statistical methods will be practiced using computer programs in python or C++. Students should have some familiarity with at least one of these languages or be willing to use additional resources to acquire the needed computing skills.

Syllabus: A general outline of the course topics.

Slides and notes from 2021/22:

  • Week 1 slides, discussion notes
  • Week 2 slides, discussion notes
  • Week 3 slides, discussion notes, Monte Carlo code cauchyMC.py, cauchyMC.ipynb.
  • Videos of 2021/22 lectures: Masters students should find these onthe course's moodle page. Otherwise you can access the lectures here (password required).

    Lecture videos and slides from 2020/21:

  • Week 1 slides and videos part 1 (course intro), part 2 (probability), part 3 (interpretation of prob., Bayes' thm.), part 4 (random variables, pdfs).
  • Week 2 slides and videos part 1 (functions of r.v.s), part 2 (expectation values), part 3 (error prop.), part 4 (catalog of distributions, 1).
  • Week 3 slides and videos part 1 (uniform, exponential), part 2 (Gaussian), part 3 (further pdfs), part 4 (Monte Carlo).
  • Week 4 slides and videos part 1 (hypothesis tests), part 2 (example of test), part 3 (test statistic, N-P lemma), part 4 (multivariate methods).
  • Week 5 slides, part 1 (neural nets), part 2 (network training), part 3 (pdf estimation), part 4 (BDTs).
  • Week 6 slides, part 1 (p-values), part 2 (examples of p-values), part 3 (chi-square test), part 4 (parameter estimation).
  • Week 7 slides, part 1 (large-sample MLEs), part 2 (variance of MLEs), part 3 (2-D numerical example), part 4 (Extended ML, Bayesian est.).
  • Week 8 slides, part 1 (method of least squares), part 2 (linear LS, bias and variance), part 3 (goodness of fit with LS), part 4 (LS example, averaging).
  • Week 9 slides, part 1 (LS with histograms), part 2 (LR test, Wilks thm.), part 3 (interval estimation), part 4 (interval from likelihood), histFit.py (for fitting histograms).
  • Week 10 slides, part 1 (Poisson upper limit), part 2 (Jeffreys prior), part 3 (nuisance parameters), part 4 (Bayesian treatment of NPs, MCMC).
  • Week 11 slides, part 1 (Bayes factors), part 2 (Finding marginal likelihoods), part 3 (Errors on errors pt. 1), part 4 (Errors on errors pt. 2), simple program for Student's t average. Lectures 11-3 and 11-4 refer to G. Cowan, Eur. Phys. J. C (2019) 79:133 or arXiv:1809.05778.
  • Revision Session slides (29apr21).
  • Problem sheets: There are 9 problem sheets due on Mondays from weeks 3 through 11. Further info on these can be found in the slides for week 1 and part 1 of the corresponding video.

  • Problem Sheet 1, due 18 October 2021.
  • Problem Sheet 2, due 25 October 2021.
  • Problem Sheet 3, due 1 November 2021. Materials for problem 4 can be found here.
  • Books on statistical methods:

    Books on multivariate methods:

    Some additional notes/resources:

  • The materials from RHUL's year-3 introduction to statistics include a short program simpleFit.py for doing least-squares fits with the python routine curve_fit; also a root/C++ version simpleFit.C.
  • A note on the Jeffreys prior.
  • A note on the Poisson distribution.
  • See Sec. 40.5 of the PDG Statistics Review for a discussion of experimental sensitivity.
  • G. Cowan, Statistical Models with Uncertain Error Parameters, Eur. Phys. J. C (2019) 79:133 or arXiv:1809.05778
  • The "Asimov Paper", aka Asymptotic formulae for likelihood-based tests of new physics, by Cowan, Cranmer, Gross and Vitells, EPJC 71 (2011) 1554. or arXiv:1007.1727 for more on statistical tests for searches.
  • G. Cowan, Topics in statistical data analysis for high energy physics, arXiv:1012.3589 (2010).
  • G. Cowan, Statistics for Searches at the LHC, arXiv:1307.2487 (2013).
  • G. Cowan, Bayes Factors for Discovery (draft note).
  • Lectures at the Galileo Galilei Institute (January 2017) .
  • An introductory paper on Bayesian statistics: G. Cowan, Data analysis: Frequently Bayesian. Physics Today, Vol. 60, No. 4. (2007), pp. 82-3.
  • The sections on probability, statistics, and Monte Carlo from the Review of Particle Physics, P.A. Zyla et al., Prog. Theor. Exp. Phys. 2020, 083C01 (2020), by the Particle Data Group.
  • G. Cowan, A Survey of Unfolding Methods in Particle Physics, in M. Whalley and L. Lyons (eds.), Advanced Statistical Techniques in Particle Physics (Proceedings) Durham, UK, March 18-22, 2002, Conf.Proc.C 0203181 (2002) 248-257.
  • Computing:

  • Some more lectures on statistics I've given:

    Archives -- Statistical Data Analysis old lectures:

    Information on computing setup: Some info on how to log into the RHUL particle physics linux machine linappserv1 from the teaching lab or your own computer is available here.

    Once you have your account on linappserv0 you connect from any other networked linux machine with

    ssh -X username@linappserv0.pp.rhul.ac.uk

    where for "username" you substitute your login name, and then enter your password. You will have been given information on computer security and on how to change your password. It is your responsibility to read and follow these rules.

    The -X qualifier above should allow you to open up an "x-window". You can check this by typing at the prompt

    xclock &

    which should open up a clock in a small window. If it doesn't work, try using -Y or -XY.

    Your default shell is bash. Your account should have in the home directory a file called .bash_profile (check this with ls -la). If it isn't there, you can copy this .bash_profile to your home directory. This defines certain aliases and environment variables automatically when you log in. In particular, it defines the environment variable ROOTSYS, which you need for the ROOT programs we will use.

    You can also copy to your home directory the file .emacs, which will set some defaults for the emacs editor.


    Glen Cowan