Statistical Data Analysis

2020/2021 University of London Postgraduate
Lectures for Particle Physicists

University of London MSci PH4515

 

  University of London crest


Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail: g.cowan@rhul.ac.uk

Time & Place: The 2020/21 lectures take place online (videos below; discussion sessions Mondays 11, 3 or 5).

Moodle page: U. of London MSc and MSci students should access the course through its RHUL moodle page.

Course structure: For 2020/21, as in recent years, the computing element of the course will not be assessed. Nevertheless, some of the statistical methods will be practiced using computer programs in python or C++. Students should have some familiarity with at least one of these languages or be willing to use additional resources to acquire the needed computing skills. In contrast to previous years, there will be no specific tuition provided in C++.

Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515. The purpose of the lectures on probability and statistics is to present the basic mathematical tools needed for the analysis of experimental data. The methods will be practiced by writing and running short computer programs.

Although the examples used in the course often relate to particle physics this is done in a relatively simple way and MSci students from other physics areas should not find this too great a difficulty.

Syllabus: A general outline of the course topics.

Lecture videos and slides:

  • Week 1 slides and videos part 1 (course intro), part 2 (probability), part 3 (interpretation of prob., Bayes' thm.), part 4 (random variables, pdfs).
  • Week 2 slides and videos part 1 (functions of r.v.s), part 2 (expectation values), part 3 (error prop.), part 4 (catalog of distributions, 1), discussion session notes (12oct20).
  • Week 3 slides and videos part 1 (uniform, exponential), part 2 (Gaussian), part 3 (further pdfs), part 4 (Monte Carlo), discussion session notes (19oct20) and sample code cauchyMC.py, cauchyMC.ipynb.
  • Week 4 slides and videos part 1 (hypothesis tests), part 2 (example of test), part 3 (test statistic, N-P lemma), part 4 (multivariate methods), discussion session notes (26oct20).
  • Week 5 slides, part 1 (neural nets), part 2 (network training), part 3 (pdf estimation), part 4 (BDTs), discussion session notes (2nov20).
  • Week 6 slides, part 1 (p-values), part 2 (examples of p-values), part 3 (chi-square test), part 4 (parameter estimation), discussion session notes (9nov20).
  • Week 7 slides, part 1 (large-sample MLEs), part 2 (variance of MLEs), part 3 (2-D numerical example), part 4 (Extended ML, Bayesian est.) discussion session notes (16nov20).
  • Week 8 slides, part 1 (method of least squares), part 2 (linear LS, bias and variance), part 3 (goodness of fit with LS), part 4 (LS example, averaging), discussion session notes (23nov20).
  • Week 9 slides, part 1 (LS with histograms), part 2 (LR test, Wilks thm.), part 3 (interval estimation), part 4 (interval from likelihood), histFit.py (for fitting histograms), discussion session notes (30nov20).
  • Week 10 slides, part 1 (Poisson upper limit), part 2 (Jeffreys prior), part 3 (nuisance parameters), part 4 (Bayesian treatment of NPs, MCMC), discussion session notes (7dec20).
  • Week 11 slides, part 1 (Bayes factors), part 2 (Finding marginal likelihoods), part 3 (Errors on errors pt. 1), part 4 (Errors on errors pt. 2), discussion session notes (14dec20), simple program for Student's t average. Lectures 11-3 and 11-4 refer to G. Cowan, Eur. Phys. J. C (2019) 79:133 or arXiv:1809.05778.
  • Revision Session slides (29apr21).
  • Problem sheets: There are 9 problem sheets due on Mondays from weeks 3 through 11. Further info on these can be found in the slides for week 1 and part 1 of the corresponding video.

  • Problem Sheet 1, due 19 October 2020.
  • Problem Sheet 2, due 26 October 2020.
  • Problem Sheet 3, due 2 November 2020. Materials for problem 3 can be found here.
  • Problem Sheet 4, due 9 November 2020. Materials for problems 1 and 2 can be found here.
  • Problem Sheet 5, due 16 November 2020. If you want to use scikit-learn (python), start with the code here, or to use TMVA (C++) look here
  • Problem Sheet 6, due 23 November 2020.
  • Problem Sheet 7, due 30 November 2020. For the warm-up problem 2 here are files to use iminuit with python or tminuit with root.
  • Problem Sheet 8, due 7 December 2020. The exercise uses the routine mlFit with iminuit/python or TMinuit/root.
  • Problem Sheet 9, due 14 December 2020; solutions.
  • Books on statistical methods:

    Books on multivariate methods:

    Some additional notes/resources:

  • The materials from RHUL's year-3 introduction to statistics include a short program simpleFit.py for doing least-squares fits with the python routine curve_fit; also a root/C++ version simpleFit.C.
  • A note on the Jeffreys prior.
  • A note on the Poisson distribution.
  • See Sec. 40.5 of the PDG Statistics Review for a discussion of experimental sensitivity.
  • G. Cowan, Statistical Models with Uncertain Error Parameters, Eur. Phys. J. C (2019) 79:133 or arXiv:1809.05778
  • The "Asimov Paper", aka Asymptotic formulae for likelihood-based tests of new physics, by Cowan, Cranmer, Gross and Vitells, EPJC 71 (2011) 1554. or arXiv:1007.1727 for more on statistical tests for searches.
  • G. Cowan, Topics in statistical data analysis for high energy physics, arXiv:1012.3589 (2010).
  • G. Cowan, Statistics for Searches at the LHC, arXiv:1307.2487 (2013).
  • G. Cowan, Bayes Factors for Discovery (draft note).
  • Lectures at the Galileo Galilei Institute (January 2017) .
  • An introductory paper on Bayesian statistics: G. Cowan, Data analysis: Frequently Bayesian. Physics Today, Vol. 60, No. 4. (2007), pp. 82-3.
  • The sections on probability, statistics, and Monte Carlo from the Review of Particle Physics, P.A. Zyla et al., Prog. Theor. Exp. Phys. 2020, 083C01 (2020), by the Particle Data Group.
  • G. Cowan, A Survey of Unfolding Methods in Particle Physics, in M. Whalley and L. Lyons (eds.), Advanced Statistical Techniques in Particle Physics (Proceedings) Durham, UK, March 18-22, 2002, Conf.Proc.C 0203181 (2002) 248-257.
  • Computing:

  • Some more lectures on statistics I've given:

    Archives -- Statistical Data Analysis old lectures:

    Information on computing setup: Some info on how to log into the RHUL particle physics linux machine linappserv1 from the teaching lab or your own computer is available here.

    Once you have your account on linappserv3 you connect from any other networked linux machine with

    ssh -X username@linappserv3.pp.rhul.ac.uk

    where for "username" you substitute your login name, and then enter your password. You will have been given information on computer security and on how to change your password. It is your responsibility to read and follow these rules.

    The -X qualifier above should allow you to open up an "x-window". You can check this by typing at the prompt

    xclock &

    which should open up a clock in a small window. If it doesn't work, try using -Y or -XY.

    Your default shell is bash. Your account should have in the home directory a file called .bash_profile (check this with ls -la). If it isn't there, you can copy this .bash_profile to your home directory. This defines certain aliases and environment variables automatically when you log in. In particular, it defines the environment variable ROOTSYS, which you need for the ROOT programs we will use.

    You can also copy to your home directory the file .emacs, which will set some defaults for the emacs editor.


    Glen Cowan