Statistical Data Analysis

2018/2019 University of London Postgraduate
Lectures for Particle Physicists

University of London MSci PH4515


  University of London crest

Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail:

* Revision sessions: There will be two revision sessions:

  • Wednesday 1 May 2019, 15:00 - 17:00 in Senate House, Malet Street, London WC1E 7HU, room 102,

  • Friday 3 May 2019, 15:00 - 17:00 in Tolansky 125 at RHUL.

  • Senate House is near UCL -- here is a map. The Tolansky building is number 21 on the RHUL map.

    You are are welcome to attend either or both of the revision sessions (their content will be essentially the same).

    Time & Place: The lectures take place at UCL, Mondays 3:00 to 6:00, starting on 1 October 2018.

    Lecture location: UCL, Chandler House G10. Here is a map.

    Course structure: For 2018/19, as last year, the computing element of the course will not be assessed. Nevertheless, some of the statistical methods will be practiced using computer programs, primarily in C++. For those students without a background in C++, additional tuition will be provided. Starting this year it will also be allowed to carry out the coursework in python, but less support for this will be provided.

    The main lectures on Statistical Data Analysis will be from 3:00 to 5:00. For the first 6 weeks, the hour from 5:00 to 6:00 will be used to cover the basics of C++. There will be no assessed work on C++ per se, but it (or, optionally, python) will be used in the statistics coursework later on. From week 7, the hour from 5:00 to 6:00 will be used to review the coursework problems and provide an oportunity for additional examples and discussion. As in previous years, the exam will only cover statistics (no C++).

    If you are a non-RHUL MSc or MSci student (i.e., from UCL, KCL or QMUL), then to be enrolled for credit in the course you need to fill in this form. Section D must be completed with two signatures and a College stamp.

    Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515. The purpose of the lectures on probability and statistics is to present the basic mathematical tools needed for the analysis of experimental data. The methods will be practiced by writing and running short computer programs.

    Although the examples used in the course often relate to particle physics this is done in a relatively simple way and MSci students from other physics areas should not find this too great a difficulty.

    Syllabus: A general outline of the course topics.

    Problem sheets: The coursework will be due on the days of our lectures so you can hand it in then (on paper). Please write clearly on the top of the page your name, college, and degree programme (MSci, MSc or PhD).

    Late or emailed coursework submissions are only allowed in case of exceptional circumstances and if agreed by the lecturer. If an email submission is agreed, the entire assignment should be contained in a single pdf attachment with all of the relevant information (name, degree programme, College), and the subect line must include the words "Statistics Problem Sheet". Please do not put high-res colour photos into the pdf; use iScanner or similar app if you need to make a pdf using your phone.

  • Problem Sheet 1, due 15 October 2018.
  • Problem Sheet 2, due 22 October 2018.
  • Problem Sheet 3, due 29 October 2018. Materials for problem 3 can be found here.
  • Problem Sheet 4, due 5 November 2018. Materials for problems 1 and 2 can be found here.
  • Problem Sheet 5, due 12 November 2018. You will need the programs here (see also the file readme.txt). You can get all of the files in the tarball here. Or if you want to use python, you can start with the code here.
  • Problem Sheet 6, due 19 November 2018.
  • Problem Sheet 7, due 26 November 2018. For problem 2 you need the programs makeData and expFit (download the files and type gmake). For an option to do the problem using python see the program here, which is based on the package iminuit.
  • Problem Sheet 8, due 3 December 2018. You will need the root macro simpleFit.C and the related files here , or if you prefer to use python use the code here.
  • Problem Sheet 9, due 10 December 2018.
  • Lecture notes:

  • Statistical Data Analysis:

  • Computing:

    More notes, books, etc.: The statistics lectures will mainly follow

    G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford, 1998.

    This book has its own web site, which contains various data analysis resources. Also useful are:

    R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley, 1989;
    Frederick James, Statistical Methods in Experimental Physics, 2nd edition, World Scientific 2006;
    S.Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998;
    Ilya Narsky and Frank Porter, Statistical Analysis Techniques in Particle Physics, Wiley, 2013.
    L.Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986.

    Books on multivariate methods:

    Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
    T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, 2nd edition, Springer, 2009.
    Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, An Introduction to Statistical Learning with Applications in R: free book and lectures.

    You can also download the sections on probability, statistics, and Monte Carlo from the Review of Particle Physics, M. Tanabashi et al. (Particle Data Group), Phys. Rev. D 98, 030001 (2018), by the Particle Data Group.

    Here is an introductory paper on Bayesian statistics: G. Cowan, Data analysis: Frequently Bayesian. Physics Today, Vol. 60, No. 4. (2007), pp. 82-3.

    C++: For computing there are many other web based references, e.g.,

    Lecture notes on C++ by Philip Blakely (Cambridge).
    Adrian Bevan's computing lectures (part of the London HEP lecture programme).
    Rob Miller's C++ Course (Imperial)
    A C++ online reference with tutorials, etc.,
    Another C++ online reference:
  • Some more lectures on statistics I've given:

    Archives: The archived course page for the 2003 lectures. Materials from the 2003 data analysis tutorial can be found here.

    Information on computing setup: Some info on how to log into the RHUL particle physics linux machine linappserv1 from the teaching lab or your own computer is available here.

    Once you have your account on linappserv1 you connect from any other networked linux machine with

    ssh -X

    where for "username" you substitute your login name, and then enter your password. You will have been given information on computer security and on how to change your password. It is your responsibility to read and follow these rules.

    Your default shell is bash. Your account should have in the home directory a file called .bash_profile (check this with ls -la). If it isn't there, you can copy this .bash_profile to your home directory. This defines certain aliases and environment variables automatically when you log in. In particular, it defines the environment variable ROOTSYS, which you need for the ROOT programs we will use.

    You can also copy to your home directory the file .emacs, which will set some defaults for the emacs editor.

    Glen Cowan