Statistical Data Analysis

2017/2018 University of London Postgraduate
Lectures for Particle Physicists

University of London MSci PH4515


  University of London crest

Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail:

* Revision sessions: There will be two revision sessions:

  • Friday 20 April 2018, 15:00 - 17:00 in Tolansky 125 at RHUL.

  • Tuesday 24 April 2018, 15:00 - 17:00 in Senate House, Malet Street, London WC1E 7HU, room 102.

  • Senate House is near UCL -- here is a map.

    You are are welcome to attend either or both of the revision sessions (their content will be essentially the same).

    Time & Place: The lectures take place at UCL, Mondays 3:00 to 6:00, starting on 2 October 2017.

    Lecture location: UCL, Chandler House G10. Here is a map.

    Course structure: For 2017/18, as last year, the computing element of the course will not be assessed. Nevertheless, some of the statistical methods will be practiced using C++ programs. For those students without a background in C++, additional tuition will be provided.

    The main lectures on Statistical Data Analysis will be from 3:00 to 5:00. For the first 6 weeks, the hour from 5:00 to 6:00 will be used to cover the basics of C++. There will be no assessed work on C++ per se, but it will be used in the statistics coursework later on. From week 7, the hour from 5:00 to 6:00 will be used to review the coursework problems and provide an oportunity for additional examples and discussion. As in previous years, the exam will only cover statistics (no C++).

    Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515. The purpose of the lectures on probability and statistics is to present the basic mathematical tools needed for the analysis of experimental data. The methods will be practiced by writing and running short computer programs.

    Although the examples used in the course often relate to particle physics this is done in a relatively simple way and MSci students from other physics areas should not find this too great a difficulty.

    Syllabus: A general outline of the course topics.

    Problem sheets: The coursework will be due on the days of our lectures so you can hand it in then (on paper). Please write clearly on the top of the page your name, college, and degree programme (MSci, MSc or PhD). Late or emailed coursework submissions are only allowed in case of exceptional circumstances and if agreed by the lecturer. If an email submission is agreed, the entire assignment should be contained in a single pdf attachment with all of the relevant information (including your name!).

  • Problem Sheet 1, due 16 October 2017.
  • Problem Sheet 2, due 23 October 2017.
  • Problem Sheet 3, due 30 October 2017. Materials for problem 3 can be found here.
  • Problem Sheet 4, due 6 November 2017. Materials for problems 1 and 2 can be found here.
  • Problem Sheet 5, due 13 November 2017. You will need the programs here (see also the file readme.txt). You can get all of the files in the tarball here
  • Problem Sheet 6, due 20 November 2017.
  • Problem Sheet 7, due 27 November 2017. For problem 2 you need the programs makeData and expFit (download the files and type gmake).
  • Problem Sheet 8, due 4 December 2017. You will need the root macro simpleFit.C and the related files here .
  • Problem Sheet 9, due 11 December 2017.
  • Lecture notes:

  • Statistical Data Analysis:

  • Computing:

    More notes, books, etc.: The statistics lectures will mainly follow

    G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford, 1998.

    This book has its own web site, which contains various data analysis resources. Also useful are:

    R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley, 1989;
    Frederick James, Statistical Methods in Experimental Physics, 2nd edition, World Scientific 2006;
    S.Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998;
    Ilya Narsky and Frank Porter, Statistical Analysis Techniques in Particle Physics, Wiley, 2013.
    L.Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986.

    Books on multivariate methods:

    Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
    T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, 2nd edition, Springer, 2009.

    You can also download the sections on probability, statistics, and Monte Carlo from the Review of Particle Physics (K.A. Olive et al., Chin. Phys. C, 38, 090001, 2014) by the Particle Data Group.

    Here is an introductory paper on Bayesian statistics: G. Cowan, Data analysis: Frequently Bayesian. Physics Today, Vol. 60, No. 4. (2007), pp. 82-3.

    C++: For computing there are many other web based references, e.g.,

    Adrian Bevan's computing lectures (part of the London HEP lecture programme).
    Rob Miller's C++ Course (Imperial)
    A C++ online reference with tutorials, etc.,
    Another C++ online reference:
  • Some more lectures on statistics I've given:

    Archives: The archived course page for the 2003 lectures. Materials from the 2003 data analysis tutorial can be found here.

    Information on computing setup: Some info on how to log into the RHUL particle physics linux machine linappserv1 from the teaching lab or your own computer is available here.

    Once you have your account on linappserv1 you connect from any other networked linux machine with

    ssh -X

    where for "username" you substitute your login name, and then enter your password. You will have been given information on computer security and on how to change your password. It is your responsibility to read and follow these rules.

    Your default shell is bash. Your account should have in the home directory a file called .bash_profile (check this with ls -la). If it isn't there, you can copy this .bash_profile to your home directory. This defines certain aliases and environment variables automatically when you log in. In particular, it defines the environment variable ROOTSYS, which you need for the ROOT programs we will use.

    You can also copy to your home directory the file .emacs, which will set some defaults for the emacs editor.

    Glen Cowan