Statistical Data Analysis

2014/2015 University of London Postgraduate
Lectures for Particle Physicists

University of London MSci PH4515

 

  University of London crest


Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail: g.cowan@rhul.ac.uk

* Time & Place: 3 November 2014, lectures are 3-6 in the Massey Lecture Theatre, here (future situation to be confirmed). Old info for reference: The lectures take place at UCL, Mondays 3:00 to 6:00, starting on 29 September 2014, UCL Physics/Union Building D103. this is on the first floor of Union (see the map here, ref. D1).

Change to course structure from 2014/15: From this year, the computing element of the course will not be assessed. Nevertheless, some of the statistical methods will be practiced using C++ programs. For those students without a background in C++, additional tuition will be provided.

The main lectures on Statistical Data Analysis will be from 3:00 to 5:00. For the first 6 weeks, the hour from 5:00 to 6:00 will be used to cover the basics of C++. There will be no assessed work on C++ per se, but it will be used in the statistics coursework later on. From week 7, the hour from 5:00 to 6:00 will be used to review the coursework problems and provide an oportunity for additional examples and discussion. As in previous years, the exam will only cover statistics (no C++).

Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515. The purpose of the lectures on probability and statistics is to present the basic mathematical tools needed for the analysis of experimental data. The methods will be practiced by writing and running short computer programs.

Although the examples used in the course often relate to particle physics this is done in a relatively simple way and MSci students from other physics areas should not find this too great a difficulty.

Syllabus: A general outline of the course topics.

Problem sheets: The coursework will be due on the days of our lectures so you can hand it in then (on paper). Please write clearly on the top of the page your name, college, and degree programme (MSci, MSc or PhD). Late or emailed coursework submissions are only allowed in case of exceptional circumstances and if agreed by the lecturer. If an email submission is agreed, the entire assignment should be contained in a single pdf attachment with all of the relevant information (including your name!).

  • Problem Sheet 1, due 13 October 2014.
  • Problem Sheet 2, due 20 October 2014.
  • Problem Sheet 3, due 27 October 2014.
  • Problem Sheet 4, due 3 November 2014. You will need the programs here (see also the file readme.txt). You can get all of the files in the tarball here
  • Problem Sheet 5, optional, aim to turn in 10 November 2014.
  • Problem Sheet 6, due 17 November 2014. .
  • Problem Sheet 7, due 23 November 2014. For problem 2 you need the programs makeData and expFit (download the files and type gmake).
  • Problem Sheet 8, due 30 November 2014. For problem 2 you need the root macro simpleFit.C and the related files here .
  • Lecture Notes:

  • Statistical Data Analysis:

  • Computing:

    More notes, books, etc.: The statistics lectures will mainly follow

    G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford, 1998.

    This book has its own web site, which contains various data analysis resources. Also useful are:

    R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley, 1989;
    Frederick James, Statistical Methods in Experimental Physics, 2nd edition, World Scientific 2006;
    S.Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998;
    Ilya Narsky and Frank Porter, Statistical Analysis Techniques in Particle Physics, Wiley, 2013.
    L.Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986.

    Books on multivariate methods:

    Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
    T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, 2nd edition, Springer, 2009.

    You can also download the sections on probability, statistics, and Monte Carlo from the Review of Particle Physics (K.A. Olive et al., Chin. Phys. C, 38, 090001, 2014) by the Particle Data Group.

    Here is an introductory paper on Bayesian statistics: G. Cowan, Data analysis: Frequently Bayesian. Physics Today, Vol. 60, No. 4. (2007), pp. 82-3.

    C++: For computing there are many other web based references, e.g.,

    Adrian Bevan's computing lectures (part of the London HEP lecture programme).
    Rob Miller's C++ Course (Imperial)
    A C++ online reference with tutorials, etc., www.cplusplus.com
    Another C++ online reference: www.cppreference.com
  • Some more lectures on statistics I've given:

    Archives: The archived course page for the 2003 lectures. Materials from the 2003 data analysis tutorial can be found here.

    Information on computing setup: Some info on how to log into the RHUL particle physics linux machine linappserv0 from the teaching lab or your own computer is available here. To set up a unix environment on a windows computer you can download and install cygwin from here. To make sure you select the required packages and install everything correctly please look at the information here, which is based on the recent email that updates the info that was here.

    Once you have your account on linappserv0 you connect from any other networked linux machine with

    ssh -X username@linappserv0.pp.rhul.ac.uk

    where for "username" you substitute your login name, and then enter your password. You will have been given information on computer security and on how to change your password. It is your responsibility to read and follow these rules.

    Your default shell is bash. Your account should have in the home directory a file called .bash_profile (check this with ls -la). If it isn't there, you can copy this .bash_profile to your home directory. This defines certain aliases and environment variables automatically when you log in. In particular, it defines the environment variable ROOTSYS, which you need for the ROOT programs we will use.

    You can also copy to your home directory the file .emacs, which will set some defaults for the emacs editor.


    Glen Cowan