Computing and Statistical Data Analysis

2013/2014 University of London Postgraduate
Lectures for Particle Physicists

University of London MSci PH4515

 

  University of London crest


Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail: g.cowan@rhul.ac.uk

Time & Place: The lectures take place at UCL, Mondays 3:00 to 6:00, starting on 30 September, UCL Physics/Union Building D103. this is on the first floor of Union (see the map here, ref. D1).

Minor change to course structure: For the first four weeks, we will use the time from 3 to 4:30 for statistics and from 4:30 to 6 for computing. This should allow us to finish the C++ part of the course within 4 weeks.

The computing part of the course is optional for the PhD students (check with your supervisor) but mandatory for the MSci/MSc students.

From week five the lectures on statistical data analysis will continue now from 3 to 5. The hour from 5 to 6 will be reserved for discussion and if needed, overflow material from the lectures.

Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515. The purpose of the lectures on probability and statistics is to present the basic mathematical tools needed for the analysis of experimental data. The methods will be practiced by writing and running short computer programs.

Although the examples used in the course often relate to particle physics this is done in a relatively simple way and MSci students from other physics areas should not find this too great a difficulty.

Syllabus: A general outline of the course topics.

Problem sheets:

  • Problem Sheet 1, due 14 October 2013.
  • Problem Sheet 2, due 21 October 2013.
  • Problem Sheet 3, due 28 October 2013.
  • Problem Sheet 4, due 11 November 2013. Materials for problems 1 and 2 can be found here and info on how to set up the ROOT environment can be found here. The code for the TwoVector class can be found here. For more information on overloading += and -=, see here, and scroll down to "compound assignment operators".
  • Problem Sheet 5, due 18 November 2013. You will need the programs here (see also the file readme.txt). You can get all of the files in the tarball here.
  • Problem sheet 6, due 25 November 2013. For problem 2 you need the programs makeData and expFit (download the filesand type gmake).
  • Problem sheet 7, due 9 December 2013. You will need the root macro simpleFit.C (updated!) and the related files here .
  • The coursework will be due on the days of our lectures so you can hand it in to me then (on paper). Please write clearly on the top of the page your name, college, and degree programme (MSci, MSc or PhD). Emailed coursework submissions are only allowed if for some reason you are unable to attend the lecture, in which case the entire assignment must be contained in a single pdf file with all of the relevant information.

    Notes, books, etc.: Copies of lectures are available below -- you can print them out and bring them to the lectures. For computing there are many other web based references, e.g.,

    Adrian Bevan's computing lectures (part of the London HEP lecture programme).
    Rob Miller's C++ Course (Imperial)
    A C++ online reference with tutorials, etc., www.cplusplus.com
    Another C++ online reference: www.cppreference.com
    The statistics lectures will mainly follow

    G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford, 1998.

    This book has its own web site, which contains various data analysis resources. Also useful are:

    R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley, 1989;
    W.T.Eadie et al., Statistical Methods in Experimental Physics, North-Holland, 1971;
    S.Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998;
    L.Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986.

    Books on multivariate methods:

    Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
    T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer, 2001.

    You can also download the sections on probability, statistics, and Monte Carlo (pdf files) from the Review of Particle Physics by the Particle Data Group (K. Nakamura et al., J. Phys. G 37 (2010) 075021).

    Here is an introductory paper on Bayesian statistics: G. Cowan, Data analysis: Frequently Bayesian. Physics Today, Vol. 60, No. 4. (2007), pp. 82-3.

    Archives: The archived course page for the 2003 lectures. Materials from the 2003 data analysis tutorial can be found here.

    Lecture Notes (2012):

  • Computing:

  • Statistical Data Analysis:

    The old A4 versions of the statistics lecture notes are here (lecture 10 is on unfolding).

    The 2013 lectures have by and larged followed the slides from 2012 with some reshuffling. But in week 11 the lectures will have some new material; here is the lecture for week 11. Some further material relevant to the lecture can be found in arXiv:1307.2487.

  • Some more lectures on statistics I've given:

  • Computing: Some info on how to log into the RHUL particle physics linux machine linappserv0 from the teaching lab or your own computer is available here. To set up a unix environment on a windows computer you can download and install cygwin from here. To make sure you select the required packages and install everything correctly please look at the information here, which is based on the recent email that updates the info that was here.

    Once you have your account on linappserv0 you connect from any other networked linux machine with

    ssh -X username@linappserv0.pp.rhul.ac.uk

    where for "username" you substitute your login name, and then enter your password. You will have been given information on computer security and on how to change your password. It is your responsibility to read and follow these rules.

    Your default shell is bash. Your account should have in the home directory a file called .bash_profile (check this with ls -la). If it isn't there, you can copy this .bash_profile to your home directory. This defines certain aliases and environment variables automatically when you log in. In particular, it defines the environment variable ROOTSYS, which you need for the ROOT programs we will use.

    You can also copy to your home directory the file .emacs, which will set some defaults for the emacs editor.

    Using ROOT: A simple standalone C++ program for creating histograms with ROOT classes can be found here. (For installation of ROOT libraries, set-up, etc. see your local particle physics & computing guru.) More information on root, especially on the interactive program, can be found on the root home page; also the ROOT class index is very useful. Some material from a tutorial given by Tania McMahon can be found here (see the file ROOTtutorial.pdf). And here are the slides from Adrian Bevan's lectures on Unix and ROOT.

    And here are some more resources I've found useful: