
Statistical Data Analysis



Glen Cowan,
Royal Holloway, University of London,
phone: (01784) 44 3452, email: g.cowan@rhul.ac.uk
Time & Place:
The lectures take place at UCL, Mondays 3:00 to
6:00, starting on 1 October 2018.
Lecture location: UCL, Chandler House G10. Here is a
map.
Course structure: For 2018/19, as last year,
the computing element of the course will not be assessed.
Nevertheless, some of the statistical methods will be practiced
using computer programs, primarily in C++. For those students without a
background in
C++, additional tuition will be provided. Starting this year
it will also be allowed to carry out the coursework in python,
but less support for this will be provided.
The main lectures on Statistical Data Analysis will be from 3:00 to
5:00. For the first 6 weeks, the hour from 5:00 to 6:00 will be used
to cover the basics of C++. There will be no assessed work on C++ per
se, but it (or, optionally, python) will be used in the statistics
coursework later on. From
week 7, the hour from 5:00 to 6:00 will be used to review the
coursework problems and provide an oportunity for additional examples
and discussion. As in previous years, the exam will only cover
statistics (no C++).
If you are a nonRHUL MSc or MSci student (i.e., from UCL, KCL or QMUL),
then to be enrolled for credit in the course you need to fill in
this form.
Section D must be completed with two signatures and a College stamp.
Aims: This series of lectures
is intended for PhD students in Particle Physics and it also forms
the University of London MSci course PH4515. The purpose of the lectures
on probability and statistics is to
present the basic mathematical tools needed for the analysis of
experimental data. The methods will be practiced by writing and
running short computer programs.
Although the examples used in the course often relate to particle
physics this is done in a relatively simple way and MSci students from
other physics areas should not find this too great a difficulty.
Syllabus: A general
outline of the course topics.
Problem sheets: The coursework will be due on the days of our
lectures so you can hand it in then (on paper). Please write clearly
on the top of the page your name, college, and degree programme (MSci,
MSc or PhD).
Late or emailed coursework submissions are only
allowed in case of exceptional circumstances and if agreed by the
lecturer. If an email submission is agreed, the entire assignment
should be contained in a single pdf attachment with all of the
relevant information (name, degree programme, College), and the
subect line must include the words "Statistics Problem Sheet". Please do not
put highres colour photos into the pdf; use iScanner or similar app if
you need to make a pdf using your phone.
Lecture notes:
Statistical Data Analysis:
 Set 1 (weeks 1,2).
 Set 2 (weeks 3,4,5); also
a note on the Poisson
distribution (optional material).
 Set 3 (weeks 6,7,8).
 Set 4 (weeks 9,10). See also
the "Asimov Paper", aka
Asymptotic formulae for likelihoodbased tests of new physics, by
Cowan, Cranmer, Gross and Vitells, EPJC
71 (2011) 1554. or
arXiv:1007.1727
for more on statistical tests for searches. And
here is a short note
on the Jeffreys prior.
 Set 5 (week 11).
 Some other resources:
 G. Cowan, Topics in statistical data analysis for high energy physics,
arXiv:1012.3589 (2010).
 G. Cowan, Statistics for Searches at the LHC,
arXiv:1307.2487 (2013).
 G. Cowan, Bayes Factors for
Discovery (draft note).
 Lectures from the Galileo Galilei Institute (January 2017)
on youtube.
 G. Cowan, Statistical
Models with Uncertain Error Parameters, arXiv:1809.05778.
Computing:
 C++ lectures all in
one file
 Some slides about the data analysis program
ROOT.
 The code for the TwoVector class can be found
here, and a 2011 problem
sheet with exercises that use it is here.
More notes, books, etc.:
The statistics lectures will mainly follow
 G. Cowan, Statistical Data Analysis,
Clarendon Press, Oxford, 1998.
This book has its own
web site, which
contains various data analysis resources. Also useful are:
 R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical
Sciences, John Wiley, 1989;
 Frederick James, Statistical Methods in Experimental Physics,
2nd edition, World Scientific 2006;
 S.Brandt, Statistical and Computational Methods in Data
Analysis, Springer, New York, 1998;
 Ilya Narsky and Frank Porter, Statistical Analysis
Techniques in Particle Physics, Wiley, 2013.
 L.Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986.
Books on multivariate methods:
 Christopher Bishop, Pattern Recognition and Machine Learning,
Springer, 2006.
 T. Hastie, R. Tibshirani and J. Friedman,
The Elements of
Statistical Learning, 2nd edition, Springer, 2009.
 Gareth James, Daniela
Witten, Trevor Hastie and Robert Tibshirani, An Introduction to
Statistical Learning with Applications in R:
free book
and
lectures.
You can also download the sections on
probability,
statistics, and
Monte
Carlo from the Review of Particle Physics,
M. Tanabashi et al. (Particle Data Group), Phys. Rev. D 98, 030001
(2018), by the
Particle Data Group.
Here is an introductory paper on Bayesian statistics:
G. Cowan, Data analysis: Frequently Bayesian. Physics Today,
Vol. 60, No. 4. (2007), pp. 823.
C++: For computing there are many other web based
references, e.g.,

Lecture notes on C++ by Philip Blakely (Cambridge).
 Adrian Bevan's
computing lectures (part of the London HEP lecture programme).
 Rob Miller's
C++ Course (Imperial)
 A C++ online reference with tutorials, etc.,
www.cplusplus.com
 Another C++ online reference:
www.cppreference.com
Some more lectures on statistics I've given:
 Academic training lectures
on
Statistics at the LHC, CERN, 1417 June, 2010.

Lectures on
statistical methods for particle physics at Tsinghua University,
1216 April, 2010.
 Seminar on
recent progress in multivariate methods for particle physics,
Weizmann Institute of Science, 17 Jan 2010.
 Statistical Methods in Particle Physics at SUSSP65,
St Andrews, 1629 August 2009:
lecture 1,
lecture 2,
lecture 3
 Two lectures on Bayesian methods given at the DESY
Statistics
School (part of the Helmholtz Alliance Physics at the Terascale initiative),
29 September  2 October 2008:
lecture 1 (ppt,pdf),
lecture 2 (ppt,pdf).
 The materials for my lectures on
advanced statistical methods for data
analysis (multivariate methods) for the University of Mainz
(Klausurtagung des GK "Eichtheorien  experimentelle Tests...",
Bullay/Mosel, 1517 September, 2008). These are updated versions of
my lectures on multivariate statistical
methods in particle physics (CERN Academic Training Lectures,
1619 June, 2008).
 The CERN Summer Student Lecture statistics lectures are
here.
 Bayesian statistics
at the LHC (and elsewhere), CavendishDAMTP HEP phenomenology seminar,
Cambridge, 7 March 2008.
 Lectures on Statistics at the CERNFNAL Hadron Collider Physics School:
lecture 1 and
lecture 2,
CERN, 6 and 8 June, 2007.
 My
talk on Bayesian statistics at
the Rencontres de Moriond (QCD), La Thuile, 18 March, 2007.
 The smalln problem in High
Energy Physics, talk at Statistical Challenges in Modern
Astronomy IV, Penn State Center for Astrostatistics, 1215 June, 2006.
 Bayesian statistical methods
for parton analyses, talk at DIS2006, Tsukuba, 22 April, 2006.
 RHUL HEP group seminar 22 March, 2006 on
nuisance parameters and systematic errors.
 Nuisance parameters
and systematic errors from the
IoP Half Day Meeting on Statistics in HEP,
Manchester 17 November 2005 (pedagogical
summaries of talks at
PHYSTAT05).
 Here is a paper on unfolding I wrote for the
2002 Durham Statistics Conference
(ps,
pdf).
Archives: The archived course page for the
2003 lectures. Materials from the
2003 data analysis tutorial can be found
here.
Information on computing setup: Some info on how to log into
the RHUL particle physics linux machine linappserv1 from the teaching
lab or your own computer is
available here.
Once you have your account on linappserv1 you connect from any other
networked linux machine with
ssh X username@linappserv1.pp.rhul.ac.uk
where for "username" you substitute your login name, and then enter
your password. You will have been given information on computer
security and on how to change your password. It is your
responsibility to read and follow these rules.
Your default shell is bash. Your account should have in the
home directory a file called .bash_profile (check this with
ls la). If it isn't there, you can copy this
.bash_profile to your
home directory. This defines certain aliases and environment
variables automatically when you log in. In particular, it
defines the environment variable ROOTSYS, which you need for
the ROOT programs we will use.
You can also copy to your home directory the file
.emacs, which will
set some defaults for the emacs editor.
Glen Cowan