Glen Cowan 23.10.03 Postgraduate Workshop on Statistical Data Analysis 1 Introduction --------------- In the data analysis workshop we'll be working through parts of an analysis of the search for the Higgs boson in e+e- collisions. This parallels very closely the real search that took place at LEP up to the end of its operation in late 2000. The exercises will give as a chance to look at Monte Carlo event generation, detector simulation, and the use of test variables to select signal events in the presence of background. The Higgs production process we will consider is e+e- -> HZ with H -> bbbar and Z -> qqbar The main background to this is from e+e- -> ZZ with both Zs decaying to qqbar We will look at highly simplified Monte Carlo generators for these processes, and we will also simulate the response of a typical detector in a very simplified way. The physics behind the event generators is described in - V. Barger et al., Phys. Rev. D 49 (1994) 79 (copy on http://www.pp.rhul.ac.uk/~cowan/barger_higgs.pdf), - D. Bardin et al, hep-ph/9406340, - Mikaelian et al., PRD 19 (1979). 2 Workshop notes ----------------- Here are some rough notes on what to do for the data analysis workshop Log in to one of the linux machines and copy the files (probably easiest to copy the whole directory structure) from www.pp.rhul.ac.uk/~cowan/stat/tut03 to your area. There are three subdirectories: toymc, evtanl and sigmatot. 2.1 The "Toy" Monte Carlo (toymc) --------------------------------- toymc contains a Monte Carlo generator for: e+e- -> HZ with H -> bbbar and Z -> qqbar (the signal process) and e+e- -> ZZ with both Zs decaying to qqbar (a background process). Both of these event types result in four jets of hadrons. The program includes a simple routine that simulates the response of a detector by smearing the momenta of the jets. It also simulates the tagging of jets that are initiated by long-lived quarks, i.e., b or c. The program generates for each jet a number called "btag". This is the p-value for the hypothesis that all of the tracks in the jet originate from the primary vertex. For u, d, and s jets this is uniformly distributed in [0,1], since the hypothesis is correct. For c jets it is somewhat peaked towards 0 and for b jets even more so, since these contain long-lived mesons whose decay products originate from a secondary vertex. For B mesons with a momentum of 30 to 40 GeV, the mean decay length is several mm. Type gmake to build the program. Run by typing ./toymc and answer the questions. Generate two files of Monte Carlo data with, say, Ecm = 220 GeV, M_higgs = 115 GeV, with 1000 events each. Look at the histograms in the output file. These give the distributions of the various decay angles. Compare these to the plots in the paper by Barger et al. (These histograms are filled before simulation of detector effects.) You can hack into the program and investigate the effect of the detector simulation. Try, for example, turning it off entirely. The file also contains an ntuple with the four-vectors of the four jets as well as the btag values. Look at the distributions of the various quantities with PAW. Try to explain the distributions of the btag values for the four jets. The distributions of the momentum components by themselves are not very informative. To see something more meaningful we need to form pairs of jets and calculate their invariant masses. In principle you can do this in PAW but really this requires a more flexible programming environment. One possibility is ROOT, and you can convert the hbook ntuple file to root format by typing, say, h2root hz.hbook hz.root Another alternative is to read the hbook file in with a C++ or FORTRAN program, unpack the ntuple one event at a time and do the analysis there. A simple program for this is evtanl. 2.2 The Event Analysis Program evtanl ------------------------------------- evtanl is a simple C++ program which reads in the ntuple, unpacks it and makes the variables available to the user. From the four-vectors, for example, you can compute the invariant masses of two-jet pairs, and try to figure out which pair came from the Z decay and which came from the Higgs. Of course it will help to use the b-tagging information. Try to figure out what variables provide the best discriminating power between HZ and ZZ events. Pick a set of selection criteria and find out your efficiency for HZ (hopefully high) and ZZ (hopefully low). If you have time, try to construct a simple Fisher discriminant function (see the course notes). Try to produce histograms of the test statistic for HZ and ZZ events. 2.3 Total cross sections ------------------------ The number of events n_i of type i that one will obtain for a given integrated luminosity L is a Poisson random variable with a mean value nu_i. This is given by nu_i = sigma_i * efficiency_i * L where sigma_i is the total cross section, which you will need for both reactions, i.e., i = HZ and i = ZZ. These depend on parameters like the centre-of-mass energy and on the Higgs mass. The directory sigmatot contains a simple program for computing total cross sections: test_sigma_tot.cc. There is a simple script for compiling and linking it: test_sigma_tot.lnk. Try to get this to give you the cross sections for the Ecm and Higgs mass values that you choose. 3 Putting it all together ------------------------- At the end of the day what you want is to set the selection criteria to maximize the expected limit that you would set on the Higgs mass. A final statement would be of the form: "With so-and-so much integrated luminosity (say, 100 pb^-1), we would find so-and-so many Higgs events at different values of the Higgs mass, and the expected number of events from background processes is so-and-so many..." Alternatively, you could set up a "mock data challenge" where you prepare a sample of data with Standard Model processes mixed together with Higgs events at a certain Higgs mass. If your analysis finds a significant signal you would compute the p-value of the hypothesis that there is only background, to see if it could be rejected. These questions go beyond the scope of today's workshop but are discussed at length in the many papers written by the LEP Higgs groups (see e.g. their paper submitted to the Amsterdam conference ICHEP02).