Consensus on Items and Quantities of Clinical Equipment Required to Deal With a Mass Casualties Big Bang Incident

A National Delphi Study

Edward A S Duncan; Keith Colver; Nadine Dougall; Kevin Swingler; John Stephenson; Purva Abhyankar


BMC Emerg Med. 2014;14(5) 

In This Article


Participants were asked to consider what would be required to provide immediate patient care for 100 people in the pre-hospital phase of a big bang mass casualties incident. The study was based on current UK planning assumptions[1,5] for such events (Table 1). The figure of 100 people was chosen, firstly as it was a conceptually straightforward number of casualties to conceptualize, and secondly as it would allow easy calculations of quantities of items required at mass casualty incidents, as the results of the study could be simply multiplied as required.

A modified Delphi study method was used. Originally developed by the RAND Corporation in the 1950's,[6] the Delphi method has since been used extensively in healthcare research,[7–11] including emergency care research,[12–17] amongst other fields. Since its inception, many Delphi studies have varied slightly from the original RAND Corporation method, and it is therefore common to find studies described as 'modified Delphi studies' , or using a Delphi approach.[7] Delphi studies use a form of consensus methodology to develop a reliable consensus of a group of experts on a specific topic. The Delphi method involves a series of questionnaires, or 'rounds' (typically 3), on a specific topic being completed by subject experts. These rounds are interspersed by controlled feedback which includes the participant's own judgment and the overall group judgment for comparison. Participants are then given the opportunity to revise their judgment in the following round if they so desire. Participants' individual responses are unknown to the group.[18]

Given the variability of study methods that have been used and described as 'Delphi', it is important to outline the features that ensure the credibility of findings for this approach. These are: a clear description of why a Delphi method has been used; the choice of participants that form the expert panel; transparency of data collection procedures used; the choice of consensus level; and the means of dissemination.[19] A study reference group comprised of a small number of key leaders in the field was formed to support the study. Key tasks for the group were to: agree the study protocol; identify potential participants; provide expert comment on the study findings.

An opinion on the status of this study was sought from the NHS Lothian Ethics Committee who advised that for the purposes of ethical approval, the study was classifiable as a service evaluation.[20] The Scottish Ambulance Service Research Governance Committee, as the NHS Scotland Special Health Board for pre-hospital emergency care, granted Research and Development study approval. All data and participant information was stored securely in line with good research practice guidelines.


Participants were purposively selected according to the following criteria, which defined our 'expert participant':-

  1. Individual clinical (paramedical or medical) experience of providing a professional pre-hospital emergency medical response to a mass casualties incident; or

  2. Responsibility in health emergency planning for mass casualties' incidents and be in a position of authority and influence within the sphere of health emergency planning and response.

Potential participants were identified through the study reference group and researchers' knowledge base. The researchers used a snowballing method of recruitment to increase the potential participant base by asking the initial group to identify other potential participants who met the inclusion criteria. Letters of invitation to participate in the study were sent to 141 individuals. The majority of people invited to participate were located in the UK, but a few (n = 7) were based in other countries with similar emergency response strategies.

People interested in participating in the study were asked to email the study research paramedic (KC) to note their interest. They were then provided with a unique password, log-in, and link to the study website. The password and log-in linked the individual in each round of data collection, and enabled them to exit and re-enter the study website in order to complete each round as their time allowed.

Data Collection

Data was collected using a purposively designed study website. This enabled the study to be carried out on-line via a web browser instead of relying on paper-based questionnaires. Although the website was developed specifically for this study, it was designed in a manner that would allow its use in further Delphi studies with minimal adaptation. Individuals could not register and take part in the study from the site alone – they needed the password and unique identifier that was sent to them by the research team. Inter-round data analysis was completed automatically and significantly reduced the administration that is normally required to be undertaken between rounds of a Delphi study. The web site included the usual features you would expect from such a service. Having logged in, the user was presented with the Delphi questionnaire and, if they had already started it, the values they had entered. Participants were prompted to save their responses as they progressed through the study and whenever they logged out of the website. This allowed the participant to return to the site and complete the questionnaire in more than one sitting. Electronic reminders were sent automatically two weeks after the commencement of each round, and also in the final stages to those individuals who had not yet completed the round. These reminders stated the final date by which the current round must be completed. An a priori decision was made to limit the study to three rounds of data collection to minimize participant fatigue.[21] The website was piloted for acceptability and usability by Scottish Ambulance Service Special Operations Response Team ambulance clinicians and emergency planning officers. Feedback from the pilot stage was positive, although individuals noted that the task was substantial due to the number of items included.

Round One. Items for round one (n = 232) were collated from the researchers' existing knowledge of current stock for mass casualties incidents in the UK. The list of items to rate was long, so they were split into subsets according to their purpose (i.e. Items relating to Airway; Breathing; Circulation; Examination Medicines; Splintage; Comfort; Control of Infection; Transport; Other) each with a separate tab on the web-page. This made the questionnaire look less daunting and helped users find the item they had reached if they had saved their partial progress, and returned later.

Participants were asked to carry out two tasks for each listed item. Firstly, they were asked to rate the importance they would give to each item along a scale of 1 to 5 (Very unimportant – 1; Quite unimportant – 2; Neither – 3; Quite important – 4; Very important - 5); and secondly, they were asked to state how much of each item they believed would be required to treat 100 patients at the scene of a big bang mass casualties incident. Participants were offered the chance to click a button to declare that they had no opinion or knowledge for any given item. This also allowed an automatic check via the web site that no items had been accidently missed. The web site displayed a bar to inform the user of their progress and offered a facility to help them find any items they had missed. Participants who had completed less than 100% of the questionnaire were automatically emailed a reminder before the end of each round. Participants were also able to add any clinical items (for inclusion in round two) which they felt were important but missing from the round one list.

Round Two. Participants were asked to review the aggregated findings for the previous round together with their previous individual ratings, as well as 16 unique items of clinical equipment added in round one. Participants were invited to reconsider their rating of importance and quantification for each item. As in the previous round, electronic reminders were sent out to all non-completing participants after two weeks.

Round Three. Participants were again asked to review the aggregated findings for the previous round together with their previous individual ratings, and were again invited to reconsider their rating and quantification for each item of equipment in respect of the results of round two. Electronic reminders were sent out after two weeks.

Data Handling

Delphi studies vary considerably in how they handle and analyze their data.[18] The computerization of the study allowed the data to be presented to participants in a novel and more meaningful way. Data from rounds two and three were presented to participants as a color histogram (or heat map) where the depth of color indicated the frequency with which respondents in the previous round had chosen each rating. Figure 1 shows the frequency with which each of the five responses had been chosen in the previous round (dark being many, light being few). The grey circle shows the choice that the current participant made on the previous round and the green circle shows the choice that they have made on the current round, (in round one each box was white as no previous selection had been made). In this way, participants could easily see how their responses compared to the consensus in the previous round and either confirm or update their response accordingly.

Figure 1.

An example from the website of a color histogram of previous responses.

The second question required a numeric answer. As the user sample size in each round exceeded 30 (and therefore the number of independent responses was sufficient to assume that the central limit theorem held with responses tending towards being normally distributed), we proceeded to adopt a parametric approach in the iterative feedback to users between rounds. Feedback to the user was given as a color again, but in this case, the depth of color indicated the number of standard deviations between the user's response and the mean response (in other words, the z-score). As the scale for each answer was different, the normalized z-score provided a consistent measure of agreement for each question. Z-scores were calculated as,

Where x was the value for which the z-score is to be calculated, μ was the mean of the values of the previous round and σ was the standard deviation of the values from the previous round. The z-score was translated into a color depth and shown around the input box for each item in the questionnaire. The mean value from the previous round, along with the participant's own response from the previous round were also displayed on the questionnaire. An example of the quantity input box is given in Figure 2; the top box shows that the previous average quantity for this item was 73 and that this participant had said 53. The light color indicates the difference. The bottom box shows where the participant was in closer agreement in the last round. The numbers in the boxes show the participant's updated response for this round.

Figure 2.

The quantity input box for two items as presented on the website.


As the focus of this study was to understand which items should be included in a mass casualties response, it was important to understand which items gained consensus as being 'important' or 'very important', and conversely, which items were viewed as being 'unimportant', or 'very unimportant'. An item was deemed to be 'important' or 'very important' if it had been rated as either four or five by at least 80% of respondents. Similarly, an item was deemed to be 'unimportant' or 'very unimportant' if it had been rated as either two or one by at least 80% of respondents.

Analysis Plan

Frequently in Delphi studies the mean value and standard deviation of ratings are presented. However, these are likely to be sub-optimal measures as it is more likely that the responses will form a skewed distribution. For example, if half the respondents in our study chose a score of 1, and half chose 5, then reporting a mean of 3 would fail to illustrate that the data had a bi-modal distribution. Therefore we proposed use of non-parametric approaches in the data analyses.

Research Question 1: A descriptive analysis of the total number of items that reached consensus of being important (agreement by at least 80% of participants) would be summarized. A statistical test of significant difference in consensus of item importance between rounds would be tested by Wilcoxon Rank Sum Test for independent events - if the 20 additional items added to the list between rounds one and two achieved consensus; otherwise the Wilcoxon Signed Rank Test for matched pairs was proposed.

Research Question 2: A descriptive analysis of the recommended median quantities of items that reached consensus (agreement by at least 80% of participants) would be summarized. A statistical test of significant difference in consensus of median items required between rounds will be tested by a Wilcoxon Signed Ranks Test for independent events.