Using Critical Incidents to Validate the Direct Measurement of Attribute Importance and Performance When Analyzing Services

Since its introduction into the marketing literature by Martilla and James, the Importance-Performance Analysis has proven multiple times to be a cost-effective technique for measuring attribute importance and performance of services for the customer. Additionally, it gives managers valuable hints in order to improve their products and services. However, despite a long list of successful applications overtime one critical aspect remains—the validation of the importance values by direct measurement. Besides the limitations and critics that accompanied with stated importance techniques, a lot of research results show that it is better to use direct methods in place of indirect measures. Some researchers suggest measuring the customers’ priority structure to compensate the critical points within the direct questioning. This study shows how the critical incident technique can be helpful for the validation of such results.


Introduction
Customer satisfaction is the central concept in marketing policy.It can be seen as the engine for the purchase volume and as release button for repurchases [1].There is no consistent definition of the key elements of the customer satisfaction concept [2].Churchill and Surprenant 1982 [1], for example, stated that customer satisfaction is the result of using or buying a product or service that is based on the customers' comparison of the products or services revenue and costs in relation to the expected consequences.Other researchers stated that success in selling products or services depends on their design and features.Therefore, the impact of single product or service attributes needs to be studied [3][4][5].But detecting these decisive and important attributes is not enough, because researchers and practitioners need to know which attributes determine the purchase decision [6].Therefore, this paper uses the definition of customer satisfaction by Myers and Alpert from 1968 [7], who defined customer satisfaction as a function of expectations concerning important attributes of products and services and the evalua-tion of the compliance to these expectations or in other words-the attribute satisfaction or performance.
In 1977, as a tool to measure the attribute importance and performance, Martilla and James [8] introduced the so-called Importance-Performance Analysis (IPA) into the marketing literature.IPA, a cost-effective technique that has been in practical use now for more than 35 years and is highly accepted [9].The idea behind this technique is the development of a comprehensive list of attributes that define a product or service and to ask a sample of customers to rate these attributes on an importance and a performance scale.The results can be displayed on a two dimensional grid where the attributes are positioned according to their average importance on the y-axis and the average performance on the x-axis.Using averages on both axes, the grid can be divided into four quadrants (from low importance/low performance to high performance/high importance) and-according to their position-the attributes receive one of four different normative strategies and recommendations for action [8].The advantages can be seen in the simplicity [8,10] and as Using Critical Incidents to Validate the Direct Measurement of Attribute Importance and Performance When Analyzing Services 2 Lovelock, Patterson and Walker [11] suggested, it shows the areas in which investments for improving the performance has the most impact on the increase of customer satisfaction.Applications can be found for objects like travel quest [12], student's choice of universities [13], meeting destinations in China [14], higher education sector [15], adult education programs [16], Tanzanian national parks [17], service quality [18] and hospitals [19].But besides the intensive usage of IPA in service theory and practice, there remains a critical issue within the technique has not been sufficiently answered up till now: the question of whether the direct measurement of attribute importance is valid [6,9,10,20]).

Direct Importance Measurement-Problems and Advantages
The majority of IPA studies use direct methods when measuring the importance of product or service attributes [21].The techniques used in research and practice can be seen in Figure 1.
Other researchers prefer indirect measurement of importance ratings [22].In these studies the respondents are not asked directly for their purchase criteria.The importance ratings are gathered by qualitative research techniques and statistical methods like discriminant analysis, multiple regressions, or conjoint analysis [6,10,23].Researchers use these techniques to avoid the problems which come along with the direct measurement methods.
As Gustafsson and Johnson [20] suggest, in direct questionnaires the respondent needs to know what is meant by importance, and they need to be clear about their own preference structure.Thus one problem stated by Azzopardi and Nash [22] is the missing predictive validity of the direct measurement.The reason why the validity is missing, is based on the multidimensional concept of importance [21,24,25], as can be seen in the paper by Myers and Alpert [7].
As Jaccard, Brinberg and Ackerman [4] suggest, the concept of importance within the customers decision consist of five different dimensions that need to be considered during the measurement.This means that the researcher should be aware of the importance dimension which needs to be measured and which technique is the best for the focused dimension [25].A problematic trend is the fact that in a lot of IPA studies the importance rates are getting evaluated extremely high with the consequence that they are positioned in the both upper areas of the IPA grid [9,10,22].The reason for this can be seen in the fact that the importance rating are directly measured and humans tend to present themselves in the best possible way [26]: The majority will not reveal things that are not desirable in society [27], thus the danger of creating an unrealistic picture of the importance of product or service attributes because of social desirability is higher in direct questionnaires [20,26,28].Another source of the problem can be seen by Martilla and James [8] who suggest that the first step in conducting an IPA should be to determine the most decisive aspects for the customer's choice of a product or service by using focus groups interviews or personal interviews [8,9].According to Wade and Eagle [17] the high importance rating for all attributes are not surprising, because the most important aspects have been conducted before.Other problems can be the respondent's unfamiliarity with the service or product of interest [29] or the participants' mental overload, when questionnaires are too long or too complex [10,30].
Nevertheless, IPA has not been constructed for an absolute Importance-Performance measurement [8,9,24] and the research in this field shows that the direct measurement is practicable.As Bottomly, Doyle and Green [31] note the direct measurement is the preferred one by the respondents and should be used for that reason rather than indirect methods.Additionally, they found that the direct measurement results are more solid concerning estimated weights and more stable in a test-retest situation.Bacon [10] stated that the underlining assumption of IPA cannot be met with statistically indirect methods.Alpert [6] also identified the direct questioning as a more effective predictive model.
Coming back to the problem, that the importance ratings are pointless, because they are concentrating in only one area of the Importance-Performance grid, priorities should be measured simultaneously.This would help to know which attributes need to be imperatively improved [32].Bacon [10] recommends validating the results of an IPA with other methods for direct questioning to reveal these priority structures.
To prevent the same problems that have been discussed within the direct questioning an indirect measurement has been used to validate the results -the Critical Incident Technique.

The Critical Incident Technique (CIT)
The Critical Incident Technique is a qualitative analysis technique [33,34], which allows a stepwise analysis of complex human action in special situations [35].It was introduced into the psychological literature in 1954 by John C. Flanagan, who developed this method for the evaluation of effective or ineffective patterns in the workflow [36] within the US Air Force [37].The popularity of using CIT for marketing relevant problem formulations was stimulated through a paper by Mary Jo Bitner, Bernard H. Booms and Mary Stanfield Tetreault [36] who used CIT for analyzing critical incidents in different service branches with focus on the employee-customer contact situations [38].Since that study more than 130 papers had been published using CIT by 2003 in the marketing literature [36].Regarding the publications from 2004 till 2013, we found additional 71 papers in the marketing literature.Applications can be found in various research subjects, for example in health care [39,40], restaurants [41], education [42], job behavior [43] and tourism [44,45].
As Flanagan [37] explained, a CIT study should consist of 5 main steps as can be seen in Figure 2, ranging from the problem definition over the data collection to the analysis and interpretation of the results.The main idea is to ask a sample of respondents to address important aspects that they liked, and did not like, during the service production.By categorizing these so-called critical incidents and counting them out across the sample, the analyst obtains a list of important categories/attributes and -over the relation between positive and negative comments in these attributes a performance evaluation.
Despite the popularity of the method some critical points have been discussed.Chell [33] doubts the validity and reliability, but Ronan and Latham [46] for example used different measures for reliability and validity and found satisfactory results.The interpretation of the results and the analysis was criticized for example by Edvardson [47].However, Anderson and Nilsson [48] studied the same aspects with special concentration to the formulation of categories and found satisfactory results.Another problem can be the influence of the interviewer or misunderstood questions.Hence, Flanagan [37] suggests to formulate the questions as precise as possible and not to comment the respondent's answers.
Despite the presented critical aspects, just a few applications have been made within the technique.Stauss and Weinlich [49] for example formed the Sequential Incident Technique, which determine all incidents in the service process using the CIT.
Keaveney [50] developed the Switching Path Analysis Technique, a method that studies the negative critical incidents which lead to switching behavior.
The Criticality Critical Incident Technique (CCIT) presented by Edvardson and Roos [51] should also be mentioned.The researchers developed this method on the basis that the CIT technique and the two mentioned applications depend on remembered incidents of the respondents.
Finally the positive aspects of the CIT are significant.The method is flexible [37] and therefore, applicable for a lot of study objectives, as the research examples mentioned showed.It gives a comprehensive view inside the customer perception [47] and show how they really think [52].Some more advantages can be seen by Gremler [36].For the usage in this study the suggestion made by Bitner, Booms and Tetreault [38] "Hence, not all service incidents were classified, only those that customers found memorable because they were particularly satisfying or dissatisfying.Examining such memorable critical incidents is likely to afford insight into the fundamentally necessary factors leading to customers' dis/satisfactory evaluations."and Gremlers [36] notion "there is no a priori determination of what will be important."compensate the problem discussed in the direct questioning section.
However, some researchers proofed the usefulness of incident based methods in comparison to attribute based methods and found some interesting aspects, which underline the relevance of this study, too.

Formulating aims
• Define with the help of experts the aims of the behavior of interest, the background and circumstances.

Planning
• Define the situation in that the respondents should be interviewed • Define the respondents that should be interviewed

Data collection
• Decision for one type of questioning: personal interviews, checklists, group interviews • Respondents report about their impressions and experiences according to the formulated question

Analysis
• Define the conception framework • Build categories Stauss and Hentschel [53] studied the applicability of attribute -and incident based methods for measuring service quality measurement.They concluded that both techniques lead to different results.Some other researchers have discussed this problem as well.Stauss [54] and Stauss and Weinlich [49] presented the same problems.They analyzed the SERVQUAL Method -a technique for determining the customer's perception of the service quality [55] and criticized that methods like these are not able to detect all the critical and decisive factors that influence the customers purchase decision.Therefore, the CIT mentioned as alternative measurement, because of the advantages mentioned above.Matzler and Sauerwein [55] also discuss the problem that IPA does not distin-• Formulate titles and summarize associated incidents

Open Access JSSM
Using Critical Incidents to Validate the Direct Measurement of Attribute Importance and Performance When Analyzing Services 4 guish between basic, performance, or excitement factors.Nor does IPA adequately address the respondents' interpretation of the importance of these factors.As a result, IPA can result in faulty marketing strategies.
They concentrated on the CIT as well and mentioned the same problem, but argued that most of the CIT studies were conducted to determine the factors that influence the customers' satisfaction.They conclude that it is vague to define if the mentioned attributes whether evaluated as important or not, do really have influence on the satisfaction of the customer.However, according to Martilla and James [8], every IPA should determine the decisive importance and performance aspects and use customer or professional interviews during its establishment.However, it can not be expected that the factors that lead to dissatisfaction, when they are not sufficiently met, were not determined.

Research Instruments
Two studies, one in the sauna area of a giant indoor waterpark in Germany, done by personal interviews with 100 randomly selected respondents, and another one with 194 randomly selected visitors of a German Bundesliga soccer stadium interviewed while they left the stadium, were conducted.The questionnaires had been developed on the basis of the relevant literature in this field and as Martilla and James [8] specified, expert interviews have taken place.Furthermore, for both services service blueprints were created, a methodology for visualizing the complete service process [56] and to receive a better understanding for the dynamics and critical points [57].For the first questionnaire 17 items for the satisfaction part were identified along with 14 items for the importance section.The scales were constructed from 1-("very satisfied") to 7-("very dissatisfied") and from 1-("very important") to 7-("absolutely unimportant").Using 7-point scale can be seen by other studies [12,58] as well.For the second study, 26 items for both sections have been developed.This scale ranged from 1-("un-important") till 5-("very important") and from 1-("awfully bad") till 1-("awfully well") [13,14,16,59].In the first questionnaire the respondents answered the satisfaction area first, and afterwards the importance statement to avoid order effects as was recommended by Martilla and James [8].In the other questionnaire the respondents needed to answer the importance questions first and then the performance for each attribute with the background, to see if there are differences.To ask first for the importance and then for the performance is in line with other studies [16,18].After the importance and performance section, CIT was used.In the first study the visitors were asked: "What aspects or situations do you remember were very positive or very negative during your stay?"In the second study the respondents have been asked to write down the first positive and negative aspect they remember happened during their stay.As discussed in the next section, the first remembered incident is the most important one [60].To avoid before mentioned interviewer bias, the interviewer did not comment on any answer or urged the respondents to answer if they could not remember any positive or negative aspect.

Data Analysis
As Figure 1 shows, direct importance can be measured with different methods, however within the IPA means based on simple ratings [8] or Likert scales are used [12,17,61].Therefore, for IPA, means and significances using SPSS were computed, using the scales mentioned above.Researchers well discussed the dimensions of the IPA grid and presented some new approaches.Slack [32] presented an alternative design of the quadrants.He developed a system which separated the quadrants diagonal for an improved understanding of the relationship between the customer's behavior and their expectations.Another modification was the using of the dimensions "current effect on performance" and "scope of improvement" instead of performance vs. importance, comes from Easingwood and Arnott [62].Another presentation of the grid can be seen in Abalo, Varela and Manzano [9] who used both, the diagonal and the quadrant model.The problem can be seen by Oh [24] who demonstrated that the results changed when another type of scaling is used.However, the focus of this study is the demonstration of using an indirect measurement for the validation of the direct importance and performance measurement and not a discussion of the exact grid and for that reason the traditional quadrant visualization is used, which is the presented method in tourism studies [22] and both study subjects can be seen as branches of tourism.A problem within the grid discussion is the design of the axes and their point of intersection.Martilla and James [8] suggested that the middle position of both axes is sound.Because of the strength of the IPA, the identification of relative performance, and importance evaluations, the mean values as well as the median can be used.Researchers can use the scale means as in the study by Hawes and Rao [63] or the actual means from their data, see for example Alberty and Mihalik [16], but the results can be quite dissimilar, with the result that the interpretation must be exact, because they influence the managers decision [24].In this study the data centered method using the means of the importance and performance evaluations [10] are used.
Figure 2 shows the analyzed results of the CIT.The negative and positive aspects have been sorted in the order they have been mentioned by each respondent.In the second step their appearances have been counted.This technique is in line with the suggestions Swan and Rao [60] made "The importance of past events to people can be roughly estimated by assuming that the more important events will be recalled and mentioned before less important events.Since different numbers of C.I.s were mentioned by different respondents, whether or not one was mentioned first may be meaningful; the other positions have less meaning."As a consequence the measured mean importance within IPA should be mirrored by the frequency and order of the aspects mentioned by the respondents.

Results of the Demographics and the Attend Ants' Behavior
The first study consisted of 40% male and 60% female respondents.To make sure that all the respondents are familiar with the branch and service, and to improve the validity of the results they have been asked how often visit a sauna on a regular basis.The majority visit such a service repeatedly during the year.Only 7% stated to use it the first time.Therefore, the problem of the inexperience as factor for poor data as mentioned by Gustaffson and Johnson [20] has been prevented.In the second study, 158 men and 36 women participated.This imbalance is in line with other studies [64].To prove the familiarity with the branch the respondents were asked to state to which kind of fan they would count themselves.Based on this the intensity of the visit can be applied [65].The majority of the persons asked, stated to be enthusiastic, faithful fans.Just 8.76% of the respondents stated to watch soccer games in stadiums only occasionally.

Results of the Importance-Performance Analysis
Tables 1 and 2 show the data-centered results in the first study and the second study concerning the means of the importance and performance of the defined items.

Results of the Critical Incident Technique
One of the critical aspects mentioned by Oh [24] is that the developed Importance-Performance grid influences the managers' decision concerning the modification of the worst performing items.For that reason, evidences for the importance structures needed, to make sure that the right aspects are getting improved.The method presented here is the CIT.The results, which have been analyzed according to Swan and Raos [60] suggestions, can be seen for the first study in Table 3 and for the second study from Table 4 in the column "frequency mentioned at first".The incidents mentioned have been counted and sorted concerning the order the respondents stated.For the first study 181 critical incidents have been analyzed.
In the second study 95 critical incidents have been reported.The relative small number of reported incidents could be the result of the limitation explained in section 4.1.

Comparison of Both Results
In both studies x-axis presents the mean values of the performance scale.The y-axis represents the mean values of the importance evaluations for each defined item.To build the four quadrants in both studies the points of interactions represents the mean values of all items.For the      first study the mean value of the performance ratings is 2.223 and for the y-axis it is 2.168.In the case of the second study the mean value for the x-axis is 3.428 and the point of interaction with the y-axis is 3.929.
The most important area is the concentrate here area, because the items situated here mean that the enterprise fails to meet the customers' expectations.Because these aspects are quite important for the customer, an intensified effort should be taken to improve the service there.For this reason, policy changes and strategy adaption should get concentrated to these factors [22].To proof the acuteness of the intervention and to be sure that the priorities of the customers are represented well by the Importance-Performance grid, the results compared with the order and frequency of the critical incidents as can be taken from Table 5.The data has been ordered, taking into consideration their relative ranks to see if they are similar evaluated.
For items 1 and 4 the results of both analyses are similar.There is just a difference of one rank up and down.The item "Attention of the service staff" was ranked as fifth most important aspect in both analyses.Interventions for the improvement of these factors should be paid according to their acuteness within the ranking.
However, there is a difference for item 12.According to the IPA this aspect is the fifth most important attribute and in line with the item 7.In the critical incident analysis it is the second most important aspect.This study was conducted during the start of the winter holiday season in Germany.
A lot of people visited the "saunapark" with their children and therefore, the normally quiet environment was disturbed.In this situation the capacities of the enterprise have been exhausted and the visitors noticed it, which could be the reason for this result.
As consequence the policy of the enterprise should be changed in the holiday seasons.This means that the ca-pacity boundary should be recognized to keep the visitors satisfied.
In Figure 4 the items 8, 9, 19 and 20 are positioned in the concentrate here area and therefore, special attention to these aspects should be paid.
To support the results mathematical too, the spearman rank coefficient was computed with a value of 0.706.This means that there is an obvious relationship between both measurements and supports the hypotheses that the CIT and IPA should be measured simultaneously to underline the results.
In Table 4 the ranks of the importance evaluations have been compared with the results of the critical incidents of the second study in the same way, it was done for the first study.
Items 19 and 20 are similar in the evaluation of both methods.Differences can be seen for the aspects 8 and 9.
The "quality of the team's performance" is the most important one concerning the CIT, but just the fifth most important one according to the Importance-Performance results.The "willingness of the team" is the second most important attribute in the IPA, but just the fourth most important in the CIT.The problem could be the differentiation and interpretation on both factors on the one side for the fans, and on the other side by the researcher.This problem is in line with other results [47].Despite the differences, the results of the CIT support the items position in the Importance-Performance grid, because independent from the ranking, both analyses filtered the same 5 or 4 most important attributes.That means the management should invest in improvements of these attributes before giving attention to the other ones.
The spearman rank coefficient for this study is 0.517- a value which supports the result as well.

Conclusion
As Chrazan and Golovashkina [5] stated, the simple importance and performance rating is easy to handle, especially for the respondent but on the other hand it is inferior when considering its validity.One critical point is that the CIT is quite costly and the usability of the results depends on the respondents' attendance, as shown in both studies.However, the presented technique proved that the CIT is a good instrument to test the validity of results steamed by an IPA.As presented, for this methodology, it does not matter whether the importance and performance is measured simultaneous or in sequences.Further tests could examine if the results still fit when the alternative grids for the IPA are used.Other methods like the conjoint analysis could be used as test of priorities as well.

Figure 1 .
Figure 1.Methods for direct importance measurement.

Fig- ures 3
and 4 show the transferred data in the grid.

Figure 3 .
Figure 3.The importance-performance grid for the first study.

Figure 4 .
Figure 4.The importance-performance grid for the second study.