Comparison of Predictive Validity of Alvarado Score and Appendicitis Inflammatory Response (AIR) Score, A Hospital Based Observational Study

Introduction: Various new risk stratification scores have been proposed to accurately diagnose appendicitis. In the wake of limited number of comparative studies of these new scores, with the existing scores, the current study has compared the validity and reliability of Alvarado score and AIR score in diagnosis of appendicitis in a tertiary care teaching hospital. Materials & Methods: The current study was a prospective observational study. conducted in a tertiary acre teaching hospital in south India, between July 2015 to August 2016, for a 12-month period. A total of 297 eligible subjects were included. For each patient Alvarado score and AIR score were calculated and compared with histopathological evaluation. Results: The predictive validity of Alvarado score as assessed by area under the ROC curve was 0.74 (95% CI 0.62 to 0.85), as compared to 0.95 (95 % CI 0.92 to 0.98) for AIR score. The sensitivity of the AIR score was 95.7% as compared to 87.3% sensitivity of ALVARADO score. AIR score had s specify of 90.5%, as compared to 52.4% for Alvarado score. Correspondingly, both false positive (47.6% vs. 9.5%) and false negative (12.7% vs.4.3%) rates were higher for Alvarado score. The positive and negative predictive values of Alvarado score were 96% and 23.9%, as compared to 99.2% and 61.3% for AIR score. The overall diagnostic accuracy of Alvarado score was 85%, as compared to 95% for AIR score. Conclusions: The newly proposed appendicitis inflammatory response score had displayed a better validity and reliability, as compared to Alvarado score


Introduction
Appendicitis even though one of the most commonly treated condition by surgical interventions, can still pose a diagnostic dilemma to the surgeon [1]. There are many studies in the past, which have reported various proportions of negative appendectomy rates [2].
The negative appendectomy rates have been reported to come down drastically with the introduction of ultrasonography initially and Computerized tomography (CT) later [3][4][5][6]. But in resource poor settings there is still heavy reliance on clinical judgment as availability and quality of ultrasonography is quite variable. Performing routine CT may not be advisable and feasible in these settings considering the availability, cost and risk of radiation [1,7].
The Alvarado score which was proposed in the year 1986 has been one of the most widely used and evaluated scoring system [8]. Various new scores have been proposed in recent times, which has claimed better validity and reliability [11,13,14].
Appendicitis inflammatory response (AIR) score is one such score, proposed by Anderson, M et al in 2008 [9] which has claimed much superior performance as compared to Alvarado score [3,[15][16][17].
The studies comparing the two scores are limited on Indian subjects, hence the current study is planned with an objective of comparing the validity and reliability of Alvarado score and AIR score in diagnosis of appendicitis in a tertiary care teaching hospital

Materials & Methods
Study design: The current study was a prospective observational study Study setting: The study was conducted in a tertiary acre teaching hospital in south India, Study duration: The data collection for the study was done between July 2015 to August 2016, for a 12 months period.

Study population:
The study population included all the subjects presenting to the emergency department, with symptoms suggestive of acute appendicitis and underwent appendectomy after necessary evaluation.

Inclusion & exclusion criteria:
The inclusion criteria of the study were people aged above 15 years, belonging to both the genders. Patients whose condition was critical and subjects with past history of appendectomy were excluded from the study.

Sample size and sampling method:
The study had included all the 297 eligible patients, who satisfied the inclusion criteria and were willing to provide informed written consent were included in the study, hence no sampling was done.
Ethical issues: The study was approved by institutional human ethics committee. Informed written consent was obtained from all the study participants, after explaining the risks and benefits involved in the study.
Confidentiality of the study participants was maintained throughout the study.
Study procedure: All the eligible subjects were evaluated by clinical examination, appropriate laboratory investigations and ultrasonography. For each patient Alvarado score [8] and AIR score [9] were calculated. Patients were categorized as high or low risk as per the suggested cut off values of the two risk scoring systems.
The association between the scores and the HPE findings was assessed by cross tabulation and chi square test.
Predictive validity of the Alvarado score and AIR score was assessed by ROC analysis. Area under the ROC curve along with it's 95% CI and P-value were presented.
The sensitivity, specificity, predictive values and diagnostic accuracy of both the risk stratifications cores against HPE findings (Gold standard) were calculated and compared. IBM SPSS statistical software version 22 was used fro statistical analysis [18].

International Journal of Surgery & Orthopedics
Available online at: www.surgicalreview.in 32 | P a g e

Results
A total of 297 subjects were included in the final analysis. Majority of the study subjects belonged to 21 to 0 years of age. were confirmed as appendicitis by HPE. (Table 1) There was a statistically significant association between the Alvarado score, AIR score categories and HPE diagnosis of appendicitis. (Table 2)

International Journal of Surgery & Orthopedics
Available online at: www.surgicalreview.in 33 | P a g e

Figure-1: ROC analysis to assess the predictive validity of Alvarado and AIR scores
The sensitivity of the AIR score was 95.7% as compared to 87.3% sensitivity of ALVARADO score. AIR score had s specify of 90.5%, as compared to 52.4% for Alvarado score. Correspondingly, both false positive (47.6% vs. 9.5%) and false negative (12.7% vs.4.3%) rates were higher for Alvarado score. The positive and negative predictive values of Alvarado score were 96% and 23.9%, as compared to 99.2% and 61.3% for AIR score. The overall diagnostic accuracy of Alvarado score was 85%, as compared to 95% for AIR score. (Table 4)

International Journal of Surgery & Orthopedics
Available online at: www.surgicalreview.in 34 | P a g e The reliability of the risk scores, as measured by kappa statistic was considerably higher for AIR score (0.706), compared to Alvarado score (0.256), which was statistically significant (P value < 0.001). (Table 5)

Discussion
Considering the non-availability of advanced investigations like CT, risk stratification scores are valuable tools in reducing diagnostic dilemma in acute appendicitis in resource poor settings [1]. But concern regarding poor validity and reliability and the resulting negative appendectomy rates, have prevented their widespread use in clinical practice [16].
With the advent of many new scoring systems, which have claimed superiority over existing scores, it is imperative to test this claim in different population subgroups before recommending their use in routine practice [10,11,[19][20][21][22]. The current study has compared the validity and reliability of newly introduced AIR score with Alvarado score. In the current study, the predictive validity of Alvarado score as assessed by area under the ROC curve was 0.74 (95% CI 0.62 to 0.85), as compared to 0.95 (95 % CI 0.92 to 0.98) for AIR score.
The reliability of the risk scores, as measured by kappa statistic was considerably higher for AIR score (0.706), compared to Alvarado score (0.256), which was statistically significant (P value < 0.001). De Castro, S.
M., et al. [15] have reported an AUC of 0.96 for AIR score and 0.82 for Alvarado score (p < 0.05). Macco, S., et al [12] have reported an area under the receiveroperating curve of 0.90 for AIR score and 0.87 for Alvarado score was 0.87.
Andersson, M. and R. E. Andersson [9], who, while proposing the AIR score have reported an ROC area of the 0.97 for advanced appendicitis and 0.93 for all appendicitis. Alvarado score had an ROC area of 0.92 and 0.88 respectively for advanced and all appendicitis.
Sensitivity of the AIR score was 95.7% as compared to 87.3% sensitivity of ALVARADO score. AIR score had s specify of 90.5%, as compared to 52.4% for Alvarado score. Correspondingly, both false positive (47.6% vs. 9.5%) and false negative (12.7% vs.4.3%) rates were higher for Alvarado score. The positive and negative predictive values of Alvarado score were 96% and 23.9%, as compared to 99.2% and 61.3% for AIR score. The overall diagnostic accuracy of Alvarado score was 85%, as compared to 95% for AIR score. In study by Macco, S., et al [12].
AIR has shown better specificity and positive predictive value than that of the Alvarado score. In study by Andersson, M. and R. E. Andersson [9] "Sixty-three percent of the patients were classified into the low-or high-probability group with an accuracy of 97.2%, leaving 37% for further investigation.
Seventy-three percent of the nonappendicitis patients, 67% of the advanced appendicitis, and 37% of all appendicitis patients were correctly classified into the low-and high-probability zone, respectively." De Castro, S. M., et al. [15] the AIR score was reported to outperform the Alvarado score in diagnosis of appendicitis in difficult patient groups like women, children, and the elderly. Kollar, D., et al [3] have reported substantially higher specificity (97 %) and positive predictive value (88 %) for AIR score. As compared to than the Alvarado score (76 and 65 %, respectively).

Conclusions
1. The newly proposed appendicitis inflammatory response score had displayed a better validity and reliability, as compared to Alvarado score. 2. Both negative appendectomy rates and missing cases of appendicitis will be reduced, if AIR score is used for treatment decisions, in place of Alvarado score