Note using the examples in Tables 2 and 4 that the non-reference standard did not properly rank the 220 subjects. The unre referenced standard judged 44 positive subjects and 176 negatives (in Table 4). In Table 2, in truth, 51 subjects have the condition of interest, and 169 have no. As the non-reference standard is sometimes incorrect, you cannot calculate from Table 4 unbiased estimates of sensitivity and specificity; Instead, you can calculate the agreement. Gart, J.J., Buck, A.A. (1966). Comparison of a screening test and a baseline test in epidemiological studies. II: A probabilistic model for comparing diagnostic tests. American Journal of Epidemiology83, 593-602. When a new test is evaluated against a non-reference standard, discrepancies (inconsistencies) between the two methods may occur due to errors in the test method or errors in the non-reference standard. As the non-reference standard may be incorrect, sensitivity and specificity calculations based on the non-reference standard are statistically flawed.

A practice called discrepant resolution has been proposed to circumvent the problem of bias. Sensitivity and specificity estimates (and other estimates of diagnostic performance) may be subject to distortion. Biased estimates are consistently too high or too low. Biased sensitivity and specificity estimates do not, on average, correspond to true sensitivity and specificity. Often, the existence, size (size) and direction of distortion cannot be determined. Bias creates inaccurate estimates. Kalantri et al. considered the accuracy and reliability of the pallor as a tool for detecting anemia. [5] They concluded that “clinical evaluation of pallor in cases of severe anaemia may exclude and govern modestly.” However, the inter-observer agreement for pallor detection was very poor (Kappa values -0.07 for conjunctiva pallor and 0.20 for tongue pallor), meaning that pallor is an unreliable sign of diagnosis of anemia.

In Table 3, you can calculate different statistical indicators of match. A discussion by Mr.M Shoukri on different types of contractual measures appears in the biostatistics encyclopedia (1998). Two frequently used measures are the total percentage agreement and (Cohens) Kappa.

