2007年5月5日星期六

Screening and Diagnostic Tests

INTRODUCTION
Technical assessment of screening tests (used for persons who are asymptomatic but who may have early disease or disease precursors) differs from the assessment of diagnostic tests (used for persons who have a specific indication of possible illness). Most aspects of the assessment of diagnostic tests also apply to the assessment of screening tests, but the differences can be important.

The first difference is that, with screening tests, the proportion of affected persons is likely to be small. Therefore, many or most positive results are false positive. This finding is not necessarily serious if the screening test procedure is included in a broader program that involves further study of each initially positive finding; evaluation should focus on the whole process rather than on the initial results. In contrast, with diagnostic tests, many patients have medical problems that require investigation; thus, more weight may be given to things such as diagnostic precision and accuracy, and less weight may be given to the acceptability of the test to patients.
A second difference is that, with screening tests, questions are likely to arise about how and how much long-term outcomes improve. Early detection of disease is helpful only if early intervention is helpful. Early intervention is sometimes helpful (eg, in hypertension), but testing for early asymptomatic glaucoma has been widely abandoned because early detection may not affect the outcome.
A third difference between screening tests and diagnostic tests is cost. A program to screen millions of people to identify a small percentage who have early disease or its precursors cannot justify use of the financial resources that may available to support diagnostic testing, especially when patients who have conditions that require accurate diagnosis and relief already exist.
In addition, the arrangement of the sequence of steps in the medical investigation can vary substantially. Also, procedures for recruiting and scheduling of subjects and methods of quality control, record keeping, and follow-up may differ. These differences may be used in the technical assessment of a test.

GLOSSARY OF TERMS
Sensitivity - Probability that a test or procedure result is positive when the disease is present; calculated as follows: number of true-positive results/(number of true-positive results + false-negative results)
Specificity - Probability that a test or procedure result is negative when disease is not present; calculated as follows: number of true-negative results/(number of true-negative results + false-positive results)
True-positive rate - Sensitivity (percentage)
False-positive rate - Probability that a test or procedure result is positive when the disease is not present; calculated as follows: 100% minus the specificity (percentage)
True-negative rate - Specificity (percentage)
False-negative rate - Probability that a test or procedure result is negative when the disease is present; calculated as follows: 100% minus the sensitivity (percentage)
Positive predictive value - Probability that the disease is present when the test or procedure result is positive; calculated as follows: number of true-positive results/(number of true-positive results + false-positive results)
Negative predictive value - Probability that the disease is not present when the test or procedure result is negative; calculated as follows: number of true-negative results/(number of true-negative results + false-negative results)

PURPOSES OF SCREENING AND DIAGNOSTIC TESTS
Screening
Laboratory tests for screening are used in people who are asymptomatic to classify their likelihood of having a particular disease. The screening procedure is not the only basis for the diagnosis of illness. Patients with positive test results are referred for subsequent testing or examination to provide the physician with more information to determine if they have the disease in question.
Numerous attempts have been made to establish clear guidelines for the selection of appropriate patients for testing in the early detection of disease. A disease should be serious to warrant large-scale screening for it, and treatment before symptoms develop or deteriorate should be of more benefit in reducing morbidity and mortality than treatment later. The estimated prevalence of preclinical disease should be high in the population being screened. Once these criteria have been met, the issue is examined from the standpoint of laboratory tests.
An acceptable test is one that is highly accurate, ie, results are positive for almost all individuals with the disease, and the physician can be confident that the patient is actually free of the disease when test results are negative. Specificity is important when one is screening for rare diseases because false-positive results are possible when the test is not specific. The basic tenets of decision analysis indicate that a particular intervention is undertaken when benefits outweigh costs. Therefore, the ideal screening test is inexpensive, easy to administer, and poses little risk and causes minimal discomfort for the patient. In addition, results of the screening test must be valid, reliable, and reproducible.

Diagnosis of disease
Diagnosis requires 2 essential steps. First, diagnostic hypotheses are established. The establishment of these hypotheses is followed by attempts to reduce the number of possible differential diagnoses by successively ruling out specific diseases. This process requires very sensitive tests. With such tests, negative results permit the physician to exclude a disease with confidence.
Second, a strong clinical suspicion is pursued. This process requires very specific tests. With such tests, abnormal findings should essentially confirm the presence of the disease. Also, the test should accurately reflect the physician's estimate of the likelihood of disease, which is based on assessment of the available clinical information. Use of a test to exclude or confirm a diagnosis should indicate that the physician's best estimate, made after careful evaluation of the patient's condition, is that the diagnosis in question is either unlikely or probable.

CHARACTERISTICS OF DIAGNOSTIC TESTS AND PROCEDURES
Tests or procedures are performed when the information from review of findings from the history, physical examination, or previous testing is considered inadequate to address the question at hand. Intelligent use of new information collected requires the physician to be aware of uncertainties associated with the test used.
Every laboratory test or diagnostic procedure has a set of characteristics that reflect the information that clinicians expect in patients with and in those without a given disease. These test characteristics lead to the following fundamental questions:

If the disease is present, what is the probability that the test result will be positive?

If the disease is absent, what is the probability that the test result will be negative?

Sensitivity and specificity are the 2 measures of validity of a test and can be displayed with a simple binary 2-by-2 table, as shown in Table 1.
Sensitivity is determined by identifying the proportion of patients with disease in whom the test result is positive, as follows: a/(a + c]), where a is the number of true-positive results, and c is the number of false-positive results. As the sensitivity of a test increases, the number of persons with disease who have incorrect negative (ie, false-negative) results decreases.
Similarly, the specificity of a test is determined by identifying the proportion of patients without disease in whom the test result is negative, as follows: d/(b + d), where b is the number of true-negative results, and d is the number of false-positive results. A highly specific test rarely yields positive results in the absence of disease, and therefore, only a small proportion of persons without disease have incorrect positive (ie, false-positive) test positive results
The ideal screening test is both highly sensitive and highly specific. Usually, the achievement of both is not possible, and a trade-off must be made between the sensitivity and specificity with a given test. With many clinical tests, some people have clearly normal results, some have clearly abnormal results, and some have intermediate results. In these situations, the cutoff point between normal and abnormal findings is arbitrary. Therefore, the result of any screening test can cause a case of disease to be missed (a sensitivity issue), or it can cause false-positive results in individuals without the disease (a specificity issue).
Altering the criteria for positive, or abnormal, findings influences the sensitivity and specificity of the test. The establishment of these criteria involves weighing the consequences of not detecting disease (false-negative cases) against the consequences of erroneously diagnosing disease in healthy persons (false false-positive cases). Sensitivity may be increased at the expense of specificity when the penalty associated with missing a case is high, such as when the disease is serious and definitive treatment exists. On the other hand, specificity should be increased relative to sensitivity when the cost or risks associated with further diagnostic evaluations or mislabeling are substantial.
The operating characteristics of a test or procedure cannot, per se, be used to determine the presence or absence of disease, unless the test result is always positive when disease is present (ie, 100% sensitivity) or always negative when the disease is absent (ie, 100% specificity). Few tests, if any, have these characteristics. The likelihood that the disease is present with a positive result, or the likelihood of its absence with a negative result, must be assessed and factored with the clinician's pretest estimate of the probability that the patient has the disease (ie, prior probability).
Since few tests are both highly sensitive and highly specific, 2 or more tests are often used to evaluate a possible diagnosis. If the result of one test is positive, the combined sensitivity is higher than that of the more sensitive test, but the specificity is lower. Conversely, when the criteria for a positive test are that both exams be positive, the combined specificity is higher than the more specific of the two, but the sensitivity is lower. Therefore, multiple tests are most useful when all results are within the normal range (a finding that tends to exclude the disease) and when all results are abnormal (a finding that tends to confirm the disease). Multiple tests are least helpful when the results of one are positive and the results of the other are negative.


PREDICTIVE VALUE OF DIAGNOSTIC TESTS AND PROCEDURES
Knowledge of test characteristics alone does not permit accurate interpretation of test results. Test characteristics reveal only what proportion of patients with and what proportion without the disease in question have positive and negative results, respectively. Since the objective is to determine the presence or absence of the disease, the physician must address the following questions:

Given a positive test result, what is the probability that the disease is present?

Given a negative test result, what is the probability that the disease is not present?

The former probability reflects the predictive value of a positive result (ie, positive predictive value), and the latter reflects the predictive value of a negative result (ie, negative predictive value).
The estimation of post-test probabilities requires integration of the knowledge of test characteristics with the clinician's estimate of the likelihood of disease before the test is ordered (ie, the pretest probability or, with screening, the prevalence of disease). By referring to the binary table, the positive predictive value is determined by identifying the probability that the patient with a positive test result actually has the disease as follows: a/(a + b). Similarly, the negative predictive value is determined by identifying the probability that an individual with a negative test result is truly disease-free as follows: d/(c + d).
The more sensitive a test, the smaller the likelihood that the individual with a negative test has the disease; thus, the negative predictive value increases. The more specific the test, the higher the likelihood that an individual with a positive test is free from disease and the greater the positive predictive value. For rare diseases, however, the major determinant of the predictive value of the test is the prevalence of the preclinical disease in the population tested. No matter how specific the test is, if the population is at low risk of having the disease, positive results are likely to be false positive.
Compared with the previous, the Bayes theorem is a more complex model used to quantify the influence of prevalence and/or pretest probability on predictive values.

Alternative expressions of the positive predictive value:
Likelihood of a true-positive results/(likelihood of a true-positive results + likelihood of a false-positive result)

(prevalence X sensitivity)/[(prevalence X sensitivity) + (1 – prevalence) X (1 - specificity)]



Alternative expressions of the negative predictive value:
Likelihood of a true-negative result/(likelihood of a true-negative + likelihood of a false-negative result)

(1 - Prevalence) X specificity/{[(1 - prevalence) X specificity] + [prevalence X (1 - sensitivity)]}



Many authors suggest that the Bayes formula is cumbersome and unnecessary, because it simply extrapolates information gleaned from horizontal assessment of data in the binary table. Furthermore, in many cases, data used to determine the likelihood of disease before testing are only estimates. The effect of prevalence and/or pretest probability on the positive predictive values of a test with given sensitivity and specificity is illustrated in Table 2. When the prevalence of preclinical disease is low, the predictive value is low, even for a test with high sensitivity and specificity. Thus, for rare diseases or cases in which the probability of disease is low, a large proportion of those with positive screening test results are inevitably found, at further testing, not to have the disease.


HINTS FOR EVALUATING A STUDY ABOUT DIAGNOSTIC TESTS
Eight elements are involved in the proper clinical evaluation of a diagnostic test. These elements constitute guides for the clinical reader who evaluates a study of a diagnostic test. The following questions summarize these elements.

Was an independent blinded comparison performed with a criterion standard for diagnosis?

Did the patient sample include individuals with an appropriate spectrum of mild and severe disease, treated and untreated, and individuals with disorders commonly mistaken for the one in question?

Was the setting and patient inclusion criteria for the study adequately described?

Was the reproducibility of the test results (precision) and of the interpretation of those results (observer variance) determined?

Was the term "normal" defined sensibly?

If the test is advocated for use as part of a cluster or sequence of tests, was its contribution to the overall validity of the cluster or sequence determined?

Was the performance the test described in sufficient detail to permit exact replication?

Was the utility of the test determined?

SUMMARY
Confirming the presence of a disease requires a test with high specificity. When 2 or more tests are available, the one with the highest specificity is ordinarily preferred. When a test is used for screening or excluding a diagnostic possibility, it must be sensitive. When 2 or more such tests are available, the one with the highest sensitivity is ordinarily preferred.
The use of more than one test is most helpful when the results are normal and allows the clinician to safely exclude the disease. When all test results are abnormal, they tend to confirm disease. Multiple tests are least helpful when the results of one are positive and the results of the others are normal. If 2 or more highly sensitive tests are performed to exclude disease, the gain in sensitivity obtained by ordering more than one (if the results are marginal) may be offset by the increase in the number of false-positive results.
No tests are perfect. Usually, the results for patients with and those without a specific disease overlap. Each point along the overlapping distribution of results defines a set of operating characteristics for the test. As the point used to define an abnormal result (ie, the cutoff point) is moved in the direction of patients with disease, specificity increases but sensitivity decreases. As it is moved toward patients without disease, the reverse is true.
Finally, the result of a test or procedure cannot be interpreted properly without considering the estimated likelihood of disease before the results are obtained. When the pretest likelihood of disease is high, a positive result tends to confirm the diagnosis, but an unexpected negative result is not helpful in ruling out disease. When the pretest likelihood of disease is low, a normal result tends to exclude the diagnosis, but an unexpected positive result is not helpful in confirming disease.


TABLES

Table 1. Results of Screening and/or Diagnostic Testing*
Result
Disease Present
Disease Absent
Total

Positive
a
b
a + b
Negative
c
d
c + d

Total
a + c
b + d
a + c + b + d
* Variables are defined as follows: a = true-positive results, b = false-positive results, c = false-negative results, and d = true-negative results. Sensitivity is defined as a/(a + c), while specificity is defined as d/(b + d). The positive predictive value is defined as a/(a + b), and the negative predictive value is defined as d/(c + d).

Table 2. Effect of Prevalence on the Positive Predictive Value, with 90% Sensitivity and 95% Specificity
Prevalence, %
Positive Predictive Value, %
0.1
1.80
1.0
15.4
5.0
48.6
50.0
94.7

没有评论: