Sensitivity, Specificity, Likelihood Ratios…..what do these terms mean!

My plan for upcoming posts is to begin to scrutinise the “orthopaedic” tests that we perform in clinical practice. When we really begin to look at the “usefulness” of many of our tests it becomes apparent that many of them have limitations.

To begin with I first think we need to understand is what terms such as Sensitivity, Specificity and Likelihood ratios mean. These terms are commonly referred to when reporting on clinical tests. So what do these terms mean?

Leibold et al (2008) summarise these terms well:

SnNOUT: With highly Sensitive tests, a Negative result will rule a disorder OUT.

SpPIN: With highly Specific tests, a Positive result will rule a disorder IN.

A good (useful) test is obviously sensitive and specific. The closer to 100% sensitivity and specificity the better.

But what is an acceptable percentage? Keep reading for some opinions.

There are also other values such as Likelihood Ratios (LR). These can be positive (LR+) or negative (LR-).

Cook and Hegedus (2011) explain LR’s:

LR+ identifies the strength of a test in determining the presence of a finding. A higher LR+ reflects a stronger ability to detect the condition when the test is positive.

LR- identifies how much the odds of the diagnosis/disorder decrease when a test is negative. The lower the value the better the ability of the test to determine the chance the disease is actually present in the event the finding is negative.

What are “good” values for the above measures? Cook and Hegedus (2011) report:

For screening, triage, or ruling out disorders, a cut-off sensitivity of 90 and a negative likelihood ratio of less than 0.20 were considered as a necessity. Although there is no definitive cut-off score for screening or triage that is suggested for sensitivity, authors have advocated a sensitivity of 90 or greater for general musculoskeletal conditions (and higher numbers for more sinister conditions) (Cook and Hegedus, 2008).

For diagnosis, tests and measures with positive likelihood ratios of 5.0 or greater were considered useful (Jaeschke et al., 1989).

What I really like from the above paragraph is the use of the terms “screening” and “diagnosis”.

Some tests are sensitive and have a low LR- only (they lack specificity and/or LR+) and hence are only useful as “screening” tests.
Other tests are specific or have high LR+ that are useful for “diagnosis”.
Some tests lack both and hence are ineffective for “screening or diagnostics”.

Reiman et al (2012) offer the following:

A LR+ identifies the strength of a test in determining the presence of a finding, and is cal- culated by the formula: SN/(1-SP).

A LR- is the ratio of a negative test result in people with the pathology to a negative test result in people without the pathology, and is calculated by the formula: (1-SN)/SP.

The higher the LR+ and lower the LR- the greater the post-test probability is altered.Post-test probability can be altered to a minimal degree (LR+’s of 1 to 2, or LR-‘s of .5 to 1), to a small degree (LR+’s of 2 to 5 and LR-‘s of .2 to .5), to a moderated degree (LR+’s of 5 to 10, LR-‘s of .1 to .2) and to a significant and almost conclusive degree (LR+’s greater than 10, LR-‘s less than 0.1).

That being a test with a LR+ of greater than 10 and a LR- of less than 0.1 can be almost conclusive in its diagnostic and screening ability.

As mentioned in the initial paragraph of this post I plan to review, based on the literature, many of our “tests” with regards to their usefulness as either screening tools, diagnostic tools, both or neither!

I hope the above helps in being able to understand some of these commonly encountered terms in clinical practice.

References:

Cook C, Hegedus EJ. Orthopedic physical examination tests: an evidence based approach. Upper Saddle River, NJ: Prentice Hall; 2008.

Cook C, Hegedus E. Diagnostic utility of clinical tests for spinal dysfunction. Man Ther. 2011 Feb;16(1):21-5. doi: 10.1016/j.math.2010.07.004. Epub 2010 Aug 3.

Jaeschke R, Singer J, Guyatt GH. Measurement of health status: ascertaining the minimal clinically important difference. Control Clinical Trials 1989;10:407e15.

Leibold MR, Huijbregts PA, Jensen R. Concurrent criterion-related validity of physical examination tests for hip labral lesions: a systematic review. J Man Manip Ther. 2008;16(2):E24-41.

Reiman MP, Goode AP, Hegedus EJ, Cook CE, Wright AA. Diagnostic accuracy of clinical tests of the hip: a systematic review with meta-analysis. Br J Sports Med. 2012 Nov 7. [Epub ahead of print].