Thursday, May 15, 2014

How good is your classification model?

One way is accuracy, ie % of correctly classified observations
Downside:
Say, your test is to id whether a patient suffers a rare disease that is found in 1% of the population.
If your test is 99% accurate, is your test doing well?
Maybe,  maybe not.
If my test just simply classify all patients as negative, I'll achieve 99% accuracy, but can hardly say my test is good.

To get around this, we can use precision and recall.

From wiki

Precision is the probability that a (randomly selected) retrieved document is relevant.

Recall is the probability that a (randomly selected) relevant document is retrieved in a search.

F1 score incorporates the number
= 2*prec*rec/(prec+rec)

Example
When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3.

No comments:

Post a Comment