Peter Christen, David J. Hand, Nishadi Kirielle

A Review of the F-Measure: Its History, Properties, Criticism, and Alternatives

  • General Computer Science
  • Theoretical Computer Science

Methods to classify objects into two or more classes are at the core of various disciplines. When a set of objects with their true classes is available, a supervised classifier can be trained and employed to decide if, for example, a new patient has cancer or not. The choice of performance measure is critical in deciding which supervised method to use in any particular classification problem. Different measures can lead to very different choices, so the measure should match the objectives. Many performance measures have been developed, and one of them is the F-measure, the harmonic mean of precision and recall. Originally proposed in information retrieval, the F-measure has gained increasing interest in the context of classification. However, the rationale underlying this measure appears weak, and unlike other measures, it does not have a representational meaning. The use of the harmonic mean also has little theoretical justification. The F-measure also stresses one class, which seems inappropriate for general classification problems. We provide a history of the F-measure and its use in computational disciplines, describe its properties, and discuss criticism about the F-Measure. We conclude with alternatives to the F-measure, and recommendations of how to use it effectively.

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

  • Web-based, modern reference management
  • Collaborate and share with fellow researchers
  • Integration with Overleaf
  • Comprehensive BibTeX/BibLaTeX support
  • Save articles and websites directly from your browser
  • Search for new articles from a database of tens of millions of references
Try out CiteDrive

More from our Archive