How to Calculate F1 Score in Machine Learning: A Guide for Testers

Machine learning models are evaluated using various metrics, and one of the most important is the F1 Score. Whether you are an AI tester, a software tester transitioning into machine learning, or an automation testing professional, understanding the F1 Score is crucial. This metric helps assess model performance, especially when dealing with imbalanced datasets. In this blog post, we will explain what the F1 Score is, how to calculate it, and why it is essential for machine learning testing.

What is the F1 Score in Machine Learning?

I have created a detailed tutorial video with live Python code to calculate the F1 score and to understand the parameters on our youtube channel, so please check that out. Also, please subscribe to the channel if not subscribed.

The F1 Score is a performance metric that balances precision and recall. It is particularly useful in AI testing when working with datasets where one class is more frequent than another (imbalanced data). The formula for the F1 Score is:

F1 = 2 x (Precision x Recall)/(Precision + Recall)

Where:

Precision (Positive Predictive Value) = TP / (TP + FP)
Recall (Sensitivity) = TP / (TP + FN)
TP (True Positives): Correctly predicted positive instances
FP (False Positives): Incorrectly predicted positive instances
FN (False Negatives): Incorrectly predicted negative instances

Why is the F1 Score Important for Machine Learning Testing?

The F1 Score is crucial for testers working in AI testing and automation testing because it provides a single metric that considers both false positives and false negatives. It is particularly useful when:

The dataset is imbalanced (e.g., fraud detection, medical diagnosis).
Both precision and recall are important, and you need a trade-off.
You want a more comprehensive evaluation beyond accuracy alone.

Step-by-Step Guide to Calculating F1 Score

Let’s go through an example of how to compute the F1 Score in machine learning testing.

Step 1: Compute Precision and Recall

Suppose you have a binary classification model that makes predictions as follows:

True Positives (TP) = 50
False Positives (FP) = 10
False Negatives (FN) = 20

Calculate Precision:

Precision = TP/(TP+FP) = 50 /(50+10) = 0.83333

Calculate Recall:

Recall = TP/(TP+FN)=50/(50+20) = 0.7143

Step 2: Compute the F1 Score

Now, applying the F1 Score formula:

F1 = 2x(0.83333 * 0.7143)/(0.83333 + 0.7143)

Thus, the F1 Score = 0.7692 (76.92%).

How to Calculate F1 Score in Python

If you are working with Python, you can easily compute the F1 Score using the scikit-learn library. Here’s a simple example:

from sklearn.metrics import f1_score

# True labels and predicted labels
y_true = [1, 1, 0, 1, 0, 1, 0, 0, 1, 1]
y_pred = [1, 1, 0, 0, 0, 1, 0, 1, 1, 1]

# Calculate F1 Score
f1 = f1_score(y_true, y_pred)
print("F1 Score:", f1)

This method is widely used in automation testing for AI models to validate classification performance.

Conclusion

The F1 Score is an essential metric for evaluating machine learning models, particularly when dealing with imbalanced datasets. As a software tester, AI tester, or automation testing professional, understanding and applying the F1 Score will help you assess your models effectively. Whether you compute it manually or using Python, this metric provides deeper insights into a model’s performance beyond simple accuracy.

By integrating F1 Score analysis into your machine learning testing workflow, you can ensure better-quality AI systems and more reliable models.