Training and evaluating classifiers

python
Author
Published

March 21, 2024

Modified

April 8, 2024

There are some support functions in the read_data.py module

python
from pathlib import Path
from read_data import read_baby_names, \
    test_dev_train_split
import nltk
from nltk import NaiveBayesClassifier
import numpy as np
python
baby_names = read_baby_names("data/baby_2017.csv")
python
def make_name_features(
        data_line: list[str, str]
    ) -> list[tuple[dict, str]]:
    gender = data_line[0]
    name = data_line[1]

    features = {
        "first_letter": name[0],
        "second_letter": name[1],
        "last_letter": name[-1]
    }

    return (features, gender)
python
all_name_features = [
    make_name_features(line) 
    for line in baby_names
    ]
train_name, dev_name, test_name = test_dev_train_split(all_name_features)

Before we train a whole Naive Bayes classifier, let’s think about how we might do with a much simpler classifier.

python
def joes_v_good_classifier(features: dict) -> str:
    return "F"

Let’s score its accuracy on the dev set.

python
def classifier_metric(classifier, data):
    guess = classifier(data[0])
    answer = data[1]

    if guess == answer:
        return 1
    return 0
python
joe_guesses = np.array([
    classifier_metric(joes_v_good_classifier, data)
    for data in dev_name
])
python
joe_guesses.mean()
0.566676932553126

By just always guessing "F", I did better than chance!

Moving beyond accuracy

  • Recall: Of all of the names that were "F", how many did the classifier label "F"?
  • Precision: Of all of the names labelled "F", how many were "F"?
python
joe_recall = np.array([
    classifier_metric(joes_v_good_classifier, data)
    for data in dev_name
    if data[1] == "F"
])
python
joe_recall_est = joe_recall.mean()
joe_recall_est
1.0
python
joe_precision = np.array([
    classifier_metric(joes_v_good_classifier, data)
    for data in dev_name
    if joes_v_good_classifier(data[0]) == "F"
])
python
joe_precision_est = joe_precision.mean()
joe_precision_est
0.566676932553126

These two measure are often combined into a single score called the “F Measure”.

\[ F = 2\frac{pr}{p+r} \]

python
joe_f = 2 * ((joe_recall_est * joe_precision_est)/(joe_recall_est + joe_precision_est))
python
joe_f
0.7234126204049538

A high-precision, low-recall classifier

python
def a_classifier(features: dict) -> str:
    if features["last_letter"] == "a":
        return "F"
    
    return "M"
python
a_recall = np.array([
    classifier_metric(a_classifier, data)
    for data in dev_name
    if data[1] == "F"
])
python
a_recall_est = a_recall.mean()
a_recall_est
0.3391304347826087
python
a_precision = np.array([
    classifier_metric(a_classifier, data)
    for data in dev_name
    if a_classifier(data[0]) == "F"
])
python
a_precision_est = a_precision.mean()
a_precision_est
0.9292628443782577

The way this classifier worked, it was really reluctant to label a name "F", but when it did, it was mostly right. For this data set, this results in a way worse F measure than just labelling every single name "F".

python
a_f = 2 * ((a_recall_est * a_precision_est)/(a_recall_est + a_precision_est))
a_f
0.49691419470435993

Training and evaluating the Naive Bayes classifier

python
nb_classifier = NaiveBayesClassifier.train(train_name)
python
nb_recall = np.array([
    classifier_metric(nb_classifier.classify, data)
    for data in dev_name
    if data[1] == "F"
])
python
nb_recall_est = nb_recall.mean()
nb_recall_est
0.7902173913043479
python
nb_precision = np.array([
    classifier_metric(nb_classifier.classify, data)
    for data in dev_name
    if nb_classifier.classify(data[0]) == "F"
])
python
nb_precision_est = nb_precision.mean()
nb_precision_est
0.7859459459459459
python
nb_f = 2 * ((nb_precision_est * nb_recall_est)/(nb_precision_est + nb_recall_est))
nb_f
0.7880758807588076
Back to top

Reuse

CC-BY-SA 4.0

Citation

BibTeX citation:
@online{fruehwald2024,
  author = {Fruehwald, Josef},
  title = {Training and Evaluating Classifiers},
  date = {2024-03-21},
  url = {https://lin511-2024.github.io/notes/programming/06_classifier.html},
  langid = {en}
}
For attribution, please cite this work as:
Fruehwald, Josef. 2024. “Training and Evaluating Classifiers.” March 21, 2024. https://lin511-2024.github.io/notes/programming/06_classifier.html.