Training and evaluating classifiers

python

Author

Published

March 21, 2024

Modified

April 8, 2024

There are some support functions in the read_data.py module

python

from pathlib import Path
from read_data import read_baby_names, \
    test_dev_train_split
import nltk
from nltk import NaiveBayesClassifier
import numpy as np

python

baby_names = read_baby_names("data/baby_2017.csv")

python

def make_name_features(
        data_line: list[str, str]
    ) -> list[tuple[dict, str]]:
    gender = data_line[0]
    name = data_line[1]

    features = {
        "first_letter": name[0],
        "second_letter": name[1],
        "last_letter": name[-1]
    }

    return (features, gender)

python

all_name_features = [
    make_name_features(line) 
    for line in baby_names
    ]
train_name, dev_name, test_name = test_dev_train_split(all_name_features)

Before we train a whole Naive Bayes classifier, let’s think about how we might do with a much simpler classifier.

python

def joes_v_good_classifier(features: dict) -> str:
    return "F"

Let’s score its accuracy on the dev set.

python

def classifier_metric(classifier, data):
    guess = classifier(data[0])
    answer = data[1]

    if guess == answer:
        return 1
    return 0

python

joe_guesses = np.array([
    classifier_metric(joes_v_good_classifier, data)
    for data in dev_name
])

python

joe_guesses.mean()

0.566676932553126

By just always guessing "F", I did better than chance!

Moving beyond accuracy

Recall: Of all of the names that were "F", how many did the classifier label "F"?
Precision: Of all of the names labelled "F", how many were "F"?

python

joe_recall = np.array([
    classifier_metric(joes_v_good_classifier, data)
    for data in dev_name
    if data[1] == "F"
])

python

joe_recall_est = joe_recall.mean()
joe_recall_est

1.0

python

joe_precision = np.array([
    classifier_metric(joes_v_good_classifier, data)
    for data in dev_name
    if joes_v_good_classifier(data[0]) == "F"
])

python

joe_precision_est = joe_precision.mean()
joe_precision_est

0.566676932553126

These two measure are often combined into a single score called the “F Measure”.

\[ F = 2\frac{pr}{p+r} \]

python

joe_f = 2 * ((joe_recall_est * joe_precision_est)/(joe_recall_est + joe_precision_est))

python

joe_f

0.7234126204049538

A high-precision, low-recall classifier

python

def a_classifier(features: dict) -> str:
    if features["last_letter"] == "a":
        return "F"
    
    return "M"

python

a_recall = np.array([
    classifier_metric(a_classifier, data)
    for data in dev_name
    if data[1] == "F"
])

python

a_recall_est = a_recall.mean()
a_recall_est

0.3391304347826087

python

a_precision = np.array([
    classifier_metric(a_classifier, data)
    for data in dev_name
    if a_classifier(data[0]) == "F"
])

python

a_precision_est = a_precision.mean()
a_precision_est

0.9292628443782577

The way this classifier worked, it was really reluctant to label a name "F", but when it did, it was mostly right. For this data set, this results in a way worse F measure than just labelling every single name "F".

python

a_f = 2 * ((a_recall_est * a_precision_est)/(a_recall_est + a_precision_est))
a_f

0.49691419470435993

Training and evaluating the Naive Bayes classifier

python

nb_classifier = NaiveBayesClassifier.train(train_name)

python

nb_recall = np.array([
    classifier_metric(nb_classifier.classify, data)
    for data in dev_name
    if data[1] == "F"
])

python

nb_recall_est = nb_recall.mean()
nb_recall_est

0.7902173913043479

python

nb_precision = np.array([
    classifier_metric(nb_classifier.classify, data)
    for data in dev_name
    if nb_classifier.classify(data[0]) == "F"
])

python

nb_precision_est = nb_precision.mean()
nb_precision_est

0.7859459459459459

python

nb_f = 2 * ((nb_precision_est * nb_recall_est)/(nb_precision_est + nb_recall_est))
nb_f

0.7880758807588076

Reuse

CC-BY-SA 4.0

Citation

BibTeX citation:

@online{fruehwald2024,
  author = {Fruehwald, Josef},
  title = {Training and Evaluating Classifiers},
  date = {2024-03-21},
  url = {https://lin511-2024.github.io/notes/programming/06_classifier.html},
  langid = {en}
}

For attribution, please cite this work as:

Fruehwald, Josef. 2024. “Training and Evaluating Classifiers.” March 21, 2024. https://lin511-2024.github.io/notes/programming/06_classifier.html.