Non-Transitive Bradley Terry Models

One thing that has always bugged me about Bradley Terry models is that it doesn’t model rock-paper-scissors.

To recap the Bradley Terry model posits that each entity has a strength \(\gamma_{i}\) and when competing the probability of victory for entity a over entity b is

\[exp(\gamma_{a}) / exp(\gamma_{a}) + exp(\gamma_{b})\]

or to make it clearer logit(γa - γb).

There’s a couple different approaches in the literature. This one augments each entity with a 2d vector vi and defines the probability as

logit(norm(v_b - v_a) - norm(v_a - v_b) + γa - γb)

This vector difference then provides the intransitivity.

I want to do a little better: in particular I’m not a huge fan of this difficult to interpret this difference of norms. You also don’t get a lot of pooling: since you have to fit this vector for each entity you won’t do well with sparse observations.

So what can we do instead? There is a natural clustering suggested by this rock-paper-scissors example so let’s use it. We augment each entity with a latent cluster Zi and a matrix Ac1, c2 of the advantage an entity of class c1 would have over an entity of class c2. Then the probability becomes

logit(AZa, Zb + γa - γb)

This is perhaps equivalent to this paper although I can’t quite tell as it’s confusingly written. I think they additionally put a DP on the γ terms which doesn’t make a ton of sense to me although they justify it by pointing that with match data certain entities are just statistically indistinguishable.]

So let’s now try it on a simulated rock-paper-scissors-like data. First let’s see what a standard Bradley Terry Model performs.

# TOOD: Plot a heatmap of win probabilities for rock-paper-scissors

For a “real” example we can use the Weedle’s cave data set from Kaggle which captures battle outcomes between different pokemon. Now those of you born in the 90s will know¹ that Pokemon have types and these types exhibit rock-paper-scissors like advantages: electric is super effective against water which is super effective against fire and so on.

Data set: https://www.kaggle.com/datasets/terminus7/pokemon-challenge/data

using AlgebraOfGraphics, CairoMakie
using Turing, Turing.RandomMeasures
using DataFrames
using CSV
using MCMCChains

struct MatchData
    lo_ids::Vector{Int}
    hi_ids::Vector{Int}
    lo_wins::Vector{Int}
    matches::Vector{Int}
    n_entities::Int
    n_pairs::Int
end

function simulated_data()
    strengths = [1,1,1,2,2,2,3,3,3]
    class = [1,2,3,1,2,3,1,2,3]

    class_strengths = zeros(3, 3)
    class_strengths[1,2] = 3
    class_strengths[2,1] = -3
    class_strengths[1,3] = -3
    class_strengths[3,1] = 3
    class_strengths[2,3] = 3
    class_strengths[3,2] = -3


    lo_ids = Int[]
    hi_ids = Int[]
    lo_wins = Int[]
    matches = Int[]


    for lo in 1:9, hi in (lo+1):9
        push!(lo_ids, lo)
        push!(hi_ids, hi)

        logit_p = strengths[lo] - strengths[hi] + class_strengths[class[lo], class[hi]]
        p = 1 / (1 + exp(-logit_p))
        push!(matches, 20)
        push!(lo_wins, rand(Binomial(20, p)))
    end

    return MatchData(lo_ids, hi_ids, lo_wins, matches, 9, 9*4)
end

function prep_match_data(df)
    pairs = Dict{Tuple{Int, Int}, Tuple{Int, Int}}()
    for row in eachrow(df)
        lo, hi = minmax(row["First_pokemon"], row["Second_pokemon"])
        lo_wins = lo == row["Winner"]
        if haskey(pairs, (lo, hi))
            pairs[lo, hi] = pairs[lo, hi] .+ (lo_wins, 1)
        else
            pairs[lo, hi] = (lo_wins, 1)
        end
    end

    lo_ids = zeros(Int, length(pairs))
    hi_ids = zeros(Int, length(pairs))
    lo_wins = zeros(Int, length(pairs))
    matches = zeros(Int, length(pairs))

    for (ii, (key, val)) in enumerate(pairs)
        lo_ids[ii] = key[1]
        hi_ids[ii] = key[2]
        lo_wins[ii] = val[1]
        matches[ii] = val[2]
    end

    n_pairs = length(pairs)
    n_entities = maximum(hi_ids)

    return MatchData(lo_ids, hi_ids, lo_wins, matches, n_entities, n_pairs)
end

@model function BradleyTerryDP(match_data)
    # Dirichlet Process for infinite mixture of clusters
    α = 1.0  # Concentration parameter
    rpm = DirichletProcess(α)
    H = Normal(0, 1)  # Base distribution for cluster effects

    n_entities = match_data.n_entities

    # Individual strengths
    # β = tzeros(n_entities)
    β ~ filldist(Normal(0, 1), n_entities)

    # Cluster assignments and strengths
    cluster = tzeros(Int, n_entities)
    max_clusters = 30
    cluster_effects = tzeros(max_clusters, max_clusters)

    for ii in 1:n_entities
        K = maximum(cluster)
        nk = Vector{Int}(map(k -> sum(cluster .== k), 1:K))
        cluster[ii] ~ ChineseRestaurantProcess(rpm, nk)

        if cluster[ii] > K
            cluster_effects[K + 1, K + 1] = 0.0
            for k in 1:K
                cluster_effects[k, K + 1] ~ Normal(0, 1)
                cluster_effects[K + 1, k] = -cluster_effects[k, K + 1] # enforce antisymmetry
            end
        end
    end

    # Likelihood
    for ii in 1:match_data.n_pairs
        lo = match_data.lo_ids[ii]
        hi = match_data.hi_ids[ii]
        logit_p = β[lo] - β[hi] + cluster_effects[cluster[lo], cluster[hi]]
        p_lo_win = 1.0 / (1.0 + exp(-logit_p))
        match_data.lo_wins[ii] ~ Binomial(match_data.matches[ii], p_lo_win)
    end
end


sim_data = simulated_data()
model = BradleyTerryDP(sim_data)

#df = DataFrame(CSV.File("assets/non-transitive-bradley-terry/combats.csv"))
#match_data = prep_match_data(df)
#model = BradleyTerryDP(match_data)


# TODO: figure out what these paramters mean
#chain = sample(model, Gibbs(PG(20, :cluster), NUTS(500, 0.65, :cluster_effects), NUTS(500, 0.65, :β)), n_iterations)

n_iterations = 1000
chain = sample(model, Gibbs(:cluster => PG(100),
                            :cluster_effects => HMC(0.01, 4),
                            :β => HMC(0.01, 4)), n_iterations)

# char_df = DataFrame(CSV.File("assets/non-transitive-bradley-terry/pokemon.csv"))
# cluster_type = DataFrame(:cluster => chain[44, namesingroup(chain, :cluster), :].value.data[:],
#                          :type => char_df[!, "Type 1"])

# cluster_counts = combine(groupby(cluster_type, [:cluster, :type]), nrow => :count)

# plt = data(cluster_counts) * mapping(:cluster => nonnumeric, :type => nonnumeric, :count) * visual(Heatmap)
# draw(plt)

sadly I know this only second-hand as my parents thought books were better than gameboys: look at the ruin that brought me ↩︎