One thing that has always bugged me about Bradley Terry models is that it doesn’t model rock-paper-scissors.

To recap the Bradley Terry model posits that each entity has a strength \(\gamma_{i}\) and when competing the probability of victory for entity a over entity b is

\[exp(\gamma_{a}) / exp(\gamma_{a}) + exp(\gamma_{b})\]

or to make it clearer logit(γa - γb).

There’s a couple different approaches in the literature. This one augments each entity with a 2d vector vi and defines the probability as

logit(norm(v_b - v_a) - norm(v_a - v_b) + γa - γb)

This vector difference then provides the intransitivity.

I want to do a little better: in particular I’m not a huge fan of this difficult to interpret this difference of norms. You also don’t get a lot of pooling: since you have to fit this vector for each entity you won’t do well with sparse observations.

So what can we do instead? There is a natural clustering suggested by this rock-paper-scissors example so let’s use it. We augment each entity with a latent cluster Zi and a matrix Ac1, c2 of the advantage an entity of class c1 would have over an entity of class c2. Then the probability becomes

logit(AZa, Zb + γa - γb)

This is perhaps equivalent to this paper although I can’t quite tell as it’s confusingly written. I think they additionally put a DP on the γ terms which doesn’t make a ton of sense to me although they justify it by pointing that with match data certain entities are just statistically indistinguishable.]

So let’s now try it on a simulated rock-paper-scissors-like data. First let’s see what a standard Bradley Terry Model performs.

# TOOD: Plot a heatmap of win probabilities for rock-paper-scissors

For a “real” example we can use the Weedle’s cave data set from Kaggle which captures battle outcomes between different pokemon. Now those of you born in the 90s will know1 that Pokemon have types and these types exhibit rock-paper-scissors like advantages: electric is super effective against water which is super effective against fire and so on.

Data set: https://www.kaggle.com/datasets/terminus7/pokemon-challenge/data

using AlgebraOfGraphics, CairoMakie
using Turing, Turing.RandomMeasures
using DataFrames
using CSV
using MCMCChains

struct MatchData
    lo_ids::Vector{Int}
    hi_ids::Vector{Int}
    lo_wins::Vector{Int}
    matches::Vector{Int}
    n_entities::Int
    n_pairs::Int
end

function simulated_data()
    strengths = [1,1,1,2,2,2,3,3,3]
    class = [1,2,3,1,2,3,1,2,3]

    class_strengths = zeros(3, 3)
    class_strengths[1,2] = 3
    class_strengths[2,1] = -3
    class_strengths[1,3] = -3
    class_strengths[3,1] = 3
    class_strengths[2,3] = 3
    class_strengths[3,2] = -3


    lo_ids = Int[]
    hi_ids = Int[]
    lo_wins = Int[]
    matches = Int[]


    for lo in 1:9, hi in (lo+1):9
        push!(lo_ids, lo)
        push!(hi_ids, hi)

        logit_p = strengths[lo] - strengths[hi] + class_strengths[class[lo], class[hi]]
        p = 1 / (1 + exp(-logit_p))
        push!(matches, 20)
        push!(lo_wins, rand(Binomial(20, p)))
    end

    return MatchData(lo_ids, hi_ids, lo_wins, matches, 9, 9*4)
end

function prep_match_data(df)
    pairs = Dict{Tuple{Int, Int}, Tuple{Int, Int}}()
    for row in eachrow(df)
        lo, hi = minmax(row["First_pokemon"], row["Second_pokemon"])
        lo_wins = lo == row["Winner"]
        if haskey(pairs, (lo, hi))
            pairs[lo, hi] = pairs[lo, hi] .+ (lo_wins, 1)
        else
            pairs[lo, hi] = (lo_wins, 1)
        end
    end

    lo_ids = zeros(Int, length(pairs))
    hi_ids = zeros(Int, length(pairs))
    lo_wins = zeros(Int, length(pairs))
    matches = zeros(Int, length(pairs))

    for (ii, (key, val)) in enumerate(pairs)
        lo_ids[ii] = key[1]
        hi_ids[ii] = key[2]
        lo_wins[ii] = val[1]
        matches[ii] = val[2]
    end

    n_pairs = length(pairs)
    n_entities = maximum(hi_ids)

    return MatchData(lo_ids, hi_ids, lo_wins, matches, n_entities, n_pairs)
end

@model function BradleyTerryDP(match_data)
    # Dirichlet Process for infinite mixture of clusters
    α = 1.0  # Concentration parameter
    rpm = DirichletProcess(α)
    H = Normal(0, 1)  # Base distribution for cluster effects

    n_entities = match_data.n_entities

    # Individual strengths
    # β = tzeros(n_entities)
    β ~ filldist(Normal(0, 1), n_entities)

    # Cluster assignments and strengths
    cluster = tzeros(Int, n_entities)
    max_clusters = 30
    cluster_effects = tzeros(max_clusters, max_clusters)

    for ii in 1:n_entities
        K = maximum(cluster)
        nk = Vector{Int}(map(k -> sum(cluster .== k), 1:K))
        cluster[ii] ~ ChineseRestaurantProcess(rpm, nk)

        if cluster[ii] > K
            cluster_effects[K + 1, K + 1] = 0.0
            for k in 1:K
                cluster_effects[k, K + 1] ~ Normal(0, 1)
                cluster_effects[K + 1, k] = -cluster_effects[k, K + 1] # enforce antisymmetry
            end
        end
    end

    # Likelihood
    for ii in 1:match_data.n_pairs
        lo = match_data.lo_ids[ii]
        hi = match_data.hi_ids[ii]
        logit_p = β[lo] - β[hi] + cluster_effects[cluster[lo], cluster[hi]]
        p_lo_win = 1.0 / (1.0 + exp(-logit_p))
        match_data.lo_wins[ii] ~ Binomial(match_data.matches[ii], p_lo_win)
    end
end


sim_data = simulated_data()
model = BradleyTerryDP(sim_data)

#df = DataFrame(CSV.File("assets/non-transitive-bradley-terry/combats.csv"))
#match_data = prep_match_data(df)
#model = BradleyTerryDP(match_data)


# TODO: figure out what these paramters mean
#chain = sample(model, Gibbs(PG(20, :cluster), NUTS(500, 0.65, :cluster_effects), NUTS(500, 0.65, :β)), n_iterations)

n_iterations = 1000
chain = sample(model, Gibbs(:cluster => PG(100),
                            :cluster_effects => HMC(0.01, 4),
                             => HMC(0.01, 4)), n_iterations)

# char_df = DataFrame(CSV.File("assets/non-transitive-bradley-terry/pokemon.csv"))
# cluster_type = DataFrame(:cluster => chain[44, namesingroup(chain, :cluster), :].value.data[:],
#                          :type => char_df[!, "Type 1"])

# cluster_counts = combine(groupby(cluster_type, [:cluster, :type]), nrow => :count)

# plt = data(cluster_counts) * mapping(:cluster => nonnumeric, :type => nonnumeric, :count) * visual(Heatmap)
# draw(plt)

*


  1. sadly I know this only second-hand as my parents thought books were better than gameboys: look at the ruin that brought me ↩︎