The schools of magic plot

Data wrangling
Making the plot

This is the code used to produce the “schools of magic” dendrogram/heatmap plot that I posted on social media. As always, I’ll start by loading the packages:

library(dplyr)
library(tidyr)
library(readr)
library(tibble)
library(ggplot2)
library(stringr)
library(legendry)

Data wrangling

The data wrangling for this one is slightly more elaborate than for the spell dice plot, because we’ll need data suitable for the heatmap and also we need data suitable to produce the dendrograms on each axis. We start by loading the spells data:

spells <- read_csv("./data/spells.csv", show_col_types = FALSE)

Data for the heatmap

To produce data for the heatmap, we select the relevant columns: i.e., those corresponding to the character classes, the school variable that denotes the school of magic for the spell, and the name variable because I like having an id column in my data. We then use pivot_longer() to arrange this data set in long form:

spells_long <- spells |>
  select(name, school, bard:wizard) |>
  pivot_longer(
    cols = bard:wizard,
    names_to = "class",
    values_to = "castable"
  ) 

print(spells_long)

#> # A tibble: 2,512 × 4
#>    name        school     class    castable
#>    <chr>       <chr>      <chr>    <lgl>   
#>  1 Acid Splash evocation  bard     FALSE   
#>  2 Acid Splash evocation  cleric   FALSE   
#>  3 Acid Splash evocation  druid    FALSE   
#>  4 Acid Splash evocation  paladin  FALSE   
#>  5 Acid Splash evocation  ranger   FALSE   
#>  6 Acid Splash evocation  sorcerer TRUE    
#>  7 Acid Splash evocation  warlock  FALSE   
#>  8 Acid Splash evocation  wizard   TRUE    
#>  9 Aid         abjuration bard     TRUE    
#> 10 Aid         abjuration cleric   TRUE    
#> # ℹ 2,502 more rows

Now we have a tidy data set with one row per “observation”, in the sense that it specifies whether a spell of a specific name (which belongs to a specific school), is in fact castable by members of a particular character class. We can summarise this by aggregating over the specific spells, and count the number of castable spells for each combination of magic school and character class:

dat <- spells_long |>
  summarise(
    count = sum(castable),
    .by = c("school", "class")
  ) |>
  mutate(
    school = str_to_title(school),
    class  = str_to_title(class)
  )

print(dat)

#> # A tibble: 64 × 3
#>    school     class    count
#>    <chr>      <chr>    <int>
#>  1 Evocation  Bard         7
#>  2 Evocation  Cleric      12
#>  3 Evocation  Druid       17
#>  4 Evocation  Paladin      3
#>  5 Evocation  Ranger       3
#>  6 Evocation  Sorcerer    30
#>  7 Evocation  Warlock      4
#>  8 Evocation  Wizard      30
#>  9 Abjuration Bard        16
#> 10 Abjuration Cleric      33
#> # ℹ 54 more rows

This dat data frame is suitable for plotting as a heat map with geom_tile(), so let’s now move to stage two of the data wrangling.

Dissimilarity data for the dendrograms

The data structure that we need at this step is slightly more complicated, because what we want to display on each axis is a hierarchical clustering, of the sort typically produced by hclust(). In a distant, distant past I actually wrote my PhD thesis on clustering and scaling tools used to represent item (dis)similarities, and as such I’m acutely aware that these tools are extremely sensitive to the way you define similarity (or dissimilarity, or distance, or association, or whatever…). So I’ll be a little careful here, because if you do this in a thoughtless way you get stupid answers.

Let’s start by reorganising the dat data frame into a matrix form. The mat matrix below contains the exact same information as the data frame: each cell in the matrix represents the number of castable spells for a specific combination of class and school.

print_truncated <- function(x) {
  if (inherits(x, "matrix")) {
    rownames(x) <- str_trunc(rownames(x), width = 6, ellipsis = ".")
    colnames(x) <- str_trunc(colnames(x), width = 6, ellipsis = ".")
  }
  if (inherits(x, "dist")) {
    attr(x, "Labels") <- str_trunc(
      attr(x, "Labels"), 
      width = 6, 
      ellipsis = "."
    )
  }
  print(round(x, digits = 3))
}

mat <- dat |>
  pivot_wider(
    names_from = "school",
    values_from = "count"
  ) |>
  as.data.frame()

rownames(mat) <- mat$class
mat$class <- NULL
mat <- as.matrix(mat)

print_truncated(mat)

#>        Evoca. Abjur. Trans. Encha. Necro. Divin. Illus. Conju.
#> Bard        7     16     18     28      5     18     22      8
#> Cleric     12     33     13      8     14     17      1     11
#> Druid      17     17     33      9      7     14      2     21
#> Palad.      3     16      3      5      3      5      0      2
#> Ranger      3     11     13      3      1      9      1      7
#> Sorce.     30      7     33     13      9      8     14     19
#> Warlo.      4      8      6     12     10      9     11      9
#> Wizard     30     22     41     15     18     19     26     24

In this matrix we have a measure of “affinity”, in the sense that larger values indicate a higher affinity between a class and a school. The tricky part here is that some classes are simply better at spellwork than others: clerics and wizards can both cast lots of spells; paladins and rangers cannot cast many. The kind of similarity that I have in mind here is not the boring “clerics and wizards are similar because they can both cast lots of spells” kind. What I really want to say is something like “paladins and clerics are similar because abjuration is the strongest school for both classes”. The same applies when thinking about the schools of magic: there are lots of transmutation spells and lots of abjuration spells. That doesn’t really make those schools similar, not in the sense I care about.

What all this amounts to is an acknowledgement that we need to correct for overall prevalance, or – to frame it in probabilistic terms – to describe classes in terms of a distribution over schools and describe schools in terms of a distribution over classes. That gives us the following two matrices:

class_distro  <- mat / replicate(ncol(mat), rowSums(mat))
school_distro <- t(mat) / (replicate(nrow(mat), colSums(mat)))

The class_distro matrix is the one that describes classes as a distribution over schools, and you can see in the printout here that when described in this fashion the paladin row and the cleric row do look rather similar to each other:

print_truncated(class_distro)

#>        Evoca. Abjur. Trans. Encha. Necro. Divin. Illus. Conju.
#> Bard    0.057  0.131  0.148  0.230  0.041  0.148  0.180  0.066
#> Cleric  0.110  0.303  0.119  0.073  0.128  0.156  0.009  0.101
#> Druid   0.142  0.142  0.275  0.075  0.058  0.117  0.017  0.175
#> Palad.  0.081  0.432  0.081  0.135  0.081  0.135  0.000  0.054
#> Ranger  0.062  0.229  0.271  0.062  0.021  0.188  0.021  0.146
#> Sorce.  0.226  0.053  0.248  0.098  0.068  0.060  0.105  0.143
#> Warlo.  0.058  0.116  0.087  0.174  0.145  0.130  0.159  0.130
#> Wizard  0.154  0.113  0.210  0.077  0.092  0.097  0.133  0.123

A similar phenomenon is observed in the school_distro matrix, where you can see that the rows for abjuration and divination are quite similar despite the fact that there are a lot more abjuration spells than divination spells:

print_truncated(school_distro)

#>         Bard Cleric Druid Palad. Ranger Sorce. Warlo. Wizard
#> Evoca. 0.066  0.113 0.160  0.028  0.028  0.283  0.038  0.283
#> Abjur. 0.123  0.254 0.131  0.123  0.085  0.054  0.062  0.169
#> Trans. 0.112  0.081 0.206  0.019  0.081  0.206  0.038  0.256
#> Encha. 0.301  0.086 0.097  0.054  0.032  0.140  0.129  0.161
#> Necro. 0.075  0.209 0.104  0.045  0.015  0.134  0.149  0.269
#> Divin. 0.182  0.172 0.141  0.051  0.091  0.081  0.091  0.192
#> Illus. 0.286  0.013 0.026  0.000  0.013  0.182  0.143  0.338
#> Conju. 0.079  0.109 0.208  0.020  0.069  0.188  0.089  0.238

We are now in a position to convert both of these to distance/distance matrices. Notwithstanding the fact that it’s probably not the ideal way to describe similarity between distributions, I’ll call dist() using the default Euclidean distance measure. I mean, sure, I could probably do something fancy with Jensen-Shannon divergence here, but in my experience the metric you use to measure distributional similarity is far less important than the manner in which you construct the distributions from raw features in the first place, so I’m not going to sweat this one. Here’s our measure of class dissimilarity:

class_dissim  <- dist(class_distro)
print_truncated(class_dissim)

#>         Bard Cleric Druid Palad. Ranger Sorce. Warlo.
#> Cleric 0.309                                         
#> Druid  0.296  0.251                                  
#> Palad. 0.373  0.167 0.381                            
#> Ranger 0.294  0.213 0.146  0.313                     
#> Sorce. 0.286  0.342 0.168  0.468  0.292              
#> Warlo. 0.151  0.270 0.288  0.371  0.312  0.279       
#> Wizard 0.218  0.259 0.152  0.389  0.228  0.118  0.196

Here’s our measure of school dissimilarity:

school_dissim <- dist(school_distro)
print_truncated(school_dissim)

#>        Evoca. Abjur. Trans. Encha. Necro. Divin. Illus.
#> Abjur.  0.320                                          
#> Trans.  0.122  0.279                                   
#> Encha.  0.323  0.284  0.270                            
#> Necro.  0.218  0.200  0.226  0.281                     
#> Divin.  0.271  0.133  0.203  0.181  0.179              
#> Illus.  0.319  0.409  0.301  0.217  0.313  0.303       
#> Conju.  0.134  0.251  0.073  0.273  0.178  0.184  0.319

Hierarchical clustering for the dendrograms

After all that effort in constructing the dissimilarity matrices, the hierarchical clustering is something of an anticlimax. The only substantive choice we need to make here is whether to use single-link, complete-link, average-link, or some other method for agglomeration. This does matter somewhat, at least in my experience, but I’m also feeling lazy so I’m going to go with average-link because it feels appropriate to me in this context:

clusters <- list(
  class = hclust(class_dissim, method = "average"),
  school = hclust(school_dissim, method = "average")
)
print(clusters)

#> $class
#> 
#> Call:
#> hclust(d = class_dissim, method = "average")
#> 
#> Cluster method   : average 
#> Distance         : euclidean 
#> Number of objects: 8 
#> 
#> 
#> $school
#> 
#> Call:
#> hclust(d = school_dissim, method = "average")
#> 
#> Cluster method   : average 
#> Distance         : euclidean 
#> Number of objects: 8 
#> 
#>

Making the plot

Constructing the plot can also be considered a two-part process. In the first stage, we constrict a base plot object that uses geom_tile() to display the class/school affinities data (i.e., dat), and add various stylistic features to make it look pretty:

base <- ggplot(dat, aes(school, class, fill = count)) +
  geom_tile() +
  scale_fill_distiller(palette = "RdPu") +
  labs(
    x = "The Schools of Magic",
    y = "The Classes of Character",
    fill = "Number of Learnable Spells"
  ) +
  coord_equal() +
  theme(
    plot.background = element_rect(
      fill = "#222", 
      color = "#222"
    ),
    plot.margin = unit(c(2, 2, 2, 2), units = "cm"),
    text = element_text(color = "#ccc", size = 14),
    axis.text = element_text(color = "#ccc"),
    axis.title = element_text(color = "#ccc"),
    axis.ticks = element_line(color = "#ccc"),
    legend.position = "bottom",
    legend.background = element_rect(
      fill = "#222", 
      color = "#222"
    )
  )

plot(base)

In this form, though, you can’t really see which schools are similar to each other and nor can you see how the classes are related in terms of their spell-casting affinities. What we really want to do is reorder the rows and columns so that the most similar schools are placed in adjacent columns, and the most similar classes are placed in adjacent rows. Until recently I’d never found a tool for doing this in R that I found satisfying, but with the release of the legendry package by Teun van den Brand (which has a lot of tools for working with plot legends and axes that I’m slowly learning…) this has changed. If we pass a hierarchical clustering to the scale_*_dendro() functions, the rows/columns are reordered appropriately, and the dendrograms themselves are shown alongside the axes:

pic <- base +
  scale_x_dendro(
    clust = clusters$school,
    guide = guide_axis_dendro(n.dodge = 2),
    expand = expansion(0, 0),
    position = "top"
  ) +
  scale_y_dendro(
    clust = clusters$class,
    expand = expansion(0, 0)
  )

plot(pic)

So much nicer!

To any D&D player, the plot is immediately interpretable: wizards and sorcerers are very similar spellcasting classes, and the spellcasting abilities of paladins are basically “clerics, but not very good at it”. The same dynamic is in play with regards to druids and rangers, in the sense that they’re both nature focused spellcasters but rangers aren’t very good at it. The grouping of bards and warlocks surprised me a little, until it was pointed out to me that they both rely heavily on charisma in their spellcasting, so there is a kind of connection there.

On the schools side, the plot is similarly interpretable: enchantment and illusion are closely related schools, as are abjuration and divination. Necromancy feels a little bit like the darker cousin of abjuration so yeah, that tracks too. Transmutation, conjuration, and evocation are all kinda related, so you get a clustering there too.

There are some limitations to hierarchical clustering, of course, and you can see a little bit of that coming through in the plot. By design, I constructed the dissimilarities so that they’d ignore the “primary spellcaster vs secondary spellcaster” distinction, so the overall brightness of adjacent rows and columns varies wildly. But to capture that in a clustering solution while also capturing the “stylistic” similarities I’ve plotted here, you’d need to use an overlapping clustering tool rather than a hierarchical one, and those are inherently trickier to work with, and I wouldn’t be able to draw the pretty dendrograms either!