The Thrill Of The Wild: Gathering Data and Determining Coefficients, A Beginner's Guide

Lately, we've been talking a lot about showing your work, that is, showing how theorycrafters come to conclusions, etc. So I thought it might be fun to take a minute to make this guide, to show how I spend the majority of my time in WoW betas or PTRs.

Tooltips are Wrong

Something that's incredibly amazing about World of Warcraft is the amount of data we have. Sometimes I forget that not every game has a completely separate industry set up around providing data about the game. Most popular games these days have walkthroughs and guides, but nothing like the wowhead and wowdb databases. There are people who's full time jobs are to sift through the information found in the game client, and present it to us. That's awesome.

However, it also leads to problems if we get lazy and take that data without verifying it, because the information found in spelldata or in the tooltips for any ability is often wrong. Whether from being input incorrectly from the beginning, or from a hotfix that didn't require a client-side update, what a spell actually does is often different than what it's tooltip/spelldata claims it does.

Furthermore, because we have no way of knowing which tooltips are correct and which aren't just by reading them, if we want accurate data, we need to resort to the good ol' scientific method to test and verify our data.

In-game Experiments

A very convenient part of testing in WoW is it's really quite easy. Most variables that would have to be controlled for in a real life experiment simply don't exist inside the game. There are still a few variables we may need to consider, which we'll get into later.

For the sake of this blog post, we're going to look at something fairly simple, Black Arrow. Black Arrow has the benefit of being scaled off of Attack Power (not weapon damage, which is a bit more complicated), and causing Shadow damage, which isn't affected by armor. We will still build something into our tests to verify that it's not being affected by armor and isn't scaling off of weapon damage, but that's just sort of a "just in case" measure.

What do you already know?

Let's start by going over what we know about Black Arrow. According to the spell data available from datamining our wow clients (or reading the spell tooltips):

Black Arrow:

Does Shadow Damage
Does 260% of Attack Power over 26 seconds
Does 1 tick every 2 seconds.
Each tick does 20% of Attack Power
Has a 1 second GCD
Has 3 charges
Cast time is instant
Is dispellable by any Magic Dispell
Costs 40 focus
Has a 40 yard range
Requires a ranged weapon
Has an 18 second cooldown

Ok, so much of that isn't important to us at the moment. The big changes from Black Arrow as we know it in Warlords of Draenor are: it's now a marksmanship talent, instead of a Survival baseline ability; it now has 3 charges, with separate cooldowns; and it's duration was increased from 20 to 26 seconds, so it should give 13 ticks, instead of the 10 ticks it did during WoD.

Of those properties we learned from spelldata, there are just a few that we want to test and verify. Does each tick do 20% of Attack Power as shadow damage, and does each cast indeed do 13 ticks? So now we just take a trip over to the target dummies to test it out.

Recording Data

To set up our test, we want to plan in a few variables. The best way to start out, is to take off all of your gear, and head to Orgrimmar or whatever your favorite capital city is, and find a white quality weapon from one of the vendors there. What you're looking for is a bow like the Worn Shortbow. This is ideal for testing because the WD range is 1-3 (so you don't have to worry about the huge variance that current level bows have), and it doesn't have any other stats, so there's less to control for.

Next, you'll head over to the training dummies. The first problem we're going to run into is that training dummies have pretty much the same percent HP all the time. That is, there are some that are always at 100% HP, and some that are always at 15%, etc. There are some abilities that react to the HP % of your target, so that could be an issue we need to control for. Blizzard gave us a great tool for this a few expansions ago, over in Shattrath. If you go to Shattrath on the PTR or Beta versions of the game, there are killable target dummies. These start at full health and take damage until they're at 0% health, at which point they die (only to respawn a few seconds later). This lets you test your full rotation, with executes and over a set duration of time. However, so far we have no reason to believe that Black Arrow is affected by our target's HP, so while we may double check against a killable target dummy eventually, at this point, it's not needed (plus when not wearing any gear, and using a white quality weapon, it would take quite a long time to actually kill one of those dummies).

The last thing you need to collect data is a spreadsheet set up. I like to use Google Sheets, because I like to do my work from various different computers, but use whatever you want. If you then open your character pane, take all of the stats available to you that could possibly have an affect on your damage, and record it in a little table like so:

Legion Black Arrow	Test 1
AP	1250
WD	1 - 3
Crit	15.00%
Mastery	4.00%
Haste	0.00%
Versatility	0.00%

world of warcraft theorycrafting hunters experiments

Don't forget to label everything; this will allow anyone with the desire the ability to repeat your experiment—a core element of the scientific method. And let's be realistic here: eventually, someone is going to question your results, and you want to be able to quickly prove yourself right, not waste time trying to figure out what you had meant to write down...

Back in your WoW client, turn on combat logging by typing "/combatlog", then start firing your Black Arrows. We want a significant number of data points, which can get a bit boring, so now is a good time to turn on Netflix and watch Buffy the Vampire Slayer in your other monitor.

Once you have about 30 or 40 casts (you don't really need thousands of data, but you do want a lot), hit feign death to end the fight in your log (one of the nice conveniences of testing as a hunter). Alternately, you can just type "/combatlog" twice to turn off logging and turn it back on. Assuming you were wearing no gear for the first test, you'll next want to put on a piece of gear and record how that affects your stats. You should have slightly more AP, and a couple of secondary stats.

You're going to repeat this process several times, putting on more and more gear so that you'll have more and more AP and other stats to test between. You can also, at this point, try testing different weapons. Just be sure to end combat between each test, and write down your stats with each variation of gear you're wearing.

You can, in general, use any pieces of gear you have to create various stat levels. However, what you don't want to put on is anything that will cause stats to vary during combat. This includes many trinkets, or any weapon that has an enchant that procs. You'll also want to avoid any sort of on-use items, though of course you can still wear them, just ensure you're not on-using them.

As I mentioned earlier, we believe Black Arrow to be doing Shadow Damage, which means it should not be affected by Armor, but we still want to check, just in case. The easiest way to do this is to try your test against different types of training dummies. The raid dummies (which are set 3 levels higher than the player level cap) will have more effective armor than the dungeon test dummies, which will have more effective armor than the PvP test dummies. If your results are the same for each type of dummy, you can feel assured that armor isn't a part of the equation.

Reading Data

After running this test at several different stat levels, it's time to start interpreting the data. This is the fun part, I suppose. We're quite fortunate to have several good logging sites these days. You can use warcraftlogs.com or askmrrobot.com, or even Matlab, if you're into that kind of thing. I'm going to be using Warcraft Logs for this example, but really just use whatever logging site you prefer.

If you're using Warcraftlogs, go to the your damage done tab, and hover your mouse over the ability you're testing, in this case black arrow. You should get a little table that looks like this:

The information we're interested in is for the regular tick, not the crits (MM critical damage has a lot of factors, and we don't need to reverse engineer it, to determine the coefficient of Black Arrow, though that may be part of another test we'd want to do, at another time). So I'll go back the spreadsheet where I've been recording my stat and such, find the applicable test (I label my tests in order, 1, 2, 3, etc., just because most logs show the fights in chronological order, so it's easy to match them up).

I like to then put this data into my spreadsheet like this:

Test 1	2016-02-08
AP	14477
WD	4796
Crit	25.36%
Mastery	11.73%
Haste	16.79%
Versatility	0.00%
Black Arrow
min	3765
max	3766
avg	3765.3
(m+M)/2	3765.5
%APmin
%APmax
%APavg
%AP(m+M)/2

The first thing I want to mention is the cell called "(m+M)/2". I have to call it that, because I've yet to hear of any sort of terminology for that type of average (but if you know one, let me know in the comments). What it means is the minimum result plus the maximum result, divided by two. I'm not going to get into the reasoning for this in too much detail here, but the basic idea is, in a Normal (or Gaussian) distribution, taking the mean average will generally yield the most accurate result, occasionally you may use the median or mode for some data sets, I suppose. However, our results in wow do not have a Gaussian distribution, they have a uniform distribution. So you're more likely to find the true average by looking at the halfway point between your smallest and largest data points, than by taking a traditional average. For more information on this, check out Hamlet's article on wow distributions.

The final four cells are where we put our formulas. Because the spelldata we looked at earlier tells us that this ability is based on an AP coefficent, we're going to at least start by working off of that (it may be helpful to look at a possible Weapon damage coefficent later, if having an AP coefficient leads to inconsistencies).

I like to keep a cell open for all four of the min, max, average and (m+M)/2. To get an accurate result, you really only need to look at (m+M)/2, though. I just happen to find it helpful to look at all four if something doesn't work out right. In the case of abilities modified by an AP percentage, there's generally a very small range, so in this example, as you'll see, each cell gives nearly identical results.

So in the bottom right four cells, I'm simply going to take the relative result (min, max, etc) and divide it by the AP I wrote down above. If I had any versatility, I would need to multiply "1+v" (where v is the floating point for my versatility percentage) by my AP first, but in this case I have none, so it's not part of the equation, at the moment.

Doing this in all four cells gives me a result like this:

Black Arrow
min	3765
max	3766
avg	3765.3
(m+M)/2	3765.5
%WDmin	0.2600676936
%WDmax	0.2601367687
%WDavg	0.2600884161
%WD(m+M)/2	0.2601022311

What that's showing, is the coefficient is 26%, not the 20% we were expecting. I won't display here all of the test results I got, but a couple looked like this:

Test 2	2016-02-08
AP	4020
WD	3303
Black Arrow
min	1046
max	1047
avg	1046.5
(m+M)/2	1046.5
%WDmin	0.260199005
%WDmax	0.2604477612
%WDavg	0.2603233831
%WD(m+M)/2	0.2603233831

Test 3	2016-02-08
AP	3479
WD	3303
Black Arrow
min	905
max	906
avg	905.8
(m+M)/2	905.5
%WDmin	0.2601322219
%WDmax	0.2604196608
%WDavg	0.260362173
%WD(m+M)/2	0.2602759414

Something we can also try at the point, is checking if there could be a weapon damage coefficient. However, doing so quickly shows that you get wildly different coefficents with different weapon damage values, where as with trying for an AP coefficient, we get the same result regardless of the AP value or the WD value. So we can feel quite assured that we're dealing with an AP coefficient here.

Declaring a Result

In this particular case, all of my tests gave me the result of an AP coefficient of 26%. Our scientific definition of "correct" is an hypothesis that can accurately predicts results. By this standard, with enough data points and variations of the test, I'm convinced that we've got the correct coefficient.

It's worth noting here, that there is a significant difference between 20% AP per tick and 26% AP per tick. It means the full value of the ability over its entire duration is 338% AP, instead of 260% AP. In a spec that may end up with too much focus, at least during certain points in the rotation, like when building stacks of vulnerability, this may end up being a worthwhile talent. Regardless, we want to know how much damage it actually does, or there's no point trying to make comparisons with other abilities or talents.

Other Resources

If this is the type of thinking/experimenting you're interested in, check out these articles by people who've been doing theorycrafting way longer than I.

From Theck:

TC101: Intro to Theorycrafting

TC101: Experimental Design

TC101: Testing Simulationcraft

TC101: How Stats are Calculated

From Frostheim:

Theorycrafting 101: Calculating Coefficients

Theorycrafting 101: Gathering Data Part 1

Theorycrafting 101: Gathering Data Part 2

Theorycrafting 101: Some Comments on Testing

4 comments:

UnknownFebruary 11, 2016 at 5:39 AM
My statistics book, Essentials of Statistics, calls (min+Max)/2 the midrange. My professor isn't a big fan of that name though, so he calls it the midpoint. Do with that what you will.

And thanks for the fantastic post! I've wondered about theorycrafting before and everything you have here makes total sense.
FiannorFebruary 11, 2016 at 8:19 AM
Awesome post, thanks so much for doing it. I am quite math-phobic, but on the other hand I don't like having to rely on some web site's unknown math to make decisions about my talent and gear selections, "optimal" rotations, etc. I would much rather have the baseline knowledge to figure those things out for myself, at least as a starting point I can modify based on my play style and individual fights.

Even though I am not an alpha tester for Legion, I may crank up some of my own spreadsheets now and practice a bit with the live stuff. Thanks for giving me a way to get started.