The ArctEX baseball statistics system

Here is the sample PDF. It shows some series that I like - there are instructions below to generate one for any series. Sorry if the fonts look bad, but the problem is your PDF reader software (firefox is the worst I think).

See the bottom of this page for the new software download's release notes, updated 16 Aug 2024. The PDF has also been updated, with the same games shown but with all the new stats and features and bugfixes and updated retrosheet database. The 14 Aug update was an important bug fix.

I have to apologize in advance for my crappy html formatting. This is a very preliminary introduction to a cool project based on Retrosheet to sort of reinvent basic baseball statistics using a somewhat more mathematical framework. What's different about it? In a word, consequentialism. All arctex statistics are based solely on umpire's decisions about what happened in the game. There are no earned runs or RBIs or at bats, or even hits or errors except as descriptions. It aims to be a complete and reasonably familiar description of baseball using these methods, although the terminology is mostly new.

The name is from "arctangent of expectation values", about which more later after some actual baseball stuff. Skip the rest of this paragraph if you don't want a brief history of the project. This project is a work in progress but close to something interesting (to me at least). It was going along great up through 2019 and then a number of things happened. One is they introduced some new rules that required a bit of a software rewrite - the runner on second base, the 7-inning game, the new postseason format. For the most part that's easy enough (see a complication below), but I delayed too long and accumulated too many irresistable ideas for new stuff, and so the update became too big to tackle at once etc. And then in 2020 I had a number of new ideas all at once that demanded following up, which eventually resulted in the other texts I wrote. Finally in late 2023 I got around to fixing all that and updating so that I can follow retrosheet again - at last!

As you can see, it starts with a funny looking graph, which is the main point of why I wanted to do this in the first place. On page 1 you see game 1 of the 1955 world series. If you know baseball reasonably well, you should be able to follow the action by looking at the two thick lines on the graph and the bold labels above it. The light gray line shows the Yankees and the black line shows the Dodgers. The Yankees are light because they're at home, like the uniforms, and you should be able to see clearly enough that they bat second. You can see home runs in the top and bottom of the second, the top of the third and the bottom of the fourth. Having identified those, you should then be able to tell when runs are scored in other innings, and to tell what the score is in the middle of the fifth. On page two you should be able to see the big home run in the sixth, and to see who won and what the score was.

Doesn't that look like a pretty good summary of a baseball game at a glance? The rest requires a bit of explanation of course. Go to page 17 of the pdf and, sorry, rotate clockwise please. Now you can see the same game 1 in a very different way, and also game 2 on the same page. Incidentally, the cryptic second line of letters above the graph receives a bit of an explanation here. The letters refer to players, and they're indicated in this table. The numbers are for bases, with 0 added to mean player out and 4 to mean player scored a run. The letters are also used under the graph to indicate sunstitutions, this time with the numbers used to indicate fielding position (plus offensive codes 10 11 12 for DH PH PR as in retrosheet). The other codes under the graph will be explained later.

Let me proceed to my basic mathematical way of describing baseball. At the top of the header for each game, you can see the same names again and again: pa o br r ro xo nr lb. pa is plate appearances, a familiar baseball statistics thing, and o and r are outs and runs. A br is defined as a plate appearance in which no out is made off the final pitch. This may seem a curious definition, but it's called br because it adds one to the total quantity of baserunners. This basic system of variables is an accounting system for baserunners. The other four numbers are defined from the first four

lb = pa - ( o + r )

xo = o + br - pa

ro = o - xo

nr = br - xo

I guess you shold already know lb if you're a baseball algebra nerd - that's runners left on base at the ends of innings or games. In fact these equations are true for any continuous stretch of the game. The next curious concept is "extra outs", or outs on the bases - pickoffs and the like, and also the second out of a double play (and third of a triple). "Regular outs" are the rest, that is the normal first out of a plate appearance. Finally we have "net runners", the number of runners left after double plays and caught stealings have had their effect. The point is that these quantities are subject to a number of interesting continuity and categorization relations:

o = ro + xo

pa = o + r + lb (ultimately)

pa = ro + br (immediately)

br = r + lb + xo (ultimately)

nr = r + lb

To drive home how these numbes can be an interesting summary of the basic mechanics of the game, look at these numbers from all of retrosheet up to 2019:

average pobr 76.7 53.6 25.6 8.9

average ro 51.08 xo 2.56 nr 23.09 lb 14.19

average br/pa 0.33 pa/3o 4.29 r/3o 0.50 nr/br 0.90 lb/nr 0.61 br/r 2.88

winner pobr 39.4 26.0 14.7 6.1

winner ro 24.73 xo 1.29 nr 13.39 lb 7.26

winner br/pa 0.37 pa/3o 4.54 r/3o 0.71 nr/br 0.91 lb/nr 0.54 br/r 2.40

loser pobr 37.2 27.5 10.9 2.8

loser ro 26.24 xo 1.27 nr 9.66 lb 6.89

loser br/pa 0.29 pa/3o 4.05 r/3o 0.30 nr/br 0.88 lb/nr 0.71 br/r 3.95

o = ro 95.2% + xo 4.8%

pa = ro 66.6% + br 33.4%

br = lb 55.3% + r 34.7% + xo 10.0%

pa = o 69.9% (ro 66.6% + xo 3.3%) + lb 18.5% + r 11.6%

This also introduces the complication I referred to above - that darn automatic runner on second base. It certainly doesn't fit anywhere in that mess above. Fortunately I'm still researching how to best use these equations in my more advanced statistics, and the ones I have so far won't be much affected. But I do use this as a game summary, and having looked at these printouts for three years I like it for that purpose. My current thought is that a new quantity will be introduced, xr which are neither br nor nr. The new equations would look like (incidentally, this shows how inherited runners would be accounted for if this analysis was ever used as the basis of an advanced set of pitching metrics):

lb = pa + xr - ( o + r )

xo = o + br - pa (unchanged)

ro = o - xo (unchanged)

nr = br - xo (unchanged)

o = ro + xo (unchanged)

pa + xr = o + r + lb (ultimately)

pa = ro + br (immediately) (unchanged)

br + xr = r + lb + xo (ultimately)

nr + xr = r + lb

There are also the following inequalities.

o >= ro, xo

br >= nr, (r-xr), (lb-xr), (xo-xr)

nr >= (r-xr), (lb-xr)

pa >= br, ro, nr, (o-xr), (r-xr), (xo-xr), (lb-xr)

The new nr is a strange quantity which can be negative. This is annoying because nr was very closely correlated with another stat called +.2 ("plus point twos").

Now for one more easy feature to understand. The graph is a great summary of a game, but what about the series? Or a lot of series? After some pages of summary stats, and after the player career summaries on ppg. 27-28, on page 29 you come to a whole-year summary. At the top is a display that baseball stats people should be able to interpret with a little snooping, except for ACR ACOR TAR (explained briefly later). Below that is a summary of the 1955 postseason (a little short by modern standards). The winning team of each series is identified on the left, followed by the losing team, and then there is a list of the games in the forms of single decorated letters called the willow. Simply, w is a win for the series winner, and l is a loss. Uppercase lettes indicate a home team victory and lower case a home team loss. A high dot to the left means it was a complete game by the winning pitcher. A comma to the right indicates a walkoff, and the middle dot to the right indicates extra innings. A semicolon means extra inning walkoff. In the text file the CG is indicated by a ^ symbol to the left. You can see the entire history of the baseball postseason in this form:

The big willow 1903-2023

I think you can see a lot of interesting patterns at a glance there. Finally on page 28 you see the last part of the presented staistics - the rest of the file is more of my favorite series. This is a table of the 25 best hitters and pitchers in each league for that year according to my present metrics. This has just been updated with a ton of new data. It looks... a bit crowded now. But it's very informative. I don't have time yet to explait it very well. There is a new calculation of hitting and pitching MVPs which is very briefly summarized in this table. The column labelled "ru" is the runner-up number of the MVP award, with a * indicating the actual winner. ACR and ACOR are the basic hitting and pitching metrics - "average contribution to runs" and "to opponents' runs", so plus is good for hitting and minus for pitching. Basically those are averages of the numbers under the graph, for when that player is hitting or pitching. ACW is the new log-probability-ratio average stat (see below), which is good when positive.

Finally, what is that wavy graph all about? Ignore the thinner lines for the moment. The graph is based on expectation values or what are sometimes called run values. These are expectation values for runs to be scored in the remainder of the half-inning based on the number of outs and the occupation of each base. This is an old idea that has been around baseball for most of my lifetime, but I didn't hear about until around 2017. I immediately set to making a graph like this to properly show what it "knew" about the game. There turn out to be some wrinkles in this, but there are good answers also. At the bottom of each graph there are a couple of lines of text. The bottom-most numerical line shows the difference in this expectation value (plus the actual score) for each play, and also the initial "free lift" at the start of each half-inning. The line of text above this "free lift" amount is a text identifier of the table of expectation values in use. As you look through the other games in the file, you can see that these vary quite a bit, although all are the same in that first game. This variation is the main part of the wrinkle, which I'll describe later.

The next thing to note is the rest of the line above the numbers. Each play has a code like 0p0010. These are the basis for calculating my statistics, and I translate retrosheet into a format based on these codes. They're called ERD codes (expected run difference) because they're used to calculate the run value for each play, as well as to calculate the ERV tables (expected run value) that are needed in the first place to calculate an ERD. The numbers are easier to undestand, so at first simply note that the letter "p" means plate appearance, and "n" means not a plate appearance, i.e. a play on the bases on a pitch that does not complete a plate apperance. The first number is a digit to indicate the occupation of all 3 bases at the start of the play - on octal digit 0-7. 1 is first, 2 is second, 4 is third, and you add them together. The digit after the letter is the same for after the play. The next two numbers are the number of outs 0-3 before and after the play, and the final digit is the number of runs scored. If that all seems slightly redundant, you're right because in fact the runs is also a check digit which can be computed from the previous information. This explains the need to indicate p or n, because whether the batter counts in the sum changes the result. This is part of the extensive error-checking I had to do to parse retrosheet correctly. Their syntax is a little whiplash-inducing at times. Anyway you should see, the example code means the first batter of the inning was out. This is the most common play in baseball figured in this manner.

A further complication is that there are more letters. "q" indicates a plate appearance in the bottom of the ninth or later inning, and "r" is the equivalent of "n" for that situation. "w" is a walkoff plate appearance, and "u" is the rare walkoff non-plate appearance to complete the set. Walkoffs are cool and all, but why is all this necessary? This has to do with an interesting wrinkle in the expectation value system and the rules of baseball. In an earlier version, I noticed a problem at the end of some particularly exciting games - sometimes the winning play had a negative run value, or ERD. In the bottom of the ninth in a tied game, with the bases loaded and nobody out, the batter hits a ball off the outfield wall that lodges in the outfielder's mouth or something. Normally four runs would score, but in this situation the umpires declare that the batter hit a single and one run scores. The expectation value naively expects the same thing as the kid in the stands who doesn't know that and wants to see the game continue, and the negative ERD is in effect its disappointment. The fix is to use a separate expectation value table for such situations. There are names for all of the various tables, and most of them are for different ballparks. This is how a "ballpark factor" is included in this system. The tables for each ballpark are rolled over every several years to update the stats, on an irregular schedule, indicated by a digit at the end. Finally to accomodate the game-ending scenario, there are tables called PW for potential walkoff. This is also followed by a digit, this time indicating the score difference at the start of the half-inning, from 0 to 4, where 4 is for 4 or higher. It turns out that all five of these PW tables are structured quite differently. Table 4 on the other hand is very similar to the all-time table for normal innings. All games throughout retrosheet history use the same PW tables regardless of year or ballpark, for the simple reason that this is required to collect an adequate statistical base for these tables. To give the worst example, the 4-0 entry (runner on 3rd nobody out) for PW3 has a denominator of 146 as of 2019 for all of retrosheet, which is pretty small.

As I briefly described above, ACR and ACOR are the primary hitting and pitching stats. They can be calculated either for players or for teams, or indeed for any other thing you can assign plate appearances to. ACR is offinsive oriented, so plus is better than minus, whereas ACOR is the opposite. In summaries for teams, TAR is ACR minus ACOR, a combination that's better when positive. The sign of TAR turns out to be a pretty good predictor of the sign of W-L on the season. The basic number is a per plate appearance average of the run value differences attributed to the unit, multiplied by a constant equal to the average number of plate appearances per game per team for either hitters or pitchers as appropriate. This adjustment makes the number easy to (slightly mis)undestand as what this player or team does, in terms of runs per game, compared to the average (of all players who played in the same ballparks in those eras). In fact the way the "free lift" at the start of each half-inning is excluded has an effect where players far from the average (at 0) have slightly inflated values.

I also have similar stats for baserunning and fielding, called BCR and FCOR, which can be seen in tables in the summaries. These are simply ERD averages for whenever a player is on base or in the field. The current versions are only calculated for the entire year and are somewhat crudely corrected to the team's overall offensive and defensive performance. They're still pretty interesting numbers, but in the update I will be correcting them with the hitter's offensive average and the pitcher's average on a per plate appearance basis, which should make them very accurate indeed. In fact I'm considering a set of telescoping corrections for fielding. Hitting and pitching for all fielders, a catcher average correction for everyone else, and then an infield average correction for outfielders. That should make the numbers pop. The reason I want to go to all this trouble is that the existing numbers make it look like some very interesting stuff is going on here. Update: OK, I have the new numbers now. It took a while as there were some bugs in the corrections, but those are now fixed. The new FCOR numbers are generally smaller in magnitude than the old experimental ones (without per-plate-appearance corrections), so in a sense they kind of diffuse the interest a little. There are still some cuiosities to be examined, and it will take a while to get used to the new fielding rankings.

In the per-game summaries, there is another statistic R which has a fractional value that can be positive or negative. This is the number of runs assigned to that player based on their total offensive performance for that game. For both pitchers and hitters on each side, they add up exactly to the total score, because they have the "free lift" value distributed among them . You might wonder why I need both ACR and R, but they fill different roles. ACR is better for long-term averages and R is better for looking at a single game's internal mechanics.

Speaking of which, there is a new pitching decision, indicated on the per-game player summary by a + or - next to the PA number. The details of this are complicated, and may change in the next update. I think it works pretty well, and has a freer decision tree than the standard, e.g. the winning pitcher is not required to pitch any specific length of time, nor does the timing of the scoring of runs by the pitcher's own team have any effect. On rare occasions the decision can be split, like when the decision devolves onto two relievers who both pitch a perfect inning. The basis of the decision is a comparison of values in terms of R/pa for both the individual pitcher and the team's offense for that entire game. There are marks beside the PA numbers to indicate which players qualify for the decision based on this measure. As you can see, it is possible to run the decision algorithm on hitters as well (slightly modified), and I display a result of this type. I'm considering splitting the hitting decision further using k-means clustering on (R PA BR), but the existing decision is not too shabby. As yet it is only displayed per-game and not tabulated, but that will change in the release. My current pitching decision is the same as the official one roughly 50% of the time. Update: the new software is now about half done, and one interesting result is here: the complete list of 30-game regular season winners (from end-2023 retrosheet). It's a short list. The numbers given below are wins, losses, and complete games on the year. Note the letter b there - Babe Ruth is the only 30-game winning batter in baseball history!

runs/1912total:p:RS:WS1:johnw102:Walter Johnson:31:10:34
runs/1912total:p:RS:BOS:woodj108:Smoky Joe Wood:33:5:35
runs/1915total:p:RS:PHI:alexg102:Pete Alexander:30:7:32
runs/1917total:p:RS:PHI:alexg102:Pete Alexander:30:13:34
runs/1920total:b:RS:NYA:ruthb101:Babe Ruth:31:3
runs/1934total:p:RS:SLN:deand102:Dizzy Dean:30:6:24
runs/1963total:p:RS:LAN:koufs101:Sandy Koufax:31:5:20
runs/1968total:p:RS:DET:mclad101:Denny McLain:32:7:28

Finally we come to those thin lines on the graph, and also to the name arctEX. After I had got my run value graphs properly straightened out and printed out on fancy paper at a print shop, I decided they were a pretty canny summary of a baseball game. So I decided to make a probabilty estimator based on them. The thin black line shows the estimate (on a scale that's always 0-100%) of the home team's win probability. This is calculated according to the following equation:

P = (1/2) + arctan( (( X - Ao ) / Bo ) + Zg )/pi

where X is the difference in scores plus expectation values, in other words the distance on the graph between the light and dark thick lines. So the name is "arctangent of expectation values". The numbers Ao and Bo are indexed by the number of outs so far in the game o. These are calculated from the entire history of retrosheet by a dead-simple simulated annealer I open-coded in 16 lines of perl, given below. The index numbers should go from 0 to 59, as extra innings are all considered a repeat of the 10th in this scheme. There is no 60th out because the game is considered over at that point, and the estimate is replaced with the result. Actually my current version tries to cut a further corner by only going to 53, but this doesn't quite work as the beginning of the 9th and 10th are not the same. Occasional small glitches will be seen in the graphs as a result of this, but that will be fixed. The number B tends to decrease as the game progresses, magnifying the effect of the score difference. A in effect tracks the shifting value of the home team advantage. I have done a preliminary check by binning the estimator's estimates in bins 0-10% 10-20% etc., and checking their a posteriori values, and they're basically accurate. This is a good mathematical estimator in that it is most accurate in the center of its range. In the tails it tends to overestimate the losing team's chances a little. This is to say it's good for using an in input to other statistics but it wouldn't be worth much to a gambler (I assume). The series included in the above pdf were chosen in part to show off the estimator in exciting and unusual circumstances.

The light thin line is an estimator for 2-3-2 format series, showing the probability for the series home advantage team to win (the same for all graphs in the series). This is very simply calculated by taking the series state to be (home advantage team wins, other team wins). Likelihoods to win the series are tabulated for all of these states, and the estimate is from using the game probability estimate to interpolate between two of these numbers. This raises an issue which explains the final factor above Zg - the game adjustment. The table of series win probabiities based on those state assignments imply game win probabilities via continuity equations. The game probability estimator has a default value for the start of the game (about 53%). In order to avoid glitches in the series estimate, the game start probability must be adjusted to the appropriate value calculated from what I call the 232 table. This is Zg, the starting probability adjustment which is a constant for a game. Its value naturally gets swamped by X by the end of a non-tie game. In fact this method of adjusting the initial probability works so well, I'm planning to calculate it for ballparks and use it to calculate home team win probability estimates for all regular season games (right now it's only postseason).

Having an estimator in hand, wouldn't it be cool to have a stat that showed the average game win probability at a pitcher's entrance, and at his exit? In fact mathematically you really just want to work with probability ratios, or a logarithmic proxy. This has an interesting wrinkle of its own. You want to take the log of the probability estimates and then make per plate appearance averages of the diffrences between these log values. But the estimator can go either way at the end of the game, which means the log goes infinite in one direction. But you have to include that infinite term in your average somehow. For this statistic, you need to identify the winning team and use probabilities for that team to win for all such stats computed for the game, negating the values for the losing side. But that means a live game has two ambiguous values of the stat! You may find this a bit confounding, but it is a mathematical reflection of reality. I think the more interested sort of fans would be able to understand the "player's contribution if they win/lose" numbers - the numbers should make sense. After all, the point of calculating these stats is that they zoom in on "what have you done to win this game/all games" like nothing else. Obviously I need to produce a demo of this. Update: which I have now done. These numbers look really interesting. They're labelled ACW in the summary tables (average contribution to wins). If they have a potential weakness, it's that they may in some cases be dominated by very large individual terms, more so than averages like ACR/ACOR. But arguably that concentration is part of what makes this stat unique and interesting.

Simulated annealer included to show people it's not super complicated or anything - the code to compute an ERA or RBI is similar in length. The function sqerr simply uses the trial values of a and b (called $anx and $bnx here), together with the "game out" number $go (referred to above as o), to go through all situations represented in all of retrosheet with this many outs in the game, and to use the a and b numbers to produce estimates for all of those occasions and then to sum the squares of all the "errors", i.e. the difference between the estimate and either 1 or 0 depending on whether the home team eventually won or lost on that occasion (its code is basically two lines - the long part is turning the entire history into the data structure it uses to accelerate the computation). The annealer just tries random small changes and prefers values that fit the history better. It only takes a few minutes on a desktop computer to do the whole thing.

In the following code, the variables $N, $C, $T, $an, $bn, $en, $besta, $bestb, and $beste are initialized, and there is a loop over all values of $go, all omitted for clarity. The variables $besta and $bestb contain the result for this value of $go.

for $n (1..$N) {
        $D = $C * (rand(1) - .5);
        if($n%2) {
                $anx = $an + $D;
                $bnx = $bn;
        } else {
                $anx = $an;
                $bnx = $bn + $D;
        }       
        $enx = sqerr($go, $anx, $bnx);
        $X = rand(1);
        $P =  min(1, exp(($en - $enx) / $T) );
        if($X <= $P) { $an = $anx; $bn = $bnx; $en = $enx; }       
        if($enx < $beste) { $besta = $anx; $bestb = $bnx; $beste = $enx; }
        $T *= .99; 
}       

OK, so hopefully in a few months there will be a major expansion of what's available here. The source code is all available under the GNU GPL. There will be a longer document describing the algorithms in more detail, and also the reasoning behind them. For example, the ensemble of statistics is intended to give the player only team-positive incentives. There will also be a substantial update to the code, to fix bugs and add necessary features. There will be some updates to show some more things in the charts and graphs. Some other random things on the to-do list for an even later update include a pitching calendar showing a team's pitchers' appearances in the last week of the season and the postseason, a probability estimator for (another) extra inning(s), a detailed decision tool for "is a bunt worth it", and a "lineup construction kit". But the goal for the next update is to produce a basic set of statistics that I think can last as a foundation (for a lasting hobby anyway).

Play ball! The arctex download is here. There is a script called run_everything which will do what it says, starting by downloading the Retrosheet big zip file, and finishing by generating the exact PDF that appears above, with all the analyses run in between. There are also some tools for looking at the stats in a number of ways, the most important of which is called bb-erd. This is an interactive command-line tool to query the ERV tables, calculate ERD code values for different ballparks, and to run the game and series probability estimators for live games. You have to run_everything first. bb-erd does at least have a help screen, which you can get by entering ? or h. To run the software, you need: bash perl wget unzip ps2pdf (from ghostscript), i.e. a normal Unix environment. Mac should probably work, but I haven't tried it. The download is around 100KB, but it expands to 5GB when you run_everything, which should take about an hour on a fast machine. Have fun! See below for new features and release notes.

Here's a quick guide to making PDF files with the software. Say you want a single PDF for the 2000 world series. I admit some of these could be automated a little better.

./guide-graph 2000 WS
./guide-roster 2000 WS
./guide-career 2000 WS|perl -C7 ./text2ps -c -b -s 8 -m 20 -T >guides/2000-WS-career.ps
./series-match -p 2000 >guides/2000-season.ps
./pscat -c guides/2000-WS-graph.ps guides/2000-WS-roster.ps guides/2000-WS-career.ps guides/2000-season.ps >2000-WS-unified.ps
ps2pdf 2000-WS-unified.ps
And that will give you a file called 2000-WS-unified.pdf. Other series than the world series can be got by substituting NLCS or whatever. Also you can put in a team code to get the files for regular season home games for that team.

The new features are here, and a lot of bug fixes! There is even a little prelminary software documentation included in the release. You can see the current download file has a bizarre sort of version number which looks like 24_6B2_82. This is a date in my calendar called Denarius Verus. You can find out all about that on my Home page. Don't worry, the software never uses dates like that - only the version number.

New features:

- A new series MVP is determined for every postseason series.

- Similar to the calculation of the yearly and series MVPs, an all-time "hall of fame" is introduced, ranking all players by a single combined metric.

- A new stat for games and series calculated basically as sqrt ( sum ( ACW^2 ) ). This is called X for excitemt factor. It is tiny for early inning blowouts and huge for games with many late inning lead changes. Or X could be for expectation - it's essentially a confusion metric. When it's low, you know what comes next, and when it's high you don't. There is an all-time ranking of games and postseason series by X. Also there's now a display of X for the entire regular season for each team in the season summary.

X has been computed and its scale decided. In effect it goes from 1 to 100, although there isn't an upper limit in theory. The highest X game was BOS193808232 with a final score 12-14 and a walkoff grand slam, with an X of 85.1. The worst is CLE199010020 which was 0-9 after the first inning and finished 3-13, with an X of 3.2. The game with the most lead changes (7 - there's a tool for this) was MON200005140 which is #15 all time in X with 76.4. For series, the 1986 ALCS had an X of 92.6, whereas the 2013 NLWC game had only 5.7.

- As always, there are any number of minor graphical improvements, and some new things to display, like the length of each game.

And yes, there are probably still bugs lingering. A bug-hunting effort is ongoing. Some days, the ball hits you.

24_6B2_82
run_everything:
        - winp-params is now distributed, so don't actually run the optimizer
guide-roster:
        - separate header for T/X/!
        - print 100 * X / game time as '!'
        - indicate pinch hit/run/field by dots:
                '.' for pinch hitter
                ',' for pinch runner
                center dot for fielding position after start
                ':' and ';' for combinations

24_6B1_84
sm-hof: reduce weight of FCOR by factor of 3 in hof (this will be documented later)

24_6B0_80
sm-hof:
        - add sqrt(pa) factor to move short-timers down the list a little
generate:
        - add version number to PDF

24_6B0_51
hwhps:  bug in sign of last ACW term in every game - yowch! (actually only home team losses).
	16 regular season MVP titles changed as a result, making this the new worst bug ever.

24_6A5_98
rs-notice: was omitted from dist - apologies for debugging the release process

24_6A5_72
winp-params:
	- update for new retrosheet release

24_6A5_52
bb-erd:
        - fix for plays-erd errors
generate:
        - add rs-notice to bb-post
        - add bb-post to sample.pdf
guide-roster:
        - adjust spacing of game headers
winp-params:
        - include file in distribution to initialize and stabilize ACW
rs-notice:
        - retrosheet copyright notice

24_6A4_92
Beware lots of new untested code could have bugs!
sm-hof:
        - new
        - hof
        - smvp
rs2erd:
        - add new xX codes for games arbitrarily ended early
bb-erd:
        - update for xX
erd-parse:
        - update for xX
erv-tab:
        - update for xX
generate:
        - update for xX
        - update for homewins -3
        - make games-by-x and series-by-x
        - update 'best' for new hw-stats format
        - make hof and smvp
run_everything:
        - update for homewins -3
        - make gacw/
        - make games-by-x and series-by-x
guide-graph:
        - update for xX
guide-roster:
        - update for xX
        - update for new hw-stats format
        - update for new run-stats format
        - display ACW and w-l per series and ACW per game
        - display game X and length in minutes (slightly hacky, could change a little)
        - incorporate series MVPs
        - add new hall of fame table
homewins:
        - update for xX
        - use 'all' erv table
        - skip ties entirely
        - fix for out 57 erv table
        - cleanups
        - add -3 option to improve performance (much faster now)
hps:
        - update for xX
hwhps:
        - update for xX
        - use 'all' erv table
        - skip ties entirely
        - fix for out 57 erv table
        - ACR n-play rules for hitting ACW
        - compute X
        - game stats in gacw/ dir
        - separate stats by season/series code
hwprob:
        - update for xX
        - use 'all' erv table
        - skip ties entirely
        - fix for out 57 erv table
mplhi:
        - update for xX
run-stats:
        - update for xX
        - final NPIT fix, affects 1 game.  run-stats is now verifiably correct for every game.
        - split decisions up by team and series in runs/*total
series-match:
        - update for xX
        - update for new hw-stats format
        - update for new run-stats format
career:
        - update for new hw-stats format
30wins:
        - update for new hw-stats format
viewdec:
        - update for new hw-stats format
hpmvp:
        - update for new hw-stats format
        - update for new run-stats format
documentation:
        - new (not too much there yet)

That's plenty for now! I will have to find time to do all of this, but I hope to have it done by the end of 2024 at the latest. Bugfix-only releases may happen sooner.


Home page