Useful tips for the data-oriented researcher

There was a nice post yesterday on the Impact of Social Sciences blog by Carly Strasser on data management for the data-oriented research:

The OP is written primarily for social scientists, and primarily those pursuing an academic career, but many of the points apply to doing work with any kind of data, including sports data, especially:

1. Learn to code in some language. Any language.
2. Stop using Excel. Or at least stop ONLY using Excel.
3. Learn about how to properly care for your data.
4. Write a data management plan.
5. Read Reinventing Discovery by Michael Nielsen.
6. Learn version control.

8. Let everyone watch.

These echo closely the core values of Chadwick Bureau, and how we go about producing and maintaining quality datasets. We hope to come back to some of these points in products and projects we have planned for 2015, so do stay tuned.

Baseball-reference adds winter league stats, Cuban stats has added winter league and Arizona Fall League stats, provided by Chadwick Bureau:

In addition, historical Cuban National Series stats are up, thanks to Brian Cartwright; we are pleased to have been part of facilitating their publication.

Chadwick Persons Register, release 2015-01-30

A new release of the Chadwick Persons Register is now available.

In addition to the usual additions and demographic updates for historical players, this release confirms the Retrosheet IDs issues for new debutants in 2014.

We expect there will likely be one more iteration of the public register before or right around Opening Day.

Chadwick tools 0.6.4 released

We have released version 0.6.4 of the Chadwick tools for manipulating play-by-play and game-level data.

This release adds support for the new umpire and manager review flags (/UREV and /MREV) which appear in the 2014 Retrosheet release, as well as improving support for courtesy runners and player re-entry.

As usual, full source code is available as well as pre-built binaries for Windows users.

git repository of Retrosheet data updated

We have now updated our git repository containing all the downloadable Retrosheet files at

Advantages of using this repository include:

  1. You can download all the Retrosheet files in one go, rather than individually downloading each archive from the Retrosheet site.
  2. You can see what changes in the files in each release, as the repository has several years of history you can look back at.
  3. In the master branch, we also maintain some patches to the Retrosheet files to correct minor formatting or other data errors.


Chadwick Persons Register updated 2014-11-03

With the World Series closing out the (North American summer) seasons, we have just posted a new version of the Chadwick Persons Register.

This includes all players to appear in North American affiliated leagues, North American independent leagues, NPB, and the KBO in 2014, as well as identifier cross-references where known.

It also includes provisional baseball-databank IDs for players who made their MLB debuts in 2014. We will post an update over the off-season once Retrosheet identifiers have been confirmed for the debutants.

As always, enjoy!

KBO stats on baseball-reference

We’re very pleased to have collaborated with, Brian Cartwright, and Patrick Bourgo and SABR’s Korea chapter, to help bring a first version of Korea Baseball Organization stats to the bb-ref website.

Chadwick Persons Register updated 2014-03-22; now with college players

To celebrate Opening Day, we have just posted a new version of the Chadwick Persons Register.

New inclusions:

  • An ID column for players on the Korean Baseball Organization site
  • Cuban National Series players from the 2013/4 season.

We’d like to acknowledge the kind assistance of Brian Cartwright in improving in these two areas.

Our other new development is the Register will now start to carry identifications for collegiate players and coaches as well. In this version, there are IDs for all players who appeared in an NCAA Division I game in 2013, as well as selected others. In addition, collegiate playing spans are provided for MLB players, as well as other players where known. (For non-MLBers, these are extremely sparse.) Metadata for most collegiate-only players is also at the moment very limited.  College identifications is an area we hope to develop gradually over the coming years.

Enjoy, and welcome to 2014!


2014 Spring Training play-by-play updated daily

The Bureau is pleased to make available to the community a daily build of play-by-play data from Spring Training 2014.  Visit our data page for details and the download link.

Chadwick Persons Register updated 2014-03-04

We have just posted a new version of the Chadwick Persons Register.  In addition to various demographic updates:

  • Records for Cuban National Series players are now complete back to the 2006-2007 campaign.  (Records for players who debuted in the current campaign will be added later this spring.)
  • Entries for players in the 2013/4 fall and winter leagues (Arizona Fall, Australian, Dominican Winter, Mexican Pacific, Roberto Clemente/Puerto Rican, and Venezuelan Winter) have been added.