There was a nice post yesterday on the Impact of Social Sciences blog by Carly Strasser on data management for the data-oriented research:
The OP is written primarily for social scientists, and primarily those pursuing an academic career, but many of the points apply to doing work with any kind of data, including sports data, especially:
1. Learn to code in some language. Any language.
2. Stop using Excel. Or at least stop ONLY using Excel.
3. Learn about how to properly care for your data.
4. Write a data management plan.
5. Read Reinventing Discovery by Michael Nielsen.
6. Learn version control.
8. Let everyone watch.
These echo closely the core values of Chadwick Bureau, and how we go about producing and maintaining quality datasets. We hope to come back to some of these points in products and projects we have planned for 2015, so do stay tuned.
Baseball-reference.com has added winter league and Arizona Fall League stats, provided by Chadwick Bureau:
In addition, historical Cuban National Series stats are up, thanks to Brian Cartwright; we are pleased to have been part of facilitating their publication.
A new release of the Chadwick Persons Register is now available.
In addition to the usual additions and demographic updates for historical players, this release confirms the Retrosheet IDs issues for new debutants in 2014.
We expect there will likely be one more iteration of the public register before or right around Opening Day.
We have released version 0.6.4 of the Chadwick tools for manipulating play-by-play and game-level data.
This release adds support for the new umpire and manager review flags (/UREV and /MREV) which appear in the 2014 Retrosheet release, as well as improving support for courtesy runners and player re-entry.
As usual, full source code is available as well as pre-built binaries for Windows users.
We have now updated our git repository containing all the downloadable Retrosheet files at https://github.com/chadwickbureau/retrosheet.
Advantages of using this repository include:
- You can download all the Retrosheet files in one go, rather than individually downloading each archive from the Retrosheet site.
- You can see what changes in the files in each release, as the repository has several years of history you can look back at.
- In the master branch, we also maintain some patches to the Retrosheet files to correct minor formatting or other data errors.
With the World Series closing out the (North American summer) seasons, we have just posted a new version of the Chadwick Persons Register.
This includes all players to appear in North American affiliated leagues, North American independent leagues, NPB, and the KBO in 2014, as well as identifier cross-references where known.
It also includes provisional baseball-databank IDs for players who made their MLB debuts in 2014. We will post an update over the off-season once Retrosheet identifiers have been confirmed for the debutants.
As always, enjoy!
We’re very pleased to have collaborated with baseball-reference.com, Brian Cartwright, and Patrick Bourgo and SABR’s Korea chapter, to help bring a first version of Korea Baseball Organization stats to the bb-ref website.
To celebrate Opening Day, we have just posted a new version of the Chadwick Persons Register.
- An ID column for players on the Korean Baseball Organization site
- Cuban National Series players from the 2013/4 season.
We’d like to acknowledge the kind assistance of Brian Cartwright in improving in these two areas.
Our other new development is the Register will now start to carry identifications for collegiate players and coaches as well. In this version, there are IDs for all players who appeared in an NCAA Division I game in 2013, as well as selected others. In addition, collegiate playing spans are provided for MLB players, as well as other players where known. (For non-MLBers, these are extremely sparse.) Metadata for most collegiate-only players is also at the moment very limited. College identifications is an area we hope to develop gradually over the coming years.
Enjoy, and welcome to 2014!
The Bureau is pleased to make available to the community a daily build of play-by-play data from Spring Training 2014. Visit our data page for details and the download link.
We have just posted a new version of the Chadwick Persons Register. In addition to various demographic updates:
- Records for Cuban National Series players are now complete back to the 2006-2007 campaign. (Records for players who debuted in the current campaign will be added later this spring.)
- Entries for players in the 2013/4 fall and winter leagues (Arizona Fall, Australian, Dominican Winter, Mexican Pacific, Roberto Clemente/Puerto Rican, and Venezuelan Winter) have been added.