Ted Turocy, technical coordinator for the SABR Minor League Encyclopedia, has posted a request for volunteers to work on data contributed by Paul Porter:
Hi again all,
I'd like to put out a "revised" call for volunteers to help move the process of getting Paul's spreadsheets incorporated into the database.
I have been working on some tools to automate much of the ID mapping process I was originally looking for. It turns out some pretty simple tests can identify around 85-90% of the players in a typical spreadsheet. Most of the rest are spelling variations, or other ambiguities (two players with the same name on the same club, etc.)
So, I'd like to break up the process into two parts. The first thing that we need to do is to restructure the team and league columns in the spreadsheets. I've just uploaded a file called
"1903-stdclubs.xls" which illustrates the process using 1903.
What has been done:
(1) Create three columns for team names, Team1, Team2, Team3. For multi-club players, enter the teams he appeared for one per column. (There are very rare instances of 4-club players. If this happens, just leave the original club entry in Team1. I will process these specially, there are so few.)
(2) Make sure the team names match the names in the database. Some common divergences include things like "St Paul" (we use "St." with a period), "Ft Worth" (we always spell out "Fort"). Check the website if you're not sure.
(3) Expand the league abbreviation out to the full league name as we have it in the database.
You should feel free to sort the spreadsheet if it helps -- just make sure that if you do, your spreadsheet selects all the rows and columns. Some spreadsheets have column C blank, which would mess things up. (By that way, if this does happen, my programs will freak out later on -- so we will know something went wrong. Don't be afraid that a bad sort will rewrite history!)
I am going to be traveling the next few days. If you're interested in doing this, feel free to just post to the list which years you've taken. If you're any good with spreadsheets, you can probably
manipulate one year in 15-30 minutes at most. Once you've finished them, send them to me off-list -- we will wind up using most of our 100Mb limit in the Files section pretty quickly otherwise.
Once this is done, I hit the spreadsheets against the database to map players. Then, the next task will be for a human to look at the unmatched players and determine why they failed to match. This should be a small fraction of the total players in the database -- much easier than having to do the whole shebang.
Ted Turocy
Technical Coordinator, SABR Minor League Encyclopedia
Labels: baseball, Minor League Committee, Minor League Encyclopedia, minor leagues, SABR