Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Thursday, December 18, 2008

Hall of Merit Plaque Room Website

Hey everyone, I wanted to share some news and also ask for some help.

Shock and TangoTiger (Awe was unavailable) have offered to help with building a website for the Hall of Merit’s plaque room. Shock is going to take the lead on designing the site and TangoTiger is going to take the lead on the back end database. I’ve reserved the domain name www.hallofmerit.com for this.

We’ll need some help populating the database if you have some spare time and want to help out, that would be great.

What I’m envisioning is something where you could search on a voter’s history, a player’s ballot history, a particular year’s election results, etc.. I’d like to have a page for every player who has ever received a vote, and and every voter who has ever voted. We could also have things like DanR’s WARP; Chris Cobb’s Negro League translations, my pitcher WARP, projected military service credit, links back to their discussion pages, etc..

I’ll leave it Shock and TangoTiger to explain what will specifically be needed, etc.. I’m really excited about this, and cannot thank them enough for offering to work on this.

Joey Numbaz (Scruff) Posted: December 18, 2008 at 04:47 PM | 224 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 3 pages  1 2 3 > 
   1. Joey Numbaz (Scruff) Posted: December 18, 2008 at 04:59 PM (#3032580)
Another great thing about this is that the back end database will be quite a nifty research tool, once it's built.
   2. Tango Posted: December 18, 2008 at 05:04 PM (#3032587)
Here's what the database will look like:

Voters
- voter_id
- voter_name
- voter_??
- voter_??

What other information do you want to capture for each voter? Note: if you can get the information from the other tables (like when he first voted, or how often he voted, etc, then we do NOT want to capture it).

Years
- year_id
- electee_count

Players
- player_id
- player_name
- player_bio (large text field)

No other player information is needed, as we can then link to the BDB/Lahman database to get playing and other bio data. Indeed, you can provide links to Baseball-Reference.com or Retrosheet.org or BaseballProspectus.com player pages.

You can, as an add-on, provide career stats based on the BDB database. For example:
Player_Voter_Evaluations
- player_id
- voter_id
- score_value

And

Player_Voter_Year_Evaluations
- player_id
- voter_id
- season_id
- score_value

So, Joe can provide say a career "wins" total for any (or all) players, and we'll find his data in the Player_Voter_Evaluations table. Dan can do similarly on an annual basis and provide it here: Player_Voter_Year_Evaluations . Other voters can do likewise. And if that above structure doesn't work for them, then we'll build an add-on table to support them.

Ballots
- voter_id
- year_id
- player_id
- ballot_rank
- voter_comment
- selection_rejected_switch

This last table is basically the one that links all the other tables. You can, at its best, be able to link this table to all the other tables, and if someone asks:

"Give me the Joe Dimino evaluation score for each player on the 1998 ballot, along with all the bio commentary for those players, and how many times each player has appeared on that ballot, and what was his highest ranking until that point, and when was he ultimately elected, if he was. And only give me for those players that John voted on."

Then with a bit of SQL, you can get your answer.

If there are other pieces of data that voters would like to capture, then what needs to be done is it ITEMIZE EVERY single piece of data that you want to record, and then we can construct a design to hold that data.

So, what needs to happen is for each voter out there to list the data you'd like to record. You should not, at this moment, consider "lists" or "totals" or "reports" or the like. Think of it like instructions for building that table: itemize all the tools and materials you need. LATER, we'll worry about how we're going to put these things together.

Tom
   3. Joey Numbaz (Scruff) Posted: December 18, 2008 at 05:29 PM (#3032617)
OCFs consensus scores will be another piece of data we can add for each election for each voter.
   4. Joey Numbaz (Scruff) Posted: December 18, 2008 at 05:31 PM (#3032619)
Although, I suppose the consensus scores can just be calculated by the DB, based on the ballot info, so they probably don't need to be captured.
   5. Tango Posted: December 18, 2008 at 05:37 PM (#3032629)
I don't know what "OCF" means, but if it can be calculated, then we definitely don't need to discuss it for the database.

Those discussions can happen afterwards, when discussing reports and anything else that can be handled with instructions or algorithms.

Right now, we are only worrying about "atomic" things (things that are so basic that you can't divide it and can't be derivable from other things).
   6. Joey Numbaz (Scruff) Posted: December 18, 2008 at 05:41 PM (#3032632)
OCF is a contributor to the Hall of Merit that calculates consensus scores for each voter after each election.
   7. David Concepcion de la Desviacion Estandar (Dan R) Posted: December 18, 2008 at 05:53 PM (#3032652)
This is, like, RIDICULOUSLY awesome. I am uber psyched. Please let me know how I can be of use. And Tango, while we have you here, could you weigh in on the Shortstops thread (linked from the right of this page) where your analysis was requested?
   8. Tango Posted: December 18, 2008 at 05:55 PM (#3032657)
Ok, if it's a straight calculation, something that has finite rules, and requires no manual or human input, then, correct, we don't need to capture this information in the database, and will instead be something that we can generate from the data itself.

***

Is there anything else about the voters or ballots that need to be captured? Do you want to know the voter's birth year, or favorite team? Do you want to know if his expertise is history, sabermetrics, or knitting?

For the Years table, I think you'll want to add:
election_date

Try to go through some of these election ballot threads, and see what kind of data voters are providing about players or ballots, or what kind of information you are providing.

I just thought of another. For the Players table:
bio_url
plaque_url

For the Years table
discussion_url
results_url

Again, just think about anything that you can capture, so that all reference information is here.
   9. OCF Posted: December 18, 2008 at 05:59 PM (#3032666)
Hi, Tango. You and I exchanged emails about this a few days ago, but since the email was in my real name, the screen name might not have stuck in your head. What I have is incomplete, (only going back to 1923 or 1921) but I'll eventually send you something you can use. I suspect that Howie Menckel has some stuff that probably complements what I have to some degree.

I'll be able to get you specific votes by specific voters but only back to the early 20's. Does someone else have that data for the early years of the project?
   10.  Hey Gurl Posted: December 18, 2008 at 06:02 PM (#3032671)
Good news everyone!

I just spent 20 minutes attempting to drive to work and getting about 10 feet, so I guess I'll stay home today and work on this. ;-)

I can also do any database stuff required, and what have you.

I was thinking we have a players table and a ballots table, and a bridge table so we can store the balloting history of every player. eg.

Players
* player_id
- name

Ballots
* year
- 1stPlayer
- 1stPoints
- 2ndPlayer
- 2ndPoints
etc.

Player_Ballots
* player_id
* year
- 1stPlaceVotes
- 2ndPlaceVotes
...etc...

I am just thinking out loud here.

The one thing I cannot really do, by the way, is graphics. So if anyone out there wants to make logos, banners and what have you, go nuts.

This is going to be so cool. :-)
   11. Joey Numbaz (Scruff) Posted: December 18, 2008 at 06:03 PM (#3032673)
OCF - if we can't pull it from the threads here, due to the post garbling that happened on the changeover a few years ago; we can probably get them from archive.com.

I know John Murphy went through there and rebuilt many previously destroyed posts, but I don't know if he wasn't able to find them all, or didn't have the time to find them all. If it's the later, I assume we should be OK.

Another thing that could be calculated from the individual voter ballots is an 'as voted' personal Hall of Merit, especially for voters that have participated in every election.
   12. rawagman Posted: December 18, 2008 at 06:21 PM (#3032697)
Joe - I know my last attempt to help bring a website to fruition didn't work out so well, but I'm happy to help wherever I can
   13. Tango Posted: December 18, 2008 at 06:54 PM (#3032741)
Shock/10: your post seems to be in similar spirit to my post 2.
- Your Players table is a repeat of mine.

- Your Ballots post is a cumulative table of all voters results. If we do not have voter history for a particular ballot year, for whatever reason, then we need a table to handle that, similar to what you are stating. Is it the case that we have results, but not individual voter tallies for some years?

- Your Player_Ballots has a similar issue to the Ballots table.
   14.  Hey Gurl Posted: December 18, 2008 at 07:10 PM (#3032773)
You're right, Tango.

Is it the case that we have results, but not individual voter tallies for some years?


This was my impression. But I suppose if we have individual results for every year then there's no need for a ballots table as the voters table would have all the necessary data.
   15. Tango Posted: December 18, 2008 at 07:38 PM (#3032852)
I just want to highlight this part, if everyone out there wants one takeaway from my posts:
Right now, we are only worrying about "atomic" things (things that are so basic that you can't divide it and can't be derivable from other things).


We're only interested in the periodic table. Molecules, figuring out if you want H2O or CO2, those are derivative of the periodic table, and we're not interested for periodic table purposes. That's for later, for all the Dr. Frankenstein's out there.
   16.  Hey Gurl Posted: December 18, 2008 at 07:40 PM (#3032856)

I just thought of another. For the Players table:
bio_url
plaque_url

For the Years table
discussion_url
results_url


discussion_url should be in the players table as well.

The discussions about individual players is my absolute favorite part of the HOM. I can't tell you guys how much I've learned about players from before my time from here!
   17. David Concepcion de la Desviacion Estandar (Dan R) Posted: December 18, 2008 at 07:42 PM (#3032861)
Needless to say, I'd be happy to put my WARP archive up on the site. Is there any chance it could spit out web pages with my stats, like baseball-reference or BP's DT cards?
   18. Tango Posted: December 18, 2008 at 07:46 PM (#3032873)
If the data is there, we can make it spit out anything we want.

Just as an example, when you look at the Fans Scouting Report, all that is is about 10 columns of data from each player, with some calculation behind the scenes.

So, if you have basic data like that, then it's a snap to generate web pages for them.
   19. Tango Posted: December 18, 2008 at 07:49 PM (#3032881)
Shock: ok, updated my file.

Dan: if you have an example of the kind of data (or formula) that you would supply, that would be helpful.
   20.  Hey Gurl Posted: December 18, 2008 at 07:58 PM (#3032905)
What Tango said.

Building reports, etc. is the easy part. The only reason I haven't made the site yet is because I haven't had the data. :-)

I made a quick little er diagram just to make sure we're on teh same page:

http://img120.imageshack.us/my.php?image=homdbsv0.jpg[/img]]Diagram

...yes?
   21. David Concepcion de la Desviacion Estandar (Dan R) Posted: December 18, 2008 at 07:58 PM (#3032906)
Tango, I'll email you my archive now.
   22.  Hey Gurl Posted: December 18, 2008 at 07:59 PM (#3032907)
Bah mangled:

Diagram
   23. Tango Posted: December 18, 2008 at 08:19 PM (#3032932)
Shock: same page.

***

Dan supplied me with his files. So, I would create a table called Rosenhack or Dan or whatever, with these columns:
YEAR
Team
Lg
Player
Age
Pos
SFrac
BWAA1
BRWAA1
FWAA1
Rep1
WARP1
WARP1/Yr
LgAdj
BWAA2
BRWAA2
FWAA2
Rep2
WARP2
WARP2/Yr
PennAdd
Salary

(Though I wouldn't use the /)

Since he has multiple versions of his data, I would include a column called "Version", as he has three different versions (so far), all showing the same data.

You'd like a DanVersions table, so that we can show what each version is:
- version_Id
- version_comment

We'd need an additional table called DanBDBTeams, so that we can link Dan's teamid to the BDB team ID, since Dan uses a different team id.
- dan_team_id
- bdb_team_id

He also has a fielding table, so we'd have DanFielding with these columns:
Year
Team
FWAA1
NRA-DERA Adjustment

All this is pretty straightforward.
   24. Tango Posted: December 18, 2008 at 08:21 PM (#3032936)
Let's put a deadline of say Mon, Dec 22, for people to offer their inputs as to what data we should be recording. Then we can come up with an initial working design before the end of the year.
   25. David Concepcion de la Desviacion Estandar (Dan R) Posted: December 18, 2008 at 08:27 PM (#3032944)
Rosen-HECK, not hack! I know, I'm a journalist and it's a tempting pun, but please, some decorum here! :)
   26. Joey Numbaz (Scruff) Posted: December 18, 2008 at 08:29 PM (#3032948)
Tango I already have Dan's stuff in an Access DB if that will save you some time . . . I've added some of the war credit seasons. As far as I know his data has seasons played during the war downward adjusted, but does not have info based on his formulas for estimating war credit for players that served in the military.

I will also send you my pitcher spreadsheet.
   27. Joey Numbaz (Scruff) Posted: December 18, 2008 at 08:30 PM (#3032952)
Let's put a deadline of say Mon, Dec 22, for people to offer their inputs as to what data we should be recording. Then we can come up with an initial working design before the end of the year.


One thing to note, I will have limited internet access from the afternoon of 12/24 through the morning of January 5. I'm going to be traveling, visiting family and friends back east, etc..
   28. Tango Posted: December 18, 2008 at 08:37 PM (#3032965)
Joe,

One good thing about Access is that it lets you "link" databases, rather than forcing you to import them. We could easily have an HOM database, and a RosenhEck database, and then just link the two. The user won't know the difference, and it will make life VERY EASY whenever Dan has updates. You simply delete the old database, and upload the new, all the while never touching the HOM database, as it'll simply link in to the new one automatically.

This is preferable, as you can also do the same with the BDB database, simply linking in to the MASTER, BATTING, PITCHING, FIELDING and whatever other tables are out there. And when the BDB gets updated, you don't do a thing to HOM.
   29. Tango Posted: December 18, 2008 at 08:38 PM (#3032969)
So, you can send me the Dan database(s), and when I create the HOM, you'll see how snazzy it'll be.
   30. David Concepcion de la Desviacion Estandar (Dan R) Posted: December 18, 2008 at 08:46 PM (#3032983)
Tango Posted: December 18, 2008 at 03:38 PM (#3032969)
when I create the HOM


Umm...we all know you're a miracle worker, Tango, but I think you're about six years late to the party here. :) Shame you never voted, by the way.
   31. Chris Cobb Posted: December 18, 2008 at 08:50 PM (#3032986)
This database will be a great resource! Thanks to all who are giving time to make it happen!

Joe mentioned in the thread header that my Negro-League translations might be part of the database.

If that is to happen, in what form would you want or need the data for Negro League MLEs?

Compared to folks who build databases, my data management and manipulation skills are VERY primitive, so I am concerned that the data I have are far from being in form suited to being uploaded into a database. With time, I can get the data into other forms (my excel spreadsheet skills are passable, and I can learn new skills as need for them arises), but I would need guidance on what format will be needed for the database.

Would I be correct in assuming that as long as the database is structured to receive the data, and the data is structured to be entered into the database, the actual uploading could happen either a bit at a time or quite a while after the initial launch of the database?
   32. Tango Posted: December 18, 2008 at 09:02 PM (#3033001)
Kent sent me a file, so I'll need to add a new table:

VoterAliases
- voter_id
- screen_name
- effective_date (or current_indicator)

It seems that some people have used more than one screen name. So, while you can maintain one voter_id that will never change (and it might as well be your Primer ID), we can show all the screen names of that voter. While effective_date would be the preferred field, I'm realistic that the best we can do here is to note what the voters current screen name is.

***

Chris, the only requirement is that for players that are not in the BDB database, then we need to make sure they exist somewhere to link them. So, it would seem that would need to not only link to the BDB MASTER table, but also include other master tables, so that we have a "super duper" master table that the Players table will link to.

So, you should create your own master table with:
- player_id
- player_name
and everything else that you want

When it comes time for HOM to create its own master table, it will LINK to BDB and to yours, and then do something like:
select player_id, player_name
from bdb.master
union
select player_id, player_name
from chriscobb.master

And we'd have to add validation to remove any duplicates that the UNION did not catch.
   33.  Hey Gurl Posted: December 18, 2008 at 09:15 PM (#3033016)
Should I assume for PLayerID we are using LAhman syntax? So the linking is easier.

So VoterID in the Voters table and Aliases table is the number of their primer reg? So Chris Cobb = 354?

Makes sense to me, although I am not sure it is necessary. Can't we just add a ScreenName column to the Voters table?
   34.  Hey Gurl Posted: December 18, 2008 at 09:19 PM (#3033020)
Unless we ARE storing multiple aliases. Then it would make sense to have two tables. EG:

Voters
------
ID: 354
Name: Chris Cobb
Awesome?: Yes

VoterAliases
------------
ID: 354
Alias: Chris Cobb
------------
ID: 354
Alias: Jim Sandwich
------------
ID: 354
Alias: Cocky Baloney
------------
ID: 6542
Name: Sexy Beast

And what have you. But if we are only storing the most recent ScreenName, I don't see the reason for two tables.
   35. Tango Posted: December 18, 2008 at 09:40 PM (#3033054)
Right, if we want to track multiple screen names, then you need the extra table. It's whatever you guys want to record.
   36.  Hey Gurl Posted: December 18, 2008 at 09:45 PM (#3033065)
I'm just working on some web layout #### right now. Just basically grabbed a template and threw some stuff up. Let me know what you guys want/need and what you're looking for, what you like, don't like, etc.

Here is what I have now:

http://ryanjlind.07x.net/index.html
   37. Joey Numbaz (Scruff) Posted: December 18, 2008 at 09:51 PM (#3033079)
Joe - I know my last attempt to help bring a website to fruition didn't work out so well, but I'm happy to help wherever I can


Thanks Ryan! That attempt didn't work out, but it got me to reserve the URL, which saved time now, etc. . . . there's definitely some butterfly effect there :-)
   38. Tango Posted: December 18, 2008 at 10:05 PM (#3033096)
Looking at shock's link, then either in the Players table, you'll need a player's primary position, or, it needs to be calculated from the Lahman/Fielding table, or it needs to be provided externally, like Dan's file.
   39.  Hey Gurl Posted: December 18, 2008 at 10:09 PM (#3033101)
I was thinking maybe a Plaques table might be in order. To track which position they are "inducted as" as well as maybe the team

Plaques
-------
Player_id
year_elected
position_elected
team_elected

? Something like that. I think in the current plaque room there is already designations for these.
   40. bjhanke Posted: December 18, 2008 at 10:23 PM (#3033114)
Tom and Shock -

I'm going to be a bit hesitant here, because I haven't programmed for money since the 1990s, but at least it was partially in Access and its predecessor, Microsoft Basic Professional Development System. Does your DBMS have ways to avoid what we used to call "column creep?" That is, if we want, say, extrapolated seasons for people like the Wright brothers (Harry and George) who played professionally before 1871, and also estimated negro league MLEs, does that inevitably involve two columns, both of which serve only small numbers of players, when there are no players who have both types of entry, so no player needs more than one of these two columns? If so, does the DBMS have ways of avoiding the large number of columns slowing the database to a crawl? This was a big issue when I last worked in Access, or DB2, or whatever. My understanding is that Oracle was able to get around this by going relational, but I've never worked in Oracle, and understand that it can be a b**ch.

If column creep is a problem, then we do have a practical limit on how many different types of data we can include, if we don't want slow response and crashed servers. If there is a way around it, we can have really specialized columns, like a war credit column just for 1918. If this is a ridiculously dated question, and all modern DBMS have solved column creep, than please forgive me for having not worked in the field for a decade. I just want to know the answer before I start making too many column suggestions.

Thanks! This is a huge project you've offered to take on, and I'll gladly help any way I can. Hopefully, this was help, and not just dated nonsense.

- Brock Hanke
   41. Tango Posted: December 18, 2008 at 10:25 PM (#3033119)
If the position is already in the Players or similar table, then you don't need it in the plaques.

For team_elected, for cases like Rickey or Winfield or (eventually) Raines, since that designation will only happen upon election, then putting it here is fine. Preferably, we should put it on an Players or similar table, since the HoM is not going to ask these guys what teams they want to go in as, right? It'll be predetermined.

For year_elected, you should be able to derive it from the YEARS and BALLOTS tables, and therefore, is not atomic.

Therefore, we might have a need for a PLAQUES-type table, but for now, it seems that this information can be handled elsewhere.
   42. Tango Posted: December 18, 2008 at 10:32 PM (#3033124)
Brock, I don't think we have to worry about too many columns. I work with Retrosheet data, and I've got 150 columns and 8 million rows, and my slowest queries take, I dunno, 10 minutes?

For what we are doing here, 10% the columns, and, I dunno, 1% the records, I'd be terribly disappointed if even the most complicated query comes back in more than 20 seconds.

Even in the event that it is slow, we can always generate a data warehouse of denormalized data, similar to how you are suggesting it, while continuing to maintain the system of record as normalized data.
   43.  Hey Gurl Posted: December 18, 2008 at 10:36 PM (#3033131)
Agreed with Tango.

Our biggest table would be the ballots table which would be years * ballots * players = about 100,000 rows or something. Which is nothing of course provided you use indexes and are smart. And the problems that you raise wouldn't affect this table anyway.
   44.  Hey Gurl Posted: December 18, 2008 at 10:47 PM (#3033145)
Actually, thinking about that table. Does it make sense to do it the way we have it, or does it make more sense to list each ballot as its own record?

An example:

The way we have it, some rows might look like so:

Year VoterID Rank PlayerID
1930 123 1 Ruthba01
1930 123 2 Gehrlo01
1930 123 3 Lajona01

But alternatively, since there are always 10 spots on a ballot, we could have the "ranks" be their own columns. So 3 rows would be 3 ballots:

Year VoterID Player1 Player2 Player3
1930 123 Ruthba01 Gehrlo01 Lajona01
1930 5423 Gehrlo01 Ruthba01 Sislge01
1930 5685 Ruthba01 Gehrlo01 Stovha01

The former way makes the most intuitive sense to me, but the latter way would cut our record by a factor of 10 and would also work I think.

Again, just sort of thinking out loud here.
   45. Joey Numbaz (Scruff) Posted: December 18, 2008 at 10:52 PM (#3033156)
15 Shock - we have 15 players on our ballots.

Dan's data has a primary position for every player already in a column, but he considers LF and RF the same position post-deadball era.
   46. Joey Numbaz (Scruff) Posted: December 18, 2008 at 10:56 PM (#3033162)
I meant Dan has position for every player year by year.

Tango - they way I've been calculating positions and 'hats' for players is by taking his pennants added, as opposed to straight playing time. But we did have a few players who were manually adjusted - guys like Harry Stovey and Stan Musial for example - who played more games at 1b than anywhere else, but played many more games in the OF combined than at 1B. For those guys we've used the OF position they played the most as their primary position.

So if you think it would be easier to just code it in a column, than to calculate it with that kind of intricacy, that should give your answer.
   47.  Hey Gurl Posted: December 18, 2008 at 11:02 PM (#3033171)
Well there is no reason why a player has to be coded with one specific position. For example he could be put in with "LF,RF" or something.

When I do my queries on the website, I just have to use a like rather than a specific match. For example, I could do:

"Select * from players where elected=yes and position like '%C%'"

Not only would that pull up all the obvious catchers like Bench, but it would also pull up someone like Torre who might be a "C, 3B." Then Torre would come up whether you clicked on C or 3B.

The guys with "OF" would just be put in as "LF,CF,RF" Personally I think it would be better to enter all these manually.
   48. OCF Posted: December 19, 2008 at 12:54 AM (#3033320)
The website will be great and all, but perhaps we should also keep the "Plaque Room" thread here current? We don't have Henderson, McGraw, and Smith up there yet.
   49. Joey Numbaz (Scruff) Posted: December 19, 2008 at 01:01 AM (#3033327)
I think John's planning to have them done by the end of the week . . .
   50. Tango Posted: December 19, 2008 at 03:15 AM (#3033445)
Shock/44:

No, you always never want to expand horizontally; almost always you want to go vertical.

If you want to do a join to another table, what are you going to do? You'll need 10 joins to 10 copies of the Players table just to get the player's name.

When in doubt: vertical.

***

where pos like '%C%'"
and pos not like '?%'

Using digits would be better. So, you can have a player that's a 79 or an 8 or a 2 or a 37 or whatnot.

But, I'd favor having a "primPos" and "secPos".
   51. Tango Posted: December 19, 2008 at 03:15 AM (#3033447)
that should be percent CF percent
   52.  Hey Gurl Posted: December 19, 2008 at 04:54 AM (#3033549)
So as far as data goes, what do we have and what do we need?
   53. bjhanke Posted: December 19, 2008 at 05:11 PM (#3033923)
Tom says, "Brock, I don't think we have to worry about too many columns. I work with Retrosheet data, and I've got 150 columns and 8 million rows, and my slowest queries take, I dunno, 10 minutes?"

Thanks. That was comforting. I worked on a DB2 system for the old Ralston Purina once. When it got done, they ran a test query. It came up correct, but it took - I'm not exaggerating - 24 hours! What had happened was that every division of the company had several one-customer-only items that they wanted an entire column for. The tables just got way out of hand. They had to redesign the whole thing. That's what I was afraid of.

BTW, I agree with your opinion regarding vertical and horizontal approaches. The one thing I learned documenting the one Oracle project I worked on was that joins are bad. Almost anything you can do to decrease the joins is good. I know one whole Oracle DBA, and he said that's still true. At least in Oracle. I don't know about modern Access.

- Brock

PS - I spent the last decade doing systems documentation instead of programing. If you need that kind of help, as opposed to coding help, ring me up. My coding is out of date because I don't know the languages any more. English hasn't changed that much. I can read code in languages I can't write in, because I don't have to know the details of small grammar and syntax.
   54. Tango Posted: December 19, 2008 at 07:35 PM (#3034148)
Shock/52: what we need is to populate each table:

Voters
Years
Players
Ballots

As it stands, someone sent me a voter list.

What I would prefer is that we get four people (none of them me) to take ownership of each of these 4 tables. Once I have the data, I can handle the design and cleansing and uploading.

The Years table should be the easiest, the Voters the next easiest, the Players one of moderate effort, and the Ballots one is the killer. You'll probably need three or thirteen people just for that one.

Ideally, each voter would supply his own votes for each year. That would be just lovely.
   55. OCF Posted: December 19, 2008 at 08:08 PM (#3034201)
I just now emailed Tango a file with what I have for raw data on ballots. It's incomplete, as it only goes back to 1920 (with partial results for 1916 and 1917).

Howie Menckel: what do you have that you use to report cumulative totals? Is that something you could send to Tango?
   56. Paul Wendt Posted: December 19, 2008 at 08:28 PM (#3034220)
24. Tango Posted: December 18, 2008 at 03:21 PM (#3032936)
Let's put a deadline of say Mon, Dec 22, for people to offer their inputs as to what data we should be recording. Then we can come up with an initial working design before the end of the year.

Hi, Tom.
I'll try to make sure I "think on it" before your deadline.

When I google my own name a site called WikiGonzalez comes up first. Presuming that Kent was not the author, it may be useful for someone to track down the site, or a cached version. The author covered HallofMerit aliases from 1898 until yyyy. I did use it a couple of years ago to look up a Hall of Merit voter or participant.

By the way, Tom, have you voted here?

== quoting Google

Paul Wendt - Wiki Gonzalez
Paul Wendt. From Wiki Gonzalez. A contributor and mentor for the Hall of Merit ( though never a voter). Retrieved from "http://digamma.net/btfwiki/Paul_Wendt ...
www.digamma.net/btfwiki/Paul_Wendt - 8k - Cached - Similar pages -


probably btfwiki/ includes the data that I have in mind

== just in case you were wondering

Paul Wendt - Boston, MA | Facebook
Paul Wendt (Boston, MA) is on Facebook. Facebook gives people the power to share and makes the world more open and connected.
www.facebook.com/people/Paul_Wendt/1033821430 - 21k - Cached - Similar pages -


no he isn't
   57. Paul Wendt Posted: December 19, 2008 at 08:40 PM (#3034243)
But, I'd favor having a "primPos" and "secPos".

If primary fielding position is derived --for career by method identical to its derivation for single season-- then Hall of Merit fielding position "homPos" should also be in the database.

For Hall of Merit members, homPos is equivalent to the key for the unique 2008 special election, "Ranking [HOM members by fieldpos]" in which he appears. Is homPos null for every nonmember? Maybe so.


53. bjhanke Posted: December 19, 2008 at 12:11 PM (#3033923)
Tom says, "Brock, I don't think we have to worry about too many columns. I work with Retrosheet data, and I've got 150 columns and 8 million rows, and my slowest queries take, I dunno, 10 minutes?"

and the computing power at hallofmerit.com, wherever that will be?

(and that may be my last contribution to the the website :-)
   58. Joey Numbaz (Scruff) Posted: December 19, 2008 at 08:49 PM (#3034250)
I think I can 'own' the players table. I can help with the others, especially the 'ballots' one as needed.
   59.  Hey Gurl Posted: December 19, 2008 at 08:57 PM (#3034256)
I'm working on the years table right now.
   60. Los Angeles El Hombre of Anaheim Posted: December 19, 2008 at 08:59 PM (#3034258)
Wix.com has a free Flash WYSIWYG layout app that lets you build some really nifty websites. Might want to take a quick look-see.
   61. Devin has a deep burning passion for fuzzy socks Posted: December 19, 2008 at 09:00 PM (#3034259)
I have all of my ballots going back to the beginning.

Another thing that could be calculated from the individual voter ballots is an 'as voted' personal Hall of Merit, especially for voters that have participated in every election.

Not really, Joe. You have to deal with players who are in the HOM but haven't made a person's PHoM yet. How can we decide where to slot them when they're not on the ballot anymore?
   62. Joey Numbaz (Scruff) Posted: December 19, 2008 at 09:15 PM (#3034268)
Good point Devin - I suppose you'd have to go based off where they were when they were on the ballot.
   63. Joey Numbaz (Scruff) Posted: December 19, 2008 at 09:19 PM (#3034273)
I'm just working on some web layout #### right now. Just basically grabbed a template and threw some stuff up. Let me know what you guys want/need and what you're looking for, what you like, don't like, etc.

Here is what I have now:

http://ryanjlind.07x.net/index.html


Thanks shock!

I think this has promise, one thing I'm not sure of though, is the picture taking up so much real estate . . . I tend to like sites like B-R and BTF more than ESPN.com for example . . . what do others think?
   64. Joey Numbaz (Scruff) Posted: December 19, 2008 at 09:20 PM (#3034276)
I do think it would be great if we can have a picture (pictures) of each player on his page, to go with everything else we're planning.
   65.  Hey Gurl Posted: December 19, 2008 at 09:29 PM (#3034279)
How much hosting space do you have?
   66. Tango Posted: December 19, 2008 at 09:46 PM (#3034292)
We also need a Points table it seems.

Points
- electee_count
- rank
- points

So, this would look like this:
2,1,24
2,2,23
2,3,18
2,4,17
...
2,15,6
3,1,24
3,2,23
...

Or whatever the point system is.

I can own this table. I saw in your constitution what the point system is, so it'll be a snap.

This way, when you link the Ballot table to the Years table to this Points table, it will automatically tell you how many points each player gets.
   67. Joey Numbaz (Scruff) Posted: December 19, 2008 at 09:53 PM (#3034300)
Tom, our points are 24-23-22-17-16-15-14-13-12-11-10-9-8-7-6 at this point in time.

But when we had a year where we only elected one, it was 24-19-18-17 etc. . . . elect two was 24-23-18-17-16, elect 4 (the first year) was 24-23-22-21-16-15.

Does that change anything?
   68. Joey Numbaz (Scruff) Posted: December 19, 2008 at 09:54 PM (#3034302)
I will have to check on the hosting space shock . . .
   69.  Hey Gurl Posted: December 19, 2008 at 10:05 PM (#3034317)
One person for the ballots table seems impossible. Maybe we could divide by years. Have one person do each decade or something, depending on how many volunteers we can get (looking like not many...) I'd be willing to tackle some but definitely not all.
   70. Joey Numbaz (Scruff) Posted: December 19, 2008 at 10:35 PM (#3034336)
depending on how many volunteers we can get (looking like not many...) I'd be willing to tackle some but definitely not all.


It's Friday afternoon, kind of a dead time on the interwebby thingamajiggy . . . not to mention half the country is covered with ice and snow, so I'm guessing that not too many people are working, and if they aren't working, they aren't surfing the web.

The next two weeks will be even worse I imagine . . . we'll get it done I'm sure.

So if I guess right, the ballots table will have columns like year, voter, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Then each ballot will be a row.

I think inputting by voter would be much easier, as most ballots are fairly stable from year to year, which could mean a lot of copying and pasting, as opposed to typing, right? Especially if OCFs data has all of this post-1920?

We had fewer voters in the early years also . . . so the 'manual' years will have fewer entries at least.
   71.  Hey Gurl Posted: December 19, 2008 at 10:48 PM (#3034342)
You guess wrong ;-)

Each row is a vote...

It will look something like:

Year, Voter, Rank, Player, (explanation)

"Voter" will be the Primer ID number.

I'm not sure what format we are using for player names. Will have to decide that I guess.

If OCF's data has all this then that is excellent!
   72. Tango Posted: December 19, 2008 at 10:50 PM (#3034343)
You guys can input the data however you like. As long as it has order to it. I'll convert it for DB purposes as I need it.
   73.  Hey Gurl Posted: December 19, 2008 at 11:02 PM (#3034351)
the hard part will be the typoes and spelling mistakes.
   74. Joey Numbaz (Scruff) Posted: December 19, 2008 at 11:05 PM (#3034355)
Another reason why any ability to cut and paste is preferable.

Why is each vote a row?

Oh wait, so we can include the explanations too . . . wow. Maybe we add those later? That is massive.

But could you just easily make each ballot a row . . . and just add columns like player1, explanation1, player2, explanation2 . . .
   75. OCF Posted: December 19, 2008 at 11:08 PM (#3034356)
In what I sent to Tango (I assume he got it), the player format was just last name, with a first initial where needed, like C. Jones or L. Smith. I didn't even use first initials to separate out Mays, DiMaggio, or Evans.

The voter ID was a the handle used at the time of the vote, or a shortened version of that handle. That's going to require some hand intervention to cover cases like Marc/sunnyday2 or Don F/Pedro Feliz Navidad. I also sent Tango a document with consensus scores in it that may help a little with that part.
   76. Joey Numbaz (Scruff) Posted: December 19, 2008 at 11:10 PM (#3034358)
In what I sent to Tango (I assume he got it), the player format was just last name, with a first initial where needed, like C. Jones or L. Smith. I didn't even use first initials to separate out Mays, DiMaggio, or Evans.


That's easy to match to LahmanID. We can just do a vlookup to catch most of them (with the complete set of LahmanIDs as a table). Manually correct the guys that it misses due to multiple C.Jones.

As far as the consensus scores, if we have the formula, the DB should be able to calculate those, right?
   77.  Hey Gurl Posted: December 19, 2008 at 11:12 PM (#3034361)
Sorry, Joe. I meant taht in the database each votw will be a row. If you want to input the data where each ballot is a row I'm sure tango can do the transformation.
   78. Joey Numbaz (Scruff) Posted: December 19, 2008 at 11:17 PM (#3034364)
Whatever works for you guys - each ballot as a row just seemed more intuitive to me, that's all. Why would each vote need to be a row in the DB? I'm not questioning it, just curious, and trying to learn better how to build DBs.
   79.  Hey Gurl Posted: December 19, 2008 at 11:20 PM (#3034367)
See Tango's post 50.

It's so that when we join tables together we only have to look at one column; the players column. If we had one ballot be a row then we would have 10 columns to store players which means 10 joins every time we reference the table, which means poor performance.
   80.  Hey Gurl Posted: December 19, 2008 at 11:36 PM (#3034375)
An example:

Suppose we wanted to query the DB to find out how many times a voter voted for a 1B that had 200 career HR or less.

With the current design, it's pretty simple. Something like:

select count(b.playerID) from players p, ballots b, playerStats s
where
p.priPos = '1B'
and
s.HR <= 200
and
p.playerID = s.playerID
and
p.playerID = b.playerID

If you only wanted to count instances where a voter had that player in his top 5, simply add:

and
b.rank <= 5

But with one ballot being a row, how would you do it? Like this?

select count(p.playerID) from players p, ballots b, playerStats s
where
p.priPos = '1B'
and
s.HR <= 200
and
p.playerID = s.playerID
and
p.playerID = b.player1
or
p.playerID = b.player2
or
p.playerID = b.player3
or
p.playerID = b.player4
or
p.playerID = b.player5
or
p.playerID = b.player6
...

it's ugly...

And even that wouldn't give the right answer I don't think because you are counting players instead of votes.
   81. Joey Numbaz (Scruff) Posted: December 19, 2008 at 11:40 PM (#3034376)
Cool, thanks!
   82. Joey Numbaz (Scruff) Posted: December 20, 2008 at 12:06 AM (#3034391)
I have 150 GB of space. I assume that's plenty, is it?
   83. Tango Posted: December 20, 2008 at 12:49 AM (#3034416)
Joe, yes, more than enough.

And it would be like this:
players p1, players p2, players p3..., players p15

And then in your where clause you match b.player1 to p1.player and b.player2 to p2.player, etc.
   84. Tango Posted: December 20, 2008 at 12:51 AM (#3034417)
Well, shock's thing would work if you are only interested in 1 player, but what if you were interested in 2 or all 15? Then you need 15 players tables.
   85. Paul Wendt Posted: December 20, 2008 at 02:30 AM (#3034463)
73. Shock Posted: December 19, 2008 at 06:02 PM (#3034351)
the hard part will be the typoes and spelling mistakes.


That is the hard part only because all the face and font variation is "hopeless", right? In effect, participants take the forum software for granted, its limitations as well as its strengths. Regarding the strengths: comments rely on typog devices such as boldface and "code" font. The database uses "printing characters" from ASCII or is it some larger set?
   86. Paul Wendt Posted: December 20, 2008 at 03:29 AM (#3034485)
I haven't programmed for money since the 1990s

There is no money here, Brock :-)

[on "column creep] . . . does the DBMS have ways of avoiding the large number of columns slowing the database to a crawl? This was a big issue when I last worked in Access, or DB2, or whatever. My understanding is that Oracle was able to get around this by going relational, but I've never worked in Oracle, and understand that it can be a b**ch.

They all went relational and call themselves RDBMs or RDBMSes, but I think that's a different problem.

If column creep is a problem, then we do have a practical limit on how many different types of data we can include, if we don't want slow response and crashed servers. If there is a way around it, we can have really specialized columns, like a war credit column just for 1918. If this is a ridiculously dated question, and all modern DBMS have solved column creep

At this stage, I believe, (this fortnight) the HOM database should not include any of this extra stuff except an Extras table, at most, or perhaps no more than an note field in the Voters table that covers the Extras.

voter_id, voter_name, comment, note
33, Joe Dimino, founder and bbtf owner of the Hall of Merit, Pennants Added for mlb Pitchers (maybe forthcoming)
354, Chris Cobb, , MLE for Negro Leagues and other play outside MLB (maybe forthcoming)

"maybe forthcoming" would mean that it may be forthcoming as part of the website, probably linked to but not part of the database. Later those words might be replaced by the URL or pathname of the other database.
   87. Paul Wendt Posted: December 20, 2008 at 04:19 AM (#3034503)
VOTERS
- voter_id
- voter_name
- voter_??
- voter_??


Perhaps that should be a people table. For example, if "Shock" is a never-voter then may yet be a listing as website designer.

: contact
Will some provide contact data here? if not directly, then bbtf username provides indirect contact in some cases; contact data may be eddress or current bbtf alias

: comment
Most tables in most databases should include a comment field. Just above I have suggested and exemplified two fields, comment and note.


YEARS
- year_id
- electee_count


: comment
Example for illustration only
year_id,electee_count,comment
1950, orginally elect two; elect two by 1930 revision
Another example


PLAYERS
- player_id
- player_name
- player_bio (large text field)

No other player information is needed,


famous last words :-)
: comment
: YearEligible


PLAYER_VOTER_EVALUATIONS
PLAYER_VOTER_YEAR_EVALUATIONS

The suggested "score_value" (one numerical dimension) is too restrictive but I suggest that this be postponed. Handle voter ratings as voters make them available.

The single most valuable "ratings" by far may be called year1 and year2 representing a kind of begin and end of the careers of those who are not in the mlb databases. I daresay a kind of begin and end date although I have not forgotten that HOM evaluations of some major league players --not only Dickey Pearce and Satchel Paige but Earl Averill-- rely on consideration of years outside their major league career timespans, but any begin or end dates for Frank Grant is more valuable, one order of magnitude more valuable, than a nonstandard begin date for Earl Averill.

(Retrosheet does not provide a page for Frank Grant and baseball-reference does not provide any but vital data --what SABR calls "biographical data" rather than "baseball records"-- and the Hall of Fame election year, which is 2006 for Grant and many others.)


BALLOTS
- voter_id
- year_id
- player_id
- ballot_rank
- voter_comment
- selection_rejected_switch

This last table is basically the one that links all the other tables.

"selection_rejected_switch"? Does that provide scope for invalid selections? I think we have only invalid ballots.

--
Unfortunately, there were some tie votes in 1898 if not thereafter, evidently two two-man ties for 15th and one five-way tie for 15th, concerning seven players because two appeared in two ties.
1898 Results (table mis-aligned)

--
Regarding tie results the question is which should be considered official ties, which may differ from which were reported as ties in the posted results. Maybe the rank-order results should be ignored for now --left out of the database design so that we are initially able to derive ties but not break them.


== note

POINTS
On the recent suggestion by Tom Tango, what data do we have for each year after joining the Years and Points tables? Something like this.
2009,3,1,24
2009,3,2,23
2009,3,3,22
. . .
2009,3,15,6
(right? )
   88. Paul Wendt Posted: December 20, 2008 at 04:32 AM (#3034507)
YEARS
- year_id
- electee_count


: comment
Example for illustration only
year_id,electee_count,comment
1950, orginally elect two; elect two by 1930 revision
Another example
<<

Excuse me for that. Maybe it is sufficient to illustrate the "comment" field, whose value in the example is "orginally elect two; elect two by 1930 revision". (I skipped the electee_count and the illustration is fictitious anyway and I didn't provide "Another example" as it says.

If ordered by year then electee_count shows some anomalies. Eg, why Elmer Flick elected all alone? At least some of the anomalies were introduced by revision along the way, not part of the original design. The "comment" may be used to note such things.

At stage two, multiple "comment" and "note" and such-like fields may be used to generate various footnotes if the website will have footnotes.

For example Sean Forman at baseball-reference uses the nameNote field to hold the text value of the slightly-free form note on player pages.
Bill Dinneen: "commonly misspelled Dineen"
Frank Selman: "Born with last name Sellman, Also Played Under Name Of Frank C. Williams 1871-75"
Hal Trosky: "born Harold Arthur Trojovsky"

(At the Hall of Merit, an explanation for revision and perhaps anomaly in electee_count is something like "re-calibrated to match Clark Griffith as a Hall of Fame player" or something like that.)
   89. Howie Menckel Posted: December 20, 2008 at 04:32 AM (#3034508)
90 pct of the posts mystify me, but that's ok.
:)

What I'd be able to offer is perhaps an arrangement of the players that more appeals to math-phobic people.

For instance, I have "Hall of Merit starting pitcher tandems," with every example of two HOM SPs on the same team in the same year. I think there are a few with 3-4-5, and that shows up there.
Also 2B-SS combos, 1B-3B combos, 3 OF HOMers on same team, etc.

And other approaches - list of total number of HOMers per league per year, both as long lists and as short ones listing only the number figure and not the player. What it does is show you, for example, the years where about as many Negro League HOMers were in action that year as there were AL or NL HOMers.
I did one showing all the black/Hispanic HOMers by year, which when plucked out that way shows some startling imbalance (some people knew NL had more in the 1960s, but I think one year had 13 in NL and ZERO in AL. That I didn't realize).
Another list shows how many HOMers played each position that year. So you see that 1890s-1900s C scarcity, and gluts at other positions in other years. 2B goes several decades with only controversial pick Nellie Fox iirc (or at least it did at the time I did the chart).

Some of these I could update in 5 minutes, others would take longer.

I have "team by team" versions for the Mets, Yankees and Red Sox. That could be done for each franchise.

The advantage of these lists would be fun stuff that any baseball fan might enjoy, not just SABR types.

As for "total votes points," I'm just a chimp posting up a version of other people's numbers in that case. I think Ron W and someone else actually have them.

Anyway, this is great news. I'd be happy to provide whatever modest help I can give (just don't ask for an Excel spreadsheet, a pdf, a pdq or any other stuff beyond just a bunch of words typed into an email or Website posting area, lol).
   90. Paul Wendt Posted: December 20, 2008 at 04:48 AM (#3034512)
Tango #32

Chris, the only requirement is that for players that are not in the BDB database, then we need to make sure they exist somewhere to link them. So, it would seem that would need to not only link to the BDB MASTER table, but also include other master tables, so that we have a "super duper" master table that the Players table will link to.

So, you should create your own master table with:
- player_id
- player_name
and everything else that you want

When it comes time for HOM to create its own master table, it will LINK to BDB and to yours, and then do something like:
select player_id, player_name
from bdb.master
union
select player_id, player_name
from chriscobb.master

And we'd have to add validation to remove any duplicates that the UNION did not catch.

--


In my desktop database the Master table is an extension of bbdb 2007 (thru the 2006 season).
At the moment the ID range is -1119 to 18702 where I have added everyone in [-1119, 0] and [1, 18702] covers everyone in the bbdb.

(Frank Osborn to Jim Creighton; Hank Aaron to Mark Kiger. You should look up all four if you don't know them.)

There are 344 records in the ID range [-1119, 0] and they should include everyone with a player page or a vote at the Hall of Merit. Is that everyone with a Chris Cobb mle? I doubt it but it's a start. To the point at hand, that table includes biographical data posted on the web (and found by quick web search 15-18 months ago); for "Negro Leagues" players published in Riley's encyclopedia.

In another table I have data for some more "Negro Leagues" players.
Kevin Johnson has digitized some data too.

I will try to confer with KJOK and Chris Cobb before Monday evening.
   91. Paul Wendt Posted: December 20, 2008 at 04:52 AM (#3034515)
I haven't checked the bbdb HallofFame data in a while, not since the 2006/2007 release that is in my desktop database. It's possible that the bbdb should import a little more data on Hall of Fame elections, which in turn may be useful on the Hall of Merit website.
   92. Howie Menckel Posted: December 20, 2008 at 04:58 AM (#3034516)
A simple example (subject to doublechecking) - the most fulltime HOM Ps on any team at once: is 3. Here they are:

1949 Cleveland AL - Bob Lemon (22-10) and Bob Feller (15-12) and Early Wynn (11-7)
1950 Cleveland AL - Bob Lemon (23-11) and Early Wynn (18-8) and Bob Feller (16-11)
1951 Cleveland AL - Bob Feller (22-8) and Early Wynn (20-13) and Bob Lemon (17-14)
1952 Cleveland AL - Bob Lemon (22-11) and Early Wynn (23-12) and Bob Feller (9-13)
1953 Cleveland AL - Bob Lemon (23-15) and Early Wynn (17-12) and Bob Feller (10-7)

1966 Los Angeles NL - Sandy Koufax (27-9) and Don Drysdale (13-16) and Don Sutton (12-12)

(1993-99 and 2001-02 Atlanta NL will be Glavine/Smoltz/Maddux)

This is with minimum 1 IP per team game or 35 G, so 1949 Satchel Paige 31 G wouldn't quite make it. We could discuss minimums, or maybe better have asterisks to indicate who got cups of coffee or came close to qualifying. Or just have two lists, one with a minimum and one without.

The lists show unexpected stuff, I think, like no NL P fulltime HOMer duos between 1917 (Alexander and Rixey, PHI) and 1959 (here come Koufax and Drysdale).

And what tandems had the most times doing this together?
8 - Wynn and Lemon
8 - Koufax and Drysdale
8 - Gaylord Perry and Juan Marichal
7 - Feller and Lemon
7 - Christy Mathewson and Joe McGinnity
6 - Eddie Plank and Rube Waddell
6 - Ted Lyons and Red Faber
5 - Feller and Wynn
5 - Bob Gibson and Steve Carlton
5 - Alexander and Rixey

But Glavine and Smoltz one day will blow them all away with 13 (!), though missing in 2008.

Until 1970, I only see two examples where there were no repeats (makes sense I guess; you have two great pitchers on your team, you want to keep them..

They are the first one:
1882 Providence NL - Charley Radbourn (33-20) and John Ward (19-12)

And the one you might never think of:
1962 Baltimore AL - Robin Roberts (10-9) and Hoyt Wilhelm (7-10/15 SV)

Ok, maybe this suggests a "story" to go with some lists.
Some want to delve into the high-end math.
Some want to chew on some lists, and notice surprising things for themselves.
And others just want to know, "Ok, if I spent the time to study this list, what's some of the coolest stuff I'd find?"
   93.  Hey Gurl Posted: December 20, 2008 at 05:13 AM (#3034521)
Howie,

As Tango has said, as long as the data is there, we can answer any question with the right query. It shouldn't be necessary to make these lists manually once the database is complete.

Ideally, if this REALLY gets off the ground, we can make a "PI" type feature that would allow people like yourself to have the system generate these kinds of reports without the knowledge of SQL. Then maybe you could maintain a blog on the site every week where you show us a cool list and talk about it for a bit. ;-)

Really, the possibilities are endless as long as we have data and man-power.
   94. KJOK Posted: December 20, 2008 at 06:43 AM (#3034535)
Chris, the only requirement is that for players that are not in the BDB database, then we need to make sure they exist somewhere to link them. So, it would seem that would need to not only link to the BDB MASTER table, but also include other master tables, so that we have a "super duper" master table that the Players table will link to.


I haven't seen Chris chime in, but I have heard from Paul. If Paul sends me what he has, I should be able to create an OTHER MASTER or OTHER PEOPLE table that would have all of the players Chris and Eric have MLE's for. I know a couple of others (Brent for one I think) also created MLE's, but I would guess they are the same players Chris and Eric have. If not, someone can let me know that, and I'll add them.
   95. Devin has a deep burning passion for fuzzy socks Posted: December 20, 2008 at 09:08 AM (#3034569)
Don't know if this makes a difference or not, but are there Lahman IDs for Negro Leaguers? (I know you can put it in that format easily enough, but are there already assigned ones?)
   96. KJOK Posted: December 20, 2008 at 09:27 AM (#3034571)
Don't know if this makes a difference or not, but are there Lahman IDs for Negro Leaguers? (I know you can put it in that format easily enough, but are there already assigned ones?)

There are for the ones in the Hall of Fame, but I would probably use a different ID system for the 'other' players.
   97. Paul Wendt Posted: December 20, 2008 at 04:56 PM (#3034648)
By phone with Kevin Johnson (KJOK) last night I sketched what vital data I have compiled for baseballists who are not in the baseball databank (bbdb). Two years ago, which matches my desktop before my own extensions, the bbdb covered major league players and managers 1871-2006 (=scope of the SABR biographical database) plus Hall of Fame members. I learned that I have compiled a lot in format similar to the bbdb Master table, and Kevin has not. On the other hand, Kevin has compiled some baseball playing data and some MLE data. So I do some cleanup and send biographical data to him, in Master table format (slightly extended).

So far I have private extensions to widely available data tables. If the HallofMerit is entering the business of maintaining one extension of the Master table, we should probably add a couple of fields. One new field should identify the source of the record(row); for this purpose it may be "Paul Wendt 2008-12-dd" rather than any account of his sources. Another field should specify whether the record matches what the bbdb or the source provided, or whether it has been edited during its HOM tenure.

It's possible that what I provide and what Kevin does with it, in conjunction with Chris and Brett and Eric, will be in some MLE database that is strictly separate from the HOM database that Tom Tango is creating :-)
Much of this will be part of any '"super duper" master table' (#32), so it must concern Tom, but maybe no one else, outside this paragraph.
(My own preference will be obvious to those who understand what I have done myself.
No "super duper master".
Instead extend the Master table by adding records.
Facilitate update of bbdb component by a device such as lahmanID<1 vs. lahmanID>0.
   98. bjhanke Posted: December 20, 2008 at 06:50 PM (#3034704)
Shock says, "the hard part will be the typoes and spelling mistakes."

Since I've been doing this project for what? a month now, I'm not a good choice to think up things like table columns and IDs, but I do write, and I have proofread for a living. When it comes to checking for typoes and spelling mistakes, I'm in. - Brock
   99. Chris Cobb Posted: December 20, 2008 at 09:43 PM (#3034760)
I haven't seen Chris chime in, but I have heard from Paul. If Paul sends me what he has, I should be able to create an OTHER MASTER or OTHER PEOPLE table that would have all of the players Chris and Eric have MLE's for. I know a couple of others (Brent for one I think) also created MLE's, but I would guess they are the same players Chris and Eric have. If not, someone can let me know that, and I'll add them.

Some of Brent's MLEs will be unique, I believe. He has been the main source for minor league MLEs for players who were not also NeL players: Gavvy Cravath, Buzz Arlett, and others. These players would already be in this master database, I guess, since they also played in the majors, but Brent's MLE data should certainly be included in our database.

As an aside, I have had little to say in this conversation because its technical aspect is way over my head. I truly know nothing whatsoever about databases, so I cannot even parse a good deal of what is being said here, and I have no idea what implications it has for the preparation of the data in my possession for inclusion in the database.

I hope that when work on the database progresses a little farther, someone can tell me--in straightforward descriptive terms that I can apply to the formatting of spreadsheets--what I will need to do.

Just as an example of the kind of instructions that are too technical for me to use, see Tango's comment to me, which Paul Wendt copies in post 90 above.
   100. bjhanke Posted: December 21, 2008 at 01:17 AM (#3034874)
Chris -

You've said this in posts on other threads. Since I pass myself off as someone who explains computer things for a living, it's my duty and privilege here to say,

1. If you can do spreadsheets, you can do databases. There aren't that many differences.

2. If you've ever summed up a column or anything like that in a spreadsheet, you've crossed the border and are doing the kind of things that databases do - manipulating the data.

3. The biggest difference is that databases have clever ways of linking up different spreadsheets. Think of a database "table" as one spreadsheet. Now think of a group of such "tables." Now think of trying to link the info in one table to the info in another. That's the basic idea of a a database: linking spreadsheets (tables) to each other.

4. Here's an example. Suppose you had one spreadsheet with every batting season by every player in history. One row in the sheet equals one season of play for one player. The spreadsheet columns would be things like Player Name, Year, At-Bats, Runs, Walks, etc. Now think of a DIFFERENT spreadsheet. This one contains ballplayer names just like the first table, but also dates of birth, debut dates, dates of death, things like that that don't change from season to season, but only happen once in a career or lifetime. Those are two different spreadsheets. The second one has only one row for each player. The first one has one row for each season that player played. They are linked by the player name. What a database allows you to do is to go into spreadsheet (table) 2, locate Honus Wagner's name code, and then go to table 1 and gather up all the seasons that have, say, "Honus_Wagner" as their player name. Then you put those rows from spreadsheet 1 into a list sorted by Year. You could then display both Honus' biographical stuff and then list all his seasons. That's what BB-Ref does, essentially. They have many more tables than two, and much more data than just batting seasons, but the "database" part of it is simply the linking of the various tables together. We call those individual spreadsheets "tables." We call the links between them "joins." We call the entire collection of tables and the joins between them a "database."

Everything else that you've read where we computer types start using terms that you've never seen is just jargon. We're just talking in database shorthand. The basic concept really is just what I have here.

Did that help? Personally, I think I've made it much more simple than you seem to need. I think you should develop some confidence here. What you've done with spreadsheets is pretty impressive. I have no doubt that you could pick up database theory pretty easily. You seem to have all the talent you'll need.

- Brock
Page 1 of 3 pages  1 2 3 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
robneyer
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.0016 seconds
49 querie(s) executed