Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Saturday, March 01, 2014

Newman: MLBAM introduces new way to analyze every play

boom! (Eye Test) ~ BOOM! (Actual Measurement)

Baseball is a game of inches, and those inches will be measured in a brand new way.

Major League Baseball Advanced Media on Saturday introduced a revolutionary plan for in-ballpark infrastructure designed to provide the first complete and reliable measurement of every play on the field and answer previously unanswerable analytics questions.

The announcement was made by MLBAM CEO Bob Bowman at the eighth annual MIT Sloan Sports Analytics Conference at the Hynes Convention Center near Fenway Park. MLBAM gave an overview of how it continues to implement various fan experience technologies, including iBeacon and widespread connectivity, to ensure MLB ballparks are crucibles of technology.

The goal is to revolutionize the way people evaluate baseball, by presenting for the first time the tools that connect all actions that happen on a field to determine how they work together. This new datastream will enable the industry to understand the whole play on the field—batting, pitching, fielding and baserunning—and enable new metrics for evaluation by clubs, scouts, players and fans.

For instance, on a brilliant, game-saving diving catch by an outfielder, this new system will let us understand what created that outcome. Was it the quickness of his first step, his acceleration? Was it his initial positioning? What if the pitcher had thrown a different pitch? Everything will be connected for the first time, providing a tool for answers to questions like this and more.

...MLB.com analyst Jim Duquette, who spent 20 years in front offices, including four years as an MLB general manager, said this will remove much of the subjectivity from a club’s own player analysis.

“When you look at how scouting has been done in the past, there’s a lot of subjectivity to the evaluation,” he said. “Some guys I have found have varied, from scout to scout, in terms of their opinion of each player. There is a lot of quality defensive statistics out there, but they’re not completely accurate. A lot of them are dependent on somebody charting, whether it’s UZR or DIPS or Defensive Runs Saved, and they can only go so far. Some players . . . range to their left better, some range better to their right, some come in on ground balls better than others, some have better first-step quickness.

“The exciting thing about this new technology is, you can start to take the subjectivity that is given to you by the scout and blend it with raw data now, and come up with a truer picture of evaluating a player. So when you take that data and compare it to others in the game, you can really find out if that position player is the best at his position. You can measure potential free agents, you can measure current free agents.”

Repoz Posted: March 01, 2014 at 02:25 PM | 66 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Dan Posted: March 01, 2014 at 02:45 PM (#4664527)
This is unbelievably cool. Some of the data available, like reaction time and route efficiency for defenders, will revolutionize our ability to evaluate defensive play.
   2. odds are meatwad is drunk Posted: March 01, 2014 at 02:56 PM (#4664533)
this is all sorts of amazing, this data will be awesome to go through
   3. Robert in Manhattan Beach Posted: March 01, 2014 at 03:04 PM (#4664539)
For instance, on a brilliant, game-saving diving catch by an outfielder, this new system will let us understand what created that outcome. Was it the quickness of his first step, his acceleration? Was it his initial positioning?

Jim Edmonds got out just in time.
   4. bigglou115 Posted: March 01, 2014 at 03:32 PM (#4664551)
Talk this could replace pitch f/x is disturbing. Seems more likely to me the general public gets less information going forward, not more.
   5. boteman is not here 'til October Posted: March 01, 2014 at 03:53 PM (#4664561)
BASEBALLGASM!!

Also it's about friggin time!
   6. Dan Posted: March 01, 2014 at 04:04 PM (#4664564)
.
   7. Sean Forman Posted: March 01, 2014 at 04:04 PM (#4664565)
this is all sorts of amazing, this data will be awesome to go through


That's a nice thought, but that's not how this is going to work.
   8. Dan Posted: March 01, 2014 at 04:07 PM (#4664566)
That's a nice thought, but that's not how this is going to work.


The data won't be available to use for new defensive stats and stuff?
   9. Sean Forman Posted: March 01, 2014 at 04:09 PM (#4664569)
The data won't be available to use for new defensive stats and stuff?


I'll be very pleasantly surprised if any of the raw data becomes public in any meaningful way.

1) I get the impression that MLB isn't very happy about how Pitch f/x got out to the public, and
2) this is terabytes of data, so scraping it is nontrivial in both bandwidth cost and expertise required.
   10. Dan Posted: March 01, 2014 at 04:10 PM (#4664571)
This new data stream will enable the industry to understand the whole play on the field—batting, pitching, fielding and base running—and enable new metrics for evaluation by clubs, scouts, players and fans.


The article certainly gives the impression that the data will be available to use in creating new metrics. Hopefully that's true, even if it's a special subscription service or something.
   11. Select Storage Device Posted: March 01, 2014 at 04:19 PM (#4664575)
MLBAM introduces new way to appreciate Ben Zobrist.
   12. Dan Posted: March 01, 2014 at 04:29 PM (#4664578)
Wasn't the big reason for Hit F/X and Field F/X being available only to the teams the fact that teams had to personally foot the bill for installing all of the cameras and maintaining and setting up all of the systems? So they didn't want it then to be available to be used by their competitors? If MLBAM is undertaking this, one would hope that it is at least available with a subscription since there's no competitive reason to keep it restricted to teams and broadcast partners. Plus teams ended up hiring a lot of smart people due to Pitch F/X analysis posted in the public sphere, so I doubt they're anxious to destroy that route of discovering talented analysts.
   13. Maury Brown Posted: March 01, 2014 at 04:58 PM (#4664604)
If you haven't seen, this is pretty damn cool:

http://m.mlb.com/video/v31405521/heywards-catch-through-bams-new-tracking-technology
   14. Maury Brown Posted: March 01, 2014 at 05:07 PM (#4664607)
Minimum of three ballparks in ¹14 (Miller Park, Target Field and Citi Field).

All 30 in 2015

As Sean mentioned, each game will generate about 7TB, yes TB, of data. Going to make it very difficult for the analytics community at large to do something with it. It's almost a case of play reconstruction, much like the video clip I posted of Heyward ranging. I imagine that you'll have to use timestamps on the data points, look at plays and do analysis. After all, if we want to look at things like how a SS ranges, we get into a host of factors now such as how fast the ball is coming off the bat, whether it was a grounder and where the ball strikes the ground, how fast the SS gets first step to the hole... etc.... etc... It's very cool, but also very daunting data to collate.
   15. Srul Itza Posted: March 01, 2014 at 05:43 PM (#4664624)
1) I get the impression that MLB isn't very happy about how Pitch f/x got out to the public,


They may hold on to it for now. But eventually the data will get out, because that is what happens with data.

2) this is terabytes of data, so scraping it is nontrivial in both bandwidth cost and expertise required.


For now. Give it five years, or maybe ten. It was not THAT long ago that computer power and storage was measured in MB. The technology and tools keep improving.


Sean is right, that the "wow" reaction is premature. But circumstances change -- sometimes slower than you expect (where the fuck is my goddamn jet pack, anyway!?!?) sometime far more quickly.
   16. Maury Brown Posted: March 01, 2014 at 05:52 PM (#4664629)
where the #### is my ####### jet pack, anyway!?!?
Blake Griffin is using it in that damn GameFly commercial.
   17. cardsfanboy Posted: March 01, 2014 at 05:53 PM (#4664630)
Sean is right, that the "wow" reaction is premature. But circumstances change -- sometimes slower than you expect (where the #### is my ####### jet pack, anyway!?!?) sometime far more quickly.


To be fair, the jet pack (and flying car and hoverboard) was always going to be a lot of energy expenditure for very little value, barring a discovery of an anti-grav solution, it was never going to happen in our lifetime.

I do think that Dan has a point about how teams were able to discover smart people with the release of the data. Yes they could hire their smart people and give them the data to work with, but that is a lot riskier in terms of finding someone who has the intuition to figure this stuff out versus letting everyone out there work on it, and grabbing the guy who has shown the passion and the ability from the crowd.
   18. Greg Pope thinks the Cubs are reeking havoc Posted: March 01, 2014 at 05:57 PM (#4664633)
It was not THAT long ago that computer power and storage was measured in MB.

Just this week I was in a meeting where we were discussing some sizes. A group had a database that was 5 GB and they needed it accessible to 1000 different sites (but all from a central location). We were throwing around ideas and somebody said we could duplicate the data, and someone else said "Yeah, but 5 GB times a thousand is just too big" and the discussion moved on. I sat there for a few seconds and then realized that it was 5 TB, which still sounds like a lot to me (and everyone else on the call), but then I realized that I had just bought a 3 TB drive for one of our backups and it was less than $100.

Sometimes it's hard to adjust.
   19. Maury Brown Posted: March 01, 2014 at 05:57 PM (#4664634)
From today's presentation:
"The goal over time, and hopefully certainly by this season, is to make these plays available in real time and start the debates. But we have to make sure baseball operations sees it and they agree that these are accurate renderings. But this year, fans will be able to see these data and these videos.” – Bob Bowman.
   20. Maury Brown Posted: March 01, 2014 at 06:02 PM (#4664636)
On #18....

It's going to be how we bucketize the data. The size of the data is not something that can't be overcome. But, as we have more data points, I get pickier in what a metric becomes. It's not good enough to know how fast a player ranges via first step because we can now see how ball speed and trajectory off the bat comes into play. As every play is different, now the data for each play is. So a range factor will need some normalization. Range factor in the OF will be different for IF based upon distance the ball travels and whether it's in the air or on the ground. Seems daunting at the moment.
   21. CFBF Is A Golden Spider Duck Posted: March 01, 2014 at 06:09 PM (#4664640)
The most important takeaway from this news is that Jason Heyward is bad-ass.
   22. boteman is not here 'til October Posted: March 01, 2014 at 06:23 PM (#4664644)
Minimum of three ballparks in 2014 (Miller Park, Target Field and Citi Field).

So we will discover in excruciating detail how teams lose ballgames???
   23. Infinite Joost (Voxter) Posted: March 01, 2014 at 06:25 PM (#4664646)
My main desideratum now that this has happened is a method of making wireless / cellular technology easily used at the ballpark. It would be pretty great to have use of MLBAM stuff while sitting down the first base line at SafeCo, but because of the nature of the beast it hardly ever works.
   24. boteman is not here 'til October Posted: March 01, 2014 at 06:31 PM (#4664647)
While one game might generate 7 terabytes of raw data, we don't necessarily need all 7TB to analyze. Here I'm drawing an analogy to the way JPEG compression throws away redundant pixels and those which are known not to add significant information to a photograph based on what we know about Human cognition. It should be possible to reduce the data to its important paths and vectors for compactness and distribution while MLBAM archives the 7TB raw data for further study in-house, perhaps leasing access to it for big analysts who want the raw data feed. This doesn't look insurmountable at all.
   25. PreservedFish Posted: March 01, 2014 at 07:50 PM (#4664675)
4. PreservedFish Posted: July 19, 2010 at 03:33 AM (#3592898)
I don't like wading into this argument because I haven't paid attention to cutting edge defensive statistical stuff in many years. But, it seems to me obvious that one day in the future, every single batted ball will be rated by its speed, angle, and trajectory - that once we have enough of this data, we will be able to say, with reasonable certainty, that a batted ball hit at whatever speed/angle/trajectory is caught 78% of the time, and that once we have all that, we can move towards incorporating weather and positioning. This might all require stationary cameras with uniform angles in every stadium, or, maybe just an army of unpaid interns, I'm not sure. But when this comes, people will look on our current stats and laugh at them.

Am I right?


Nobody responded...
   26. boteman is not here 'til October Posted: March 01, 2014 at 08:43 PM (#4664694)
There's a soothsayer in our midst!
   27. Select Storage Device Posted: March 01, 2014 at 10:40 PM (#4664734)
Nobody responded...


Why would you laugh? The narrative here is that this will provide a kind of accuracy and perspective that wasn't available previously. You'd likely just shrug your shoulders and say "well, the tech hadn't been built/proven yet."

It might expose fundamental flaws in the way we understand things, but it's not like misreading data we've had since the beginning of the game.
   28. What did Billy Ripken have against ElRoy Face? Posted: March 02, 2014 at 12:46 AM (#4664751)
bucketize


Please, no.
   29. ellsbury my heart at wounded knee Posted: March 02, 2014 at 02:21 AM (#4664760)
Please, no.


Yeah - categorize, group, bin, classify - these words all work. Well, bin is probably a little nonstandard, but for some reason I can't handle bucketize.

It'll be interesting to see how this data might eventually be used. There's so much data available these days, but statistical methods for high-dimensional data I think are a ways behind the explosion of data that's becoming available. You see it in genetics, proteomics, metabolomics, the microbiome craziness - it's tough. I mean, I think people will eventually figure it out, but that #### is crazy complicated. At least in baseball you can actually see what's being measured with your eyes, I guess. Still, I wonder how well this stuff might actually be validated. We tend to trust the pitch f/x data, but there's a lot of stuff about pitch classification and some of the algorithms that get a little funky.
   30. Zach Posted: March 02, 2014 at 02:40 AM (#4664764)
You don't need 7 TB per game to do interesting things. Think about the stuff in the Jason Heyward video:

1) Batted ball trajectory -- launch speed, angle, landing point (you probably need some slicing/hooking parameter, too)
2) Heyward's first step, maximum speed, route efficiency (probably need this for the outfielder backing up the play in some way, too -- tricky)

You could easily envision a box score with this kind of information that would only be a few kilobytes per game. Not much worse than PitchFX, really.
   31. Select Storage Device Posted: March 02, 2014 at 04:13 AM (#4664767)
You don't need 7 TB per game to do interesting things. Think about the stuff in the Jason Heyward video:

1) Batted ball trajectory -- launch speed, angle, landing point (you probably need some slicing/hooking parameter, too)
2) Heyward's first step, maximum speed, route efficiency (probably need this for the outfielder backing up the play in some way, too -- tricky)

You could easily envision a box score with this kind of information that would only be a few kilobytes per game. Not much worse than PitchFX, really.


I am not sure it would work that way. You still need to store everything else that doesn't, for whatever benchmark, not make it in the "boxscore."

At the same time I don't believe it would take an army of interns to slog their way through "7TB" of data. Assuming that the visuals are the cherry on top of tables and tables of data in something as simple as a SQL structure, at that point you are at least Laura Dern sifting through triceratops poo -- you have a good idea of what you are looking for and an established method of judgement. Just have to find the toxic bits.
   32. RMc is a fine piece of cheese Posted: March 02, 2014 at 09:15 AM (#4664773)
To be fair, the jet pack (and flying car and hoverboard) was always going to be a lot of energy expenditure for very little value, barring a discovery of an anti-grav solution, it was never going to happen in our lifetime.


All anti-grav vehicles also have one obvious problem: liability. I've long said you're never gonna see flying cars in populated areas, because there's always a chance of a crash, and it's doubtful any insurance company would touch it.

There were 34,080* fatalities on US roads in 2012. Now multiply that by three dimensions.

(*Which is actually down quite a bit: in the late 70s, it was over 50K/year.)
   33. AROM Posted: March 02, 2014 at 10:30 AM (#4664785)
What would make sense is for MLB to store and process the 7tb per game, break down a few key variables in an analysis package and distribute those.

Right now I have game day data going back to 2003, majors and minors, pitch fx back to I think 2007, plus some cool data sets that I can't even talk about. Still have plenty of space on the hard drive.

If I were to attempt to store 7tb of data and wiped my hd clean to start, I would run out of room sometime in the bottom of the first inning. There are 2430 games per season, which means 17000 tb. Or, learning a new word, 17 petabytes. Anybody involved with big data who knows how much that would cost?
   34. BDC Posted: March 02, 2014 at 10:47 AM (#4664791)
Wikipedia tells me that "The experiments in the Large Hadron Collider produce about 15 petabytes of data per year." I don't know what the Large Hadron Collider costs, but an industry that can pay AJ Burnett $16 million a year must have some money lying around that can be devoted to science :)
   35. cardsfanboy Posted: March 02, 2014 at 11:18 AM (#4664795)
If I were to attempt to store 7tb of data and wiped my hd clean to start, I would run out of room sometime in the bottom of the first inning. There are 2430 games per season, which means 17000 tb. Or, learning a new word, 17 petabytes. Anybody involved with big data who knows how much that would cost?


Google search says roughly $81,000 for the raw drives. Although when trying to price it out it, I'm finding assembled racks for it for $350,000...some company called Backblaze makes a petabyte rack for $117,000 but I don't know if that is the cost they are charging or the cost that it took for them to manufacture it. (roughly speaking as a cloud solution they are charging $95,000 for 3 years of storage of 1 petabyte)
   36. Der-K and the statistical werewolves. Posted: March 02, 2014 at 11:41 AM (#4664801)
IIRC, Netflix's inventory is (or was) a PB.
This is outside my everything level, but Teradata could conceivably handle it (v14 allows for up to 117PB of data).
   37. cardsfanboy Posted: March 02, 2014 at 01:39 PM (#4664871)
IIRC, Netflix's inventory is (or was) a PB.
This is outside my everything level, but Teradata could conceivably handle it (v14 allows for up to 117PB of data).


As of 5/9/2013, it was about 3.20 PB.
   38. Fancy Pants Handles lap changes with class Posted: March 02, 2014 at 04:19 PM (#4664946)
If I were to attempt to store 7tb of data and wiped my hd clean to start, I would run out of room sometime in the bottom of the first inning. There are 2430 games per season, which means 17000 tb. Or, learning a new word, 17 petabytes. Anybody involved with big data who knows how much that would cost?

Google search says roughly $81,000 for the raw drives. Although when trying to price it out it, I'm finding assembled racks for it for $350,000...some company called Backblaze makes a petabyte rack for $117,000 but I don't know if that is the cost they are charging or the cost that it took for them to manufacture it. (roughly speaking as a cloud solution they are charging $95,000 for 3 years of storage of 1 petabyte)


If you want to not just store it, but have it accessible, quite a lot. Hard drive costs are only a fraction of the hardware costs of a server. And hardware costs are only a fraction of the total costs to running a datacenter (power, cooling, bandwidth, maintenance staff, real estate...)

On top of that, you will likely want to build in at least triple redundancy, because when dealing with that much data, drive failures are just a question of when, not if. So more likely 51 PB instead of 17.

Numbers I saw 2-3 years ago, suggested that Google was paying around 250 million per exabyte of storage for it's datacenters. So just to ballpark estimate it, if you pro-rate that to 50 PB, you would be talking around 12.5m per year. Of course if you keep adding 50 PB of data each year...
   39. Sean Forman Posted: March 02, 2014 at 08:02 PM (#4665039)
Please note that my 7 TB was a made up number. It will not be that high, but the field fx data is probably around a ten column row, say 100 bytes (I'm not working out the column types here, though it does matter) for each fifteenth of a second for every person on the field for the entire game. (including umps, 3b coaches, etc). Included in that is time between innings as they aren't shutting the cameras off.

200 minutes is 180000 tenths of a second * 15 people = 2.7m rows of data per game * 100 bytes = 270 MB/game * 2400 = 648 TB of data. You can probably cut down that quite a lot, but it's really big data set. So if you want to get a job in an MLB front office, knowing Hadoop would be a good idea.
   40. Zach Posted: March 02, 2014 at 08:12 PM (#4665048)
200 minutes is 180000 tenths of a second * 15 people = 2.7m rows of data per game * 100 bytes = 270 MB/game * 2400 = 648 TB of data.

Almost all of that data is useless from an analytic perspective, though. You could get ~80% (obviously, this percentage is a wild guess) of the useful information just from noting fielders' positions at the moment when the ball is struck and when the ball is fielded/passes out of the infield.
   41. PreservedFish Posted: March 02, 2014 at 08:35 PM (#4665061)
What will be the most frivolous use of this data? Will we be able to know the average jogging speed of relief pitchers? The longest outfield warm-up throw of the season?
   42. Sean Forman Posted: March 02, 2014 at 09:42 PM (#4665106)

Almost all of that data is useless from an analytic perspective, though. You could get ~80% (obviously, this percentage is a wild guess) of the useful information just from noting fielders' positions at the moment when the ball is struck and when the ball is fielded/passes out of the infield.


Of course, but you are being a bit glib about what it will take to go from non-useful to useful data.
   43. Pat Rapper's Delight Posted: March 02, 2014 at 09:51 PM (#4665110)
Hopefully in the future this technology can be applied retroactively to game film involving Pete Incaviglia.
   44. Random Transaction Generator Posted: March 02, 2014 at 10:35 PM (#4665143)
What will be the most frivolous use of this data? Will we be able to know the average jogging speed of relief pitchers? The longest outfield warm-up throw of the season?


Highest foul pop-up caught.
Slowest home run trot.
Least amount of distance traveled by an outfield when observing a 400'+ HR over his head.
   45. PreservedFish Posted: March 02, 2014 at 11:17 PM (#4665161)
Slowest home run trot.

They already do this one.
   46. AROM Posted: March 03, 2014 at 09:30 AM (#4665218)
I presume Dave Kingman holds the all time record for highest foul pop hit.
   47. Andy McGeady Posted: March 03, 2014 at 12:05 PM (#4665307)
@Sean_Forman where did the 7TB number come from? I ask because a baseball writer mentioned the same figure to me on Saturday at Sloan.
   48. DL from MN Posted: March 03, 2014 at 12:08 PM (#4665310)
you're never gonna see flying cars in populated areas, because there's always a chance of a crash


Lots of research being done currently on drones that are able to self-organize their flight plans.
   49. jacksone (AKA It's OK...) Posted: March 03, 2014 at 05:25 PM (#4665664)
for each fifteenth of a second for every person on the field for the entire game. (including umps, 3b coaches, etc). Included in that is time between innings as they aren't shutting the cameras off.

Almost all of that data is useless from an analytic perspective, though. You could get ~80% (obviously, this percentage is a wild guess) of the useful information just from noting fielders' positions at the moment when the ball is struck and when the ball is fielded/passes out of the infield.



Of course, but you are being a bit glib about what it will take to go from non-useful to useful data.


Any reason there is no pause function on the cameras? Why not program that in for stoppages of play?
   50. cardsfanboy Posted: March 03, 2014 at 07:14 PM (#4665735)
Any reason there is no pause function on the cameras? Why not program that in for stoppages of play?


My guess, is why? Who knows maybe in the future we'll find out other information that that data could help support. Just because we can't think of it now, doesn't mean that in the future someone else won't think of it. I mean maybe we might discover that players who move around during "stoppages" are better suited to handle a play than those who just stand still.
   51. dr. scott Posted: March 03, 2014 at 07:17 PM (#4665737)
will it analyze audio also? Then we could have look up tables on who is most upset because people think they are either belly itchers... or bums.
   52. madvillain Posted: March 03, 2014 at 07:32 PM (#4665743)
Of course, but you are being a bit glib about what it will take to go from non-useful to useful data.


The important thing though is access to the data. Even if you have to work in a cloud query builder or form people will analyze in surprising ways given time. Hopefully, given tech progress, the multiple terabytes of data produced per game can be handled / parsed / downloaded more easily in a few years. Working with that many records is not easy at all, as I'm sure you know!

Lots of research being done currently on drones that are able to self-organize their flight plans.


The TED talk (not a huge fan of the concept but whatever) on the team of quadcopters is pretty mind blowing. The sensor and computing tech is already here, in 10 years when it matures it will be mainstream and amazing. Once the insurance / liability of it gets established there will be no going back to a pre drone era, it's a very disruptive tech in theory.
   53. TDF, situational idiot Posted: March 03, 2014 at 08:10 PM (#4665756)
First, this seems awesome. It's especially so if MLB is collecting the data and then letting others use it free.

Second, I wonder if the terabites of data is much more than is needed daily. A ball is hit - we get info on (a) pitch type and location; (b) ball trajectory, direction, and speed after contact; (c) initial player positioning; and (d) if a play is made or not. Do we need anything else?

Billy Hamilton catches a screaming line drive in the gap. Once we know a, b, and c above do we really care how the play was made? Do we care immediately (in evaluating him in-season) whether he caught the ball because of his speed or because of his route or because of his reaction, as long as the play was made?

In the off-season, it would make a difference - "defensive speed" may age worse than "route reading", for instance, and we could incorporate that into projections. But it's like the difference between FIP-based WAR and results-based WAR for pitchers - while FIP-based stats might be more predictive, results-based stats tell you everything much more of what you need to know about what actually happened.

   54. madvillain Posted: March 03, 2014 at 08:31 PM (#4665765)
Second, I wonder if the terabytes of data is much more than is needed daily.


I'm a little curious (tech person but not on the programming side) as to what exactly is generating those "terabytes of data". I've worked in large SQL databases containing millions of rows, they were large, but they weren't over a terabyte. Very curious from a code standpoint how this stuff gets coded from zeros and ones and how large that data is. Can the data be contained in a traditional table? If not, what is the container?

Also, with other motion tracking software like Kinect, it doesn't seem to generate large amounts of data as it can be streamed "real time" to your avatar playing digital ping pong or whatever.
   55. AROM Posted: March 03, 2014 at 10:49 PM (#4665810)
From the article in Prospectus, by Ben Lindbergh, he mentions capturing player position 30 times per second. So at minimum you have 10 on the field, plus base runners and maybe they are getting umps and base coaches too, 30x60x60x 3 hours. That's over 3 million rows for a game. A lot, but not terabytes.

Also he mentions the radar tracking the ball is sampling it at 20000 times per second. I don't know what radar data looks like when stored, but if there's rows for all of that, that would probably add up to terabytes. I would think you could have a pretty good handle on where the ball went by looking at say, every thousand radar observations. But I can't speak with any authority on this.
   56. jacksone (AKA It's OK...) Posted: March 04, 2014 at 08:47 AM (#4665901)
Do we care immediately (in evaluating him in-season) whether he caught the ball because of his speed or because of his route or because of his reaction, as long as the play was made?


We may not need to know about Hamilton's route-to-ball ability at individual points in the season, but I am sure he and his coaches would love to know. Players watch video of every at bat they have ever taken, why not watch a video of every ball hit towards them overlay-ed with the pertinent info?
   57. TDF, situational idiot Posted: March 04, 2014 at 10:06 AM (#4665931)
We may not need to know about Hamilton's route-to-ball ability at individual points in the season, but I am sure he and his coaches would love to know. Players watch video of every at bat they have ever taken, why not watch a video of every ball hit towards them overlay-ed with the pertinent info?
I'm sure they do, too, but that's terabites of data for 2 teams per game, not terabites of data for thousands of media/fans per game. Further, the data probably wouldn't add much for a good organization - the film alone would show if Hamilton gets a good jump, or if his routes are good, or if he's just doing it with his speed.

Another thought - the vast majority of the data would be instantly worthless:
So at minimum you have 10 on the field, plus base runners and maybe they are getting umps and base coaches too, 30x60x60x 3 hours. That's over 3 million rows for a game. A lot, but not terabytes.
But for most plays, only 1 fielder is actually making a play on the ball; why would you track the 1B for a line-drive down the LF line, for instance?

Another huge source of wasted data: All of the dead time. Sure, a game is 3 hours long, but the actual action doesn't take up that much time. The slowest play in baseball is a David Ortiz HR trot; he takes about 30 seconds to circle the bases. Last season, there were 53 BIP/game (PA-K-BB-HR). If every play took as long as an Ortiz HR, that's still only 26 1/2 minutes of pertinent action per game, not 3 hours.

I don't see why we would care about anything happening in the field except (1) when the ball is in play and (2) the particular fielders involved (which, on a double play or relay throw would obviously add data). Because of that, it seems that the "terabites of data" could be vastly more info than is needed.
   58. BDC Posted: March 04, 2014 at 10:38 AM (#4665959)
I don't see why we would care about anything happening in the field except (1) when the ball is in play and (2) the particular fielders involved

Ah, but what future generations of researchers wouldn't give for a chart of Lenny Dykstra's tobacco-spray pattern.
   59. AROM Posted: March 04, 2014 at 11:17 AM (#4665978)
But for most plays, only 1 fielder is actually making a play on the ball; why would you track the 1B for a line-drive down the LF line, for instance?


But you don't know it's a line drive down the left field line until the ball is hit. So you start with tracking everybody. Cutting out the noise from the useful data requires editing what is recorded. It can only be done after the play happens. One of the things people want to know is how fast a fielder reacts to the ball being hit (not to mention anticipation - maybe based on pitch location and the swing path he actually reacts before the ball is struck). A cameraman who can react fast enough to focus on the relevant players in time to record this shouldn't be a cameraman at all. His reactions would be better used on the field.
   60. McCoy Posted: March 04, 2014 at 11:26 AM (#4665990)
Secondly why wouldn't you want to track what all the players are doing on a play? I would think it would be valuable (in the sense that an extra win or a loss is worth millions) to know which players get themselves into position to good while others don't. Whiteyball at the granular level.
   61. McCoy Posted: March 04, 2014 at 11:28 AM (#4665996)
Secondly why wouldn't you want to track what all the players are doing on a play? I would think it would be valuable (in the sense that an extra win or a loss is worth millions) to know which players get themselves into position to do good while others don't. Whiteyball at the granular level.
   62. TDF, situational idiot Posted: March 04, 2014 at 11:47 AM (#4666026)
But you don't know it's a line drive down the left field line until the ball is hit. So you start with tracking everybody.
Good point. I wonder if there's an easy way to edit much of the noise out in real-time - say, by deleting the lines of data for uninvolved fielders?
Secondly why wouldn't you want to track what all the players are doing on a play? I would think it would be valuable (in the sense that an extra win or a loss is worth millions) to know which players get themselves into position to do good while others don't. Whiteyball at the granular level.
Because if they aren't involved in the play it doesn't matter. If the player is consistently in the right position, that'll show up when he is involved.
   63. McCoy Posted: March 04, 2014 at 12:09 PM (#4666043)
It didn't matter then but it could matter later. When doing projections with millions of dollars on the line why would you exclude information that would lead to a more accurate prediction?

If the player is consistently in the right position, that'll show up when he is involved.

Or it could have been a fluke. You don't know unless you track it.
   64. jacksone (AKA It's OK...) Posted: March 04, 2014 at 12:33 PM (#4666069)
Because if they aren't involved in the play it doesn't matter. If the player is consistently in the right position, that'll show up when he is involved.


Take a look at the bunt double thread. There are a few plays on there that would definitely benefit from tracking all of the fielders, especially those not involved in the initial plays.
   65. jacksone (AKA It's OK...) Posted: March 04, 2014 at 12:36 PM (#4666070)
Further, the data probably wouldn't add much for a good organization - the film alone would show if Hamilton gets a good jump, or if his routes are good, or if he's just doing it with his speed.


It may sound simple, but two lines on the video, actual route and optimal route, could help the fielder and coaches immensely. Try watching football without the 1st down line. It sucks.
   66. Textbook Editor Posted: March 04, 2014 at 01:59 PM (#4666151)
Just being able to track and quantify "route efficiency" for OF to fly balls seems amazingly, unbelievably cool. And that's just one thing.

Alas, the technology comes in right after Jeter retires...

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
danielj
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogHBT: Talking head says Jeter is “a fraud” and “you are all suckers”
(103 - 6:16am, Sep 22)
Last: The Id of SugarBear Blanks

NewsblogRoyals encounter problem with online sale of playoff tickets
(33 - 3:48am, Sep 22)
Last: Bhaakon

NewsblogOT: The Soccer Thread, September 2014
(353 - 2:01am, Sep 22)
Last: Swedish Chef

NewsblogCameron: The Stealth MVP Candidacy of Hunter Pence
(48 - 1:07am, Sep 22)
Last: shoewizard

NewsblogJohn Thorn: Fame & Fandom
(18 - 12:51am, Sep 22)
Last: Bunny Vincennes

NewsblogA’s lose Triple-A Sacramento affiliate
(92 - 12:40am, Sep 22)
Last: Toothless

NewsblogOT: NFL/NHL thread
(8037 - 12:34am, Sep 22)
Last: AuntBea

NewsblogOT: Monthly NBA Thread - September 2014
(296 - 11:51pm, Sep 21)
Last: Der-K and the statistical werewolves.

NewsblogEn Banc Court May Call Foul on Bonds Conviction
(42 - 11:50pm, Sep 21)
Last: Walt Davis

NewsblogOT August 2014:  Wrassle Mania I
(204 - 11:37pm, Sep 21)
Last: SouthSideRyan

NewsblogJames Shields is the perfect pitcher at the perfect time
(47 - 11:03pm, Sep 21)
Last: Shibal

NewsblogOT: NBC.news: Valve isn’t making one gaming console, but multiple ‘Steam machines’
(834 - 10:57pm, Sep 21)
Last: CrosbyBird

NewsblogOT: Politics, September, 2014: ESPN honors Daily Worker sports editor Lester Rodney
(3429 - 10:56pm, Sep 21)
Last: Greg K

NewsblogOMNICHATTER 9-21-2014
(102 - 10:51pm, Sep 21)
Last: salvomania

NewsblogAthletics out of top wild-card spot, Texas sweeps
(18 - 10:30pm, Sep 21)
Last: Spahn Insane

Page rendered in 0.4347 seconds
52 querie(s) executed