Their tool employs a historical model that weighs the relative importance of eight statistical categories to determine which players have the best chance to take home MLB’s top awards
The nation’s ballparks are dark and empty now, with the Texas Rangers having clinched the franchise’s first World Series title a few weeks ago. Major League Baseball has shifted into offseason mode, which means the distribution of major awards before fans turn their thoughts to free agency, spring training, and the promise of a new season.MLB’s Most Valuable Player Award, to be announced Thursday, is the biggest prize of all, given annually to one player each from the American and National leagues. Since 1931, this most prestigious of baseball’s accolades has been decided by members of the Baseball Writers’ Association of America who rank their top 10 players in each league on a weighted scale. Their votes are tabulated, and the MVPs are crowned.
But could a computer program, perhaps one employing artificial intelligence, predict the MVP winners? This fall a trio of Johns Hopkins undergrads endeavored to find out.
Sophomores Alex Shane and Jacob Harris and junior Jackson Roloff are members of the Hopkins baseball team and have a passion for the game. Their team’s crackerjack 48-8 season this past spring sent them to their own world series, the NCAA Division III championship series.
When seeking out an independent study project to augment their computer science studies, they naturally turned to Anton Dahbura , executive director of the Johns Hopkins Information Security Institute who operates a sports analytics research group and has a longstanding relationship with the Baltimore Orioles. A Hopkins alum himself, Dahbura also played on the Hopkins baseball team. He became their faculty advisor and connected them with the Orioles’ front office.
"We met with a data scientist on the Orioles, and he kind of pitched us a couple ideas that the team was interested in," Shane says. "We decided that MVP prediction would be the most fun for us, and we thought we had the capability to do it."
The students developed two approaches. One was a traditional computer program focusing on eight player statistical categories-straightforward and traditional stats such as home runs and RBIs as well as those that reflect the game’s somewhat recent turn toward advanced statistical analysis, such as rOBA (reference-weighted On-Base Average). By analyzing data from the 2004 through 2022 seasons, the students determined how much weight to give each stat when attempting to predict an MVP.
A second approach used the same stats but employed AI to "train" on the numbers to ultimately determine how important each stat was in MVP selection. Because the MVP has almost exclusively been awarded to heavy-hitting position players in the modern baseball era, both programs excluded pitchers and pitching stats. (A separate honor, MLB’s Cy Young Award, awarded annually to one player each from the American and National leagues, recognizes excellent pitching.)
The results? Both programs achieved 50% accuracy when picking MVP winners since 2004. And the student’s pick for MVP was in the top five more than 90% of the time. A primary obstacle to improving accuracy is simply the nature of MVP voting: the programs dealt with cold, hard numbers; the actual MVP is selected by living, breathing sports writers, who are sometimes swayed by nonstatistical factors.
"We met with a data scientist on the Orioles, and he kind of pitched us a couple ideas that the team was interested in. We decided that MVP prediction would be the most fun for us, and we thought we had the capability to do it."
Alex Shane "It’s a vote subject to a combination of subjectivity and also the dynamics of the sports writing industry," Dubura says. "Sometimes sports writers have hometown favorites, and there are other biases that are baked into the process as well."
Maybe a player gets more votes because he’s viewed as a valuable leader in the clubhouse. Perhaps there’s a compelling narrative surrounding a player, such as in 2001, when Ichiro Suzuki of the Seattle Mariners won MVP-Suzuki’s on-field numbers were great, but there was also considerable excitement around his being the first Japanese-born player to make it big on these shores. "To make a subjective stat into an objective value is hard because you can’t measure the human mind," Harris says.
Still, the algorithms the student’s developed (and delivered to the Orioles) are not without value. Looking beyond the top spot, the programs can potentially discover up-and-comers lower on the lists. A player who shows up eighth or ninth on their calculated, predicted rankings could have MVP potential later on in his career. "You could conclude that the guy is undervalued right now and maybe that’s a free agent signing to go get," Shane says.
As for this year’s MVPs, to be announced Thursday evening? Both programs predict the NL award will go to Atlanta Braves outfielder Ronald Acuña Jr., who batted.337, reached base in more than 41% of his plate appearances, hit 41 home runs, and stole 73 bases.
Meanwhile, in the American League, the Los Angeles Angels’ Shohei Ohtani appears to be an absolute lock to win MVP. The students join just about every baseball analyst in the country in coming to this conclusion. However, their programs say the winner will be another player, Texas Rangers shortstop Corey Seager. That’s because Ohtani is an extremely rare two-way player-he’s both an excellent hitter and an excellent pitcher, and the students’ programs don’t factor in his pitching prowess. Ohtani’s abilities are so uncommon, you must go back to the legendary Babe Ruth to find comparable talent on both the mound and in the batter’s box.
The students, among others, have a term for Ohtani: A unicorn.
Science+Technology , Athletics , Student Life
baseball , computer science , artificial intelligence