Resources and ideas - baseballhackday/data-and-resources GitHub Wiki
Submit and share your resources and ideas here! This document is publicly editable. (Hit "Edit" on right top)
Baseball Hack Day 2023! MARCH 11
For Baseball Hack Day, we are collecting/crowdsourcing as much “raw material” as possible for participants to get an idea for the project here. Our goal is to collect a large list of publicly accessible APIs and data feeds/sources of anything baseball-related, as well as existing tools/apps for inspiration for many years to come.
To avoid re-inventing the wheel, we encourage you to just link to other resources -- e.g., if you’ve already compiled a great list, or written a blog post, or know of another great API list -- please just copy & paste that link right here into this wiki.
That’s it. Feel free to edit for clarity. If you find an old link, try strike it through with a comment
APIs/Data source/Feed
Baseball Savant CSV downloads: https://baseballsavant.mlb.com/statcast_search
MLBAM StatsAPI (there are a lot more public endpoints for e.g. stats but these are the most useful ones)
- Schedule: https://statsapi.mlb.com/api/v1/schedule?sportId=1&date=2023-03-11
- Sports lookup (i.e. MLB, AAA, AA etc.): https://statsapi.mlb.com/api/v1/sports
- Game Boxscore: https://statsapi.mlb.com/api/v1/game/715722/boxscore
- Game live data (boxscore, lineups, play-by-play): https://statsapi.mlb.com/api/v1.1/game/715722/feed/live
- Game play-by-play only: https://statsapi.mlb.com/api/v1/game/715722/playByPlay
- People (players): https://statsapi.mlb.com/api/v1/people/430832
- Teams: https://statsapi.mlb.com/api/v1/teams/141
- MLB Media Search Service: http://m.mlb.com/ws/search/MediaSearchService?query=most%20popular&start=0&hitsPerPage=18&type=json&sort=desc&sort_type=date
- MLB Properties: http://mlb.com/properties/mlb_properties.xml http://mlb.com/lookup/json/named.team_all.bam?sport_code=%27mlb%27&active_sw=%27Y%27&all_star_sw=%27N%27
- MLB Master Scoreboard:
http://mlb.mlb.com/gdcross/components/game/mlb/year_2023/month_03/day_11/master_scoreboard.json
Other useful APIs
- Baseball Savant Statcast Search - https://baseballsavant.mlb.com/statcast_search after searching, click on the save icon (floppy disk) to download csv
- Baseball-Reference.com http://www.baseball-reference.com/about/sources.shtml
- Bill Petti's baseball r package is possibly useful (have not used it myself): https://billpetti.github.io/baseballr/
- Statcast leaderboards http://m.mlb.com/lookup/json/named.psc_leader_hit_hr_dist.bam?season=2015&game_type=%27D%27&game_type=%27L%27&game_type%27W%27&game_type=%27F%27&min_hip_count=15
- StattleShip Baseball API: api.stattleship.com Explore our Developer Documentation and Get Your API Token to see what's possible. SDKs Rails developer? Data science guru? We got your SDK! RubyGem: https://github.com/stattleship/stattleship-ruby R Wrapper: https://github.com/stattleship/stattleship-r
- SportRader API: SportRadar, formerly known as Sports Data LLC, will provide participants of Boston Baseball Hack Day with expanded access to their RealTime game data API. https://developer.sportradar.com/docs/read/Home#getting-started Baseball Hack Day Portal: [BROKEN LINK] 2014, 2015, 2016, 2017, 2018 and 2019 SPONSOR THANK YOU!
- MLB Transaction Data in tabular format: January 2001 - 2021. Github repository here
- ESPN API:1 [DEPRECATED] 2013 Sponsor, Thank you!
- Baseball Prospectus Hitter Tunnels data http://www.baseballprospectus.com/sortable/hitter_tunnels.php
- MLB-StatsAPI - Python 2/3 API Wrapper for MLB Stats API - retrieve data from newer MLB API, has some built-in functions for common queries, and supports advanced queries for data related to schedules, games, standings, stats, players, teams, etc. https://github.com/toddrob99/MLB-StatsAPI
- MLB PITCHf/x Data
http://webusers.npl.illinois.edu/~a-nathan/pob/tracking.htmBroken - Singlearity Batter vs. Pitcher predictor and game simulator . Python and R APIs for predictions based on Neural Networks. https://www.singlearity.com and https://github.com/singlearity-sports/singlearity-python
- Baseball databank http://www.baseball-databank.org/
- Retrosheet http://www.retrosheet.org - both a file format for game data, and a source of historical data
- RetroSheet data in mysql format (downloadable)
http://www.baseballheatmaps.com/retrosheet-database-download/Seems broken - Retrosheet Pitch Sequence Parser (for decomposing sequences into ball-strike count states) - https://github.com/mattdennewitz/retrosheet-pitch-sequencesss
- baseball-probable-pitchers - Retrieve a list of probable pitchers for a given day https://github.com/edwarddistel/baseball-probable-pitchers
- mlbweather - Retrieve current hourly forecast for all active mlb venues https://github.com/jaw187/mlbweather
- mlbplays - Retrieve all of the MLB Gameday plays for a given day https://github.com/jaw187/mlbplays
- mlbgames - Retrieve MLB games for a given day https://github.com/jaw187/mlbgames
- mlbboxscores - Retrieve MLB Boxscores for a given day https://github.com/jaw187/mlbboxscores
- mlbplayers - Retrieve MLB Gameday Player Data for a given day https://github.com/jaw187/mlbplayers
- yahoo-fantasy-baseball-reader - Retrieve your fantasy team data via the Yahoo Fantasy Baseball API https://github.com/edwarddistel/yahoo-fantasy-baseball-reader
- Wells Oliver http://baseball.wellsoliver.com/
- Spatially Processesed (GIS Data) 2D and 3D MLB PITCHf/x data: https://arcg.is/11LLee
- ArcGIS for Developers: SDK downloads and Documentation https://developers.arcgis.com/
- ArcGIS API for Javascript: API documentation, Samples, Reference Documentation https://developers.arcgis.com/javascript/
- Sean Lahman's Baseball Database http://www.seanlahman.com/baseball-archive/statistics/
- Sean Lahman’s Open Source Sports http://www.seanlahman.com/open-source-sports/
- Lahman Baseball Database project http://lahman.r-forge.r-project.org/
- Erik Berg's xmlstats API https://erikberg.com/api
- MySportsFeeds API: https://www.mysportsfeeds.com
- Ruby Gameday API https://github.com/timothyf/gameday_api
- Python Gameday API https://github.com/zachpanz88/mlbgame
- Node MLB API https://github.com/erwstout/node-mlb-api
- From the creator of http://www.fisherbaseball.com, stats visualizations and pitchfx data & https://github.com/timothyf/gameday_api "If using the Gameday API for your event, I would be happy to provide support." Tim [email protected] Twitter: @tfisher @pitchfx
- Yahoo Fantasy Sports API - http://developer.yahoo.com/fantasysports/guide/
- David Keeney put the Baseball Databank into an online database, where you (anonymously even) can query the database with SQL. Look at: http://baseball.rdbhost.com
- Excellent source of updated baseball data http://www.baseballheatmaps.com/
- An exploration salaries and payrolls across MLB teams and players https://github.com/chelm/mlb_viz
Processes MLB GameDay XML to produce a boxscore object (Node.js) https://github.com/rockinghorse/gameday-boxscoreRetrieves data from MLB's GameDay servers (Node.js) https://github.com/rockinghorse/gameday-fetch- Modifies the times listed on the MLB.tv media center to have times in your time zone of choice rather than Eastern Time https://github.com/int3h/MLB.tv-Time-Zones
- A Ruby library for retrieving current Major League Baseball players, managers, teams, divisions, and leagues. https://github.com/sferik/mlb
- PHP Gameday API inspired by Timothy Fisher's Ruby API https://github.com/jasonrhodes/GamedayAPI_PHP
- Python abstraction layer for MLB.com information API https://github.com/wellsoliver/py-mlb
- Python scripts to pull down MLB Gameday files into a database https://github.com/wellsoliver/py-gameday
- Perl modules to pull down MLB Gameday files into a MongoDB database https://github.com/kruser/atbat-mongodb (as used by https://github.com/kruser/pitchfx-site)
- Complete forkable stats website https://github.com/kruser/pitchfx-site
- Historical Stats of Japanese Baseball: https://docs.google.com/document/d/1SIOq8Px0Rrj_Cu3rwyGgHh00w8o685PC6m8S5N0uvnk/view?pli=1
- "Retrolahman" baseball database. A Docker container of a CouchDB database combining both the historical Retrosheet data with the more stats-oriented data from Sean Lahman's baseball database. 1920 thru 2012. http://blog.narf.io/retrolahman
- PostgreSQL Retrosheet SQL cookbook - https://github.com/mattdennewitz/retrosheet-queries
- MARCEL projections in SQL - https://medium.com/@mattdennewitz/tinkering-with-marcel-pt-1-3cb1f9edb36d
- Introduction to pitchRx package http://cpsievert.github.io/pitchRx/ source
- openWAR https://baseballwithr.wordpress.com/2014/03/17/introduction-to-openwar/
- List of MLB players and their mlb.com ids listed under "MLBCODE": http://www.baseballprospectus.com/sortable/playerids/playerid_list.csv
- Baseball Stats from Stats Crew https://www.statscrew.com/baseball/
- TheSportsDB - Free Open Sports Metadata and Artwork API https://www.thesportsdb.com/league/4424
- Ballpark Data https://www.ballparksofbaseball.com/comparisons/current-ballparks/
- Cost of a Beer and a Hot Dog at Every MLB Ballpark https://blog.cheapism.com/mlb-hot-dog-beer-prices/
- Average per game regular season attendance in Major League Baseball from 2009 to 2022 (Need to create an account) https://www.statista.com/statistics/235634/average-attendance-per-game-in-the-mlb--regular-season/
- 2022 Ballpark Food Prices (USD) from ESPN: https://www.espn.com/mlb/story/_/id/34266746/how-inflation-affecting-your-wallet-ballpark-concessions-stand-season
- MLB payroll data https://www.spotrac.com/mlb/payroll/
Ticket Pricing
- Seat Geek has great API for venues, schedules: http://seatgeek.com/build see events http://platform.seatgeek.com/#events (Get red sox games: http://api.seatgeek.com/2/events?performers.slug=boston-red-sox ) and venues http://platform.seatgeek.com/#venues access Fenway info like http://api.seatgeek.com/2/venues/21 It is completely open without any restriction. Very easy to use. 2014 and 2015 SPONSOR THANK YOU!
- SeatGeek cost of Major League Baseball Tickets API: https://seatgeek.com/build
- Stubhub https://developer.stubhub.com/store/
- Ticketmaster (need access to their private api) http://stackoverflow.com/questions/15835558/ticketmaster-api-buy-tickets-within-website
- Ticketcity http://www.programmableweb.com/api/ticketcity
Design
Other useful stuff for building basic projects:
- Team logos: https://www.mlbstatic.com/team-logos/141.svg
- Player headshots (MLB): https://securea.mlb.com/mlb/images/players/head_shot/430832.jpg
- Player headshots (MILB): https://img.mlbstatic.com/mlb-photos/image/upload/w_180,g_auto,c_fill/v1/people/600524/headshot/milb/current
- Field diagrams: https://prod-gameday.mlbstatic.com/responsive-gameday-assets/1.2.0/images/fields/3.svg
- Player Head Shots: http://gdx.mlb.com/images/gameday/mugshots/mlb/[email protected]
- Player Action Shots: http://losangeles.angels.mlb.com/images/players/525x330/545361.jpg
- Team Style and Logo Properties - http://mlb.mlb.com/shared/properties/style/cle.json
- Team Colors - http://teamcolors.arc90.com/ - is a reference of HEX values for the brand colors of major league sporting teams. Includes SVG logos. Source on GitHub: https://github.com/arc90/teamcolors
- Team Logos as an icon font (Outdated) http://daigofuji.github.io/bbclub-font/
- Design patterns for UI - http://ui-patterns.com/patterns
- Free fonts - https://www.creativebloq.com/graphic-design-tips/best-free-fonts-for-designers-1233380
Non-sports
Other public data resources that you might want to consider for mash-ups/correlation (hint: sign-up for any API keys now so you're ready to start building):
- Intuit's (2019 sponsor) Open Source projects can be found here:
- https://opensource.intuit.com/
- https://github.com/intuit
- QuickBooks integrations can be found here:
- https://developer.intuit.com/app/developer/homepage
- https://developer.intuit.com/app/developer/qbo/docs/develop
- Weather - http://www.wunderground.com/weather/api/, https://developer.forecast.io/, https://openweathermap.org/api, https://www.ncdc.noaa.gov/cdo-web/webservices/v2
- US Census data - http://www.census.gov/developers/ https://data.census.gov/advanced https://censusreporter.org/
- US public health - http://www.healthdata.gov/
- NYC - https://data.cityofnewyork.us/
- Music data - https://developer.spotify.com/web-api/ or http://www.last.fm/api
- Movies data - http://www.deanclatworthy.com/imdb/ http://www.omdbapi.com/
- Programmable Web's API directory - http://www.programmableweb.com/apis
- Google's API directory - http://www.google.com/publicdata/directory
- Azure's API directory - https://datamarket.azure.com/browse/data?price=free
- Open Tok: http://www.tokbox.com/opentok/api/documentation/gettingstarted
- Boston https://data.cityofboston.gov/
- parking data free api http://www.parkwhiz.com/developers/
- Bitcoin APIs: https://bitcoinaverage.com/api.htm https://www.kraken.com/help/api https://www.bitstamp.net/api/ https://coinbase.com/docs/api/overview https://www.bitfinex.com/pages/api https://coinmarketcap.com/currencies/bitcoin/historical-data/
Resources/How-to’s/Inspirations/Existing tools for ideas, etc..
- Databases for sabermetricians, Part One by Colin Wyers http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/
- Fisher Baseball. http://www.fisherbaseball.com
- How to build a retrosheet database http://www.hardballtimes.com/main/blog_article/building-a-retrosheet-database-the-short-form/
- Baseline Cherrypicker - Ben Schmidt's visualization of cumulative baseball statistics
- Interesting read from Pennant App creator Steve Varga http://www.creativeapplications.net/openframeworks/pennant-ipad-openframeworks/
- How to Create Pitch Charts with Python - Blog post on scraping pitch data from yahoo.com using Python.
- How to Make a Twitter Bot in Under an Hour
Also, see our inspiration blog at http://baseballhackday.tumblr.com/
Server/Hosting
- Dot Cloud https://www.dotcloud.com/
- Heroku https://devcenter.heroku.com/
- Google https://developers.google.com/cloud/
- Nearly Free Speech https://www.nearlyfreespeech.net/
- Digital Ocean https://www.digitalocean.com/
- Amazon AWS http://aws.amazon.com/
- Azure: full-featured public cloud with Web Sites and some other services handy in hackathons available free - http://aka.ms/iaas
- GitHub Pages: Websites for you and your projects. Hosted directly from your GitHub repository. https://pages.github.com/
Tools
- iOS/Android Getting Started apps, as well as Server side SDKs to generate sessions: https://github.com/opentok
- Firebase: helps you build better mobile apps and grow your business https://firebase.google.com/
- Loopy: make cool visualizations http://ncase.me/loopy/
- Flutter: Build beautiful native apps in record time https://flutter.io/
- Hugo: The world’s fastest framework for building websites https://gohugo.io/
- Splunk Developer Overview - http://dev.splunk.com/ Splunk on GitHub - https://github.com/splunk
- CloudAnt: A highly scalable and performant JSON database service https://cloudant.com/
- Android Developer: Build Your First App https://developer.android.com/training/basics/firstapp/index.html
See past versions of this document: 2012 Google Doc 2013 Google Doc