Sports data are typically published online and in newspapers as box scores. Box scores contain a numerical view of a sporting event and are of interest to fans, handicappers, and fantasy sports players. While box scores contain a wealth of information, they are impractical for performing research.
The Sports Data Query Language (SDQL) makes box score data accessible to researchers. SDQL's simple and powerful syntax is desigend to allow queries on any imaginable situation.
SDQL is written in the open source scripting language Python and strives to maintain the elegant power of that language.
Key design ideas of SDQL include:
- queries are terse.
- queries are consistent across sports.
- parameter names are all lower case.
- short cuts are all upper case (e.g.: H is short hand or site = home).
- parameter names may contain spaces (e.g. for the NBA: points in the paint is used - not points_in_the_paint nor pointsinthepaint).
- abbreviations are avoided (e.g.: points is used rather than pts).
Key features of the SDQL include:
- access to arbitrary past and future games (e.g.: after a team has increased their total points over the last three games is ppp:points < pp:points < p:points ).
- access to running averages and sums (e.g.: teams that average less than 10 points per game is tA(points) < 10 ).
- access to mathematical combinations of parameters (e.g.: points minus opponent's points is points - o:points ).