Design and Implementation of NBA Playoff Prediction Method Based on ELO Algorithm and Graph Database

With the globalization of NBA, all eyes on the NBA playoffs are around the world. Ones celebrate the winning of their team which they like. Especially, NBA fans keep on predicting the playoffs game results. However, prediction of winning probability of teams in NBA playoffs is challenging. In order to meet the challenges, we proposed a method using ELO algorithm for prediction and leveraging Graph Database, Neo4j, for implementation. Experiment results show that, the design and implementation of the prediction system could work to some degree.


Introduction
Physical fitness has become a simple and effective way to keep fit in our daily life. It not only allows people to release and entertain themselves in today's fast-paced life but also makes their bodies stronger. As a sport, basketball is very popular among teenagers. As the highest level basketball game in the world, the NBA attracts billions of audiences every year in the playoffs, and the wins and losses of each game also create a very considerable operating profit for the gambling companies. The gambling companies give the winning odds of each team according to their unique prediction algorithm. Pan et al. put forward a method of NBA playoff prediction based on support vector machine, which has good prediction effect [1]. Qiu et al. put forward a new method for calculating the team's comprehensive strength, and established the Logistic model and Bayes Journal of Computer and Communications discriminant model [2]. The forecasting method we proposed is different from the above. We use graph database to implement ELO algorithm invented by Elo.
In this paper, our main contribution is that we proposed to use the improved ELO algorithm to predict the winning rate. ELO grading system is a method established by Elo, an American physicist of Hungarian origin, to measure the level of players in all kinds of games. It is an authoritative method to evaluate the level of games, and store all the data in graph database Neo4j. Experiment results show that, the design and implementation of the prediction system could work to some degree.
The rest of the paper is organized as follows: in Section 2, we introduce the preliminary. Section 3 introduces the architecture of this prediction system in detail, which consists of three parts: data preparation, data storage and query. Section 4 gives the algorithm of the system. In Section 5, we will discuss case testing. In Section 6, we review the relevant work and draw conclusions in Section 7.

Graph Database and Neo4j
A graph database is a database whose data model conforms to some forms of graph (or network or link) structure. The graph data model usually consists of nodes (or vertices) and (directed) edges (or arcs or links), where the nodes represent concepts (or objects) and the edges represent relationships (or connections) between these concepts (objects) [3]. Graph database management system is an online database management system, which also has the methods of adding, deleting, changing and searching graph data model. Graph database apply graph into the ability of storing data, which is a kind of high-performance data structure to store a large amount of data. It allows us to construct arbitrarily complex models freely by assembling nodes and connections with simple and abstract characteristics into relational structures, and to visually map the issues we want to describe. Graph databases show the advantages of its performance, flexibility, and agility. And now Neo4j has become one of the most commonly used graph databases.
Neo4j is one of the most prominent open source graph databases available. It allows developers to persist data more naturally from domains such as social networking and recommendation engines, where representing data as a graph of interconnected nodes is a natural choice. Neo4j significantly outperforms relational databases when querying graph data and it supports large data sets while preserving full transactional database attributes [4]. Neo4j is one of the NoSQL graph database management system. It stores data in a variety of graphs in the form of networks or trees. It can vividly and intuitively describe the real world. It is stable and efficient in the efficiency of the query and does not make the query performance to a lower level unlike the relational databases with the increase of the amount of data. Journal of Computer and Communications The main features of Neo4j: first, it consists of the nodes, relations, and attributes.
Second, the attribute of a relation or a node is a Key-Value data set. Third, every relation has its own head node and tail node. Fourth, relationships can have no attribute.
The details are shown in Figure 1: the entities are represented as the four colored nodes in the diagram, where the red ones represent teams and the pink ones represent playoff rounds. The attributes in the figure are entities' names: "San Antonio", "Golden State", "First Round" and "Conference Finals". The relationship in the graph shows that WIN and RWIN represent the winning relationship of playoff and regular season respectively.

ELO Algorithm
With the development of the network and the improvement of people's living standards, many people will compete in all kinds of competitions on the network. At present, in all major competitive platforms, there is a lack of a ranking system to judge the competitive level of users in competitive competitions. International ranking is also called "FIBA ranking" or "ELO score". It was designed by Elo (1903Elo ( -1992, an American Professor born in Hungary. It was drafted by the International Chess Federation Hierarchy Committee. It was adopted by the 1969 Plenary Session of the International Chess Federation and was formally implemented since 1970 [5].
ELO Rating Algorithm is widely used rating algorithm for ranking players in many competitive games. Players with higher ELO rating have a higher probability of winning a game than a player with lower ELO rating. ELO grading system is a method for calculating the overall level of both sides in a competition. It is an official method for evaluating the level of competition between two or groups at present. At present, it is mainly used in chess, football, basketball and electronic sports. The computing method is listed as follows:

System Architecture
In this section, we mainly introduce the architecture of this prediction system, as shown in Figure 2. It consists of three parts: data preparation, data storage, and query.
Data preparation mainly includes data selection. We select the data of playoffs and regular season according to our forecast demand. Then, according to the team's fighting situation, the win-lose relationship between teams is determined.
The data storage part mainly constructs a graph to store the team's regular and playoff data and the relationship between teams in the database. In the Neo4j graph database, we can find the battle situation between a team and any team.
Preprocessing is mainly used for data prediction and preprocessing. For each team, the name of the team is created as the vertex, and the number of wins and losses between teams is created as the winning relationship of the team. If the team enters the playoffs, then on this basis, the relationship between the team and the new playoffs will be added. The query part mainly queries the data needed for team winning rate calculation, queries each part of the data through Cypher language, then calculates each part of the data through ELO algorithm, and finally obtains the team winning probability.

Modified ELO Algorithm
The ELO algorithm was originally used in chess to calculate and evaluate the rank of two players. So we need to modify it if we want to use it in basketball game prediction. The modified ELO algorithm is listed as follows: t. name: the name of team; Specific calculations algorithms are as follows: Algorithm 1, Algorithm 2.

Experiment Environment
We run experiments with the following configurations, which are showed in Table 1.

Initial Score
The number of regular season wins in the 2018-2019 season is used as the initial score for each team (data from https://china.nba.com/), as shown in Table 2.
We select the team with the biggest and smallest difference and the same winning game in the regular season of 2018-2019 to make explanation. The details are as follows: The groups with the greatest difference in winning field are MIL and NYK.
We think of MIL as team A and NYK as team B. 60

Case Study
All data in this paper are selected from the 2015-2018 playoffs and 2018-2019 regular season data (data resource from https://china.nba.com/). We chose two teams GSW and HOU as a simple example in this section. Cypher query statements for postseason winning rate: Cypher query statement on playoff match between two teams: Specific query data are shown in Table 3.
For convenience, we define a presents GSW, and b represents HOU.
Winning gap in regular-season between GSW and HOU is: The final winning rate is: The new score after GSW winning this round is:

Related Work
With the further globalization of the NBA, a playoff team brings more and more economic benefits, so it is very meaningful to predict whether a team can enter cross-misjudgment rate is reduced to 13.3%. Through the analysis of the misjudgment information, it is found that the western team has stronger strength [6]. between two sports teams poses a challenging problem of interest to statistical scientists as well as the general public. To be effective such prediction must exploit special contextual features of the game [8]. Not only in the NBA, but also in other sports, there are many prediction methods. Stephanie Kovalchik proposed a Searching for the GOAT of tennis win prediction method. The evaluation models are divided into three categories: regression-based, point-based and pair-based comparison models. ELO algorithm is also used to judge, and the accuracy rate is 75% [9].

Conclusions
In this paper, we propose a method of using graph database to predict NBA playoffs, which uses graph database to store and ELO algorithm to predict NBA playoffs. This experiment uses graph database for data storage. Through the analysis of the real situation, the team is considered as a whole and the influence of players' ability and coaches' ability on the team is not considered. To achieve this goal, we have selected the most "new" data as far as possible, that is, the season data that represents the team's latest personnel allocation. In this way, we can ignore the influence of players and coaches in recent matches. The limitation of this experiment is that it only considers the recent strength of the team, without paying attention to the impact of changes in players and coaches. For example, the current season, the 2018-2019 finals, TOR vs GSW, will advance based on the predicted results. However, the reality is that in this round of the series, GSW lost to TOR due to the absence of some star players. In the future, we plan to take such situations into consideration.