LoL AI Model Part 2: Redesign MDP with Gold Diff

最新推荐文章于 2023-02-28 16:36:39 发布

XYYxyy55

最新推荐文章于 2023-02-28 16:36:39 发布

阅读量906

点赞数

AI in Video Games: Improving Decision Making in League of Legends using Real Match Statistics and Personal Preferences

Part 2: Redesigning Markov Decision Process with Gold Difference and Improving Model

Motivations and Objectives

League of Legends is a team oriented video game where on two team teams (with 5 players in each) compete for objectives and kills. Gaining an advantage enables the players to become stronger (obtain better items and level up faster) than their opponents and, as their advantage increases, the likelihood of winning the game also increases. We therefore have a sequence of events dependent on previous events that lead to one team destroying the other’s base and winning the game.

Sequences like this being modelled statistically is nothing new; for years now researchers have considered how this is applied in sports, such as basketball (https://arxiv.org/pdf/1507.01816.pdf), where a sequence of passing, dribbling and foul plays lead to a team obtaining or losing points. The aim of research such as this one mentioned is to provide more detailed insight beyond a simple box score (number of points or kill gained by player in basketball or video games respectively) and consider how teams perform when modelled as a sequence of events connected in time.

Modelling the events in this way is even more important in games such as League of Legends as taking objectives and kills lead towards both an item and level advantage. For example, a player obtaining the first kill of the game nets them gold that can be used to purchase more powerful items. With this item they are then strong enough to obtain more kills and so on until they can lead their team to a win. Facilitating a lead like this is often referred to as ‘snowballing’ as the players cumulatively gain advantages but often games are not this one sided and objects and team plays are more important.

The aim of this is project is simple; can we calculate the next best event given what has occurred previously in the game so that the likelihood of eventually leading to a win increases based on real match statistics?

However, there are many factors that lead to a player’s decision making in a game that cannot be easily measured. No how matter how much data collected, the amount of information a player can capture is beyond any that a computer can detect (at least for now!). For example, players may be over or underperforming in this game or may simply have a preference for the way they play (often defined by the types of characters they play). Some players will naturally be more aggressive and look for kills while others will play passively and push for objectives instead. Therefore, we further develop our model to allow the player to adjust the recommended play on their preferences.

Import Packages and Data

In [1]:

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image
import math
from scipy.stats import kendalltau

from IPython.display import clear_output
import timeit

import warnings
warnings.filterwarnings('ignore')

In [2]:

#kills = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\kills.csv')
#matchinfo = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\matchinfo.csv')
#monsters = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\monsters.csv')
#structures = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\structures.csv')

kills = pd.read_csv('../input/kills.csv')
matchinfo = pd.read_csv('../input/matchinfo.csv')
monsters = pd.read_csv('../input/monsters.csv')
structures = pd.read_csv('../input/structures.csv')

Introducing Gold Difference to Redesign the Markov Decision Process

In [3]:

#gold = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\gold.csv')
gold = pd.read_csv('../input/gold.csv')

In [4]:

gold = gold[gold['Type']=="golddiff"]
gold.head()

Out[4]:

	Address	Type	min_1	min_2	min_3	min_4	min_5	min_6	min_7	min_8	min_9	min_10	min_11	min_12	min_13	min_14	min_15	min_16	min_17	min_18	min_19	min_20	min_21	min_22	min_23	min_24	min_25	min_26	min_27	min_28	min_29	min_30	min_31	min_32	min_33	min_34	min_35	min_36	min_37	min_38	...	min_56	min_57	min_58	min_59	min_60	min_61	min_62	min_63	min_64	min_65	min_66	min_67	min_68	min_69	min_70	min_71	min_72	min_73	min_74	min_75	min_76	min_77	min_78	min_79	min_80	min_81	min_82	min_83	min_84	min_85	min_86	min_87	min_88	min_89	min_90	min_91	min_92	min_93	min_94	min_95
0	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	0	0	-14	-65	-268	-431	-488	-789	-494	-625	-1044	-313	-760	-697	-790	-611	240	845.0	797.0	1422.0	987.0	169.0	432.0	491.0	1205.0	1527.0	1647.0	1847.0	3750.0	4719.0	3561.0	3367.0	2886.0	2906.0	4411.0	4473.0	4639.0	4762.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	0	0	-26	-18	147	237	-152	18	88	-242	102	117	802	1420	1394	1301	1489	1563.0	1368.0	1105.0	205.0	192.0	587.0	377.0	667.0	415.0	1876.0	1244.0	2130.0	2431.0	680.0	1520.0	949.0	1894.0	2644.0	3394.0	3726.0	1165.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	0	0	10	-60	34	37	589	1064	1258	913	1233	1597	1575	3046	2922	3074	3626	3466.0	5634.0	5293.0	4597.0	4360.0	4616.0	4489.0	4880.0	5865.0	6993.0	7049.0	7029.0	7047.0	7160.0	7081.0	7582.0	9917.0	10337.0	9823.0	12307.0	13201.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	0	0	-15	25	228	-6	-243	175	-346	16	-258	-57	-190	-111	-335	-8	324	428.0	-124.0	768.0	2712.0	1813.0	198.0	1242.0	1245.0	1278.0	1240.0	-664.0	-1195.0	-1157.0	-2161.0	-2504.0	-3873.0	-3688.0	-3801.0	-3668.0	-3612.0	-5071.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	40	40	44	-36	113	158	-121	-191	23	205	156	272	-271	-896	-574	177	-425	-730.0	-318.0	478.0	926.0	761.0	-286.0	473.0	490.0	1265.0	2526.0	3890.0	4319.0	5121.0	5140.0	5141.0	6866.0	9517.0	11322.0	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

In [5]:

# Add ID column based on last 16 digits in match address for simpler matching

matchinfo['id'] = matchinfo['Address'].astype(str).str[-16:]
kills['id'] = kills['Address'].astype(str).str[-16:]
monsters['id'] = monsters['Address'].astype(str).str[-16:]
structures['id'] = structures['Address'].astype(str).str[-16:]
gold['id'] = gold['Address'].astype(str).str[-16:]

In [6]:

# Dragon became multiple types in patch v6.9 (http://leagueoflegends.wikia.com/wiki/V6.9) 
# so we remove and games before this change occured and only use games with the new dragon system

old_dragon_id = monsters[ monsters['Type']=="DRAGON"]['id'].unique()
old_dragon_id

monsters = monsters[ ~monsters['id'].isin(old_dragon_id)]
monsters = monsters.reset_index()

matchinfo = matchinfo[ ~matchinfo['id'].isin(old_dragon_id)]
matchinfo = matchinfo.reset_index()

kills = kills[ ~kills['id'].isin(old_dragon_id)]
kills = kills.reset_index()

structures = structures[ ~structures['id'].isin(old_dragon_id)]
structures = structures.reset_index()

gold = gold[ ~gold['id'].isin(old_dragon_id)]
gold = gold.reset_index()

gold.head(3)

Out[6]:

	index	Address	Type	min_2	min_3	min_4	min_5	min_6	min_7	min_8	min_9	min_10	min_11	min_12	min_13	min_14	min_15	min_16	min_17	min_18	min_19	min_20	min_21	min_22	min_23	min_24	min_25	min_26	min_27	min_28	min_29	min_30	min_31	min_32	min_33	min_34	min_35	min_36	min_37	...	min_57	min_58	min_59	min_60	min_61	min_62	min_63	min_64	min_65	min_66	min_67	min_68	min_69	min_70	min_71	min_72	min_73	min_74	min_75	min_76	min_77	min_78	min_79	min_80	min_81	min_82	min_83	min_84	min_85	min_86	min_87	min_88	min_89	min_90	min_91	min_92	min_93	min_94	min_95	id
0	335	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	0	-28	67	355	453	311	1250	1321	1470	2005	1735	2064	2985	2911	4018	4308	4344.0	4687.0	5719.0	6525.0	5506.0	6198.0	7494.0	7284.0	7191.0	8089.0	8006.0	7922.0	8548.0	9077.0	12737.0	12064.0	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	55109b5a7a91ae87
1	336	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	0	36	-133	-58	-393	-374	-115	-194	-497	-1131	-1204	-1656	-1222	-221	-1613	-824	-644.0	-851.0	-955.0	774.0	658.0	812.0	1134.0	153.0	363.0	322.0	243.0	74.0	-987.0	-2022.0	-254.0	-629.0	-672.0	-3757.0	-3858.0	-4105.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	e147296c928da5b4
2	337	http://matchhistory.na.leagueoflegends.com/en/...	golddiff	17	58	574	224	501	743	603	1018	1167	1592	1288	1140	2014	2698	3263	3822	4199.0	5032.0	4730.0	4859.0	5129.0	6546.0	6428.0	7093.0	7080.0	6786.0	7047.0	9240.0	10510.0	12205.0	15039.0	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	ea0611c44259f062

In [7]:

#Transpose Gold table, columns become matches and rows become minutes

gold_T = gold.iloc[:,3:-1].transpose()
gold_T.head(3)

Out[7]:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	...	4875	4876	4877	4878	4879	4880	4881	4882	4883	4884	4886	4887	4888	4889	4890	4891	4892	4893	4894	4895	4896	4897	4898	4899	4900	4901	4902	4903	4904	4905	4906	4907	4908	4909	4910	4911	4912	4913	4914
min_1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
min_2	0.0	0.0	17.0	8.0	-1.0	0.0	0.0	0.0	-3.0	0.0	-8.0	10.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	-510.0	0.0	0.0	0.0	-8.0	0.0	0.0	-8.0	0.0	0.0	-18.0	-8.0	0.0	0.0	8.0	8.0	-10.0	0.0	0.0	-8.0	-8.0	...	-11.0	0.0	0.0	0.0	0.0	0.0	3.0	0.0	-16.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	5.0	0.0	0.0	-5.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	13.0	0.0	0.0	0.0	0.0	-8.0	0.0	0.0	-8.0	0.0	0.0
min_3	-28.0	36.0	58.0	8.0	83.0	20.0	6.0	-17.0	-61.0	34.0	-33.0	50.0	50.0	-73.0	40.0	-71.0	-74.0	-35.0	-54.0	-465.0	-39.0	-27.0	-14.0	6.0	-18.0	-71.0	-48.0	20.0	51.0	465.0	23.0	-24.0	-74.0	-8.0	56.0	52.0	-4.0	-407.0	-64.0	6.0	...	-97.0	88.0	82.0	108.0	-47.0	-43.0	-117.0	2.0	-83.0	-75.0	-7.0	-1.0	204.0	-90.0	-74.0	161.0	14.0	-145.0	49.0	-2.0	-26.0	-22.0	47.0	114.0	-38.0	-32.0	29.0	150.0	45.0	-14.0	70.0	34.0	37.0	-187.0	-18.0	-86.0	-6.0	-97.0	-8.0

In [8]:

gold2 = pd.DataFrame()

start = timeit.default_timer()
for r in range(0,len(gold)):
    clear_output(wait=True)
    
    # Select each match column, drop any na rows and find the match id from original gold table
    gold_row = gold_T.iloc[:,r]
    gold_row = gold_row.dropna()
    gold_row_id = gold['id'][r]
    
    # Append into table so that each match and event is stacked on top of one another    
    gold2 = gold2.append(pd.DataFrame({'id':gold_row_id,'GoldDiff':gold_row}))
    
    
    stop = timeit.default_timer()
   
    if (r/len(gold)*100) < 5  :
        expected_time = "Calculating..."
        
    else:
        time_perc = timeit.default_timer()
        expected_time = np.round( ( (time_perc-start)/60 / (r/len(gold)) ),2)
        
  
        
    print("Current progress:",np.round(r/len(gold) *100, 2),"%")        
    print("Current run time:",np.round((stop - start)/60,2),"minutes")
    print("Expected Run Time:",expected_time,"minutes")

Current progress: 74.95 %

In [9]:

gold3 = gold2[['id','GoldDiff']]
gold3.head(3)

Out[9]:

	id	GoldDiff
min_1	55109b5a7a91ae87	0.0
min_2	55109b5a7a91ae87	0.0
min_3	55109b5a7a91ae87	-28.0

In [10]:

### Create minute column with index, convert from 'min_1' to just the number
gold3['Minute'] = gold3.index.to_series()
gold3['Minute'] = np.where(gold3['Minute'].str[-2]=="_", gold3['Minute'].str[-1],gold3['Minute'].str[-2:])
gold3['Minute'] = gold3['Minute'].astype(int)
gold3 = gold3.reset_index()
gold3 = gold3.sort_values(by=['id','Minute'])

gold3.head(3)

Out[10]:

	index	id	GoldDiff	Minute
22608	min_1	0001f4374a03c133	0.0	1
22609	min_2	0001f4374a03c133	0.0	2
22610	min_3	0001f4374a03c133	51.0	3

In [11]:

# Gold difference from data is relative to blue team's perspective,
# therefore we reverse this by simply multiplying amount by -1
gold3['GoldDiff'] = gold3['GoldDiff']*-1

gold3.head(10)

Out[11]:

	index	id	GoldDiff	Minute
22608	min_1	0001f4374a03c133	-0.0	1
22609	min_2	0001f4374a03c133	-0.0	2
22610	min_3	0001f4374a03c133	-51.0	3
22611	min_4	0001f4374a03c133	132.0	4
22612	min_5	0001f4374a03c133	35.0	5
22613	min_6	0001f4374a03c133	940.0	6
22614	min_7	0001f4374a03c133	589.0	7
22615	min_8	0001f4374a03c133	1391.0	8
22616	min_9	0001f4374a03c133	1151.0	9
22617	min_10	0001f4374a03c133	563.0	10

In [12]:

matchinfo.head(3)

Out[12]:

	index	League	Year	Season	Type	blueTeamTag	bResult	rResult	redTeamTag	gamelength	blueTop	blueTopChamp	blueJungle	blueJungleChamp	blueMiddle	blueMiddleChamp	blueADC	blueADCChamp	blueSupport	blueSupportChamp	redTop	redTopChamp	redJungle	redJungleChamp	redMiddle	redMiddleChamp	redADC	redADCChamp	redSupport	redSupportChamp	Address	id
0	335	NALCS	2016	Summer	Season	TSM	1	0	CLG	33	Hauntzer	Maokai	Svenskeren	Graves	Bjergsen	Zilean	Doublelift	Lucian	Biofrost	Karma	Darshan	Trundle	Xmithie	RekSai	Huhi	Viktor	Stixxay	Ezreal	aphromoo	Nami	http://matchhistory.na.leagueoflegends.com/en/...	55109b5a7a91ae87
1	336	NALCS	2016	Summer	Season	CLG	0	1	TSM	39	Darshan	Fiora	Xmithie	RekSai	Huhi	Vladimir	Stixxay	Ezreal	aphromoo	Bard	Hauntzer	Ekko	Svenskeren	Elise	Bjergsen	Viktor	Doublelift	Lucian	Biofrost	Braum	http://matchhistory.na.leagueoflegends.com/en/...	e147296c928da5b4
2	337	NALCS	2016	Summer	Season	NV	1	0	NRG	32	Seraph	Trundle	Procxin	Nidalee	Ninja	Lissandra	LOD	Ezreal	Hakuho	Karma	Quas	Maokai	Santorin	Kindred	GBM	Ziggs	ohq	Sivir	KiWiKiD	Alistar	http://matchhistory.na.leagueoflegends.com/en/...	ea0611c44259f062

In [13]:

gold4 = gold3

matchinfo2 = matchinfo[['id','rResult','gamelength']]
matchinfo2['gamlength'] = matchinfo2['gamelength'] + 1
matchinfo2['index'] = 'min_'+matchinfo2['gamelength'].astype(str)
matchinfo2['rResult2'] =  np.where(matchinfo2['rResult']==1,999999,-999999)
matchinfo2 = matchinfo2[['index','id','rResult2','gamelength']]
matchinfo2.columns = ['index','id','GoldDiff','Minute']


gold4 = gold4.append(matchinfo2)
gold4.tail()

Out[13]:

	index	id	GoldDiff	Minute
4910	min_34	377c00c79e80e193	999999.0	34
4911	min_39	22cad427dfd10959	999999.0	39
4912	min_24	671b2487ca72bfab	999999.0	24
4913	min_35	7cdb33f56fe49084	-999999.0	35
4914	min_42	13203adbaa0c1fa5	999999.0	42

In [14]:

kills = kills[ kills['Time']>0]

kills['Minute'] = kills['Time'].astype(int)

kills['Team'] = np.where( kills['Team']=="rKills","Red","Blue")
kills.head(3)

Out[14]:

	index	Address	Team	Time	Victim	Killer	Assist_1	Assist_2	Assist_3	Assist_4	x_pos	y_pos	id	Minute
0	4462	http://matchhistory.na.leagueoflegends.com/en/...	Blue	6.032	CLG Huhi	TSM Svenskeren	TSM Bjergsen	NaN	NaN	NaN	7825	8666	55109b5a7a91ae87	6
1	4463	http://matchhistory.na.leagueoflegends.com/en/...	Blue	9.428	CLG Huhi	TSM Biofrost	TSM Bjergsen	TSM Doublelift	NaN	NaN	8728	8751	55109b5a7a91ae87	9
2	4464	http://matchhistory.na.leagueoflegends.com/en/...	Blue	9.780	CLG Xmithie	TSM Bjergsen	TSM Hauntzer	TSM Svenskeren	NaN	NaN	8655	1172	55109b5a7a91ae87	9

In [15]:

# For the Kills table, we need decided to group by the minute in which the kills took place and averaged 
# the time of the kills which we use later for the order of events

f = {'Time':['mean','count']}

killsGrouped = kills.groupby( ['id','Team','Minute'] ).agg(f).reset_index()
killsGrouped.columns = ['id','Team','Minute','Time Avg','Count']
killsGrouped = killsGrouped.sort_values(by=['id','Minute'])
killsGrouped.head(3)

Out[15]:

	id	Team	Minute	Time Avg	Count
4	0001f4374a03c133	Red	4	4.635	1
5	0001f4374a03c133	Red	6	6.064	1
0	0001f4374a03c133	Blue	8	8.194	1

In [16]:

structures = structures[ structures['Time']>0]

structures['Minute'] = structures['Time'].astype(int)
structures['Team'] = np.where(structures['Team']=="bTowers","Blue",
                        np.where(structures['Team']=="binhibs","Blue","Red"))
structures2 = structures.sort_values(by=['id','Minute'])

structures2 = structures2[['id','Team','Time','Minute','Type']]
structures2.head(3)

Out[16]:

	id	Team	Time	Minute	Type
4247	0001f4374a03c133	Blue	11.182	11	OUTER_TURRET
37055	0001f4374a03c133	Red	11.006	11	OUTER_TURRET
4248	0001f4374a03c133	Blue	16.556	16	OUTER_TURRET

In [17]:

monsters['Type2'] = np.where( monsters['Type']=="FIRE_DRAGON", "DRAGON",
                    np.where( monsters['Type']=="EARTH_DRAGON","DRAGON",
                    np.where( monsters['Type']=="WATER_DRAGON","DRAGON",       
                    np.where( monsters['Type']=="AIR_DRAGON","DRAGON",   
                             monsters['Type']))))

monsters = monsters[ monsters['Time']>0]

monsters['Minute'] = monsters['Time'].astype(int)

monsters['Team'] = np.where( monsters['Team']=="bDragons","Blue",
                   np.where( monsters['Team']=="bHeralds","Blue",
                   np.where( monsters['Team']=="bBarons", "Blue", 
                           "Red")))

monsters = monsters[['id','Team','Time','Minute','Type2']]
monsters.columns = ['id','Team','Time','Minute','Type']
monsters.head(3)

Out[17]:

	id	Team	Time	Minute	Type
0	55109b5a7a91ae87	Blue	23.444	23	DRAGON
1	55109b5a7a91ae87	Blue	31.069	31	DRAGON
2	55109b5a7a91ae87	Blue	16.419	16	DRAGON

In [18]:

GoldstackedData = gold4.merge(killsGrouped, how='left',on=['id','Minute'])
 
monsters_structures_stacked = structures2.append(monsters[['id','Team','Minute','Time','Type']])

GoldstackedData2 = GoldstackedData.merge(monsters_structures_stacked, how='left',on=['id','Minute'])

GoldstackedData2 = GoldstackedData2.sort_values(by=['id','Minute'])
GoldstackedData2.head(30)

Out[18]:

	index	id	GoldDiff	Minute	Team_x	Time Avg	Count	Team_y	Time	Type
0	min_1	0001f4374a03c133	-0.0	1	NaN	NaN	NaN	NaN	NaN	NaN
1	min_2	0001f4374a03c133	-0.0	2	NaN	NaN	NaN	NaN	NaN	NaN
2	min_3	0001f4374a03c133	-51.0	3	NaN	NaN	NaN	NaN	NaN	NaN
3	min_4	0001f4374a03c133	132.0	4	Red	4.6350	1.0	NaN	NaN	NaN
4	min_5	0001f4374a03c133	35.0	5	NaN	NaN	NaN	NaN	NaN	NaN
5	min_6	0001f4374a03c133	940.0	6	Red	6.0640	1.0	NaN	NaN	NaN
6	min_7	0001f4374a03c133	589.0	7	NaN	NaN	NaN	NaN	NaN	NaN
7	min_8	0001f4374a03c133	1391.0	8	Blue	8.1940	1.0	NaN	NaN	NaN
8	min_9	0001f4374a03c133	1151.0	9	NaN	NaN	NaN	NaN	NaN	NaN
9	min_10	0001f4374a03c133	563.0	10	Red	10.4780	1.0	NaN	NaN	NaN
10	min_11	0001f4374a03c133	915.0	11	NaN	NaN	NaN	Blue	11.182	OUTER_TURRET
11	min_11	0001f4374a03c133	915.0	11	NaN	NaN	NaN	Red	11.006	OUTER_TURRET
12	min_11	0001f4374a03c133	915.0	11	NaN	NaN	NaN	Red	11.261	DRAGON
13	min_12	0001f4374a03c133	2444.0	12	NaN	NaN	NaN	NaN	NaN	NaN
14	min_13	0001f4374a03c133	1509.0	13	NaN	NaN	NaN	NaN	NaN	NaN
15	min_14	0001f4374a03c133	1859.0	14	NaN	NaN	NaN	NaN	NaN	NaN
16	min_15	0001f4374a03c133	2133.0	15	NaN	NaN	NaN	Red	15.777	RIFT_HERALD
17	min_16	0001f4374a03c133	2162.0	16	NaN	NaN	NaN	Blue	16.556	OUTER_TURRET
18	min_16	0001f4374a03c133	2162.0	16	NaN	NaN	NaN	Red	16.145	OUTER_TURRET
19	min_17	0001f4374a03c133	2243.0	17	Red	17.3870	2.0	Red	17.570	DRAGON
20	min_18	0001f4374a03c133	1793.0	18	Blue	18.0980	1.0	Red	18.378	OUTER_TURRET
21	min_18	0001f4374a03c133	1793.0	18	Red	18.0670	2.0	Red	18.378	OUTER_TURRET
22	min_19	0001f4374a03c133	3151.0	19	NaN	NaN	NaN	NaN	NaN	NaN
23	min_20	0001f4374a03c133	4194.0	20	NaN	NaN	NaN	NaN	NaN	NaN
24	min_21	0001f4374a03c133	4792.0	21	Red	21.7900	1.0	NaN	NaN	NaN
25	min_22	0001f4374a03c133	4494.0	22	Blue	22.5125	2.0	Red	22.915	BARON_NASHOR
26	min_22	0001f4374a03c133	4494.0	22	Red	22.4578	5.0	Red	22.915	BARON_NASHOR
27	min_23	0001f4374a03c133	4833.0	23	NaN	NaN	NaN	Red	23.885	DRAGON
28	min_24	0001f4374a03c133	7907.0	24	NaN	NaN	NaN	Red	24.943	INNER_TURRET
29	min_25	0001f4374a03c133	7681.0	25	NaN	NaN	NaN	Red	25.463	INNER_TURRET

In [19]:

GoldstackedData3 = GoldstackedData2
GoldstackedData3['Time2'] = GoldstackedData3['Time'].fillna(GoldstackedData3['Time Avg']).fillna(GoldstackedData3['Minute'])
GoldstackedData3['Team'] = GoldstackedData3['Team_x'].fillna(GoldstackedData3['Team_y'])
GoldstackedData3 = GoldstackedData3.sort_values(by=['id','Time2'])

GoldstackedData3['EventNum'] = GoldstackedData3.groupby('id').cumcount()+1

GoldstackedData3 = GoldstackedData3[['id','EventNum','Team','Minute','Time2','GoldDiff','Count','Type']]

GoldstackedData3.columns = ['id','EventNum','Team','Minute','Time','GoldDiff','KillCount','Struct/Monster']

GoldstackedData3.head(30)

Out[19]:

	id	EventNum	Team	Minute	Time	GoldDiff	KillCount	Struct/Monster
0	0001f4374a03c133	1	NaN	1	1.000	-0.0	NaN	NaN
1	0001f4374a03c133	2	NaN	2	2.000	-0.0	NaN	NaN
2	0001f4374a03c133	3	NaN	3	3.000	-51.0	NaN	NaN
3	0001f4374a03c133	4	Red	4	4.635	132.0	1.0	NaN
4	0001f4374a03c133	5	NaN	5	5.000	35.0	NaN	NaN
5	0001f4374a03c133	6	Red	6	6.064	940.0	1.0	NaN
6	0001f4374a03c133	7	NaN	7	7.000	589.0	NaN	NaN
7	0001f4374a03c133	8	Blue	8	8.194	1391.0	1.0	NaN
8	0001f4374a03c133	9	NaN	9	9.000	1151.0	NaN	NaN
9	0001f4374a03c133	10	Red	10	10.478	563.0	1.0	NaN
11	0001f4374a03c133	11	Red	11	11.006	915.0	NaN	OUTER_TURRET
10	0001f4374a03c133	12	Blue	11	11.182	915.0	NaN	OUTER_TURRET
12	0001f4374a03c133	13	Red	11	11.261	915.0	NaN	DRAGON
13	0001f4374a03c133	14	NaN	12	12.000	2444.0	NaN	NaN
14	0001f4374a03c133	15	NaN	13	13.000	1509.0	NaN	NaN
15	0001f4374a03c133	16	NaN	14	14.000	1859.0	NaN	NaN
16	0001f4374a03c133	17	Red	15	15.777	2133.0	NaN	RIFT_HERALD
18	0001f4374a03c133	18	Red	16	16.145	2162.0	NaN	OUTER_TURRET
17	0001f4374a03c133	19	Blue	16	16.556	2162.0	NaN	OUTER_TURRET
19	0001f4374a03c133	20	Red	17	17.570	2243.0	2.0	DRAGON
20	0001f4374a03c133	21	Blue	18	18.378	1793.0	1.0	OUTER_TURRET
21	0001f4374a03c133	22	Red	18	18.378	1793.0	2.0	OUTER_TURRET
22	0001f4374a03c133	23	NaN	19	19.000	3151.0	NaN	NaN
23	0001f4374a03c133	24	NaN	20	20.000	4194.0	NaN	NaN
24	0001f4374a03c133	25	Red	21	21.790	4792.0	1.0	NaN
25	0001f4374a03c133	26	Blue	22	22.915	4494.0	2.0	BARON_NASHOR
26	0001f4374a03c133	27	Red	22	22.915	4494.0	5.0	BARON_NASHOR
27	0001f4374a03c133	28	Red	23	23.885	4833.0	NaN	DRAGON
28	0001f4374a03c133	29	Red	24	24.943	7907.0	NaN	INNER_TURRET
29	0001f4374a03c133	30	Red	25	25.463	7681.0	NaN	INNER_TURRET

In [20]:

GoldstackedData3[GoldstackedData3['GoldDiff']==999999].head(3)

Out[20]:

	id	EventNum	Team	Minute	Time	GoldDiff	KillCount	Struct/Monster
238225	0001f4374a03c133	44	NaN	34	34.0	999999.0	NaN	NaN
240809	0016710a48fdd46d	55	NaN	40	40.0	999999.0	NaN	NaN
241084	0016c9df37278448	55	NaN	40	40.0	999999.0	NaN	NaN

In [21]:

# We then add an 'Event' column to merge the columns into one, where kills are now
# simple labelled as 'KILLS'

GoldstackedData3['Event'] = np.where(GoldstackedData3['KillCount']>0,"KILLS",None)
GoldstackedData3['Event'] = GoldstackedData3['Event'].fillna(GoldstackedData3['Struct/Monster'])

GoldstackedData3['Event'] = GoldstackedData3['Event'].fillna("NONE")

GoldstackedData3['GoldDiff2'] = np.where( GoldstackedData3['GoldDiff']== 999999,"WIN",
                                np.where( GoldstackedData3['GoldDiff']==-999999, 'LOSS',
                                         
    
                                np.where((GoldstackedData3['GoldDiff']<1000) & (GoldstackedData3['GoldDiff']>-1000),
                                        "EVEN",
                                np.where( (GoldstackedData3['GoldDiff']>=1000) & (GoldstackedData3['GoldDiff']<2500),
                                         "SLIGHTLY_AHEAD",
                                np.where( (GoldstackedData3['GoldDiff']>=2500) & (GoldstackedData3['GoldDiff']<5000),
                                         "AHEAD",
                                np.where( (GoldstackedData3['GoldDiff']>=5000),
                                         "VERY_AHEAD",
                                         
                                np.where( (GoldstackedData3['GoldDiff']<=-1000) & (GoldstackedData3['GoldDiff']>-2500),
                                         "SLIGHTLY_BEHIND",
                                np.where( (GoldstackedData3['GoldDiff']<=-2500) & (GoldstackedData3['GoldDiff']>-5000),
                                         "BEHIND",
                                np.where( (GoldstackedData3['GoldDiff']<=-5000),
                                         "VERY_BEHIND","ERROR"
                                        
                                        )))))))))

GoldstackedData3.head(3)

Out[21]:

	id	EventNum	Team	Minute	Time	GoldDiff	KillCount	Struct/Monster	Event	GoldDiff2
0	0001f4374a03c133	1	NaN	1	1.0	-0.0	NaN	NaN	NONE	EVEN
1	0001f4374a03c133	2	NaN	2	2.0	-0.0	NaN	NaN	NONE	EVEN
2	0001f4374a03c133	3	NaN	3	3.0	-51.0	NaN	NaN	NONE	EVEN

In [22]:

GoldstackedData3[GoldstackedData3['GoldDiff2']=="ERROR"]

Out[22]:

In [23]:

GoldstackedData3[GoldstackedData3['Team']=='Blue'].head(10)

Out[23]:

	id	EventNum	Team	Minute	Time	GoldDiff	KillCount	Struct/Monster	Event	GoldDiff2
7	0001f4374a03c133	8	Blue	8	8.1940	1391.0	1.0	NaN	KILLS	SLIGHTLY_AHEAD
10	0001f4374a03c133	12	Blue	11	11.1820	915.0	NaN	OUTER_TURRET	OUTER_TURRET	EVEN
17	0001f4374a03c133	19	Blue	16	16.5560	2162.0	NaN	OUTER_TURRET	OUTER_TURRET	SLIGHTLY_AHEAD
20	0001f4374a03c133	21	Blue	18	18.3780	1793.0	1.0	OUTER_TURRET	KILLS	SLIGHTLY_AHEAD
25	0001f4374a03c133	26	Blue	22	22.9150	4494.0	2.0	BARON_NASHOR	KILLS	AHEAD
35	0001f4374a03c133	36	Blue	31	31.1685	12117.0	2.0	NaN	KILLS	VERY_AHEAD
47	0016710a48fdd46d	5	Blue	5	5.9710	-341.0	1.0	NaN	KILLS	EVEN
53	0016710a48fdd46d	12	Blue	11	11.9720	-283.0	1.0	NaN	KILLS	EVEN
55	0016710a48fdd46d	13	Blue	12	12.6030	-234.0	NaN	OUTER_TURRET	OUTER_TURRET	EVEN
58	0016710a48fdd46d	16	Blue	15	15.4360	-1190.0	NaN	DRAGON	DRAGON	SLIGHTLY_BEHIND

In [24]:

GoldstackedData3['Next_Min'] = GoldstackedData3['Minute']+1


GoldstackedData4 = GoldstackedData3.merge(gold4[['id','Minute','GoldDiff']],how='left',left_on=['id','Next_Min'],
                                         right_on=['id','Minute'])

GoldstackedData4.head(10)

Out[24]:

	id	EventNum	Team	Minute_x	Time	GoldDiff_x	KillCount	Struct/Monster	Event	GoldDiff2	Next_Min	Minute_y	GoldDiff_y
0	0001f4374a03c133	1	NaN	1	1.000	-0.0	NaN	NaN	NONE	EVEN	2	2.0	-0.0
1	0001f4374a03c133	2	NaN	2	2.000	-0.0	NaN	NaN	NONE	EVEN	3	3.0	-51.0
2	0001f4374a03c133	3	NaN	3	3.000	-51.0	NaN	NaN	NONE	EVEN	4	4.0	132.0
3	0001f4374a03c133	4	Red	4	4.635	132.0	1.0	NaN	KILLS	EVEN	5	5.0	35.0
4	0001f4374a03c133	5	NaN	5	5.000	35.0	NaN	NaN	NONE	EVEN	6	6.0	940.0
5	0001f4374a03c133	6	Red	6	6.064	940.0	1.0	NaN	KILLS	EVEN	7	7.0	589.0
6	0001f4374a03c133	7	NaN	7	7.000	589.0	NaN	NaN	NONE	EVEN	8	8.0	1391.0
7	0001f4374a03c133	8	Blue	8	8.194	1391.0	1.0	NaN	KILLS	SLIGHTLY_AHEAD	9	9.0	1151.0
8	0001f4374a03c133	9	NaN	9	9.000	1151.0	NaN	NaN	NONE	SLIGHTLY_AHEAD	10	10.0	563.0
9	0001f4374a03c133	10	Red	10	10.478	563.0	1.0	NaN	KILLS	EVEN	11	11.0	915.0

In [25]:

GoldstackedData4[ GoldstackedData4['GoldDiff_y']== -999999].head(3)

Out[25]:

	id	EventNum	Team	Minute_x	Time	GoldDiff_x	KillCount	Struct/Monster	Event	GoldDiff2	Next_Min	Minute_y	GoldDiff_y
409	0091705b03924485	30	NaN	26	26.000	-10450.0	NaN	NaN	NONE	VERY_BEHIND	27	27.0	-999999.0
491	00986b51908a63c3	79	NaN	68	68.000	-3283.0	NaN	NaN	NONE	BEHIND	69	69.0	-999999.0
538	00b13dbf1bd7aff0	44	Blue	35	35.129	-2880.0	1.0	INHIBITOR	KILLS	BEHIND	36	36.0	-999999.0

In [26]:

GoldstackedData4['GoldDiff2_Next'] =  np.where( GoldstackedData4['GoldDiff_y']== 999999,"WIN",
                                np.where( GoldstackedData4['GoldDiff_y']==-999999, 'LOSS',
                                         
    
                                np.where((GoldstackedData4['GoldDiff_y']<1000) & (GoldstackedData4['GoldDiff_y']>-1000),
                                        "EVEN",
                                np.where( (GoldstackedData4['GoldDiff_y']>=1000) & (GoldstackedData4['GoldDiff_y']<2500),
                                         "SLIGHTLY_AHEAD",
                                np.where( (GoldstackedData4['GoldDiff_y']>=2500) & (GoldstackedData4['GoldDiff_y']<5000),
                                         "AHEAD",
                                np.where( (GoldstackedData4['GoldDiff_y']>=5000),
                                         "VERY_AHEAD",
                                         
                                np.where( (GoldstackedData4['GoldDiff_y']<=-1000) & (GoldstackedData4['GoldDiff_y']>-2500),
                                         "SLIGHTLY_BEHIND",
                                np.where( (GoldstackedData4['GoldDiff_y']<=-2500) & (GoldstackedData4['GoldDiff_y']>-5000),
                                         "BEHIND",
                                np.where( (GoldstackedData4['GoldDiff_y']<=-5000),
                                         "VERY_BEHIND","ERROR"
                                        
                                        )))))))))
GoldstackedData4 = GoldstackedData4[['id','EventNum','Team','Minute_x','Time','Event','GoldDiff2','GoldDiff2_Next']]
GoldstackedData4.columns = ['id','EventNum','Team','Minute','Time','Event','GoldDiff2','GoldDiff2_Next']

GoldstackedData4['Event'] = np.where( GoldstackedData4['Team']=="Red", "+"+GoldstackedData4['Event'],
                                np.where(GoldstackedData4['Team']=="Blue", "-"+GoldstackedData4['Event'], 
                                         GoldstackedData4['Event']))

#GoldstackedData4.head(10)

In [27]:

# Errors are caused due to game ending in minute and then there is no 'next_min' info for this game but our method expects there to be
GoldstackedData4 = GoldstackedData4[GoldstackedData4['GoldDiff2_Next']!="ERROR"]
GoldstackedData4[GoldstackedData4['GoldDiff2_Next']=="ERROR"]

Out[27]:

In [28]:

GoldstackedDataFINAL = GoldstackedData4
GoldstackedDataFINAL['Min_State_Action_End'] = ((GoldstackedDataFINAL['Minute'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['GoldDiff2'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['Event'].astype(str)) + "_"  
                                       + (GoldstackedDataFINAL['GoldDiff2_Next'].astype(str))
                                      )

GoldstackedDataFINAL['MSAE'] = ((GoldstackedDataFINAL['Minute'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['GoldDiff2'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['Event'].astype(str)) + "_"  
                                       + (GoldstackedDataFINAL['GoldDiff2_Next'].astype(str))
                                      )

GoldstackedDataFINAL.head()

Out[28]:

	id	EventNum	Team	Minute	Time	Event	GoldDiff2	GoldDiff2_Next	Min_State_Action_End	MSAE
0	0001f4374a03c133	1	NaN	1	1.000	NONE	EVEN	EVEN	1_EVEN_NONE_EVEN	1_EVEN_NONE_EVEN
1	0001f4374a03c133	2	NaN	2	2.000	NONE	EVEN	EVEN	2_EVEN_NONE_EVEN	2_EVEN_NONE_EVEN
2	0001f4374a03c133	3	NaN	3	3.000	NONE	EVEN	EVEN	3_EVEN_NONE_EVEN	3_EVEN_NONE_EVEN
3	0001f4374a03c133	4	Red	4	4.635	+KILLS	EVEN	EVEN	4_EVEN_+KILLS_EVEN	4_EVEN_+KILLS_EVEN
4	0001f4374a03c133	5	NaN	5	5.000	NONE	EVEN	EVEN	5_EVEN_NONE_EVEN	5_EVEN_NONE_EVEN

In [29]:

goldMDP = GoldstackedDataFINAL[['Minute','GoldDiff2','Event','GoldDiff2_Next']]
goldMDP.columns = ['Minute','State','Action','End']
goldMDP['Counter'] = 1
goldMDP.head()

Out[29]:

	Minute	State	Action	End	Counter
0	1	EVEN	NONE	EVEN	1
1	2	EVEN	NONE	EVEN	1
2	3	EVEN	NONE	EVEN	1
3	4	EVEN	+KILLS	EVEN	1
4	5	EVEN	NONE	EVEN	1

In [30]:

goldMDP[goldMDP['End']=='ERROR'].head(3)

Out[30]:

In [31]:

goldMDP2 = goldMDP.groupby(['Minute','State','Action','End']).count().reset_index()
goldMDP2['Prob'] = goldMDP2['Counter']/(goldMDP2['Counter'].sum())
goldMDP2.head()

Out[31]:

	Minute	State	Action	End	Counter	Prob
0	1	EVEN	+KILLS	EVEN	70	0.000285
1	1	EVEN	-KILLS	EVEN	80	0.000325
2	1	EVEN	NONE	EVEN	4770	0.019408
3	1	EVEN	NONE	SLIGHTLY_AHEAD	1	0.000004
4	2	EVEN	+KILLS	EVEN	228	0.000928

In [32]:

goldMDP3 = goldMDP.groupby(['Minute','State','Action']).count().reset_index()
goldMDP3['Prob'] = goldMDP3['Counter']/(goldMDP3['Counter'].sum())
goldMDP3.head()

Out[32]:

	Minute	State	Action	End	Counter	Prob
0	1	EVEN	+KILLS	70	70	0.000285
1	1	EVEN	-KILLS	80	80	0.000325
2	1	EVEN	NONE	4771	4771	0.019412
3	2	EVEN	+KILLS	228	228	0.000928
4	2	EVEN	-KILLS	269	269	0.001094

In [33]:

goldMDP4 = goldMDP2.merge(goldMDP3[['Minute','State','Action','Prob']], how='left',on=['Minute','State','Action'] )
goldMDP4.head(20)

Out[33]:

	Minute	State	Action	End	Counter	Prob_x	Prob_y
0	1	EVEN	+KILLS	EVEN	70	0.000285	0.000285
1	1	EVEN	-KILLS	EVEN	80	0.000325	0.000325
2	1	EVEN	NONE	EVEN	4770	0.019408	0.019412
3	1	EVEN	NONE	SLIGHTLY_AHEAD	1	0.000004	0.019412
4	2	EVEN	+KILLS	EVEN	228	0.000928	0.000928
5	2	EVEN	-KILLS	EVEN	269	0.001094	0.001094
6	2	EVEN	NONE	EVEN	4457	0.018134	0.018151
7	2	EVEN	NONE	SLIGHTLY_AHEAD	2	0.000008	0.018151
8	2	EVEN	NONE	SLIGHTLY_BEHIND	2	0.000008	0.018151
9	2	SLIGHTLY_AHEAD	NONE	SLIGHTLY_AHEAD	1	0.000004	0.000004
10	3	EVEN	+DRAGON	EVEN	5	0.000020	0.000020
11	3	EVEN	+KILLS	EVEN	592	0.002409	0.002421
12	3	EVEN	+KILLS	SLIGHTLY_BEHIND	3	0.000012	0.002421
13	3	EVEN	+OUTER_TURRET	EVEN	437	0.001778	0.001778
14	3	EVEN	-DRAGON	EVEN	6	0.000024	0.000024
15	3	EVEN	-KILLS	EVEN	632	0.002571	0.002592
16	3	EVEN	-KILLS	SLIGHTLY_BEHIND	5	0.000020	0.002592
17	3	EVEN	-OUTER_TURRET	EVEN	422	0.001717	0.001717
18	3	EVEN	NONE	EVEN	3389	0.013789	0.013895
19	3	EVEN	NONE	SLIGHTLY_AHEAD	12	0.000049	0.013895

In [34]:

goldMDP4['GivenProb'] = goldMDP4['Prob_x']/goldMDP4['Prob_y']
goldMDP4 = goldMDP4.sort_values('GivenProb',ascending=False)
goldMDP4['Next_Minute'] = goldMDP4['Minute']+1
goldMDP4[(goldMDP4['State']!=goldMDP4['End'])&(goldMDP4['Counter']>10)&(goldMDP4['State']!="WIN")&(goldMDP4['State']!="LOSS")].head(10)

Out[34]:

	Minute	State	Action	End	Counter	Prob_x	Prob_y	GivenProb	Next_Minute
5397	31	BEHIND	-BASE_TURRET	VERY_BEHIND	20	0.000081	0.000102	0.800000	32
5378	31	BEHIND	+INHIBITOR	VERY_BEHIND	17	0.000069	0.000094	0.739130	32
4483	28	BEHIND	-BASE_TURRET	VERY_BEHIND	14	0.000057	0.000081	0.700000	29
5642	32	AHEAD	+INHIBITOR	VERY_AHEAD	17	0.000069	0.000102	0.680000	33
6726	35	SLIGHTLY_AHEAD	+INNER_TURRET	AHEAD	12	0.000049	0.000073	0.666667	36
4117	27	AHEAD	+BASE_TURRET	VERY_AHEAD	13	0.000053	0.000081	0.650000	28
4124	27	AHEAD	+INHIBITOR	VERY_AHEAD	11	0.000045	0.000069	0.647059	28
4771	29	BEHIND	+INHIBITOR	VERY_BEHIND	18	0.000073	0.000114	0.642857	30
3553	25	AHEAD	+BASE_TURRET	VERY_AHEAD	12	0.000049	0.000077	0.631579	26
5800	32	SLIGHTLY_AHEAD	+INNER_TURRET	AHEAD	12	0.000049	0.000077	0.631579	33

Reinforcement Learning AI Model

Now that we have our data modelled as an MDP, we can apply Reinforcement Learning. In short, this applied a model that simulates thousands of games and learns how good or bad each decision is towards reaching a win given the team’s current position.

What makes this AI is its ability to learn from its own trial and error experience. It starts with zero knowledge about the game but, as it is rewarded for reaching a win and punished for reaching a loss, it begins to recognise and remember which decisions are better than others. Our first models start with no knowledge but I later demonstrate the impact initial information about decisions can be fed into the model to represent a person’s preferences.

So how is the model learning? In short, we use Monte Carlo learning whereby each episode is a simulation of a game based on our MDP probabilities and depending on the outcome for the team, our return will vary (+1 terminal reward for win and -1 terminal reward for loss). The value of each action taken in this episode is then updated accordingly based on whether the outcome was a win or loss.

In Monte Carlo learning, we have a parameter 'gamma' that discounts the rewards and will give a higher value to immediate steps than later one. In our model, this can be understood by the fact that as we reach later stages of the games, the decisions we make will have a much larger impact on the final outcome than those made in the first few minutes. For example, losing a team fight in minute 50 is much more likely to lead to a loss than losing a team fight in the first 5 minutes.

We can re-apply our model from part 1 with some minor adjustments for our new MDP.

In [35]:

def MCModelv4(data, alpha, gamma, epsilon, reward, StartState, StartMin, StartAction, num_episodes, Max_Mins):
    
    # Initiatise variables appropiately
    
    data['V'] = 0
    data_output = data
    
    outcomes = pd.DataFrame()
    episode_return = pd.DataFrame()
    actions_output = pd.DataFrame()
    V_output = pd.DataFrame()
    
    
    Actionist = [
       'NONE',
       'KILLS', 'OUTER_TURRET', 'DRAGON', 'RIFT_HERALD', 'BARON_NASHOR',
       'INNER_TURRET', 'BASE_TURRET', 'INHIBITOR', 'NEXUS_TURRET',
       'ELDER_DRAGON']
    
    
    for e in range(0,num_episodes):
        action = []
        
        current_min = StartMin
        current_state = StartState
        
        
        
        data_e1 = data
    
    
        actions = pd.DataFrame()

        for a in range(0,100):
            
            action_table = pd.DataFrame()
       
            # Break condition if game ends or gets to a large number of mins 
            if (current_state=="WIN") | (current_state=="LOSS") | (current_min==Max_Mins):
                continue
            else:
                if a==0:
                    data_e1=data_e1
                   
                elif (len(individual_actions_count[individual_actions_count['Action']=="+RIFT_HERALD"])==1):
                    data_e1_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="-RIFT_HERALD"])==1):
                    data_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                
                elif (len(individual_actions_count[individual_actions_count['Action']=="+OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+OUTER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-OUTER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INNER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INNER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+BASE_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-BASE_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="+NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='+NEXUS_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='-NEXUS_TURRET']
                
                       
                else:
                    data_e1 = data_e1
                    
                # Break condition if we do not have enough data    
                if len(data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)])==0:
                    continue
                else:             

                    if (a>0) | (StartAction is None):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    else:
                        current_action =  StartAction


                    data_e = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)&(data_e1['Action']==current_action)]

                    data_e = data_e[data_e['GivenProb']>0]





                    data_e = data_e.sort_values('GivenProb')
                    data_e['CumProb'] = data_e['GivenProb'].cumsum()
                    data_e['CumProb'] = np.round(data_e['CumProb'],4)


                    rng = np.round(np.random.random()*data_e['CumProb'].max(),4)
                    action_table = data_e[ data_e['CumProb'] >= rng]
                    action_table = action_table[ action_table['CumProb'] == action_table['CumProb'].min()]
                    action_table = action_table.reset_index()


                    action = current_action
                    next_state = action_table['End'][0]
                    next_min = current_min+1


                    if next_state == "WIN":
                        step_reward = 10*(gamma**a)
                    elif next_state == "LOSS":
                        step_reward = -10*(gamma**a)
                    else:
                        step_reward = -0.005*(gamma**a)

                    action_table['StepReward'] = step_reward


                    action_table['Episode'] = e
                    action_table['Action_Num'] = a

                    current_action = action
                    current_min = next_min
                    current_state = next_state


                    actions = actions.append(action_table)

                    individual_actions_count = actions


        actions_output = actions_output.append(actions)
                
        episode_return = actions['StepReward'].sum()

                
        actions['Return']= episode_return
                
        data_output = data_output.merge(actions[['Minute','State','Action','End','Return']], how='left',on =['Minute','State','Action','End'])
        data_output['Return'] = data_output['Return'].fillna(0)    
             
        data_output['V'] = data_output['V'] + alpha*(data_output['Return']-data_output['V'])
        data_output = data_output.drop('Return', 1)
        
        V_outputs = pd.DataFrame({'Episode':[e],'V_total':[data_output['V'].sum()]})
        V_output = V_output.append(V_outputs)
        
                
        if current_state=="WIN":
            outcome = "WIN"
        elif current_state=="LOSS":
            outcome = "LOSS"
        else:
            outcome = "INCOMPLETE"
        outcome = pd.DataFrame({'Epsiode':[e],'Outcome':[outcome]})
        outcomes = outcomes.append(outcome)

        
   


    return(outcomes,actions_output,data_output,V_output)

In [36]:

alpha = 0.1
gamma = 0.9
num_episodes = 100
epsilon = 0.1


goldMDP4['Reward'] = 0
reward = goldMDP4['Reward']

StartMin = 15
StartState = 'EVEN'
StartAction = None
data = goldMDP4

Max_Mins = 50
start_time = timeit.default_timer()


Mdl4 = MCModelv4(data=data, alpha = alpha, gamma=gamma, epsilon = epsilon, reward = reward,
                StartMin = StartMin, StartState=StartState,StartAction=StartAction, 
                num_episodes = num_episodes, Max_Mins = Max_Mins)

elapsed = timeit.default_timer() - start_time

print("Time taken to run model:",np.round(elapsed/60,2),"mins")
print("Avg Time taken per episode:", np.round(elapsed/num_episodes,2),"secs")

Time taken to run model: 0.64 mins
Avg Time taken per episode: 0.38 secs

In [37]:

Mdl4[1].head(10)

Out[37]:

index	Minute	State	Action	End	Counter	Prob_x	Prob_y	GivenProb	Next_Minute	CumProb	StepReward	Action_Num
1319	15	EVEN	+RIFT_HERALD	EVEN	26	0.000106	0.000130	0.812500	16	1.0000	-0.005000	0
1530	16	EVEN	-OUTER_TURRET	EVEN	119	0.000484	0.000679	0.712575	17	1.0000	-0.004500	1
1731	17	EVEN	-DRAGON	SLIGHTLY_AHEAD	4	0.000016	0.000220	0.074074	18	0.0926	-0.004050	2
1978	18	SLIGHTLY_AHEAD	+DRAGON	SLIGHTLY_AHEAD	35	0.000142	0.000236	0.603448	19	1.0000	-0.003645	3
2232	19	SLIGHTLY_AHEAD	NONE	SLIGHTLY_AHEAD	232	0.000944	0.001261	0.748387	20	1.0000	-0.003280	4
2420	20	SLIGHTLY_AHEAD	+BASE_TURRET	AHEAD	2	0.000008	0.000012	0.666667	21	1.0000	-0.002952	5
2555	21	AHEAD	-INNER_TURRET	SLIGHTLY_AHEAD	1	0.000004	0.000016	0.250000	22	0.2500	-0.002657	6
2901	22	SLIGHTLY_AHEAD	-BASE_TURRET	AHEAD	1	0.000004	0.000004	1.000000	23	1.0000	-0.002391	7
3038	23	AHEAD	-INNER_TURRET	AHEAD	5	0.000020	0.000020	1.000000	24	1.0000	-0.002152	8
3309	24	AHEAD	-KILLS	AHEAD	107	0.000435	0.000635	0.685897	25	1.0000	-0.001937	9

So we now have the values of each state action pair, we can use this to select the single next best step as being the action with the highest value given the current state.

In [38]:

final_output = Mdl4[2]

final_output2 = final_output[(final_output['Minute']==StartMin)&(final_output['State']==StartState)].sort_values('V',ascending=False).head(10)
final_output2

Out[38]:

	Minute	State	Action	End	Counter	Prob_x	Prob_y	GivenProb	Next_Minute	V
1853	15	EVEN	-DRAGON	EVEN	51	0.000208	0.000252	0.822581	16	0.200944
2066	15	EVEN	-INNER_TURRET	EVEN	14	0.000057	0.000073	0.777778	16	0.145974
3034	15	EVEN	+DRAGON	EVEN	49	0.000199	0.000305	0.653333	16	0.103227
2569	15	EVEN	-OUTER_TURRET	EVEN	133	0.000541	0.000769	0.703704	16	0.085860
7494	15	EVEN	+OUTER_TURRET	SLIGHTLY_AHEAD	27	0.000110	0.000655	0.167702	16	0.056043
6950	15	EVEN	+INNER_TURRET	SLIGHTLY_AHEAD	3	0.000012	0.000061	0.200000	16	0.013789
7679	15	EVEN	-INNER_TURRET	SLIGHTLY_BEHIND	3	0.000012	0.000073	0.166667	16	0.011424
7669	15	EVEN	-RIFT_HERALD	SLIGHTLY_BEHIND	6	0.000024	0.000146	0.166667	16	0.003259
2109	15	EVEN	+KILLS	EVEN	382	0.001554	0.002018	0.770161	16	0.001017
8552	15	EVEN	+KILLS	SLIGHTLY_BEHIND	60	0.000244	0.002018	0.120968	16	0.000975

In theory, we are now ready to run the model for as many episodes as possible so that the results will eventually converge to an optimal suggestion. However, it seems that we will need a lot of epsiodes with this current method to get anything close to convergence.

In [39]:

sum_V_episodes = Mdl4[3]

plt.plot(sum_V_episodes['Episode'],sum_V_episodes['V_total'])
plt.xlabel("Epsiode")
plt.ylabel("Sum of V")
plt.title("Cumulative V by Episode")
plt.show()

Model Improvements

There are many ways we can improve this, one solution would be to greatly simplify the problem by breaking the game into segments. Often these segments are referred to as early, mid and late game and would be we would only need to consider fewer steps to reach an end goal. In this case it would not be to lead to a win but rather aiming to be at an advantage and the end of, say, 10 minute intervals.

Another solution would be to use a model that learns quicker instead of the Monte Carlo method used. This often includes Q-learning or SARSA.

We will not consider doing this here as they would require a lot of work to re-adjust the code. Instead, we will improve the rate the model learns by the way it selects its actions. Currently, we can either define the first action or it will choose randomly which means it is. However, after this first action all subsequent actions are also chosen randomly which is causing the number of episodes needed to exponentially increase for convergence to occur.

Therefore, we will introduce a basic action selection method for these known as greedy selection. This means we select the best action the majority of the time therefore continually testing the success rate of the actions it thinks are best. It will also randomly select actions some of the time so that it can still explore states and doesnt get caught in a local maximum and not the optimal sulution.

Also parameters play a key part in how quickly the output will converge, none moreso than our alpha parameter. Although a small alpha will be more accurate, a larger alpha value will learn and therefore subsequently converge faster.

We will adjust our code to utilise both greedy selection and a reasonably large alpha as well as running for as many episodes as possible. More episodes means the runtime will be longer but because, at least at this stage, we are not attemping to have this running in real time there is no issue with it running for a while to find an optimal suggestion.

It takes approximately 10 mins to run the model for 1,000 episodes, we have also included a tracking system for the run progress which shows the current percentage (based on which episode) the loop is in. This is particuarly useful if running for anything more than a few minutes.

It was at this stage, I also notice that my method for applying the update rule was overriding previous knowledge if the action wasnt used in the current episode and so all results were converging back to zero after each episode. I fixed this by making it so if the state + action wasnt used in the episode, then it simply remians the same and will show in our results as being the flat parts of the lines.

Lastly, we output the value of each of our available actions given our start state in each episode so we can track the learning process for these and how, over many episodes, our optimal action is decided.

In [40]:

from IPython.display import clear_output

In [41]:

def MCModelv5(data, alpha, gamma, epsilon, reward, StartState, StartMin, StartAction, num_episodes, Max_Mins):
    
    # Initiatise variables appropiately
    
    data['V'] = 0
    data_output = data
    
    outcomes = pd.DataFrame()
    episode_return = pd.DataFrame()
    actions_output = pd.DataFrame()
    V_output = pd.DataFrame()
    
    
    Actionist = [
       'NONE',
       'KILLS', 'OUTER_TURRET', 'DRAGON', 'RIFT_HERALD', 'BARON_NASHOR',
       'INNER_TURRET', 'BASE_TURRET', 'INHIBITOR', 'NEXUS_TURRET',
       'ELDER_DRAGON']
        
    for e in range(0,num_episodes):
        clear_output(wait=True)
        
        action = []
        
        current_min = StartMin
        current_state = StartState
        
        
        
        data_e1 = data
    
    
        actions = pd.DataFrame()

        for a in range(0,100):
            
            action_table = pd.DataFrame()
       
            # Break condition if game ends or gets to a large number of mins 
            if (current_state=="WIN") | (current_state=="LOSS") | (current_min==Max_Mins):
                continue
            else:
                if a==0:
                    data_e1=data_e1
                   
                elif (len(individual_actions_count[individual_actions_count['Action']=="+RIFT_HERALD"])==1):
                    data_e1_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="-RIFT_HERALD"])==1):
                    data_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                
                elif (len(individual_actions_count[individual_actions_count['Action']=="+OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+OUTER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-OUTER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INNER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INNER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+BASE_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-BASE_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="+NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='+NEXUS_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='-NEXUS_TURRET']
                
                       
                else:
                    data_e1 = data_e1
                    
                # Break condition if we do not have enough data    
                if len(data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)])==0:
                    continue
                else:             

                    
                    # Greedy Selection:
                    # If this is our first action and start action is non, select greedily. 
                    # Else, if first actions is given in our input then we use this as our start action. 
                    # Else for other actions, if it is the first episode then we have no knowledge so randomly select actions
                    # Else for other actions, we randomly select actions a percentage of the time based on our epsilon and greedily (max V) for the rest 
                    
                    
                    if   (a==0) & (StartAction is None):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    elif (a==0):
                        current_action =  StartAction
                    
                    elif (e==0) & (a>0):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    
                    elif (e>0) & (a>0):
                        epsilon = epsilon
                        greedy_rng = np.round(np.random.random(),2)
                        if (greedy_rng<=epsilon):
                            random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                            random_action = random_action.reset_index()
                            current_action = random_action['Action'][0]
                        else:
                            greedy_action = (
                            
                                data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)][
                                    
                                    data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)]['V']==data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)]['V'].max()
                                
                                ])
                                
                            greedy_action = greedy_action.reset_index()
                            current_action = greedy_action['Action'][0]
                            
                  
                    
                        
                   
                            
                        
                        
                        
                    


                    data_e = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)&(data_e1['Action']==current_action)]

                    data_e = data_e[data_e['GivenProb']>0]





                    data_e = data_e.sort_values('GivenProb')
                    data_e['CumProb'] = data_e['GivenProb'].cumsum()
                    data_e['CumProb'] = np.round(data_e['CumProb'],4)


                    rng = np.round(np.random.random()*data_e['CumProb'].max(),4)
                    action_table = data_e[ data_e['CumProb'] >= rng]
                    action_table = action_table[ action_table['CumProb'] == action_table['CumProb'].min()]
                    action_table = action_table.reset_index()


                    action = current_action
                    next_state = action_table['End'][0]
                    next_min = current_min+1


                    if next_state == "WIN":
                        step_reward = 10*(gamma**a)
                    elif next_state == "LOSS":
                        step_reward = -10*(gamma**a)
                    else:
                        step_reward = -0.005*(gamma**a)

                    action_table['StepReward'] = step_reward


                    action_table['Episode'] = e
                    action_table['Action_Num'] = a

                    current_action = action
                    current_min = next_min
                    current_state = next_state


                    actions = actions.append(action_table)

                    individual_actions_count = actions
                    
        print("Current progress:", np.round((e/num_episodes)*100,2),"%")

        actions_output = actions_output.append(actions)
                
        episode_return = actions['StepReward'].sum()

                
        actions['Return']= episode_return
                
        data_output = data_output.merge(actions[['Minute','State','Action','End','Return']], how='left',on =['Minute','State','Action','End'])
        data_output['Return'] = data_output['Return'].fillna(0)    
             
            
        data_output['V'] = np.where(data_output['Return']==0,data_output['V'],data_output['V'] + alpha*(data_output['Return']-data_output['V']))
        
        data_output = data_output.drop('Return', 1)

        
        for actions in data_output[(data_output['Minute']==StartMin)&(data_output['State']==StartState)]['Action'].unique():
            V_outputs = pd.DataFrame({'Index':[str(e)+'_'+str(actions)],'Episode':e,'StartMin':StartMin,'StartState':StartState,'Action':actions,
                                      'V':data_output[(data_output['Minute']==StartMin)&(data_output['State']==StartState)&(data_output['Action']==actions)]['V'].sum()
                                     })
            V_output = V_output.append(V_outputs)
        
        if current_state=="WIN":
            outcome = "WIN"
        elif current_state=="LOSS":
            outcome = "LOSS"
        else:
            outcome = "INCOMPLETE"
        outcome = pd.DataFrame({'Epsiode':[e],'Outcome':[outcome]})
        outcomes = outcomes.append(outcome)

        
   


    return(outcomes,actions_output,data_output,V_output)

In [42]:

alpha = 0.3
gamma = 0.9
num_episodes = 1000
epsilon = 0.2


goldMDP4['Reward'] = 0
reward = goldMDP4['Reward']

StartMin = 15
StartState = 'EVEN'
StartAction = None
data = goldMDP4

Max_Mins = 50
start_time = timeit.default_timer()


Mdl5 = MCModelv5(data=data, alpha = alpha, gamma=gamma, epsilon = epsilon, reward = reward,
                StartMin = StartMin, StartState=StartState,StartAction=StartAction, 
                num_episodes = num_episodes, Max_Mins = Max_Mins)

elapsed = timeit.default_timer() - start_time

print("Time taken to run model:",np.round(elapsed/60,2),"mins")
print("Avg Time taken per episode:", np.round(elapsed/num_episodes,2),"secs")

Current progress: 99.9 %
Time taken to run model: 11.53 mins
Avg Time taken per episode: 0.69 secs

In [43]:

Mdl5[3].sort_values(['Episode','Action']).head(10)

Out[43]:

Index	StartMin	StartState	Action	V
0_+BASE_TURRET	15	EVEN	+BASE_TURRET	0.000000
0_+DRAGON	15	EVEN	+DRAGON	0.000000
0_+INNER_TURRET	15	EVEN	+INNER_TURRET	0.000000
0_+KILLS	15	EVEN	+KILLS	0.000000
0_+OUTER_TURRET	15	EVEN	+OUTER_TURRET	0.000000
0_+RIFT_HERALD	15	EVEN	+RIFT_HERALD	0.000000
0_-DRAGON	15	EVEN	-DRAGON	0.000000
0_-INNER_TURRET	15	EVEN	-INNER_TURRET	0.000000
0_-KILLS	15	EVEN	-KILLS	0.000000
0_-OUTER_TURRET	15	EVEN	-OUTER_TURRET	-0.418229

We have also changed our sum of V output to only be for the possible actions from our start state as these are the only ones we are currently concerned with.

In [44]:

V_episodes = Mdl5[3]

plt.figure(figsize=(20,10))

for actions in V_episodes['Action'].unique():
    plot_data = V_episodes[V_episodes['Action']==actions]
    plt.plot(plot_data['Episode'],plot_data['V'])
plt.xlabel("Epsiode")
plt.ylabel("V")
plt.title("V for each Action by Episode")
#plt.show()

Out[44]:

Text(0.5,1,'V for each Action by Episode')

In [45]:

final_output = Mdl5[2]


final_output2 = final_output[(final_output['Minute']==StartMin)&(final_output['State']==StartState)]
final_output3 = final_output2.groupby(['Minute','State','Action']).sum().sort_values('V',ascending=False).reset_index()
final_output3[['Minute','State','Action','V']]

Out[45]:

	Minute	State	Action	V
0	15	EVEN	+DRAGON	2.166430
1	15	EVEN	-KILLS	1.403234
2	15	EVEN	-INNER_TURRET	1.101869
3	15	EVEN	-DRAGON	0.555246
4	15	EVEN	+RIFT_HERALD	0.529776
5	15	EVEN	+OUTER_TURRET	0.218571
6	15	EVEN	+BASE_TURRET	0.091745
7	15	EVEN	+INNER_TURRET	0.058387
8	15	EVEN	+KILLS	-0.118718
9	15	EVEN	-RIFT_HERALD	-0.294434
10	15	EVEN	NONE	-0.589646
11	15	EVEN	-OUTER_TURRET	-1.191101

In [46]:

single_action1 = final_output3['Action'][0]
single_action2 = final_output3['Action'][len(final_output3)-1]

plot_data1 = V_episodes[(V_episodes['Action']==single_action1)]
plot_data2 = V_episodes[(V_episodes['Action']==single_action2)]

plt.plot(plot_data1['Episode'],plot_data1['V'], label = single_action1)
plt.plot(plot_data2['Episode'],plot_data2['V'], label = single_action2)
plt.xlabel("Epsiode")
plt.ylabel("V")
plt.legend()
plt.title("V by Episode for the Best/Worst Actions given the Current State")
plt.show()

Part 2 Conclusion

We have now fixed our issue highlighted in part one and have a MDP that takes into account cumulative success/failures in a match by defining our current/next states by a gold advtantage/disadvantage.

We have also made a number of improvements to our model but there are many aspects that could be further improved. These will be discussed further in our next part where we will introduce personal preferences to influence the model output.