LoL AI Model Part 2: Redesign MDP with Gold Diff

AI in Video Games: Improving Decision Making in League of Legends using Real Match Statistics and Personal Preferences

Part 2: Redesigning Markov Decision Process with Gold Difference and Improving Model

Motivations and Objectives

League of Legends is a team oriented video game where on two team teams (with 5 players in each) compete for objectives and kills. Gaining an advantage enables the players to become stronger (obtain better items and level up faster) than their opponents and, as their advantage increases, the likelihood of winning the game also increases. We therefore have a sequence of events dependent on previous events that lead to one team destroying the other’s base and winning the game.

Sequences like this being modelled statistically is nothing new; for years now researchers have considered how this is applied in sports, such as basketball (https://arxiv.org/pdf/1507.01816.pdf), where a sequence of passing, dribbling and foul plays lead to a team obtaining or losing points. The aim of research such as this one mentioned is to provide more detailed insight beyond a simple box score (number of points or kill gained by player in basketball or video games respectively) and consider how teams perform when modelled as a sequence of events connected in time.

Modelling the events in this way is even more important in games such as League of Legends as taking objectives and kills lead towards both an item and level advantage. For example, a player obtaining the first kill of the game nets them gold that can be used to purchase more powerful items. With this item they are then strong enough to obtain more kills and so on until they can lead their team to a win. Facilitating a lead like this is often referred to as ‘snowballing’ as the players cumulatively gain advantages but often games are not this one sided and objects and team plays are more important.

The aim of this is project is simple; can we calculate the next best event given what has occurred previously in the game so that the likelihood of eventually leading to a win increases based on real match statistics?

However, there are many factors that lead to a player’s decision making in a game that cannot be easily measured. No how matter how much data collected, the amount of information a player can capture is beyond any that a computer can detect (at least for now!). For example, players may be over or underperforming in this game or may simply have a preference for the way they play (often defined by the types of characters they play). Some players will naturally be more aggressive and look for kills while others will play passively and push for objectives instead. Therefore, we further develop our model to allow the player to adjust the recommended play on their preferences.

Import Packages and Data

In [1]:

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image
import math
from scipy.stats import kendalltau

from IPython.display import clear_output
import timeit

import warnings
warnings.filterwarnings('ignore')

In [2]:

#kills = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\kills.csv')
#matchinfo = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\matchinfo.csv')
#monsters = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\monsters.csv')
#structures = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\structures.csv')

kills = pd.read_csv('../input/kills.csv')
matchinfo = pd.read_csv('../input/matchinfo.csv')
monsters = pd.read_csv('../input/monsters.csv')
structures = pd.read_csv('../input/structures.csv')

Introducing Gold Difference to Redesign the Markov Decision Process

In [3]:

#gold = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\gold.csv')
gold = pd.read_csv('../input/gold.csv')

In [4]:

gold = gold[gold['Type']=="golddiff"]
gold.head()

Out[4]:

 AddressTypemin_1min_2min_3min_4min_5min_6min_7min_8min_9min_10min_11min_12min_13min_14min_15min_16min_17min_18min_19min_20min_21min_22min_23min_24min_25min_26min_27min_28min_29min_30min_31min_32min_33min_34min_35min_36min_37min_38...min_56min_57min_58min_59min_60min_61min_62min_63min_64min_65min_66min_67min_68min_69min_70min_71min_72min_73min_74min_75min_76min_77min_78min_79min_80min_81min_82min_83min_84min_85min_86min_87min_88min_89min_90min_91min_92min_93min_94min_95
0http://matchhistory.na.leagueoflegends.com/en/...golddiff00-14-65-268-431-488-789-494-625-1044-313-760-697-790-611240845.0797.01422.0987.0169.0432.0491.01205.01527.01647.01847.03750.04719.03561.03367.02886.02906.04411.04473.04639.04762.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1http://matchhistory.na.leagueoflegends.com/en/...golddiff00-26-18147237-1521888-24210211780214201394130114891563.01368.01105.0205.0192.0587.0377.0667.0415.01876.01244.02130.02431.0680.01520.0949.01894.02644.03394.03726.01165.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2http://matchhistory.na.leagueoflegends.com/en/...golddiff0010-6034375891064125891312331597157530462922307436263466.05634.05293.04597.04360.04616.04489.04880.05865.06993.07049.07029.07047.07160.07081.07582.09917.010337.09823.012307.013201.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3http://matchhistory.na.leagueoflegends.com/en/...golddiff00-1525228-6-243175-34616-258-57-190-111-335-8324428.0-124.0768.02712.01813.0198.01242.01245.01278.01240.0-664.0-1195.0-1157.0-2161.0-2504.0-3873.0-3688.0-3801.0-3668.0-3612.0-5071.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4http://matchhistory.na.leagueoflegends.com/en/...golddiff404044-36113158-121-19123205156272-271-896-574177-425-730.0-318.0478.0926.0761.0-286.0473.0490.01265.02526.03890.04319.05121.05140.05141.06866.09517.011322.0NaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN

In [5]:

# Add ID column based on last 16 digits in match address for simpler matching

matchinfo['id'] = matchinfo['Address'].astype(str).str[-16:]
kills['id'] = kills['Address'].astype(str).str[-16:]
monsters['id'] = monsters['Address'].astype(str).str[-16:]
structures['id'] = structures['Address'].astype(str).str[-16:]
gold['id'] = gold['Address'].astype(str).str[-16:]

In [6]:

# Dragon became multiple types in patch v6.9 (http://leagueoflegends.wikia.com/wiki/V6.9) 
# so we remove and games before this change occured and only use games with the new dragon system

old_dragon_id = monsters[ monsters['Type']=="DRAGON"]['id'].unique()
old_dragon_id

monsters = monsters[ ~monsters['id'].isin(old_dragon_id)]
monsters = monsters.reset_index()

matchinfo = matchinfo[ ~matchinfo['id'].isin(old_dragon_id)]
matchinfo = matchinfo.reset_index()

kills = kills[ ~kills['id'].isin(old_dragon_id)]
kills = kills.reset_index()

structures = structures[ ~structures['id'].isin(old_dragon_id)]
structures = structures.reset_index()

gold = gold[ ~gold['id'].isin(old_dragon_id)]
gold = gold.reset_index()

gold.head(3)

Out[6]:

 indexAddressTypemin_1min_2min_3min_4min_5min_6min_7min_8min_9min_10min_11min_12min_13min_14min_15min_16min_17min_18min_19min_20min_21min_22min_23min_24min_25min_26min_27min_28min_29min_30min_31min_32min_33min_34min_35min_36min_37...min_57min_58min_59min_60min_61min_62min_63min_64min_65min_66min_67min_68min_69min_70min_71min_72min_73min_74min_75min_76min_77min_78min_79min_80min_81min_82min_83min_84min_85min_86min_87min_88min_89min_90min_91min_92min_93min_94min_95id
0335http://matchhistory.na.leagueoflegends.com/en/...golddiff00-286735545331112501321147020051735206429852911401843084344.04687.05719.06525.05506.06198.07494.07284.07191.08089.08006.07922.08548.09077.012737.012064.0NaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN55109b5a7a91ae87
1336http://matchhistory.na.leagueoflegends.com/en/...golddiff0036-133-58-393-374-115-194-497-1131-1204-1656-1222-221-1613-824-644.0-851.0-955.0774.0658.0812.01134.0153.0363.0322.0243.074.0-987.0-2022.0-254.0-629.0-672.0-3757.0-3858.0-4105.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNe147296c928da5b4
2337http://matchhistory.na.leagueoflegends.com/en/...golddiff017585742245017436031018116715921288114020142698326338224199.05032.04730.04859.05129.06546.06428.07093.07080.06786.07047.09240.010510.012205.015039.0NaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNea0611c44259f062

In [7]:

#Transpose Gold table, columns become matches and rows become minutes

gold_T = gold.iloc[:,3:-1].transpose()
gold_T.head(3)

Out[7]:

 0123456789101112131415161718192021222324252627282930313233343536373839...4875487648774878487948804881488248834884488548864887488848894890489148924893489448954896489748984899490049014902490349044905490649074908490949104911491249134914
min_10.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
min_20.00.017.08.0-1.00.00.00.0-3.00.0-8.010.00.00.00.00.00.00.00.0-510.00.00.00.0-8.00.00.0-8.00.00.0-18.0-8.00.00.08.08.0-10.00.00.0-8.0-8.0...-11.00.00.00.00.00.03.00.0-16.00.00.00.00.00.00.00.00.00.05.00.00.0-5.00.00.00.00.00.00.00.013.00.00.00.00.0-8.00.00.0-8.00.00.0
min_3-28.036.058.08.083.020.06.0-17.0-61.034.0-33.050.050.0-73.040.0-71.0-74.0-35.0-54.0-465.0-39.0-27.0-14.06.0-18.0-71.0-48.020.051.0465.023.0-24.0-74.0-8.056.052.0-4.0-407.0-64.06.0...-97.088.082.0108.0-47.0-43.0-117.02.0-83.0-75.00.0-7.0-1.0204.0-90.0-74.0161.014.0-145.049.0-2.0-26.0-22.047.0114.0-38.0-32.029.0150.045.0-14.070.034.037.0-187.0-18.0-86.0-6.0-97.0-8.0

In [8]:

gold2 = pd.DataFrame()

start = timeit.default_timer()
for r in range(0,len(gold)):
    clear_output(wait=True)
    
    # Select each match column, drop any na rows and find the match id from original gold table
    gold_row = gold_T.iloc[:,r]
    gold_row = gold_row.dropna()
    gold_row_id = gold['id'][r]
    
    # Append into table so that each match and event is stacked on top of one another    
    gold2 = gold2.append(pd.DataFrame({'id':gold_row_id,'GoldDiff':gold_row}))
    
    
    stop = timeit.default_timer()
   
    if (r/len(gold)*100) < 5  :
        expected_time = "Calculating..."
        
    else:
        time_perc = timeit.default_timer()
        expected_time = np.round( ( (time_perc-start)/60 / (r/len(gold)) ),2)
        
  
        
    print("Current progress:",np.round(r/len(gold) *100, 2),"%")        
    print("Current run time:",np.round((stop - start)/60,2),"minutes")
    print("Expected Run Time:",expected_time,"minutes")
    
Current progress: 74.95 %

In [9]:

gold3 = gold2[['id','GoldDiff']]
gold3.head(3)

Out[9]:

 idGoldDiff
min_155109b5a7a91ae870.0
min_255109b5a7a91ae870.0
min_355109b5a7a91ae87-28.0

In [10]:

### Create minute column with index, convert from 'min_1' to just the number
gold3['Minute'] = gold3.index.to_series()
gold3['Minute'] = np.where(gold3['Minute'].str[-2]=="_", gold3['Minute'].str[-1],gold3['Minute'].str[-2:])
gold3['Minute'] = gold3['Minute'].astype(int)
gold3 = gold3.reset_index()
gold3 = gold3.sort_values(by=['id','Minute'])

gold3.head(3)

Out[10]:

 indexidGoldDiffMinute
22608min_10001f4374a03c1330.01
22609min_20001f4374a03c1330.02
22610min_30001f4374a03c13351.03

In [11]:

# Gold difference from data is relative to blue team's perspective,
# therefore we reverse this by simply multiplying amount by -1
gold3['GoldDiff'] = gold3['GoldDiff']*-1

gold3.head(10)

Out[11]:

 indexidGoldDiffMinute
22608min_10001f4374a03c133-0.01
22609min_20001f4374a03c133-0.02
22610min_30001f4374a03c133-51.03
22611min_40001f4374a03c133132.04
22612min_50001f4374a03c13335.05
22613min_60001f4374a03c133940.06
22614min_70001f4374a03c133589.07
22615min_80001f4374a03c1331391.08
22616min_90001f4374a03c1331151.09
22617min_100001f4374a03c133563.010

In [12]:

matchinfo.head(3)

Out[12]:

 indexLeagueYearSeasonTypeblueTeamTagbResultrResultredTeamTaggamelengthblueTopblueTopChampblueJungleblueJungleChampblueMiddleblueMiddleChampblueADCblueADCChampblueSupportblueSupportChampredTopredTopChampredJungleredJungleChampredMiddleredMiddleChampredADCredADCChampredSupportredSupportChampAddressid
0335NALCS2016SummerSeasonTSM10CLG33HauntzerMaokaiSvenskerenGravesBjergsenZileanDoubleliftLucianBiofrostKarmaDarshanTrundleXmithieRekSaiHuhiViktorStixxayEzrealaphromooNamihttp://matchhistory.na.leagueoflegends.com/en/...55109b5a7a91ae87
1336NALCS2016SummerSeasonCLG01TSM39DarshanFioraXmithieRekSaiHuhiVladimirStixxayEzrealaphromooBardHauntzerEkkoSvenskerenEliseBjergsenViktorDoubleliftLucianBiofrostBraumhttp://matchhistory.na.leagueoflegends.com/en/...e147296c928da5b4
2337NALCS2016SummerSeasonNV10NRG32SeraphTrundleProcxinNidaleeNinjaLissandraLODEzrealHakuhoKarmaQuasMaokaiSantorinKindredGBMZiggsohqSivirKiWiKiDAlistarhttp://matchhistory.na.leagueoflegends.com/en/...ea0611c44259f062

In [13]:

gold4 = gold3

matchinfo2 = matchinfo[['id','rResult','gamelength']]
matchinfo2['gamlength'] = matchinfo2['gamelength'] + 1
matchinfo2['index'] = 'min_'+matchinfo2['gamelength'].astype(str)
matchinfo2['rResult2'] =  np.where(matchinfo2['rResult']==1,999999,-999999)
matchinfo2 = matchinfo2[['index','id','rResult2','gamelength']]
matchinfo2.columns = ['index','id','GoldDiff','Minute']


gold4 = gold4.append(matchinfo2)
gold4.tail()

Out[13]:

 indexidGoldDiffMinute
4910min_34377c00c79e80e193999999.034
4911min_3922cad427dfd10959999999.039
4912min_24671b2487ca72bfab999999.024
4913min_357cdb33f56fe49084-999999.035
4914min_4213203adbaa0c1fa5999999.042

In [14]:

kills = kills[ kills['Time']>0]

kills['Minute'] = kills['Time'].astype(int)

kills['Team'] = np.where( kills['Team']=="rKills","Red","Blue")
kills.head(3)

Out[14]:

 indexAddressTeamTimeVictimKillerAssist_1Assist_2Assist_3Assist_4x_posy_posidMinute
04462http://matchhistory.na.leagueoflegends.com/en/...Blue6.032CLG HuhiTSM SvenskerenTSM BjergsenNaNNaNNaN7825866655109b5a7a91ae876
14463http://matchhistory.na.leagueoflegends.com/en/...Blue9.428CLG HuhiTSM BiofrostTSM BjergsenTSM DoubleliftNaNNaN8728875155109b5a7a91ae879
24464http://matchhistory.na.leagueoflegends.com/en/...Blue9.780CLG XmithieTSM BjergsenTSM HauntzerTSM SvenskerenNaNNaN8655117255109b5a7a91ae879

In [15]:

# For the Kills table, we need decided to group by the minute in which the kills took place and averaged 
# the time of the kills which we use later for the order of events

f = {'Time':['mean','count']}

killsGrouped = kills.groupby( ['id','Team','Minute'] ).agg(f).reset_index()
killsGrouped.columns = ['id','Team','Minute','Time Avg','Count']
killsGrouped = killsGrouped.sort_values(by=['id','Minute'])
killsGrouped.head(3)

Out[15]:

 idTeamMinuteTime AvgCount
40001f4374a03c133Red44.6351
50001f4374a03c133Red66.0641
00001f4374a03c133Blue88.1941

In [16]:

structures = structures[ structures['Time']>0]

structures['Minute'] = structures['Time'].astype(int)
structures['Team'] = np.where(structures['Team']=="bTowers","Blue",
                        np.where(structures['Team']=="binhibs","Blue","Red"))
structures2 = structures.sort_values(by=['id','Minute'])

structures2 = structures2[['id','Team','Time','Minute','Type']]
structures2.head(3)

Out[16]:

 idTeamTimeMinuteType
42470001f4374a03c133Blue11.18211OUTER_TURRET
370550001f4374a03c133Red11.00611OUTER_TURRET
42480001f4374a03c133Blue16.55616OUTER_TURRET

In [17]:

monsters['Type2'] = np.where( monsters['Type']=="FIRE_DRAGON", "DRAGON",
                    np.where( monsters['Type']=="EARTH_DRAGON","DRAGON",
                    np.where( monsters['Type']=="WATER_DRAGON","DRAGON",       
                    np.where( monsters['Type']=="AIR_DRAGON","DRAGON",   
                             monsters['Type']))))

monsters = monsters[ monsters['Time']>0]

monsters['Minute'] = monsters['Time'].astype(int)

monsters['Team'] = np.where( monsters['Team']=="bDragons","Blue",
                   np.where( monsters['Team']=="bHeralds","Blue",
                   np.where( monsters['Team']=="bBarons", "Blue", 
                           "Red")))

monsters = monsters[['id','Team','Time','Minute','Type2']]
monsters.columns = ['id','Team','Time','Minute','Type']
monsters.head(3)

Out[17]:

 idTeamTimeMinuteType
055109b5a7a91ae87Blue23.44423DRAGON
155109b5a7a91ae87Blue31.06931DRAGON
255109b5a7a91ae87Blue16.41916DRAGON

In [18]:

GoldstackedData = gold4.merge(killsGrouped, how='left',on=['id','Minute'])
 
monsters_structures_stacked = structures2.append(monsters[['id','Team','Minute','Time','Type']])

GoldstackedData2 = GoldstackedData.merge(monsters_structures_stacked, how='left',on=['id','Minute'])

GoldstackedData2 = GoldstackedData2.sort_values(by=['id','Minute'])
GoldstackedData2.head(30)

Out[18]:

 indexidGoldDiffMinuteTeam_xTime AvgCountTeam_yTimeType
0min_10001f4374a03c133-0.01NaNNaNNaNNaNNaNNaN
1min_20001f4374a03c133-0.02NaNNaNNaNNaNNaNNaN
2min_30001f4374a03c133-51.03NaNNaNNaNNaNNaNNaN
3min_40001f4374a03c133132.04Red4.63501.0NaNNaNNaN
4min_50001f4374a03c13335.05NaNNaNNaNNaNNaNNaN
5min_60001f4374a03c133940.06Red6.06401.0NaNNaNNaN
6min_70001f4374a03c133589.07NaNNaNNaNNaNNaNNaN
7min_80001f4374a03c1331391.08Blue8.19401.0NaNNaNNaN
8min_90001f4374a03c1331151.09NaNNaNNaNNaNNaNNaN
9min_100001f4374a03c133563.010Red10.47801.0NaNNaNNaN
10min_110001f4374a03c133915.011NaNNaNNaNBlue11.182OUTER_TURRET
11min_110001f4374a03c133915.011NaNNaNNaNRed11.006OUTER_TURRET
12min_110001f4374a03c133915.011NaNNaNNaNRed11.261DRAGON
13min_120001f4374a03c1332444.012NaNNaNNaNNaNNaNNaN
14min_130001f4374a03c1331509.013NaNNaNNaNNaNNaNNaN
15min_140001f4374a03c1331859.014NaNNaNNaNNaNNaNNaN
16min_150001f4374a03c1332133.015NaNNaNNaNRed15.777RIFT_HERALD
17min_160001f4374a03c1332162.016NaNNaNNaNBlue16.556OUTER_TURRET
18min_160001f4374a03c1332162.016NaNNaNNaNRed16.145OUTER_TURRET
19min_170001f4374a03c1332243.017Red17.38702.0Red17.570DRAGON
20min_180001f4374a03c1331793.018Blue18.09801.0Red18.378OUTER_TURRET
21min_180001f4374a03c1331793.018Red18.06702.0Red18.378OUTER_TURRET
22min_190001f4374a03c1333151.019NaNNaNNaNNaNNaNNaN
23min_200001f4374a03c1334194.020NaNNaNNaNNaNNaNNaN
24min_210001f4374a03c1334792.021Red21.79001.0NaNNaNNaN
25min_220001f4374a03c1334494.022Blue22.51252.0Red22.915BARON_NASHOR
26min_220001f4374a03c1334494.022Red22.45785.0Red22.915BARON_NASHOR
27min_230001f4374a03c1334833.023NaNNaNNaNRed23.885DRAGON
28min_240001f4374a03c1337907.024NaNNaNNaNRed24.943INNER_TURRET
29min_250001f4374a03c1337681.025NaNNaNNaNRed25.463INNER_TURRET

In [19]:

GoldstackedData3 = GoldstackedData2
GoldstackedData3['Time2'] = GoldstackedData3['Time'].fillna(GoldstackedData3['Time Avg']).fillna(GoldstackedData3['Minute'])
GoldstackedData3['Team'] = GoldstackedData3['Team_x'].fillna(GoldstackedData3['Team_y'])
GoldstackedData3 = GoldstackedData3.sort_values(by=['id','Time2'])

GoldstackedData3['EventNum'] = GoldstackedData3.groupby('id').cumcount()+1

GoldstackedData3 = GoldstackedData3[['id','EventNum','Team','Minute','Time2','GoldDiff','Count','Type']]

GoldstackedData3.columns = ['id','EventNum','Team','Minute','Time','GoldDiff','KillCount','Struct/Monster']

GoldstackedData3.head(30)

Out[19]:

 idEventNumTeamMinuteTimeGoldDiffKillCountStruct/Monster
00001f4374a03c1331NaN11.000-0.0NaNNaN
10001f4374a03c1332NaN22.000-0.0NaNNaN
20001f4374a03c1333NaN33.000-51.0NaNNaN
30001f4374a03c1334Red44.635132.01.0NaN
40001f4374a03c1335NaN55.00035.0NaNNaN
50001f4374a03c1336Red66.064940.01.0NaN
60001f4374a03c1337NaN77.000589.0NaNNaN
70001f4374a03c1338Blue88.1941391.01.0NaN
80001f4374a03c1339NaN99.0001151.0NaNNaN
90001f4374a03c13310Red1010.478563.01.0NaN
110001f4374a03c13311Red1111.006915.0NaNOUTER_TURRET
100001f4374a03c13312Blue1111.182915.0NaNOUTER_TURRET
120001f4374a03c13313Red1111.261915.0NaNDRAGON
130001f4374a03c13314NaN1212.0002444.0NaNNaN
140001f4374a03c13315NaN1313.0001509.0NaNNaN
150001f4374a03c13316NaN1414.0001859.0NaNNaN
160001f4374a03c13317Red1515.7772133.0NaNRIFT_HERALD
180001f4374a03c13318Red1616.1452162.0NaNOUTER_TURRET
170001f4374a03c13319Blue1616.5562162.0NaNOUTER_TURRET
190001f4374a03c13320Red1717.5702243.02.0DRAGON
200001f4374a03c13321Blue1818.3781793.01.0OUTER_TURRET
210001f4374a03c13322Red1818.3781793.02.0OUTER_TURRET
220001f4374a03c13323NaN1919.0003151.0NaNNaN
230001f4374a03c13324NaN2020.0004194.0NaNNaN
240001f4374a03c13325Red2121.7904792.01.0NaN
250001f4374a03c13326Blue2222.9154494.02.0BARON_NASHOR
260001f4374a03c13327Red2222.9154494.05.0BARON_NASHOR
270001f4374a03c13328Red2323.8854833.0NaNDRAGON
280001f4374a03c13329Red2424.9437907.0NaNINNER_TURRET
290001f4374a03c13330Red2525.4637681.0NaNINNER_TURRET

In [20]:

GoldstackedData3[GoldstackedData3['GoldDiff']==999999].head(3)

Out[20]:

 idEventNumTeamMinuteTimeGoldDiffKillCountStruct/Monster
2382250001f4374a03c13344NaN3434.0999999.0NaNNaN
2408090016710a48fdd46d55NaN4040.0999999.0NaNNaN
2410840016c9df3727844855NaN4040.0999999.0NaNNaN

In [21]:

# We then add an 'Event' column to merge the columns into one, where kills are now
# simple labelled as 'KILLS'

GoldstackedData3['Event'] = np.where(GoldstackedData3['KillCount']>0,"KILLS",None)
GoldstackedData3['Event'] = GoldstackedData3['Event'].fillna(GoldstackedData3['Struct/Monster'])

GoldstackedData3['Event'] = GoldstackedData3['Event'].fillna("NONE")

GoldstackedData3['GoldDiff2'] = np.where( GoldstackedData3['GoldDiff']== 999999,"WIN",
                                np.where( GoldstackedData3['GoldDiff']==-999999, 'LOSS',
                                         
    
                                np.where((GoldstackedData3['GoldDiff']<1000) & (GoldstackedData3['GoldDiff']>-1000),
                                        "EVEN",
                                np.where( (GoldstackedData3['GoldDiff']>=1000) & (GoldstackedData3['GoldDiff']<2500),
                                         "SLIGHTLY_AHEAD",
                                np.where( (GoldstackedData3['GoldDiff']>=2500) & (GoldstackedData3['GoldDiff']<5000),
                                         "AHEAD",
                                np.where( (GoldstackedData3['GoldDiff']>=5000),
                                         "VERY_AHEAD",
                                         
                                np.where( (GoldstackedData3['GoldDiff']<=-1000) & (GoldstackedData3['GoldDiff']>-2500),
                                         "SLIGHTLY_BEHIND",
                                np.where( (GoldstackedData3['GoldDiff']<=-2500) & (GoldstackedData3['GoldDiff']>-5000),
                                         "BEHIND",
                                np.where( (GoldstackedData3['GoldDiff']<=-5000),
                                         "VERY_BEHIND","ERROR"
                                        
                                        )))))))))

GoldstackedData3.head(3)

Out[21]:

 idEventNumTeamMinuteTimeGoldDiffKillCountStruct/MonsterEventGoldDiff2
00001f4374a03c1331NaN11.0-0.0NaNNaNNONEEVEN
10001f4374a03c1332NaN22.0-0.0NaNNaNNONEEVEN
20001f4374a03c1333NaN33.0-51.0NaNNaNNONEEVEN

In [22]:

GoldstackedData3[GoldstackedData3['GoldDiff2']=="ERROR"]

Out[22]:

In [23]:

GoldstackedData3[GoldstackedData3['Team']=='Blue'].head(10)

Out[23]:

 idEventNumTeamMinuteTimeGoldDiffKillCountStruct/MonsterEventGoldDiff2
70001f4374a03c1338Blue88.19401391.01.0NaNKILLSSLIGHTLY_AHEAD
100001f4374a03c13312Blue1111.1820915.0NaNOUTER_TURRETOUTER_TURRETEVEN
170001f4374a03c13319Blue1616.55602162.0NaNOUTER_TURRETOUTER_TURRETSLIGHTLY_AHEAD
200001f4374a03c13321Blue1818.37801793.01.0OUTER_TURRETKILLSSLIGHTLY_AHEAD
250001f4374a03c13326Blue2222.91504494.02.0BARON_NASHORKILLSAHEAD
350001f4374a03c13336Blue3131.168512117.02.0NaNKILLSVERY_AHEAD
470016710a48fdd46d5Blue55.9710-341.01.0NaNKILLSEVEN
530016710a48fdd46d12Blue1111.9720-283.01.0NaNKILLSEVEN
550016710a48fdd46d13Blue1212.6030-234.0NaNOUTER_TURRETOUTER_TURRETEVEN
580016710a48fdd46d16Blue1515.4360-1190.0NaNDRAGONDRAGONSLIGHTLY_BEHIND

In [24]:

GoldstackedData3['Next_Min'] = GoldstackedData3['Minute']+1


GoldstackedData4 = GoldstackedData3.merge(gold4[['id','Minute','GoldDiff']],how='left',left_on=['id','Next_Min'],
                                         right_on=['id','Minute'])

GoldstackedData4.head(10)

Out[24]:

 idEventNumTeamMinute_xTimeGoldDiff_xKillCountStruct/MonsterEventGoldDiff2Next_MinMinute_yGoldDiff_y
00001f4374a03c1331NaN11.000-0.0NaNNaNNONEEVEN22.0-0.0
10001f4374a03c1332NaN22.000-0.0NaNNaNNONEEVEN33.0-51.0
20001f4374a03c1333NaN33.000-51.0NaNNaNNONEEVEN44.0132.0
30001f4374a03c1334Red44.635132.01.0NaNKILLSEVEN55.035.0
40001f4374a03c1335NaN55.00035.0NaNNaNNONEEVEN66.0940.0
50001f4374a03c1336Red66.064940.01.0NaNKILLSEVEN77.0589.0
60001f4374a03c1337NaN77.000589.0NaNNaNNONEEVEN88.01391.0
70001f4374a03c1338Blue88.1941391.01.0NaNKILLSSLIGHTLY_AHEAD99.01151.0
80001f4374a03c1339NaN99.0001151.0NaNNaNNONESLIGHTLY_AHEAD1010.0563.0
90001f4374a03c13310Red1010.478563.01.0NaNKILLSEVEN1111.0915.0

In [25]:

GoldstackedData4[ GoldstackedData4['GoldDiff_y']== -999999].head(3)

Out[25]:

 idEventNumTeamMinute_xTimeGoldDiff_xKillCountStruct/MonsterEventGoldDiff2Next_MinMinute_yGoldDiff_y
4090091705b0392448530NaN2626.000-10450.0NaNNaNNONEVERY_BEHIND2727.0-999999.0
49100986b51908a63c379NaN6868.000-3283.0NaNNaNNONEBEHIND6969.0-999999.0
53800b13dbf1bd7aff044Blue3535.129-2880.01.0INHIBITORKILLSBEHIND3636.0-999999.0

In [26]:

GoldstackedData4['GoldDiff2_Next'] =  np.where( GoldstackedData4['GoldDiff_y']== 999999,"WIN",
                                np.where( GoldstackedData4['GoldDiff_y']==-999999, 'LOSS',
                                         
    
                                np.where((GoldstackedData4['GoldDiff_y']<1000) & (GoldstackedData4['GoldDiff_y']>-1000),
                                        "EVEN",
                                np.where( (GoldstackedData4['GoldDiff_y']>=1000) & (GoldstackedData4['GoldDiff_y']<2500),
                                         "SLIGHTLY_AHEAD",
                                np.where( (GoldstackedData4['GoldDiff_y']>=2500) & (GoldstackedData4['GoldDiff_y']<5000),
                                         "AHEAD",
                                np.where( (GoldstackedData4['GoldDiff_y']>=5000),
                                         "VERY_AHEAD",
                                         
                                np.where( (GoldstackedData4['GoldDiff_y']<=-1000) & (GoldstackedData4['GoldDiff_y']>-2500),
                                         "SLIGHTLY_BEHIND",
                                np.where( (GoldstackedData4['GoldDiff_y']<=-2500) & (GoldstackedData4['GoldDiff_y']>-5000),
                                         "BEHIND",
                                np.where( (GoldstackedData4['GoldDiff_y']<=-5000),
                                         "VERY_BEHIND","ERROR"
                                        
                                        )))))))))
GoldstackedData4 = GoldstackedData4[['id','EventNum','Team','Minute_x','Time','Event','GoldDiff2','GoldDiff2_Next']]
GoldstackedData4.columns = ['id','EventNum','Team','Minute','Time','Event','GoldDiff2','GoldDiff2_Next']

GoldstackedData4['Event'] = np.where( GoldstackedData4['Team']=="Red", "+"+GoldstackedData4['Event'],
                                np.where(GoldstackedData4['Team']=="Blue", "-"+GoldstackedData4['Event'], 
                                         GoldstackedData4['Event']))

#GoldstackedData4.head(10)

In [27]:

# Errors are caused due to game ending in minute and then there is no 'next_min' info for this game but our method expects there to be
GoldstackedData4 = GoldstackedData4[GoldstackedData4['GoldDiff2_Next']!="ERROR"]
GoldstackedData4[GoldstackedData4['GoldDiff2_Next']=="ERROR"]

Out[27]:

In [28]:

GoldstackedDataFINAL = GoldstackedData4
GoldstackedDataFINAL['Min_State_Action_End'] = ((GoldstackedDataFINAL['Minute'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['GoldDiff2'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['Event'].astype(str)) + "_"  
                                       + (GoldstackedDataFINAL['GoldDiff2_Next'].astype(str))
                                      )

GoldstackedDataFINAL['MSAE'] = ((GoldstackedDataFINAL['Minute'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['GoldDiff2'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['Event'].astype(str)) + "_"  
                                       + (GoldstackedDataFINAL['GoldDiff2_Next'].astype(str))
                                      )

GoldstackedDataFINAL.head()

Out[28]:

 idEventNumTeamMinuteTimeEventGoldDiff2GoldDiff2_NextMin_State_Action_EndMSAE
00001f4374a03c1331NaN11.000NONEEVENEVEN1_EVEN_NONE_EVEN1_EVEN_NONE_EVEN
10001f4374a03c1332NaN22.000NONEEVENEVEN2_EVEN_NONE_EVEN2_EVEN_NONE_EVEN
20001f4374a03c1333NaN33.000NONEEVENEVEN3_EVEN_NONE_EVEN3_EVEN_NONE_EVEN
30001f4374a03c1334Red44.635+KILLSEVENEVEN4_EVEN_+KILLS_EVEN4_EVEN_+KILLS_EVEN
40001f4374a03c1335NaN55.000NONEEVENEVEN5_EVEN_NONE_EVEN5_EVEN_NONE_EVEN

In [29]:

goldMDP = GoldstackedDataFINAL[['Minute','GoldDiff2','Event','GoldDiff2_Next']]
goldMDP.columns = ['Minute','State','Action','End']
goldMDP['Counter'] = 1
goldMDP.head()

Out[29]:

 MinuteStateActionEndCounter
01EVENNONEEVEN1
12EVENNONEEVEN1
23EVENNONEEVEN1
34EVEN+KILLSEVEN1
45EVENNONEEVEN1

In [30]:

goldMDP[goldMDP['End']=='ERROR'].head(3)

Out[30]:

In [31]:

goldMDP2 = goldMDP.groupby(['Minute','State','Action','End']).count().reset_index()
goldMDP2['Prob'] = goldMDP2['Counter']/(goldMDP2['Counter'].sum())
goldMDP2.head()

Out[31]:

 MinuteStateActionEndCounterProb
01EVEN+KILLSEVEN700.000285
11EVEN-KILLSEVEN800.000325
21EVENNONEEVEN47700.019408
31EVENNONESLIGHTLY_AHEAD10.000004
42EVEN+KILLSEVEN2280.000928

In [32]:

goldMDP3 = goldMDP.groupby(['Minute','State','Action']).count().reset_index()
goldMDP3['Prob'] = goldMDP3['Counter']/(goldMDP3['Counter'].sum())
goldMDP3.head()

Out[32]:

 MinuteStateActionEndCounterProb
01EVEN+KILLS70700.000285
11EVEN-KILLS80800.000325
21EVENNONE477147710.019412
32EVEN+KILLS2282280.000928
42EVEN-KILLS2692690.001094

In [33]:

goldMDP4 = goldMDP2.merge(goldMDP3[['Minute','State','Action','Prob']], how='left',on=['Minute','State','Action'] )
goldMDP4.head(20)

Out[33]:

 MinuteStateActionEndCounterProb_xProb_y
01EVEN+KILLSEVEN700.0002850.000285
11EVEN-KILLSEVEN800.0003250.000325
21EVENNONEEVEN47700.0194080.019412
31EVENNONESLIGHTLY_AHEAD10.0000040.019412
42EVEN+KILLSEVEN2280.0009280.000928
52EVEN-KILLSEVEN2690.0010940.001094
62EVENNONEEVEN44570.0181340.018151
72EVENNONESLIGHTLY_AHEAD20.0000080.018151
82EVENNONESLIGHTLY_BEHIND20.0000080.018151
92SLIGHTLY_AHEADNONESLIGHTLY_AHEAD10.0000040.000004
103EVEN+DRAGONEVEN50.0000200.000020
113EVEN+KILLSEVEN5920.0024090.002421
123EVEN+KILLSSLIGHTLY_BEHIND30.0000120.002421
133EVEN+OUTER_TURRETEVEN4370.0017780.001778
143EVEN-DRAGONEVEN60.0000240.000024
153EVEN-KILLSEVEN6320.0025710.002592
163EVEN-KILLSSLIGHTLY_BEHIND50.0000200.002592
173EVEN-OUTER_TURRETEVEN4220.0017170.001717
183EVENNONEEVEN33890.0137890.013895
193EVENNONESLIGHTLY_AHEAD120.0000490.013895

In [34]:

goldMDP4['GivenProb'] = goldMDP4['Prob_x']/goldMDP4['Prob_y']
goldMDP4 = goldMDP4.sort_values('GivenProb',ascending=False)
goldMDP4['Next_Minute'] = goldMDP4['Minute']+1
goldMDP4[(goldMDP4['State']!=goldMDP4['End'])&(goldMDP4['Counter']>10)&(goldMDP4['State']!="WIN")&(goldMDP4['State']!="LOSS")].head(10)

Out[34]:

 MinuteStateActionEndCounterProb_xProb_yGivenProbNext_Minute
539731BEHIND-BASE_TURRETVERY_BEHIND200.0000810.0001020.80000032
537831BEHIND+INHIBITORVERY_BEHIND170.0000690.0000940.73913032
448328BEHIND-BASE_TURRETVERY_BEHIND140.0000570.0000810.70000029
564232AHEAD+INHIBITORVERY_AHEAD170.0000690.0001020.68000033
672635SLIGHTLY_AHEAD+INNER_TURRETAHEAD120.0000490.0000730.66666736
411727AHEAD+BASE_TURRETVERY_AHEAD130.0000530.0000810.65000028
412427AHEAD+INHIBITORVERY_AHEAD110.0000450.0000690.64705928
477129BEHIND+INHIBITORVERY_BEHIND180.0000730.0001140.64285730
355325AHEAD+BASE_TURRETVERY_AHEAD120.0000490.0000770.63157926
580032SLIGHTLY_AHEAD+INNER_TURRETAHEAD120.0000490.0000770.63157933

Reinforcement Learning AI Model

Now that we have our data modelled as an MDP, we can apply Reinforcement Learning. In short, this applied a model that simulates thousands of games and learns how good or bad each decision is towards reaching a win given the team’s current position.

What makes this AI is its ability to learn from its own trial and error experience. It starts with zero knowledge about the game but, as it is rewarded for reaching a win and punished for reaching a loss, it begins to recognise and remember which decisions are better than others. Our first models start with no knowledge but I later demonstrate the impact initial information about decisions can be fed into the model to represent a person’s preferences.

So how is the model learning? In short, we use Monte Carlo learning whereby each episode is a simulation of a game based on our MDP probabilities and depending on the outcome for the team, our return will vary (+1 terminal reward for win and -1 terminal reward for loss). The value of each action taken in this episode is then updated accordingly based on whether the outcome was a win or loss.

In Monte Carlo learning, we have a parameter 'gamma' that discounts the rewards and will give a higher value to immediate steps than later one. In our model, this can be understood by the fact that as we reach later stages of the games, the decisions we make will have a much larger impact on the final outcome than those made in the first few minutes. For example, losing a team fight in minute 50 is much more likely to lead to a loss than losing a team fight in the first 5 minutes.

We can re-apply our model from part 1 with some minor adjustments for our new MDP.

In [35]:

def MCModelv4(data, alpha, gamma, epsilon, reward, StartState, StartMin, StartAction, num_episodes, Max_Mins):
    
    # Initiatise variables appropiately
    
    data['V'] = 0
    data_output = data
    
    outcomes = pd.DataFrame()
    episode_return = pd.DataFrame()
    actions_output = pd.DataFrame()
    V_output = pd.DataFrame()
    
    
    Actionist = [
       'NONE',
       'KILLS', 'OUTER_TURRET', 'DRAGON', 'RIFT_HERALD', 'BARON_NASHOR',
       'INNER_TURRET', 'BASE_TURRET', 'INHIBITOR', 'NEXUS_TURRET',
       'ELDER_DRAGON']
    
    
    for e in range(0,num_episodes):
        action = []
        
        current_min = StartMin
        current_state = StartState
        
        
        
        data_e1 = data
    
    
        actions = pd.DataFrame()

        for a in range(0,100):
            
            action_table = pd.DataFrame()
       
            # Break condition if game ends or gets to a large number of mins 
            if (current_state=="WIN") | (current_state=="LOSS") | (current_min==Max_Mins):
                continue
            else:
                if a==0:
                    data_e1=data_e1
                   
                elif (len(individual_actions_count[individual_actions_count['Action']=="+RIFT_HERALD"])==1):
                    data_e1_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="-RIFT_HERALD"])==1):
                    data_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                
                elif (len(individual_actions_count[individual_actions_count['Action']=="+OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+OUTER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-OUTER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INNER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INNER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+BASE_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-BASE_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="+NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='+NEXUS_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='-NEXUS_TURRET']
                
                       
                else:
                    data_e1 = data_e1
                    
                # Break condition if we do not have enough data    
                if len(data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)])==0:
                    continue
                else:             

                    if (a>0) | (StartAction is None):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    else:
                        current_action =  StartAction


                    data_e = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)&(data_e1['Action']==current_action)]

                    data_e = data_e[data_e['GivenProb']>0]





                    data_e = data_e.sort_values('GivenProb')
                    data_e['CumProb'] = data_e['GivenProb'].cumsum()
                    data_e['CumProb'] = np.round(data_e['CumProb'],4)


                    rng = np.round(np.random.random()*data_e['CumProb'].max(),4)
                    action_table = data_e[ data_e['CumProb'] >= rng]
                    action_table = action_table[ action_table['CumProb'] == action_table['CumProb'].min()]
                    action_table = action_table.reset_index()


                    action = current_action
                    next_state = action_table['End'][0]
                    next_min = current_min+1


                    if next_state == "WIN":
                        step_reward = 10*(gamma**a)
                    elif next_state == "LOSS":
                        step_reward = -10*(gamma**a)
                    else:
                        step_reward = -0.005*(gamma**a)

                    action_table['StepReward'] = step_reward


                    action_table['Episode'] = e
                    action_table['Action_Num'] = a

                    current_action = action
                    current_min = next_min
                    current_state = next_state


                    actions = actions.append(action_table)

                    individual_actions_count = actions


        actions_output = actions_output.append(actions)
                
        episode_return = actions['StepReward'].sum()

                
        actions['Return']= episode_return
                
        data_output = data_output.merge(actions[['Minute','State','Action','End','Return']], how='left',on =['Minute','State','Action','End'])
        data_output['Return'] = data_output['Return'].fillna(0)    
             
        data_output['V'] = data_output['V'] + alpha*(data_output['Return']-data_output['V'])
        data_output = data_output.drop('Return', 1)
        
        V_outputs = pd.DataFrame({'Episode':[e],'V_total':[data_output['V'].sum()]})
        V_output = V_output.append(V_outputs)
        
                
        if current_state=="WIN":
            outcome = "WIN"
        elif current_state=="LOSS":
            outcome = "LOSS"
        else:
            outcome = "INCOMPLETE"
        outcome = pd.DataFrame({'Epsiode':[e],'Outcome':[outcome]})
        outcomes = outcomes.append(outcome)

        
   


    return(outcomes,actions_output,data_output,V_output)
    

In [36]:

alpha = 0.1
gamma = 0.9
num_episodes = 100
epsilon = 0.1


goldMDP4['Reward'] = 0
reward = goldMDP4['Reward']

StartMin = 15
StartState = 'EVEN'
StartAction = None
data = goldMDP4

Max_Mins = 50
start_time = timeit.default_timer()


Mdl4 = MCModelv4(data=data, alpha = alpha, gamma=gamma, epsilon = epsilon, reward = reward,
                StartMin = StartMin, StartState=StartState,StartAction=StartAction, 
                num_episodes = num_episodes, Max_Mins = Max_Mins)

elapsed = timeit.default_timer() - start_time

print("Time taken to run model:",np.round(elapsed/60,2),"mins")
print("Avg Time taken per episode:", np.round(elapsed/num_episodes,2),"secs")
Time taken to run model: 0.64 mins
Avg Time taken per episode: 0.38 secs

In [37]:

Mdl4[1].head(10)

Out[37]:

 indexMinuteStateActionEndCounterProb_xProb_yGivenProbNext_MinuteRewardVCumProbStepRewardEpisodeAction_Num
0131915EVEN+RIFT_HERALDEVEN260.0001060.0001300.81250016001.0000-0.00500000
0153016EVEN-OUTER_TURRETEVEN1190.0004840.0006790.71257517001.0000-0.00450001
0173117EVEN-DRAGONSLIGHTLY_AHEAD40.0000160.0002200.07407418000.0926-0.00405002
0197818SLIGHTLY_AHEAD+DRAGONSLIGHTLY_AHEAD350.0001420.0002360.60344819001.0000-0.00364503
0223219SLIGHTLY_AHEADNONESLIGHTLY_AHEAD2320.0009440.0012610.74838720001.0000-0.00328004
0242020SLIGHTLY_AHEAD+BASE_TURRETAHEAD20.0000080.0000120.66666721001.0000-0.00295205
0255521AHEAD-INNER_TURRETSLIGHTLY_AHEAD10.0000040.0000160.25000022000.2500-0.00265706
0290122SLIGHTLY_AHEAD-BASE_TURRETAHEAD10.0000040.0000041.00000023001.0000-0.00239107
0303823AHEAD-INNER_TURRETAHEAD50.0000200.0000201.00000024001.0000-0.00215208
0330924AHEAD-KILLSAHEAD1070.0004350.0006350.68589725001.0000-0.00193709

So we now have the values of each state action pair, we can use this to select the single next best step as being the action with the highest value given the current state.

In [38]:

final_output = Mdl4[2]

final_output2 = final_output[(final_output['Minute']==StartMin)&(final_output['State']==StartState)].sort_values('V',ascending=False).head(10)
final_output2

Out[38]:

 MinuteStateActionEndCounterProb_xProb_yGivenProbNext_MinuteRewardV
185315EVEN-DRAGONEVEN510.0002080.0002520.8225811600.200944
206615EVEN-INNER_TURRETEVEN140.0000570.0000730.7777781600.145974
303415EVEN+DRAGONEVEN490.0001990.0003050.6533331600.103227
256915EVEN-OUTER_TURRETEVEN1330.0005410.0007690.7037041600.085860
749415EVEN+OUTER_TURRETSLIGHTLY_AHEAD270.0001100.0006550.1677021600.056043
695015EVEN+INNER_TURRETSLIGHTLY_AHEAD30.0000120.0000610.2000001600.013789
767915EVEN-INNER_TURRETSLIGHTLY_BEHIND30.0000120.0000730.1666671600.011424
766915EVEN-RIFT_HERALDSLIGHTLY_BEHIND60.0000240.0001460.1666671600.003259
210915EVEN+KILLSEVEN3820.0015540.0020180.7701611600.001017
855215EVEN+KILLSSLIGHTLY_BEHIND600.0002440.0020180.1209681600.000975

In theory, we are now ready to run the model for as many episodes as possible so that the results will eventually converge to an optimal suggestion. However, it seems that we will need a lot of epsiodes with this current method to get anything close to convergence.

In [39]:

sum_V_episodes = Mdl4[3]

plt.plot(sum_V_episodes['Episode'],sum_V_episodes['V_total'])
plt.xlabel("Epsiode")
plt.ylabel("Sum of V")
plt.title("Cumulative V by Episode")
plt.show()

Model Improvements

There are many ways we can improve this, one solution would be to greatly simplify the problem by breaking the game into segments. Often these segments are referred to as early, mid and late game and would be we would only need to consider fewer steps to reach an end goal. In this case it would not be to lead to a win but rather aiming to be at an advantage and the end of, say, 10 minute intervals.

Another solution would be to use a model that learns quicker instead of the Monte Carlo method used. This often includes Q-learning or SARSA.

We will not consider doing this here as they would require a lot of work to re-adjust the code. Instead, we will improve the rate the model learns by the way it selects its actions. Currently, we can either define the first action or it will choose randomly which means it is. However, after this first action all subsequent actions are also chosen randomly which is causing the number of episodes needed to exponentially increase for convergence to occur.

Therefore, we will introduce a basic action selection method for these known as greedy selection. This means we select the best action the majority of the time therefore continually testing the success rate of the actions it thinks are best. It will also randomly select actions some of the time so that it can still explore states and doesnt get caught in a local maximum and not the optimal sulution.

Also parameters play a key part in how quickly the output will converge, none moreso than our alpha parameter. Although a small alpha will be more accurate, a larger alpha value will learn and therefore subsequently converge faster.

We will adjust our code to utilise both greedy selection and a reasonably large alpha as well as running for as many episodes as possible. More episodes means the runtime will be longer but because, at least at this stage, we are not attemping to have this running in real time there is no issue with it running for a while to find an optimal suggestion.

It takes approximately 10 mins to run the model for 1,000 episodes, we have also included a tracking system for the run progress which shows the current percentage (based on which episode) the loop is in. This is particuarly useful if running for anything more than a few minutes.

It was at this stage, I also notice that my method for applying the update rule was overriding previous knowledge if the action wasnt used in the current episode and so all results were converging back to zero after each episode. I fixed this by making it so if the state + action wasnt used in the episode, then it simply remians the same and will show in our results as being the flat parts of the lines.

Lastly, we output the value of each of our available actions given our start state in each episode so we can track the learning process for these and how, over many episodes, our optimal action is decided.

In [40]:

from IPython.display import clear_output

In [41]:

def MCModelv5(data, alpha, gamma, epsilon, reward, StartState, StartMin, StartAction, num_episodes, Max_Mins):
    
    # Initiatise variables appropiately
    
    data['V'] = 0
    data_output = data
    
    outcomes = pd.DataFrame()
    episode_return = pd.DataFrame()
    actions_output = pd.DataFrame()
    V_output = pd.DataFrame()
    
    
    Actionist = [
       'NONE',
       'KILLS', 'OUTER_TURRET', 'DRAGON', 'RIFT_HERALD', 'BARON_NASHOR',
       'INNER_TURRET', 'BASE_TURRET', 'INHIBITOR', 'NEXUS_TURRET',
       'ELDER_DRAGON']
        
    for e in range(0,num_episodes):
        clear_output(wait=True)
        
        action = []
        
        current_min = StartMin
        current_state = StartState
        
        
        
        data_e1 = data
    
    
        actions = pd.DataFrame()

        for a in range(0,100):
            
            action_table = pd.DataFrame()
       
            # Break condition if game ends or gets to a large number of mins 
            if (current_state=="WIN") | (current_state=="LOSS") | (current_min==Max_Mins):
                continue
            else:
                if a==0:
                    data_e1=data_e1
                   
                elif (len(individual_actions_count[individual_actions_count['Action']=="+RIFT_HERALD"])==1):
                    data_e1_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="-RIFT_HERALD"])==1):
                    data_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                
                elif (len(individual_actions_count[individual_actions_count['Action']=="+OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+OUTER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-OUTER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INNER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INNER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+BASE_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-BASE_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="+NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='+NEXUS_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='-NEXUS_TURRET']
                
                       
                else:
                    data_e1 = data_e1
                    
                # Break condition if we do not have enough data    
                if len(data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)])==0:
                    continue
                else:             

                    
                    # Greedy Selection:
                    # If this is our first action and start action is non, select greedily. 
                    # Else, if first actions is given in our input then we use this as our start action. 
                    # Else for other actions, if it is the first episode then we have no knowledge so randomly select actions
                    # Else for other actions, we randomly select actions a percentage of the time based on our epsilon and greedily (max V) for the rest 
                    
                    
                    if   (a==0) & (StartAction is None):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    elif (a==0):
                        current_action =  StartAction
                    
                    elif (e==0) & (a>0):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    
                    elif (e>0) & (a>0):
                        epsilon = epsilon
                        greedy_rng = np.round(np.random.random(),2)
                        if (greedy_rng<=epsilon):
                            random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                            random_action = random_action.reset_index()
                            current_action = random_action['Action'][0]
                        else:
                            greedy_action = (
                            
                                data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)][
                                    
                                    data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)]['V']==data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)]['V'].max()
                                
                                ])
                                
                            greedy_action = greedy_action.reset_index()
                            current_action = greedy_action['Action'][0]
                            
                  
                    
                        
                   
                            
                        
                        
                        
                    


                    data_e = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)&(data_e1['Action']==current_action)]

                    data_e = data_e[data_e['GivenProb']>0]





                    data_e = data_e.sort_values('GivenProb')
                    data_e['CumProb'] = data_e['GivenProb'].cumsum()
                    data_e['CumProb'] = np.round(data_e['CumProb'],4)


                    rng = np.round(np.random.random()*data_e['CumProb'].max(),4)
                    action_table = data_e[ data_e['CumProb'] >= rng]
                    action_table = action_table[ action_table['CumProb'] == action_table['CumProb'].min()]
                    action_table = action_table.reset_index()


                    action = current_action
                    next_state = action_table['End'][0]
                    next_min = current_min+1


                    if next_state == "WIN":
                        step_reward = 10*(gamma**a)
                    elif next_state == "LOSS":
                        step_reward = -10*(gamma**a)
                    else:
                        step_reward = -0.005*(gamma**a)

                    action_table['StepReward'] = step_reward


                    action_table['Episode'] = e
                    action_table['Action_Num'] = a

                    current_action = action
                    current_min = next_min
                    current_state = next_state


                    actions = actions.append(action_table)

                    individual_actions_count = actions
                    
        print("Current progress:", np.round((e/num_episodes)*100,2),"%")

        actions_output = actions_output.append(actions)
                
        episode_return = actions['StepReward'].sum()

                
        actions['Return']= episode_return
                
        data_output = data_output.merge(actions[['Minute','State','Action','End','Return']], how='left',on =['Minute','State','Action','End'])
        data_output['Return'] = data_output['Return'].fillna(0)    
             
            
        data_output['V'] = np.where(data_output['Return']==0,data_output['V'],data_output['V'] + alpha*(data_output['Return']-data_output['V']))
        
        data_output = data_output.drop('Return', 1)

        
        for actions in data_output[(data_output['Minute']==StartMin)&(data_output['State']==StartState)]['Action'].unique():
            V_outputs = pd.DataFrame({'Index':[str(e)+'_'+str(actions)],'Episode':e,'StartMin':StartMin,'StartState':StartState,'Action':actions,
                                      'V':data_output[(data_output['Minute']==StartMin)&(data_output['State']==StartState)&(data_output['Action']==actions)]['V'].sum()
                                     })
            V_output = V_output.append(V_outputs)
        
        if current_state=="WIN":
            outcome = "WIN"
        elif current_state=="LOSS":
            outcome = "LOSS"
        else:
            outcome = "INCOMPLETE"
        outcome = pd.DataFrame({'Epsiode':[e],'Outcome':[outcome]})
        outcomes = outcomes.append(outcome)

        
   


    return(outcomes,actions_output,data_output,V_output)
    

In [42]:

alpha = 0.3
gamma = 0.9
num_episodes = 1000
epsilon = 0.2


goldMDP4['Reward'] = 0
reward = goldMDP4['Reward']

StartMin = 15
StartState = 'EVEN'
StartAction = None
data = goldMDP4

Max_Mins = 50
start_time = timeit.default_timer()


Mdl5 = MCModelv5(data=data, alpha = alpha, gamma=gamma, epsilon = epsilon, reward = reward,
                StartMin = StartMin, StartState=StartState,StartAction=StartAction, 
                num_episodes = num_episodes, Max_Mins = Max_Mins)

elapsed = timeit.default_timer() - start_time

print("Time taken to run model:",np.round(elapsed/60,2),"mins")
print("Avg Time taken per episode:", np.round(elapsed/num_episodes,2),"secs")
Current progress: 99.9 %
Time taken to run model: 11.53 mins
Avg Time taken per episode: 0.69 secs

In [43]:

Mdl5[3].sort_values(['Episode','Action']).head(10)

Out[43]:

 IndexEpisodeStartMinStartStateActionV
00_+BASE_TURRET015EVEN+BASE_TURRET0.000000
00_+DRAGON015EVEN+DRAGON0.000000
00_+INNER_TURRET015EVEN+INNER_TURRET0.000000
00_+KILLS015EVEN+KILLS0.000000
00_+OUTER_TURRET015EVEN+OUTER_TURRET0.000000
00_+RIFT_HERALD015EVEN+RIFT_HERALD0.000000
00_-DRAGON015EVEN-DRAGON0.000000
00_-INNER_TURRET015EVEN-INNER_TURRET0.000000
00_-KILLS015EVEN-KILLS0.000000
00_-OUTER_TURRET015EVEN-OUTER_TURRET-0.418229

We have also changed our sum of V output to only be for the possible actions from our start state as these are the only ones we are currently concerned with.

In [44]:

V_episodes = Mdl5[3]

plt.figure(figsize=(20,10))

for actions in V_episodes['Action'].unique():
    plot_data = V_episodes[V_episodes['Action']==actions]
    plt.plot(plot_data['Episode'],plot_data['V'])
plt.xlabel("Epsiode")
plt.ylabel("V")
plt.title("V for each Action by Episode")
#plt.show()

Out[44]:

Text(0.5,1,'V for each Action by Episode')

In [45]:

final_output = Mdl5[2]


final_output2 = final_output[(final_output['Minute']==StartMin)&(final_output['State']==StartState)]
final_output3 = final_output2.groupby(['Minute','State','Action']).sum().sort_values('V',ascending=False).reset_index()
final_output3[['Minute','State','Action','V']]

Out[45]:

 MinuteStateActionV
015EVEN+DRAGON2.166430
115EVEN-KILLS1.403234
215EVEN-INNER_TURRET1.101869
315EVEN-DRAGON0.555246
415EVEN+RIFT_HERALD0.529776
515EVEN+OUTER_TURRET0.218571
615EVEN+BASE_TURRET0.091745
715EVEN+INNER_TURRET0.058387
815EVEN+KILLS-0.118718
915EVEN-RIFT_HERALD-0.294434
1015EVENNONE-0.589646
1115EVEN-OUTER_TURRET-1.191101

In [46]:

single_action1 = final_output3['Action'][0]
single_action2 = final_output3['Action'][len(final_output3)-1]

plot_data1 = V_episodes[(V_episodes['Action']==single_action1)]
plot_data2 = V_episodes[(V_episodes['Action']==single_action2)]

plt.plot(plot_data1['Episode'],plot_data1['V'], label = single_action1)
plt.plot(plot_data2['Episode'],plot_data2['V'], label = single_action2)
plt.xlabel("Epsiode")
plt.ylabel("V")
plt.legend()
plt.title("V by Episode for the Best/Worst Actions given the Current State")
plt.show()

Part 2 Conclusion

We have now fixed our issue highlighted in part one and have a MDP that takes into account cumulative success/failures in a match by defining our current/next states by a gold advtantage/disadvantage.

We have also made a number of improvements to our model but there are many aspects that could be further improved. These will be discussed further in our next part where we will introduce personal preferences to influence the model output.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值