Last time, we moved all constant values used for weighting different moves and goals into a single class called a DispositionProfile. Going a step forward, a collection of these objects was created called a DispositionProfileSet, which used a different DispositionProfile based on the state of the game. In essence giving it different behaviors based on whether it was winning, losing, or trying to expand.
There was one glaring problem however, I was picking the actual numbers. Bringing all my own biases and expectations on how things should work. Instead I wanted these numbers to be based on how things actually worked. To do this I turned to using a Genetic Algorithm to compute the best numbers to use.
The general idea was pretty simple:
- Create a population of DispositionProfileSets initialized with random values
- Play a game with each DispositionProfileSet in the population
- Give each DispositionProfileSet a score based on how well it performed
- Create a new population by:
- Randomly choose 2 "parents" from the current generation with the chance of selection being based on their score
- Each value in the new "child" DispositionProfileSet is computed as the average of its parents
- Randomize 10% of each value further by +/- 50% (aka Mutation)
- Repeat the above sets for several population sets (Generations)
Given a large enough population, each generation will slowly become better at the given task (measured by the score each DispositionProfileSet generates). I had built my game engine such that visuals were unnecessary, allowing me to run batches of games as I always had intended on trying to "evolve" the AI through the use of a Genetic Algorithm such as this.
I settled on a population size of 30, and would have each DispositionProfileSet play against last week's AI George. I set the game to end in a loss after 40 turns. Each AI player would gain 1 point for each turn it survived in a loss. For a win, the AI would score 50 + 1 point for each turn prior to 40 it achieved victory. The idea was to generate an AI that would win as quickly as possible.
I then started up the simulator and waited for the magic to happen. Early AIs would lose in under 20 turns, but with each generation the number got larger. Each game took 1-2 minutes to run in batch mode, such that each generation took about 90 minutes to complete. I went to bed with the simulator running, hoping to see some wins the next day.
The next day showed that the average AI player was now losing at the maximum of 40 turns, but not a win in sight. To try to address this, I made several small tweaks.
- I set the parent selection to use the squared score, giving even greater chance for the high scoring AI players to propagate into the next generation
- I added 5 new, completely random AI players into each population in the hopes of introducing some extra diversity into the population to keep it from stagnating
- I had each AI player play 2 games, one as player 1, the other as player 2, attempting to eliminate any advantage or bias of playing one side over the other
Several long days later, there was no change. Regardless of the various tweaks, the population would stagnate at hover at the best score for losing a game, 41. After 100's of generations, not a single win.
After each generation, I would save out each DispositionProfileSet to disk, so I loaded up one of the most recently "evolved" players into the game viewer to take a look. What I found was pretty surprising, the AI player would build 3-4 units, and then just sit there, passing each turn without action. This seemed to work, as through some strange quirk in his logic, George would never identify the AI as a threat and was content to simply build units and hang out on his side of the board.
Needless to say, I was pretty upset. I had already left this simulator running day and night for the last week, completely neglecting Diablo 3. I had nothing to show for it. In trying to brainstorm where I went wrong, I concluded that the fitness test I used to score each AI player was the issue. Afterall, the AI evolved to do exactly what I asked it to do, survive as long as possible, and it accomplished this by adopting a completely passive strategy.
I decided to change up the way I scored each game. This time I would base it not on winning or losing, but on maximizing resources, figuring that an AI that managed its resources the best would also be good at winning. So the updated scoring became:
+1 point awarded for each Gold gained
+3 points for each Mana gained (a ratio already partially established in the game)
+points equal to the cost of each enemy unit killed
-points equal to the cost of each friendly unit lost
I started each unit with 200 points to represent the amount of gold each side began with, and then divided each AI player's score by the number of turns the game lasted. I ran George through the simulator against himself to get a baseline score (George scored a respectable 51 points). Then the simulator was started up from the beginning once again.
Early scores were low (between 2 and 20) but ramped up quickly from there. I ran it through 30 generations and starting seeing regular scores in the 80's and even a few into the 90's. I ran the top 5 scoring AI players up against George in visual mode. Each AI player won handily against George.
Comparing the evolved numbers against those I derived myself in George shows a few interesting bits (you can see the numbers yourself here). The AI player has assigned negative values across the board for moving wounded units onto settlements to be healed for example, finding higher value in attacking with the units even when they are close to being eliminated. Such values that go against my own common sense can have a few explanations, all of which are interesting:
- At only 30 generations, I just did not run the simulation long enough for all numbers to converge
- The Units and their abilities are just not balanced properly
- Certain game systems (such as healing on settlements) does not have enough value to be worthwhile
- A certain situation or value did not get exercised enough during the simulation to converge towards a maximum
- My AI logic is flawed/bugged
- My AI contains unnecessary logic
This makes this tool a lot more interesting than simply generating a good AI player. It can be used to identify potentially unbalanced game systems and units and test the game itself. The fact that the simulation ran without a hitch for 1000's of games without interruption gives me a good feeling about the stability of my code. I do wish I had run it through a profiler, as 1-2 minutes per game seems excessive, and is likely something that could be easily remedied.
Initially I had a lot of expectations around the results of this phase of the project. Initially they were crushed, but in the end I am really glad for building in the game batching and genetic algorithm, as it will give me many useful tools throughout the rest of this game project. The AI I ended up with is unlikely to be the final version. As game systems change and get added, I will certainly be running the simulation again.
I can also build out different AI behavior sets to address different factions, leaders, and even certain problematic maps. Of all the steps I have taken during this project, none feels more right or correct than having all the weights and values used in the AI calculations separated out into a single central class like the DispositionProfileSet I have been using. This is a concept I would very much recommend to others working on similar projects. The ability to then use such a class in a Genetic Algorithm makes it all the more valuable.
Check out this week's evolved AI Henry take on George in this episode's video: