this post was submitted on 02 Jun 2024

52 points (93.3% liked)

Programming

17378 readers

237 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 1 year ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

training a neural network to play a bullet hell game (programming.dev)

submitted 5 months ago by zolax@programming.dev to c/programming@programming.dev

14 comments fedilink hide all child comments

neural network is trained with deep Q-learning in its own training environment
controls the game with twinject

demonstration video of the neural network playing Touhou (Imperishable Night):

it actually makes progress up to the stage boss which is fairly impressive. it performs okay in its training environment but performs poorly in an existing bullet hell game and makes a lot of mistakes.

let me know your thoughts and any questions you have!

top 14 comments

sorted by: hot top controversial new old

[–] namingthingsiseasy@programming.dev 10 points 5 months ago (1 children)

This is quite cool. I always find it interesting to see how optimization algorithms play games and to see how their habits can change how we would approach the game.

I notice that the AI does some unnatural moves. Humans would usually try to find the safest area on the screen and leave generous amounts of space in their dodges, whereas the AI here seems happy to make minimal motions and cut dodges as closely as possible.

I also wonder if the AI has any concept of time or ability to predict the future. If not, I imagine it could get cornered easily if it dodges into an area where all of its escape routes are about to get closed off.

[–] zolax@programming.dev 7 points 5 months ago* (last edited 5 months ago)

I always find it interesting to see how optimization algorithms play games and to see how their habits can change how we would approach the game.

me too! there aren't many attempts at machine learning in this type of game so I wasn't really sure what to expect.

Humans would usually try to find the safest area on the screen and leave generous amounts of space in their dodges, whereas the AI here seems happy to make minimal motions and cut dodges as closely as possible.

yeah, the NN did this as well in the training environment. most likely it just doesn't understand these tactics as well as it could so it's less aware of (and therefore more comfortable) to make smaller, more riskier dodges.

I also wonder if the AI has any concept of time or ability to predict the future.

this was one of its main weaknesses. the timespan of the input and output data are both 0.1 seconds - meaning it sees 0.1 seconds into the past to perform moves for 0.1 seconds into the future - and that amount of time is only really suitable for quick, last-minute dodges, not complex sequences of moves to dodge several bullets at a time.

If not, I imagine it could get cornered easily if it dodges into an area where all of its escape routes are about to get closed off.

the method used to input data meant it couldn't see the bounds of the game window so it does frequently corner itself. I am working on a different method that prevents this issue, luckily.

[–] 100@fedia.io 8 points 5 months ago (1 children)

one problem ive seen with these game ai projects is that you have to constantly tweak it and reset training because it eventually ends up in a loop of bad habits and doesnt progress

so is it even possible to complete such a project with this kind of approach as it seems to take too much time to get anywhere without insane server farms?

[–] zolax@programming.dev 3 points 5 months ago (1 children)

one problem ive seen with these game ai projects is that you have to constantly tweak it and reset training because it eventually ends up in a loop of bad habits and doesnt progress

you're correct that this is a recurring problem with a lot of machine learning projects, but this is more a problem with some evolutionary algorithms (simulating evolution to create better-performing neural networks) where the randomness of evolution usually leads to unintended behaviour and an eventual lack of progression, while this project instead uses deep Q-learning.

the neural network is scored based on its total distance between every bullet. so while the neural network doesn't perform well in-game, it does actually score very good (better than me in most attempts).

so is it even possible to complete such a project with this kind of approach as it seems to take too much time to get anywhere without insane server farms?

the vast majority of these kind of projects - including mine - aren't created to solve a problem. they just investigate the potential of such an algorithm as a learning experience and for others to learn off of.

the only practical applications for this project would be to replace the "CPU" in 2 player bullet hell games and maybe to automatically gauge a game's difficulty and programs already exist to play bullet hell games automatically so the application is quite limited.

[–] 100@fedia.io 2 points 5 months ago (1 children)

i mean if you could in the future make an ai play long games from start to finish, it would be very useful to test games with thousands running at once

[–] zolax@programming.dev 1 points 5 months ago

definitely. usually algorithms are used to calculate the difficulty of a game (eg. in osu!, a rhythm game) so there's definitely a practical application there

[–] drspod@lemmy.ml 7 points 5 months ago (1 children)

So the training environment was not Touhou? So what does the training environment look like? I'd be interested to see that, and how it improved over time.

[–] zolax@programming.dev 8 points 5 months ago* (last edited 5 months ago) (1 children)

yeah, the training environment was a basic bullet hell "game" (really just bullets being fired at the player and at random directions) to teach the neural network basic bullet dodging skills

the white dot with 2 surrounding squares is the player and the red dots are bullets
the data input from the environment is at the top-left and the confidence levels for each key (green = pressed) are at the bottom-left
the scoring system is basically the total of all bullet distances

this was one of the training sessions
the fitness does improve but stops improving pretty quickly
the increase in validation error (while training error decreased) is indicated overfitting
- it's kinda hard to explain here but basically the neural network performs well with the training data it is trained with but doesn't perform well with training data it isn't (which it should also be good at)

[–] Cirk2@programming.dev 3 points 5 months ago (1 children)

That's an interesting approach. The Traditional way would be to go by game score like the AI Mario Projects. But I can see the value in prioritizing Bullet Avoidance over pure score.

Does your training Environment Model that shooting at enemies (eventually) makes them stop spitting out bullets? I also would assume that total survival time is a part of the score, otherwise the Boss would just be a loosing game score wise.

[–] zolax@programming.dev 1 points 5 months ago

the training environment is pretty basic right now so all bullets shoot from the top of the screen with no enemy to destroy.

additionally, the program I'm using to get player and bullet data (twinject) doesn't support enemy detection so the neural network wouldn't be able to see enemies in an existing bullet hell game. the character used has a wide bullet spread and honing bullets so the neural network inadvertently destroys the enemies on screen.

the time spent in each training session is constant rather than dependent on survival time because the scoring system is based on the total bullet distance only.

[–] taaz@biglemmowski.win 4 points 5 months ago* (last edited 5 months ago) (1 children)

A bit offtopic but either way, have you ever tried applying NN agents on games with incomplete information, card games with opponents and alike?

[–] zolax@programming.dev 3 points 5 months ago

I did create a music NN and started coding an UNO NN, but apart from that, no

[–] shasta@lemm.ee 2 points 5 months ago (1 children)

Wouldn't a regular bot be easier to make and perform better?

[–] zolax@programming.dev 3 points 5 months ago

currently, yes, but this is more an investigation into how well a neural network could play a bullet hell game

very few bullet hell AI programs rely on machine learning and virtually all of the popular ones use algorithms.

but it is interesting to see how it mimics human behaviour, skills and strategies and how different methods of machine learning perform and why

(plus I understand machine learning more than the theory behind those bullet hell bots.)