I'm a computer science student, and as a personal project, I'm interested in building software that can watch and produce useful information about Super Nintendo games being run on a local emulator. This might be things like current health, current score, etc., (anything legible on the screen). The emulator runs in windowed form (I'm using SNES9x) and so I wouldn't need to capture every pixel on the screen, and I'd only have to capture about 30fps.
I've looked into some libraries like FFMPEG and OpenCV, but so far what I've seen leads me to believe I have to have pre-recorded renderings of the game.
At some point, I'd like to explore the capacity for developing a somewhat heuristic AI that might be able to play Super Metroid, but to do so, it would need to be interpreting live gameplay. The algorithms and data structures needed for something like this are within my realms of study; video processing is not, and I'm something of a noob. Any pointers would be awesome (pardon the lame computer science pun).
For those who might point out that it would be simpler to scrape the game memory rather than screen grab data -- yes, it would be. My interest is in developing something that is only given the information a human player would have, i.e., the visuals on the screen, so this is the approach I'm interested in for the time being. Thanks!
解决方案
A: Yes, python can grab & process any scene via a USB-input device
The real-time image ( not stream ... ) processing design issues are about the overall RT-loop performance, mainly the image-transformations & processing, not about just the static image-size and an acquisition method per-se.
Anyway, your code has to be carefully designed and pre-measured in [usec, nsec] ( yes, there are python tools available to allow you to benchmark your code's timing issues down to some 25-nsec resolution ) so as to keep the whole RT-loop feasible within your general image-processing architecture. Plus you will struggle with both resouces management & error-handling, both of which cause a lot of problems in RT-scheduling.
How? Take this as an inspiration to start from
A sample brought just for an initial image-capture idea from a medical imaging PoC python prototype:
def demoCLUT( ):
cameraCapture = cv2.VideoCapture(0)
cv2.namedWindow( 'msLIB:ComputerVision.IN' )
cv2.setMouseCallback( 'msLIB:ComputerVision.IN', onMouse )
cv2.namedWindow( 'msLIB:ComputerVision.OUT-0' )
cv2.namedWindow( 'msLIB:ComputerVision.OUT-1' )
cv2.namedWindow( 'msLIB:ComputerVision.OUT-2' )
success, frame = cameraCapture.read()
if success:
while success and cv2.waitKey( 10 ) == -1 and not clicked: # [msec]
aGrayFRAME = cv2.cvtColor( frame, cv2.COLOR_BGR2GRAY )
cv2.imshow( 'msLIB:ComputerVision.IN', frame )
cv2.imshow( 'msLIB:ComputerVision.OUT-0', aGrayFRAME )
cv2.imshow( 'msLIB:ComputerVision.OUT-1', reprocessIntoFalseCOLORs( aGrayFRAME, frame, aFalseCLUT ) ) # -destructive
cv2.imshow( 'msLIB:ComputerVision.OUT-2', reprocessIntoFalseCOLORs( aGrayFRAME, frame, aFalseCLUT_T ) ) # -destructive
success, frame = cameraCapture.read()
pass
else:
print "OpenCV.CLUT.DEMO: cameraCapture.read() failed to serve a success/frame ... "
pass
# ------------------------------------------------------------------
cameraCapture = False #RELEASE-a-Resource setting it asFalse
print 30 * ">", "call clearWIN() to release & tidy up resources..."
# ------------------------------------------------------------------
Are pre-recorded sequences a must or a nice-to have?
As far as your motivation was expressed, your prototype will use a lot of time for development. There the pre-recorded sequences may help you focus on dev/test side, while your concentration is not split in halves between the game and the python-code, however these are not a must-have.
A remark on FPS. You build AI against a Human-Player
Having said this, your initial AI-engine may start at anything low as 10-15 FPS, no need to get yourself into an unsolvable RT-loop puzzle just due to artificially high FPS rate.
Our human eye / brain tandem gets an illusion of motion somewhere near the TV refresh-rate ( meaning the analog-TV original, where about 21-half-screens were for many decades enough for people ( not the same for dogs ... thus the marketing companies focused on rather influencing humans, measuring their advertising campaings' impact with people-meters and not dog-meters as our best friends did not like at all to watch those strange flashing statics on TV-screens ) ).
So do not over-design the AI-engine to be developped, it shall aim at beating Human-Players, not the dog ones, shan't it?