Python Image Hashing and Statistic Parsing

Example output
The world to parse
Two types of tiles with similar structure but different tints
Multiple "grass" tiles appear to be the same, but have slight variations in pixels

Inspiration

This program is a tool I wrote to help with a future project. The future project is a 2d, top-down video game as a way to explore procedural generation. My hope is to generate Legend of Zelda, Link's Awakening style worlds completely procedurally.

So, where does this python script come in?

The Legend of Zelda, Link's Awakening map are made of distinct 16x16 pixel tiles. This script takes in screenshots of the existing world maps and calculates statistics for the individual 16x16 tiles, like how often a tile occurs, and probability of a specific tile type neighboring any other given tile type. For example, one can determine the odds a water tile is adjacent to a grass tile.

The Challenge of Hashing Images

Looping over an image to calculate stats isn't terribly hard. The challenge came from hashing images. Hashes are used to compare images very quickly. Two identical 16x16pixel tiles must hash to the same value. Two different tiles must hash to different values.

While two tiles 16x16pixel tiles of the same type may appear to be identical, there are often slight variations between corresponding pixels. In other words, a pixel may not have the exact same RGB value of the same-location pixel on the other tile. These variations may be almost indistinguishable to the human eye, however a computer can easily spot the difference. For this project I needed similar looking tiles to hash to the same value, even if they had small differences.

Thus, I implemented an image processing algorithm called 'difference hashing' to handle possible noise. (See https://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html for more info on difference hashing)

Unfortunately, difference hashing brought forth a new problem. Some tiles look similar in structure, however have different tints. Difference hashing algorithms flag these tiles as the same. To counter this, I offset the difference hash by a factor of the tile's average pixel value (with a slight tolerance).

Future work

My next step is to parallelize this program. This should be fairly trivial, as the world can be divided among each processor, without changing the algorithm. After each processor runs, the statistics can be merged. I plan on using the statistics from many old-school Legend of Zelda maps to define the rules for my own game's world generation. I will seed the world with a few random tile types, then fill in the rest based on the most probable neighbor type.

References

Input images belong to Nintendo

https://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html for an overview of image hashing algorithms

Built With

Updates

Adam Gaia started this project — Oct 06, 2019 02:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.