DetectiveAI

Inspiration

With AI-generated content becoming increasingly human-like, we were curious: can people still spot the difference? We wanted to use Devvit to create a fun and engaging way to explore this question while also benchmarking data to evaluate how human-like the content different AI models can generate are becoming. We're redefining how AI models are evaluated—by introducing a human-introspective dimension to performance assessment.

What it does

DetectiveAI is an interactive game where users are shown real photographs and AI-generated images, and their goal is to identify which ones are AI-generated.

Every 6 hours we automatically upload a new post with images from a category not seen yet!

To play the game, simply find a post with a category that you find interesting, use the previous/next buttons to flip through each image, select the ones you think are AI generated, and press submit on the last image! It will tell you which images you correctly selected as AI, and how many other people also selected each image as AI.

Behind the scenes, the game logs these scores based on accuracy and simultaneously aggregates data to benchmark the realism of different AI models, showing how convincing each model really is.

How we built it

We built the frontend using Reddit’s native Devit components. We store key data using Redis about each post including the image category, how many total attempts have been made, and how many people selected each image as AI. All of the frontend logic, including dynamic image selecting and displaying the results was handled with Javascript.

We implemented a Job Scheduler to automatically schedule new posts every 6 hours on new image categories that have not been posted yet.

Our backend was built with modularity and scalability in mind using Python. We used several API sources in order to generate images from different models, including images models like Imagen3, Flux, and Recraft, as well as text models like DeepSeek-R1, Llama, and Qwen. It also supports a variety of storage methods, such as storing the generated files locally or putting them onto a database like Cloudflare’s R2.

Challenges we ran into

We initially tried using React and Webview for our frontend but chose to use the native Devvit components as it allowed us to display images without a preview, which we thought would help draw in more users.

We also ran into the problem of how to load our images from the backend onto the frontend. We initially tried loading the images from an R2 instance, but found that the Devvit image component couldn’t load from there. We solved this problem by uploading our images to the assets folder, and having the backend create an ImageData.json file that contained all image categories and their locations in the assets folder, which was read by the frontend to know what categories and images were available.

Accomplishments that we're proud of

Organizing Image Generation APIs and getting images from multiple models Able to auto-schedule posts, we automatically create a new post every 6 hours with images from a new category! Analytics feature that aggregates user guesses per image, allowing us to see what percent of people also thought an image was AI. It gives us a newfound perspective of model realism and what kinds of images look convincing.

What's next for DetectiveAI

Broaden User Reach: Launch outreach campaigns across subreddits and other social platforms to increase user engagement and collect a more extensive benchmark dataset. Scale up backend infrastructure: upgrade backend to support higher throughput Enable Custom Model Evaluation: Allow users to upload/select AI models to generate content and thus run benchmark tests to evaluate model realism.