Inspiration
Feature flags are meant to be temporary, but in real-world codebases they often become permanent. Over time, teams forget why a flag exists, how risky it is, or whether it is still needed. Existing tools can list flags, but they do not explain which flags are dangerous today or why an engineer should care. I built FlipTheFlags to answer a very important question: “Which feature flags should I worry about right now, and why?”
What it does
FlipTheFlags scans a repository to identify feature flags and analyzes how they are used across the codebase. For each flag, it classifies risk into one of three categories:
- Danger – Flags that control critical runtime behavior
- Needs Fixing – Flags with unclear or unresolved intent
- Obsolete / Remove – Flags that are hardcoded and no longer serve a purpose
Each flag includes a clear, concise explanation written like a senior engineer’s code review comment. The tool also surfaces how many lines of code depend on each flag, helping engineers understand the potential blast radius before making changes.
How we built it
The project is implemented as a lightweight Python analysis tool. It combines static scanning to extract flags and dependency information with Gemini-powered reasoning for classification and explanation. The architecture separates signal extraction from reasoning, making the system reliable, testable, and scalable to large repositories.
Challenges we ran into
The biggest challenge was avoiding false confidence. Feature flag analysis is inherently contextual, so the tool is designed to surface risk and reasoning without automatically modifying code. This reinforced an important lesson that developer trust comes from clarity, not automation.
Accomplishments that we're proud of
- Built a working end to end feature flag analysis tool within a short hackathon timeframe.
- Designed a system that reasons about feature flags instead of relying on static rules.
- Successfully integrated Gemini 3 to generate human readable explanations similar to senior code review comments.
- Implemented dependency analysis to show how many lines of code are affected by each flag.
- Created a deterministic fallback mode to ensure reliable output even when API limits are reached.
What we learned
- Feature flags accumulate technical debt silently and require contextual reasoning rather than simple heuristics.
- Large language models like Gemini are most effective when used for reasoning and explanation, not automation.
- Clear developer-facing explanations build more trust than aggressive refactoring or enforcement.
- Designing for graceful degradation (fallback reasoning) is critical when working with API-based systems.
What's next for FlipTheFlags
- Expand support for additional programming languages and configuration based flags.
- Improve dependency analysis to better approximate execution paths and blast radius.
- Integrate with CI pipelines to surface risky flags during code review.
- Add historical analysis to track how flag risk evolves over time as the codebase changes.
Log in or sign up for Devpost to join the conversation.