Inspiration

Reverse engineering is needed for operational environments where data is damaged, partial, mixed, or intentionally hostile. In these scenarios, metadata is missing, file boundaries are unclear, and simply classifying a file as a PDF is insufficient. The inspiration was to build a system that goes beyond surface-level classification to support triage, evidence handling, cracking handoff, and bounded recovery under adversarial conditions.

What it does

FractureKey is a blind, encrypted-document recovery project. It takes an unordered pool of byte fragments and reconstructs the necessary structure to recover encrypted PDFs. The system operates "blind," meaning it is not told which fragments belong to the target, the correct fragment order, the password, or even if the target is actually present. Ultimately, it recovers enough syntax to identify PDF object boundaries and encryption dictionaries, and then chooses an attack path to produce a validated plaintext artifact within a specific time and attempt budget.

How we built it

We designed FractureKey as a six-stage pipeline rather than a single-model prediction task. The stages include ingest and provenance, fragment intelligence (using deterministic features and a small learned model), candidate assembly, format inference, crypto orchestration, and finally validation and reporting. Alongside this product pipeline, we built a rigorous, decryption-first benchmark using a fixed corpus of 100 public PDFs from arXiv and the U.S. Supreme Court. This benchmark generates 5,000 instances by mixing different fragmentation scenarios, interference types, encryption profiles, and bounded recovery budgets.

Challenges we ran into

Accomplishments that we're proud of

We are highly proud of building a strict benchmark that genuinely resists gaming. Full success on our primary Track C leaderboard requires exact content alignment with source truth, strict structural PDF validation, zero distractor-token contamination, and successful decryption within an explicit budget. This means the system cannot fake success with partial plaintexts or plausible-looking document shells. We are also proud that we aligned the benchmark outputs directly with the product pipeline outputs, ensuring that our evaluated progress translates into real-world capability.

What we learned

We learned that blind encrypted-document recovery is best treated as a pipeline problem with explicit handoffs, as attempting to collapse everything into one opaque score makes the system hard to debug. We discovered that interference (mixed provenance) is much harder to solve than mere fragmentation. We also found that exact-match validation is necessary to prevent benchmark gaming, as approximate text similarity is not strict enough. Finally, we learned that while a tiny learned model helps organize fragment evidence, the true credibility of the system comes from validation discipline and strict engineering contracts.

What's next for FractureKey

While we have built a credible environment to measure progress, the current system is not yet finished. Preliminary results on a four-case development mini-split revealed clear weaknesses in handoff materialization and attack-path availability that we need to address. The ultimate goal is to refine this controlled environment into a finished product capable of making honest, validated claims about blind document reconstruction.

Built With

Share this project:

Updates