Inspiration

In distributed planetary exploration and modern medicine, moving raw patient database files is highly risky, slow, and breaches data privacy laws (like HIPAA or GDPR). We wanted to build a "weightless" federated analytics pipeline. By keeping data completely decentralized, the Federated Data Pipeline gathers global health telemetry from deep-space outposts (Mars & Titan) without moving a single byte of private patient data.

What it does

Simulates a distributed federated network. It connects remote clinic outposts to an Orbit Station Mothership.

  1. Isolated Outposts: Work as local query engines. They load their raw CSV databases containing private patient names and ID numbers, completely strip out individual identifiers to protect privacy, and aggregate metrics locally.
  2. Orbit Station (Mothership): Operates a central web dashboard. It queries the outposts' secure endpoints, retrieves only the anonymous statistical metadata, and mathematically combines it into global cohort metrics.
  3. Interactive Dashboard: Features a premium dark-mode, glassmorphic UI representing an orbital station deck, showing live telemetry logs, system alerts, and network health.

How we built it

The system is built entirely using standard Python 3 and minimal libraries for instant out-of-the-box execution: -Local Outposts (probe_worker.py): Built with Flask and powered by Pandas and NumPy for fast matrix aggregations. The worker immediately filters out the citizen_id and full_name columns from the CSV data, generating PII-free statistical summaries.

  • Mothership Coordinator (mothership_server.py): Built with Flask and standard Python urllib to make non-blocking outbound requests to nodes on ports 5001 and 5002.
  • Dashboard Interface (templates/index.html): Developed using modern HTML5, custom Vanilla CSS3 (glassmorphic styling, neon status glows, loading shimmer animations), and Vanilla JavaScript (ES6) to fetch data asynchronously without page reloads.

The Aggregation Mathematics (LaTeX)

To calculate exact system-wide statistics without raw records, the Mothership performs weightless combining formulas.

Challenges we faced

  • Mathematical Accuracy: Combining statistical bounds (like minimums, maximums, and averages) correctly without having direct access to raw rows.
  • Resilient Pipeline States: Designing the dashboard to support "degraded mode" (where only one outpost is online) or "offline mode" gracefully, updating state indicators dynamically without crashing.

Accomplishments we're proud of

  • True PII Protection: Auditing the transaction console logs on the dashboard confirms that absolutely no patient names or citizen IDs ever pass into the network.
  • Premium User Experience: Creating a high-fidelity space telemetry aesthetic with fully responsive designs, hover states, and smooth animations.

What we learned

  • How to structure decentralized federated query flows.
  • How to coordinate multi-port network servers using clean, minimal dependencies.

What's next

  • Differential Privacy: Injecting mathematical Laplace noise into local statistical outputs to prevent database reconstruction attacks.
  • Secure Multiparty Computation (SMPC): Incorporating cryptographic keys to secure outposts.

Built With

Share this project:

Updates