deleted deleted

deleted deleted posted an update

Infrastructure & Deployment Stabilization

The initial Cloud Run deployment surfaced several critical issues that required immediate attention:

  • Celery Worker Memory Crashes: Workers were OOM-killed on Cloud Run's default 512 MB limit. All services (API, workers, beat scheduler) were upgraded to 1 Gi memory, and Gunicorn workers were reduced to prevent memory contention in containerized environments.
  • Celery Task Discovery: Workers failed to find tasks due to an incorrect app name configuration in celery.py. Fixed the autodiscover path to match the Django project structure.
  • Cloud Run Job Argument Parsing: The --args flag in Cloud Run uses commas as delimiters, which conflicted with Celery queue names containing commas. Switched to semicolon delimiters with entrypoint parsing to preserve argument integrity.
  • Dependency Upgrades: Upgraded django-celery-beat (2.1.0 → 2.8.1) and django-timezone-field (4.2.3 → 7.2.1) for Python 3.12 compatibility. Removed all AWS legacy references and adapted the full stack to GCP-native services.
  • SSL & CORS Configuration: Resolved staging redirect loops caused by Cloudflare's Flexible SSL mode conflicting with Django's SECURE_SSL_REDIRECT. Configured Full (Strict) SSL mode and aligned ALLOWED_HOSTS, CORS_ALLOWED_ORIGINS, and CSRF_TRUSTED_ORIGINS across environments.
  • Cloud Run Jobs Pipeline: Built a reusable cloud_run_job.sh runner that creates/updates Cloud Run Jobs on-the-fly, enabling one-command scenario seeding to staging and production.

Staging Environment Protection

  • Cloudflare Access: Configured Cloudflare Access Application with email-based policies to restrict staging access to authorized developers only.
  • DNS Proxying: Ensured all staging subdomains are proxied through Cloudflare for DDoS protection and access control.

Frontend Real-Time Improvements

Events & Live Mode Overhaul

The real-time simulation experience received significant fixes to make the live telemetry streaming actually usable:

  • Event Reactivity Fix: The Events Timeline component used BehaviorSubject.getValue() inside Angular computed() signals, which cannot be reactively tracked. Migrated to toSignal() from @angular/core/rxjs-interop so hasActiveRun and isContinuousRun now update reactively.
  • Click-to-Seek: Events in the timeline are now clickable — selecting an event seeks the viewer to the exact simulation frame where the event occurred.
  • Reload Protection: Added proper cleanup and re-initialization when navigating between scenarios, preventing stale telemetry from previous sessions from leaking into new views.
  • Polling Guards: Guarded event polling and telemetry fetches behind authentication checks to prevent unnecessary API calls (and associated costs) for unauthenticated viewers.

Authentication Guards

Several interactive features were exposed to unauthenticated users, causing redirect errors when the API returned 401/403:

  • Feedback Form: Wrapped in auth check — unauthenticated users see a snackbar with a "Login" action instead of a broken form.
  • Resume Continuous Mode: The "Go Live" button now checks auth before attempting to resume real-time simulation.
  • Project Future Button: The trajectory projection feature (PROJECT FUTURE +15min) now requires authentication — unauthenticated users receive a descriptive snackbar prompt.
  • Pattern: All guards use the same consistent pattern: auth.isAuthenticated$.pipe(take(1)) → snackbar with Login action → router.navigate(['/login']).

API Contract Alignment

  • Events Interface Fix: Renamed min_separation_lt to min_separation_km_lt across the full stack (backend serializer, frontend API service, and component queries) to match the actual backend filter parameter.
  • Graceful 503 Fallback: Frontend now handles 503 Service Unavailable responses gracefully (e.g., when Celery workers are temporarily down) instead of showing raw error screens.

Simulation Engine Fixes

False Collision Events

The proximity detection system was generating false COLLISION_IMPACT and SURFACE_IMPACT events in several scenarios:

  • Ring Systems: Saturn's rings, classified as RING_SYSTEM, were triggering collision events with moons passing through them. Fixed by adding RING_SYSTEM to the is_visual_only() category filter in ProximityService.
  • NRHO Gateway Station: The Lunar Gateway in the cislunar scenario generated false impacts. Adjusted proximity thresholds for NRHO (Near Rectilinear Halo Orbit) entities.
  • Barycenter Collisions: In the Three-Body Choreography scenario, bodies passing through the coordinate-frame barycenter triggered false collisions. Fixed by introducing a skip_proximity_check flag in entity logical_properties and extending ProximityService to honor it. Also added container category (GALAXY, STAR_SYSTEM, UNIVERSE) exclusion from proximity checks.
  • External ID Validation: Fixed scenario_09_planetary_defense where an incorrect external_id caused entity lookup failures during catalog hydration.

Numerical Stability

  • Three-Body Choreography (Scenario 16): The original 1-day time step was far too coarse for the figure-8 choreographic solution, causing RK4 integrator divergence after ~4.3 simulated years. Reduced step size and tuned duration for stable propagation across multiple orbital periods.

Scenario Step Timing Audit

Performed a comprehensive audit of all 16 scenario templates to ensure smooth visualization. Scenarios with low step counts (< 500 frames) produce "jumpy" animations.

Scenario Consolidation & Expansion

Consolidation: 29 → 16 Templates

The original codebase contained 29 scenario files, many of which were duplicates, incomplete prototypes, or absorbed into other scenarios. A full audit consolidated these into 16 production-ready templates, each with validated physics configurations and proper documentation.

Current Scenario Catalog

# Scenario Duration Physics
01 Real-Time Simulation Continuous Kepler
02 LEO Operations (TLEs) 6 hours J2 + Atmo
03 Earth-Moon Cislunar 30 days Cowell N-Body
04 Inner Solar System 2 years Kepler
05 JWST at Lagrange L2 1 year Cowell N-Body
06 Outer Solar System 165 years Kepler
07 Voyager — Furthest Spacecraft 50 years Kepler
08 Solar Dynamics 25 years Kepler
09 Planetary Defense 1 year J2
10 Space Hazards (Debris) 6 hours J2 + Atmo + Mag
11 TRAPPIST-1 Exoplanet System 20 days Cowell N-Body
12 Alpha Centauri System 80 years Cowell N-Body
13 Galactic Center (Sgr A*) 20 years Cowell N-Body
14 Local Group (Galaxy Collisions) 1.5 Byr Cowell N-Body
15 Stellar Evolution (Sirius AB) 50 years Cowell N-Body + Mag
16 Three-Body Choreography 6 years Cowell N-Body

Backend Architecture Improvements

  • Django Admin Registration: Registered all simulator and core models in Django admin for operational visibility and manual data management.
  • Timezone Configuration: Set TIME_ZONE = 'UTC' globally and suppressed ErfaWarning for dubious year calculations in Astropy (common in deep-time scenarios like Local Group).
  • Email Configuration: Migrated from console email backend to a production-grade SMTP provider for staging/production transactional emails.
  • Internationalization: Generated Django locale translations (pending review) for future multi-language support.

Testing & Coverage

  • Full backend test suite maintained across all changes (unit + integration + e2e)
  • Dedicated proximity/collision test coverage validating the event detection pipeline: visual-only entity filtering, container exclusion, skip_proximity_check flag, threshold calculations, and event classification

Log in or sign up for Devpost to join the conversation.