Inspiration The spark for this project came from observing that modern AI pipelines often break when an LLM endpoint or an MCP server becomes unavailable. In the hackathon tracks we are competing in, resilience and workflow automation are judged as first‑class requirements. We wanted a concrete, repeatable way to turn those failure moments into new, verified skills, and to expose developer‑centric workflows through the Lark CLI/MCP ecosystem. The Crusoe inference layer’s lightning‑fast TTFT and >4 k TPS made it the perfect backbone for a meta‑skill generation loop that can run synchronously without waiting for sluggish model calls.

What it does
1. Resilience Layer – Monitors multiple LLM providers (OpenAI, Claude, local‑LLM) and automatically falls back when any provider errors out. A circuit‑breaker based on the Generative Freshness Indicator (GFI) black‑lists a flaky provider and triggers a fallback. Health‑check scripts (scripts/check_resilience.sh) surface the status as JSON for CI consumption.  
2. Lark Integration – Exposes developer‑focused workflows via the Lark CLI/MCP. Sample workflows (issue verification, PR creation, Linear ticket handling) are defined under workflows/ and can be invoked with scripts/run_lark_workflow.sh. The wrapper makes it trivial to trigger a workflow from Slack, Discord, or any CI event.  
3. Meta‑Skill Generation Loop – When a failure is detected, the pipeline:  
   - extracts a FailureSignature (via proof_by_failure),  
   - derives a minimal GapSpecification (via backward_goal_derivation),  
   - creates a SkillTemplate (via skill_gap_discovery),  
   - instantiates a concrete skill (via skill_acquisition_and_generation),  
   - validates the new skill on the original failing task,  
   - persists the artefact in the vault and updates the GFI.  
   This deterministic, auditable loop turns every error into a new, verified capability.

How we built it
- Infrastructure – The repository lives at /home/kairosia/repo/TriadApp and is initialized as a Git project with a single commit. All tooling (‑scripts, Makefile targets, GitHub Actions CI) is pure bash/Python and uses only open‑source CLI utilities (hermes_execute, lark, pytest).  
- Documentation – Markdown files under docs/ describe the architecture (truefoundry_resilience.md, lark_integration.md), the combined execution flow (integration_summary.md), and the step‑by‑step implementation plan (implementation_plan_resilience_and_lark.md).  
- Automation –  
  - Makefile defines check-resilience, run-lark-workflow, and integration-test targets that together verify both tracks in a single command.  
  - .github/workflows/ci.yml runs the same checks on every push/PR, ensuring continuous validation.  
  - scripts/ contain the health‑probe (check_resilience.sh) and workflow launcher (run_lark_workflow.sh).  
- Skill Encapsulation – All 16 local skills are stored as skills/*.md files; a concise skills/resilience-lark-summary.md maps each skill to the concrete contribution it made (e.g., writing_plans drafted the plan, systematic_debugging handled failure detection, skill_acquisition_and_generation generated this very summary).  

Challenges we ran into
1. Provider‑level unreliability – Some LLM endpoints would intermittently return 504s, causing the circuit‑breaker to fire too early and break the fallback chain. We solved it by adding a max_failures threshold and a reset_period that only clears the blacklist after a cool‑down window.  
2. MCP server lifecycle – The Lark MCP process sometimes exited with a “connection reset” error, causing the health‑check to report failures even when the CLI was functional. Wrapping the MCP in a supervisord‑style watchdog (process tool) and restarting it automatically resolved the flakiness.  
3. Skill‑template compatibility – Early attempts at generating a skill template produced code that lacked required test harnesses, causing validation to fail and triggering endless retry loops. By enforcing a strict template format (header, description, step list, test harness placeholder) and adding a validate_template.py helper, we guaranteed that every generated skill passes its own unit test on first try.  
4. CI race condition – The CI job would sometimes execute the Lark workflow before the resilience health‑check had completed, leading to false negatives. Introducing a dedicated integration-test target that sequentially runs the health check and the workflow eliminated the race.

Accomplishments that we're proud of
- A fully deterministic error‑to‑skill pipeline that can be audited step‑by‑step; every generated skill is stored with a cryptographic hash and appears under 100_Skills_Genealogy/.  
- Multi‑provider resilience that gracefully degrades without dropping the user experience; the health‑check script and circuit‑breaker are production‑ready and are exercised by the CI pipeline.  
  • A complete suite of 16 autopoietic and mutually compounding meta-skills, and more...

What's next for TriadApp - Extend multi‑modal failure handling – add support for runtime, security, and performance anomalies as first‑class failure signatures.
- Dynamic skill market – publish generated skills to a community marketplace where other teams can import and reuse them directly via a skill_fetch command.
- User‑facing dashboard – build a TUI that visualizes GFI trends, provider health, and pending skill generations in real time.
- Cross‑platform orchestration – integrate with additional CLI tools (e.g., Linear, GitHub, Jira) so that skill generation can be triggered from any project management system.
- Open‑source contribution pipeline – set up automated PRs that push newly validated skills to a public GitHub organization, enabling community‑driven growth of the skill graph.

Built With

  • bash
  • crusoe
  • hermes
  • lark
  • md
  • toml
  • truefundry
  • yaml
Share this project:

Updates