POMDP formulation (state, observation, action, reward), belief updates from noisy behavioral signals, policy optimization under intervention costs, counterfactual evaluation using historical logs, safety constraints (no repeated discounting). Must think causally and reason about uncertainty.

Built With

  • action
  • amazon-web-services
  • belief-updates-from-noisy-behavioral-signals
  • cli
  • counterfactual-evaluation-using-historical-logs
  • llm
  • observation
  • policy-optimization-under-intervention-costs
  • python
  • react
  • reward)
  • vectordb
Share this project:

Updates