POMDP formulation (state, observation, action, reward), belief updates from noisy behavioral signals, policy optimization under intervention costs, counterfactual evaluation using historical logs, safety constraints (no repeated discounting). Must think causally and reason about uncertainty.
Built With
- action
- amazon-web-services
- belief-updates-from-noisy-behavioral-signals
- cli
- counterfactual-evaluation-using-historical-logs
- llm
- observation
- policy-optimization-under-intervention-costs
- python
- react
- reward)
- vectordb

Log in or sign up for Devpost to join the conversation.