Partially Observable Monte-Carlo Graph Search

January 4, 2026 · View on GitHub

Partially Observable Monte-Carlo Graph Search

Overview

This repository contains the Julia implementation of the ICAPS 2025 paper:

Partially Observable Monte-Carlo Graph Search
Yang You, Vincent Thomas, Alex Schutz, Robert Skilton, Nick Hawes, Olivier Buffet

The repository implements POMCGS, an offline planning algorithm that performs Monte-Carlo Graph Search for POMDPs and stores policies as finite-state controllers (FSCs).

Code Structure

File	Description
`./POMCGraphSearch.jl`	Main POMCGS algorithm
`./ModelWrapper.jl`	Model wrapper for continuous domains
`./Planner.jl`	General POMCGS planner implementation
`./FSC.jl`	Defines the FSC data structure used during planning
`./Qlearning.jl`	Implements the MDP heuristic method
`./Utils.jl`	Utility functions

Getting Started

Recommended Setup

Important: Before using POMCGraphSearch, we recommend creating a new Julia environment to avoid dependency conflicts:

using Pkg
Pkg.activate("ExamplePOMCGSEnv")  # create a clean environment
Pkg.add("POMCGraphSearch")

Example — RockSample (with parameters)

using POMCGraphSearch
using POMDPs
using RockSample

pomdp = RockSamplePOMDP(7, 8)

pomcgs = SolverPOMCGS(pomdp;
    max_b_gap = 0.2,                # belief merging threshold
    max_search_depth = 30,          # maximum search depth
    num_sim_per_sa = 20 # simulations per action for a given state particle
)

fsc = solve(pomcgs, pomdp)          # Solve using POMDPs.jl API

run_standard_simulation(pomdp, fsc; verbose=true)  # Simulate the resulting FSC

Example — LightDark (Continuous POMDP)

using POMCGraphSearch 
using POMDPs
using POMDPModels

pomdp = LightDark1D()  # define the LightDark problem

pomcgs = SolverPOMCGS(pomdp;
    max_b_gap = 0.2, 
    state_grid = [1.0, 1.0], # the state grid for state discretization
    num_fixed_observations = 20, # the number of observation clusters
    max_search_depth = 30,
    num_sim_per_sa = 1000
)  # Initialize the POMCGS solver. It will automatically use the continuous planner for this problem.

fsc = solve(pomcgs, pomdp)          # Solve using POMDPs.jl API

run_standard_simulation(pomdp, fsc; verbose=true)  # Simulate the resulting FSC

Saving FSC Policies

SaveFSCPolicyJSON(pomcgs.fsc) # save the fsc policy to a JSON file

SaveFSCPolicyJLD2(pomcgs.fsc) # save the fsc policy to a JLD2 file

Please be aware that the saved FSC is not prunned, it contains edges derived from non-optimal actions.

Settings for Common Benchmarks

We provide detailed configuration examples for several common POMDP benchmarks including RockSample(7,8), RockSample(11,11), RockSample(15,15), LightDark, Bumper Roomba, and Lidar Roomba in the file /docs/example_common_benchmarks.md.

These examples are the benchmarks listed in the POMCGS paper and may serve as good starting points for tuning parameters in other domains.

Core POMCGS Planning Parameters

The POMCGS solver automatically detects the input POMDP type, i.e., whether the state, observation, or action spaces are continuous or discrete. Users can further tune the behavior of the solver through the following core parameters:

Parameter	Default	Description
`max_b_gap`	`0.1`	Belief merging threshold; controls granularity of the belief graph. (0.1 = tight, larger = faster planning but coarser graph)
`max_search_depth`	`50`	Maximum search depth.
`num_sim_per_sa`	`100`	Number of simulations per action.
`epsilon`	`0.1`	Convergence threshold: stop when upper–lower bound gap < `epsilon`.
`nb_particles`	`10000`	Number of particles sampled from the initial belief `b0`.
`C_star`	`100`	Node visit threshold controlling when a belief node is considered sufficiently explored. Larger values yield tighter bounds but slower convergence, while smaller values favor faster but more approximate updates.
`max_planning_secs`	`10000`	The planning time out (in seconds)
`state_grid`	`[]`	Grid for discretizing continuous states (used only for continuous-state POMDPs).
`num_fixed_observations`	`20`	Number of observation clusters (used only for continuous-observation POMDPs).
`k_a`	`2.0`	Action Progressive Widening constant (used only for POMDPs with large or continuous action spaces).
`alpha_a`	`0.2`	Action Progressive Widening exponent (used only for POMDPs with large or continuous action spaces).

Q-learning Initialization Parameters (for MDP upper bound)

During initialization, POMCGS uses Q-learning to compute an approximate MDP value for the upper bound.
The default values should work for most POMDPs tested in the paper.
If initialization is too slow, inaccurate, or gives unexpected results, you can adjust the following parameters:

Parameter	Default	Description
`nb_episode_size`	`30`	Number of steps per episode in the Q-learning initialization.
`VMDP_nb_max_episode`	`20`	Maximum number of episodes to run during initialization.
`nb_samples_VMDP`	`5000`	Number of samples used to estimate the MDP value function.
`nb_sim_VMDP`	`10`	Number of simulations per sample for value estimation.
`epsilon_VMDP`	`0.1`	Convergence threshold for Q-learning value computation.

Citation

If you found this work useful in your research, please cite:

@article{You_Thomas_Schutz_Skilton_Hawes_Buffet_2025,
  title={Partially Observable Monte-Carlo Graph Search},
  author={You, Yang and Thomas, Vincent and Schutz, Alex and Skilton, Robert and Hawes, Nick and Buffet, Olivier},
  journal={Proceedings of the International Conference on Automated Planning and Scheduling},
  volume={35},
  number={1},
  pages={279--287},
  year={2025},
  month={Sep.},
  url={https://ojs.aaai.org/index.php/ICAPS/article/view/36129},
  doi={10.1609/icaps.v35i1.36129}
}