Partially Observable Monte-Carlo Graph Search

January 4, 2026 · View on GitHub

Logo

Partially Observable Monte-Carlo Graph Search

Overview

This repository contains the Julia implementation of the ICAPS 2025 paper:

Partially Observable Monte-Carlo Graph Search
Yang You, Vincent Thomas, Alex Schutz, Robert Skilton, Nick Hawes, Olivier Buffet

The repository implements POMCGS, an offline planning algorithm that performs Monte-Carlo Graph Search for POMDPs and stores policies as finite-state controllers (FSCs).


Code Structure

FileDescription
./POMCGraphSearch.jlMain POMCGS algorithm
./ModelWrapper.jlModel wrapper for continuous domains
./Planner.jlGeneral POMCGS planner implementation
./FSC.jlDefines the FSC data structure used during planning
./Qlearning.jlImplements the MDP heuristic method
./Utils.jlUtility functions

Getting Started

Important: Before using POMCGraphSearch, we recommend creating a new Julia environment to avoid dependency conflicts:

using Pkg
Pkg.activate("ExamplePOMCGSEnv")  # create a clean environment
Pkg.add("POMCGraphSearch")

Example — RockSample (with parameters)

using POMCGraphSearch
using POMDPs
using RockSample

pomdp = RockSamplePOMDP(7, 8)

pomcgs = SolverPOMCGS(pomdp;
    max_b_gap = 0.2,                # belief merging threshold
    max_search_depth = 30,          # maximum search depth
    num_sim_per_sa = 20 # simulations per action for a given state particle
)

fsc = solve(pomcgs, pomdp)          # Solve using POMDPs.jl API

run_standard_simulation(pomdp, fsc; verbose=true)  # Simulate the resulting FSC

Example — LightDark (Continuous POMDP)

using POMCGraphSearch 
using POMDPs
using POMDPModels

pomdp = LightDark1D()  # define the LightDark problem

pomcgs = SolverPOMCGS(pomdp;
    max_b_gap = 0.2, 
    state_grid = [1.0, 1.0], # the state grid for state discretization
    num_fixed_observations = 20, # the number of observation clusters
    max_search_depth = 30,
    num_sim_per_sa = 1000
)  # Initialize the POMCGS solver. It will automatically use the continuous planner for this problem.

fsc = solve(pomcgs, pomdp)          # Solve using POMDPs.jl API

run_standard_simulation(pomdp, fsc; verbose=true)  # Simulate the resulting FSC

Saving FSC Policies

SaveFSCPolicyJSON(pomcgs.fsc) # save the fsc policy to a JSON file

or

SaveFSCPolicyJLD2(pomcgs.fsc) # save the fsc policy to a JLD2 file

Please be aware that the saved FSC is not prunned, it contains edges derived from non-optimal actions.


Settings for Common Benchmarks

We provide detailed configuration examples for several common POMDP benchmarks including RockSample(7,8), RockSample(11,11), RockSample(15,15), LightDark, Bumper Roomba, and Lidar Roomba in the file /docs/example_common_benchmarks.md.

These examples are the benchmarks listed in the POMCGS paper and may serve as good starting points for tuning parameters in other domains.


Core POMCGS Planning Parameters

The POMCGS solver automatically detects the input POMDP type, i.e., whether the state, observation, or action spaces are continuous or discrete. Users can further tune the behavior of the solver through the following core parameters:

ParameterDefaultDescription
max_b_gap0.1Belief merging threshold; controls granularity of the belief graph. (0.1 = tight, larger = faster planning but coarser graph)
max_search_depth50Maximum search depth.
num_sim_per_sa100Number of simulations per action.
epsilon0.1Convergence threshold: stop when upper–lower bound gap < epsilon.
nb_particles10000Number of particles sampled from the initial belief b0.
C_star100Node visit threshold controlling when a belief node is considered sufficiently explored. Larger values yield tighter bounds but slower convergence, while smaller values favor faster but more approximate updates.
max_planning_secs10000The planning time out (in seconds)
state_grid[]Grid for discretizing continuous states (used only for continuous-state POMDPs).
num_fixed_observations20Number of observation clusters (used only for continuous-observation POMDPs).
k_a2.0Action Progressive Widening constant (used only for POMDPs with large or continuous action spaces).
alpha_a0.2Action Progressive Widening exponent (used only for POMDPs with large or continuous action spaces).

Q-learning Initialization Parameters (for MDP upper bound)

During initialization, POMCGS uses Q-learning to compute an approximate MDP value for the upper bound.
The default values should work for most POMDPs tested in the paper.
If initialization is too slow, inaccurate, or gives unexpected results, you can adjust the following parameters:

ParameterDefaultDescription
nb_episode_size30Number of steps per episode in the Q-learning initialization.
VMDP_nb_max_episode20Maximum number of episodes to run during initialization.
nb_samples_VMDP5000Number of samples used to estimate the MDP value function.
nb_sim_VMDP10Number of simulations per sample for value estimation.
epsilon_VMDP0.1Convergence threshold for Q-learning value computation.

Citation

If you found this work useful in your research, please cite:

@article{You_Thomas_Schutz_Skilton_Hawes_Buffet_2025,
  title={Partially Observable Monte-Carlo Graph Search},
  author={You, Yang and Thomas, Vincent and Schutz, Alex and Skilton, Robert and Hawes, Nick and Buffet, Olivier},
  journal={Proceedings of the International Conference on Automated Planning and Scheduling},
  volume={35},
  number={1},
  pages={279--287},
  year={2025},
  month={Sep.},
  url={https://ojs.aaai.org/index.php/ICAPS/article/view/36129},
  doi={10.1609/icaps.v35i1.36129}
}