PoseEstimator

June 28, 2026 · View on GitHub

Auto-generated documentation for musicalgestures._pose_estimator module.

Pose estimator interface and backends for MGT-python.

Mgt-python / Modules / Musicalgestures / PoseEstimator

This module provides:

class PoseEstimator – an abstract base class (ABC) defining the common interface that all pose backends must implement.
class MediaPipePoseEstimator – a concrete backend powered by Google MediaPipe Pose (33 landmarks, CPU-friendly, zero model download).
class OpenPosePoseEstimator – a thin wrapper around the legacy OpenPose / Caffe-model implementation already present in :mod:Pose.

The shared interface means that backends are interchangeable

from musicalgestures._pose_estimator import MediaPipePoseEstimator
est = MediaPipePoseEstimator()
keypoints = est.predict_frame(frame)   # → np.ndarray shape (33, 3)

Examples

>>> import numpy as np
>>> frame = np.zeros((480, 640, 3), dtype=np.uint8)
>>> # Without mediapipe installed this raises MgDependencyError gracefully.

## MediaPipePoseEstimator

[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_pose_estimator.py#L195)

```python
class MediaPipePoseEstimator(PoseEstimator):
    def __init__(
        model_complexity: int = 1,
        min_detection_confidence: float = 0.5,
        min_tracking_confidence: float = 0.5,
        device: PoseDevice | str = PoseDevice.CPU,
    ) -> None:

Pose estimator backed by Google MediaPipe Pose (Tasks API).

Requires the optional mediapipe>=0.10 package

pip install musicalgestures[pose]

The first time you use a given complexity level the corresponding .task model file (~8–28 MB) is downloaded from Google's model storage and cached in musicalgestures/models/.

model_complexity: MediaPipe model complexity (0 = lite, 1 = full, 2 = heavy). Higher values are more accurate but slower. Default: 1. min_detection_confidence: Minimum confidence for initial body detection. Default: 0.5. min_tracking_confidence: Minimum confidence for landmark tracking. Default: 0.5.

Examples

>>> import numpy as np
>>> est = MediaPipePoseEstimator()  # doctest: +SKIP
>>> frame = np.zeros((480, 640, 3), dtype=np.uint8)
>>> result = est.predict_frame(frame)  # doctest: +SKIP
>>> result.keypoints.shape  # (33, 3)  # doctest: +SKIP

#### See also

- [PoseEstimator](#poseestimator)

### MediaPipePoseEstimator().close

[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_pose_estimator.py#L388)

```python
def close() -> None:

Release MediaPipe resources.

MediaPipePoseEstimator().landmark_names

[find in source code]

@property
def landmark_names() -> list[str]:

MediaPipePoseEstimator().predict_frame

[find in source code]

def predict_frame(frame: np.ndarray) -> PoseEstimatorResult:

Run MediaPipe Pose on a single BGR frame.

Parameters

frame: BGR frame, shape (H, W, 3).

Returns

PoseEstimatorResult 33 landmarks; confidence is the visibility score.

OpenPosePoseEstimator

[find in source code]

class OpenPosePoseEstimator(PoseEstimator):
    def __init__(
        model: PoseModel | str = PoseModel.BODY_25,
        device: PoseDevice | str = PoseDevice.GPU,
        threshold: float = 0.1,
    ) -> None:

Thin wrapper around the legacy OpenPose / Caffe-model backend.

This class delegates to :func:pose and is provided so that the old OpenPose workflow can be used through the same :class:PoseEstimator interface.

Parameters

model: One of 'body_25', 'coco', or 'mpi'. device: 'cpu' or 'gpu'. threshold: Minimum confidence threshold. Default: 0.1.

OpenPosePoseEstimator().landmark_names

[find in source code]

@property
def landmark_names() -> list[str]:

OpenPosePoseEstimator().predict_frame

[find in source code]

def predict_frame(frame: np.ndarray) -> PoseEstimatorResult:

Run OpenPose inference on a single BGR frame.

Notes

Full video-level processing is better handled by calling :meth:MgVideo.pose directly.

PoseEstimator

[find in source code]

class PoseEstimator(abc.ABC):
    def __init__(
        model: PoseModel | str = PoseModel.MEDIAPIPE,
        device: PoseDevice | str = PoseDevice.CPU,
    ) -> None:

Abstract base class for pose estimation backends.

All concrete subclasses must implement :meth:predict_frame and :meth:landmark_names.

Parameters

model: Skeleton model variant. device: Compute backend ('cpu' or 'gpu').

PoseEstimator().landmark_names

[find in source code]

@property
@abc.abstractmethod
def landmark_names() -> list[str]:

Ordered list of keypoint names.

PoseEstimator().predict_frame

[find in source code]

@abc.abstractmethod
def predict_frame(frame: np.ndarray) -> PoseEstimatorResult:

Run pose estimation on a single BGR frame.

Parameters

frame: Input frame as a NumPy array of shape (H, W, 3) in BGR order.

Returns

PoseEstimatorResult

PoseEstimator().predict_video

[find in source code]

def predict_video(
    filename: str | Path,
    start: float = 0.0,
    end: float | None = None,
    skip: int = 0,
) -> list[PoseEstimatorResult]:

Run pose estimation on every frame of a video file.

Parameters

filename: Path to the video file. start: Start time in seconds. end: End time in seconds (None = full video). skip: Process every (1 + skip)-th frame.

Returns

list[PoseEstimatorResult]

PoseEstimatorResult

[find in source code]

class PoseEstimatorResult():
    def __init__(
        keypoints: np.ndarray,
        landmark_names: list[str],
        frame_index: int = 0,
        timestamp: float = 0.0,
    ) -> None:

Container for the output of a single-frame pose estimation.

Parameters

keypoints: 2-D array of shape (n_keypoints, 3) where columns are (x, y, confidence). Coordinates are normalised to [0, 1]. landmark_names: List of keypoint names corresponding to each row. frame_index: Frame index this result belongs to. timestamp: Timestamp in seconds.

PoseEstimatorResult().n_keypoints

[find in source code]

@property
def n_keypoints() -> int:

PoseEstimatorResult().to_dict

[find in source code]

def to_dict() -> dict[str, Any]:

Return a plain dict representation.

get_pose_estimator

[find in source code]

def get_pose_estimator(
    backend: str = 'mediapipe',
    **kwargs: Any,
) -> PoseEstimator:

Factory function: return a :class:PoseEstimator for the requested backend.

>>> est = get_pose_estimator("mediapipe", model_complexity=0)  # doctest: +SKIP

#### See also

- [PoseEstimator](#poseestimator)