PoseEstimator
June 28, 2026 · View on GitHub
Auto-generated documentation for musicalgestures._pose_estimator module.
Pose estimator interface and backends for MGT-python.
- Mgt-python / Modules / Musicalgestures / PoseEstimator
This module provides:
- class
PoseEstimator– an abstract base class (ABC) defining the common interface that all pose backends must implement. - class
MediaPipePoseEstimator– a concrete backend powered by Google MediaPipe Pose (33 landmarks, CPU-friendly, zero model download). - class
OpenPosePoseEstimator– a thin wrapper around the legacy OpenPose / Caffe-model implementation already present in :mod:Pose.
The shared interface means that backends are interchangeable
from musicalgestures._pose_estimator import MediaPipePoseEstimator
est = MediaPipePoseEstimator()
keypoints = est.predict_frame(frame) # → np.ndarray shape (33, 3)
Examples
>>> import numpy as np
>>> frame = np.zeros((480, 640, 3), dtype=np.uint8)
>>> # Without mediapipe installed this raises MgDependencyError gracefully.
## MediaPipePoseEstimator
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_pose_estimator.py#L195)
```python
class MediaPipePoseEstimator(PoseEstimator):
def __init__(
model_complexity: int = 1,
min_detection_confidence: float = 0.5,
min_tracking_confidence: float = 0.5,
device: PoseDevice | str = PoseDevice.CPU,
) -> None:
Pose estimator backed by Google MediaPipe Pose (Tasks API).
Requires the optional mediapipe>=0.10 package
pip install musicalgestures[pose]
The first time you use a given complexity level the corresponding
.task model file (~8–28 MB) is downloaded from Google's model
storage and cached in musicalgestures/models/.
Parameters
model_complexity: MediaPipe model complexity (0 = lite, 1 = full, 2 = heavy). Higher values are more accurate but slower. Default: 1. min_detection_confidence: Minimum confidence for initial body detection. Default: 0.5. min_tracking_confidence: Minimum confidence for landmark tracking. Default: 0.5.
Examples
>>> import numpy as np
>>> est = MediaPipePoseEstimator() # doctest: +SKIP
>>> frame = np.zeros((480, 640, 3), dtype=np.uint8)
>>> result = est.predict_frame(frame) # doctest: +SKIP
>>> result.keypoints.shape # (33, 3) # doctest: +SKIP
#### See also
- [PoseEstimator](#poseestimator)
### MediaPipePoseEstimator().close
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_pose_estimator.py#L388)
```python
def close() -> None:
Release MediaPipe resources.
MediaPipePoseEstimator().landmark_names
@property
def landmark_names() -> list[str]:
MediaPipePoseEstimator().predict_frame
def predict_frame(frame: np.ndarray) -> PoseEstimatorResult:
Run MediaPipe Pose on a single BGR frame.
Parameters
frame:
BGR frame, shape (H, W, 3).
Returns
PoseEstimatorResult
33 landmarks; confidence is the visibility score.
See also
OpenPosePoseEstimator
class OpenPosePoseEstimator(PoseEstimator):
def __init__(
model: PoseModel | str = PoseModel.BODY_25,
device: PoseDevice | str = PoseDevice.GPU,
threshold: float = 0.1,
) -> None:
Thin wrapper around the legacy OpenPose / Caffe-model backend.
This class delegates to :func:pose and is provided so that the old OpenPose workflow can be used through the same :class:PoseEstimator interface.
Parameters
model:
One of 'body_25', 'coco', or 'mpi'.
device:
'cpu' or 'gpu'.
threshold:
Minimum confidence threshold. Default: 0.1.
See also
OpenPosePoseEstimator().landmark_names
@property
def landmark_names() -> list[str]:
OpenPosePoseEstimator().predict_frame
def predict_frame(frame: np.ndarray) -> PoseEstimatorResult:
Run OpenPose inference on a single BGR frame.
Notes
Full video-level processing is better handled by calling
:meth:MgVideo.pose directly.
See also
PoseEstimator
class PoseEstimator(abc.ABC):
def __init__(
model: PoseModel | str = PoseModel.MEDIAPIPE,
device: PoseDevice | str = PoseDevice.CPU,
) -> None:
Abstract base class for pose estimation backends.
All concrete subclasses must implement :meth:predict_frame and
:meth:landmark_names.
Parameters
model:
Skeleton model variant.
device:
Compute backend ('cpu' or 'gpu').
PoseEstimator().landmark_names
@property
@abc.abstractmethod
def landmark_names() -> list[str]:
Ordered list of keypoint names.
PoseEstimator().predict_frame
@abc.abstractmethod
def predict_frame(frame: np.ndarray) -> PoseEstimatorResult:
Run pose estimation on a single BGR frame.
Parameters
frame:
Input frame as a NumPy array of shape (H, W, 3) in BGR order.
Returns
PoseEstimatorResult
See also
PoseEstimator().predict_video
def predict_video(
filename: str | Path,
start: float = 0.0,
end: float | None = None,
skip: int = 0,
) -> list[PoseEstimatorResult]:
Run pose estimation on every frame of a video file.
Parameters
filename: Path to the video file. start: Start time in seconds. end: End time in seconds (None = full video). skip: Process every (1 + skip)-th frame.
Returns
list[PoseEstimatorResult]
See also
PoseEstimatorResult
class PoseEstimatorResult():
def __init__(
keypoints: np.ndarray,
landmark_names: list[str],
frame_index: int = 0,
timestamp: float = 0.0,
) -> None:
Container for the output of a single-frame pose estimation.
Parameters
keypoints:
2-D array of shape (n_keypoints, 3) where columns are
(x, y, confidence). Coordinates are normalised to [0, 1].
landmark_names:
List of keypoint names corresponding to each row.
frame_index:
Frame index this result belongs to.
timestamp:
Timestamp in seconds.
PoseEstimatorResult().n_keypoints
@property
def n_keypoints() -> int:
PoseEstimatorResult().to_dict
def to_dict() -> dict[str, Any]:
Return a plain dict representation.
get_pose_estimator
def get_pose_estimator(
backend: str = 'mediapipe',
**kwargs: Any,
) -> PoseEstimator:
Factory function: return a :class:PoseEstimator for the requested backend.
Parameters
backend:
'mediapipe' (default) or 'openpose'.
**kwargs:
Additional keyword arguments forwarded to the estimator constructor.
Returns
PoseEstimator
Examples
>>> est = get_pose_estimator("mediapipe", model_complexity=0) # doctest: +SKIP
#### See also
- [PoseEstimator](#poseestimator)