Médéric Fourmy
I am a robotics research engineer at Mistral AI. Before that I was a robotics researcher at CIIRC in the team of Josef Sivic where my research revolves around perception for robotics, like object pose estimation, model predictive control, and robot state estimation. I am part of the BOP challenge team, where I manage the evaluation framework and actively contribute to the open-source toolkit. I obtained a PhD at Laas under the supervision of Nicolas Mansard and Joan Solà on Factor Graph state estimation for legged-robots. I hold a Master’s degree in Aerospace engineering and Data science from Supaero, which was completed by a Master’s Thesis at Heriott-Watt University on underwater robotics.
publications
-
AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric AlignmentAnna Šárová Mikeštíková, Médéric Fourmy, Martin Cífka, and 2 more authorsCVPR, 2026Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation methods have the potential to solvethese issues, but existing works rely on precise single-view pose estimates or lack generalization to unseen objects. Weaddress these challenges via the following three contributions. First, we introduce AlignPose, a 6D object pose estimation method that aggregates information from multiple extrinsically calibrated RGB views and does not requireany object-specific training or symmetry annotation. Second, the key component of this approach is a new multi-view feature-metric refinement specifically designed for object pose. It optimizes a single, consistent world-frame object pose minimizing the feature discrepancy between on-the-fly rendered object features and observed image features across all views simultaneously. Third, we report extensiveexperiments on four datasets (YCB-V, T-LESS, ITODD-MV, HouseCat6D) using the BOP benchmark evaluation andshow that AlignPose outperforms other published methods, especially on challenging industrial datasets where multiple views are readily available in practice.
-
Realm: A real-to-sim validated benchmark for generalization in robotic manipulationMartin Sedlacek, Pavlo Yefanov, Georgy Ponimatkin, and 7 more authorsRAL, 2026Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the \mathbf\pi_0, \mathbf\pi_0-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs.
-
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose EstimationStephen Tyree, Andrew Guo, Médéric Fourmy, and 8 more authorsCoRR, 2025We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods need to onboard objects just from provided reference videos. Second, we defined a new, more practical 6D object detection task where identities of objects visible in a test image are not provided as input. Third, we introduced new BOP-H3 datasets recorded with high-resolution sensors and AR/VR headsets, closely resembling real-world scenarios. BOP-H3 include 3D models and onboarding videos to support both model-based and model-free tasks. Participants competed on seven challenge tracks. Notably, the best 2024 method for model-based 6D localization of unseen objects (FreeZeV2.1) achieves 22% higher accuracy on BOP-Classic-Core than the best 2023 method (GenFlow), and is only 4% behind the best 2023 method for seen objects (GPose2023) although being significantly slower (24.9 vs 2.7s per image). A more practical 2024 method for this task is Co-op which takes only 0.8s per image and is 13% more accurate than GenFlow. Methods have similar rankings on 6D detection as on 6D localization but higher run time. On model-based 2D detection of unseen objects, the best 2024 method (MUSE) achieves 21–29% relative improvement compared to the best 2023 method (CNOS). However, the 2D detection accuracy for unseen objects is still -35% behind the accuracy for seen objects (GDet2023), and the 2D detection stage is consequently the main bottleneck of existing pipelines for 6D localization/detection of unseen objects.
-
FreePose: 6D Object Pose Tracking in Internet Videos for Robotic ManipulationGeorgy Ponimatkin, Martin Cı́fka, Tomáš Souček, and 4 more authorsICLR, 2025We seek to extract a temporally consistent 6D pose trajectory of a manipulated object from an Internet instructional video. This is a challenging set-up for current 6D pose estimation methods due to uncontrolled capturing conditions, fine-grained dynamic object motions, and the fact that the exact mesh of the manipulated object is not known. To address these challenges, we present the following contributions. First, we develop a new method that estimates the 6D pose of any object in the input image without prior knowledge of the object itself. The method proceeds by (i) retrieving a CAD model similar to the depicted object from a large-scale model database, (ii) 6D aligning the retrieved CAD model with the input image, and (iii) grounding the absolute scale of the object with respect to the scene. Second, we extract smooth 6D object trajectories from Internet videos by carefully tracking the detected objects across video frames. The extracted object trajectories are then retargeted via trajectory optimization into the configuration space of a robotic manipulator. Third, we thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets and demonstrate significant improvements over existing state-of-the-art RGB 6D pose estimation methods. Finally, we show that the 6D object motion estimated from Internet videos can be transferred to a 7-axis robotic manipulator both in a virtual simulator as well as in the real world. Additionally, we successfully apply our method to egocentric videos taken from the EPIC-KITCHENS dataset, demonstrating potential for Embodied AI applications.
-
Temporally Consistent Object 6D Pose Estimation for Robot ControlKateryna Zorina, Vojtech Priban, Médéric Fourmy, and 2 more authorsIEEE Robotics and Automation Letters, 2025Single-view RGB object pose estimators have reached a level of precision and efficiency that makes them good candidates for vision-based robot control. However, off-the-shelf methods lack temporal consistency and robustness that are mandatory for a stable feedback control. In this work, we develop a factor graph approach to enforce temporal consistency of the object pose estimates. In particular, the proposed approach: (i) incorporates object motion models, (ii) explicitly estimates the object pose measurement uncertainty, and (iii) integrates the above two components in an online optimization-based estimator. We demonstrate that with appropriate outlier rejection and smoothing using the proposed factor graph approach, we can significantly improve the results on standardized pose estimation benchmarks. We experimentally validate the stability of the proposed approach for a feedback-based robot control task in which the object is tracked by the camera attached to a torque controlled manipulator.
-
Model Predictive Control Under Hard Collision Avoidance Constraints for a Robotic ArmArthur Haffemayer, Armand Jordana, Médéric Fourmy, and 5 more authorsIn 2024 21st International Conference on Ubiquitous Robots (UR), 2024We design a method to control the motion of a manipulator robot while strictly enforcing collision avoidance in a dynamic obstacle field. We rely on model predictive control while formulating collision avoidance as a hard constraint. We express the constraint as the requirement for a signed distance function to be positive between pairs of strictly convex objects. Among various formulations, we provide a suitable definition for this signed distance and the analytical derivatives the numerical solver needs to enforce the constraint. The method is completely implemented on a manipulator ”Panda” robot, and the efficient open-source implementation is provided along with the paper. We experimentally demonstrate the efficiency of our approach by performing dynamic tasks in an obstacle field while reacting to non-modeled perturbations.
-
Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and TrackingMédéric Fourmy, Vojtech Priban, Jan Kristof Behrens, and 3 more authorsarXiv preprint arXiv:2311.05344, 2023The objective of this work is to enable manipulation tasks with respect to the 6D pose of a dynamically moving object using a camera mounted on a robot. Examples include maintaining a constant relative 6D pose of the robot arm with respect to the object, grasping the dynamically moving object, or co-manipulating the object together with a human. Fast and accurate 6D pose estimation is crucial to achieve smooth and stable robot control in such situations. The contributions of this work are three fold. First, we propose a new visual perception module that asynchronously combines accurate learning-based 6D object pose localizer and a highrate model-based 6D pose tracker. The outcome is a low-latency accurate and temporally consistent 6D object pose estimation from the input video stream at up to 120 Hz. Second, we develop a visually guided robot arm controller that combines the new visual perception module with a torque-based model predictive control algorithm. Asynchronous combination of the visual and robot proprioception signals at their corresponding frequencies results in stable and robust 6D object pose guided robot arm control. Third, we experimentally validate the proposed approach on a challenging 6D pose estimation benchmark and demonstrate 6D object pose-guided control with dynamically moving objects on a real 7 DoF Franka Emika Panda robot.
-
WOLF: A modular estimation framework for robotics based on factor graphsJoan Sola, Joan Vallve, Joaquim Casals, and 5 more authorsIEEE Robotics and Automation Letters, 2022This paper introduces WOLF, a C++ estimation framework based on factor graphs and targeted at mobile robotics. WOLF extends the applications of factor graphs from the typical problems of SLAM and odometry to a general estimation framework able to handle self-calibration, model identification, or the observation of dynamic quantities other than localization. WOLF produces high throughput estimates at sensor rates up to the kHz range, which can be used for feedback control of highly dynamic robots such as humanoids, quadrupeds or aerial manipulators. Departing from the factor graph paradigm, the architecture of WOLF allows for a modular yet tightly-coupled estimator. Modularity is based on plugins that are loaded at runtime. Then, integration is achieved simply through YAML files, allowing users to configure a wide range of applications without the need of writing or compiling code. Synchronization of incoming data and their processing into a unique factor graph is achieved through a decentralized strategy of frame creation and joining. Most algorithmic assets are coded as abstract algorithms in base classes with varying levels of specialization. Overall, these assets allow for coherent processing and favor code reusability and scalability. WOLF can be interfaced with different solvers, and we provide a wrapper to Google Ceres. Likewise, we offer ROS integration, providing a generic ROS node and specialized packages with subscribers and publishers. WOLF is made publicly available and open to collaboration.
-
Contact Forces Preintegration for Estimation in Legged Robotics using Factor GraphsMédéric Fourmy, Thomas Flayols, Pierre-Alexandre Léziart, and 2 more authorsIn IEEE International Conference on Robotics and Automation (ICRA), 2021State estimation, in particular estimation of the base position, orientation and velocity, plays a big role in the efficiency of legged robot stabilization. The estimation of the base state is particularly important because of its strong correlation with the underactuated dynamics, i.e. the evolution of center of mass and angular momentum. Yet this estimation is typically done in two phases, first estimating the base state, then reconstructing the center of mass from the robot model. The underactuated dynamics is indeed not properly observed, and any bias in the model would not be corrected from the sensors. While it has already been observed that force measurements make such a bias observable, these are often only used for a binary estimation of the contact state. In this paper, we propose to simultaneously estimate the base and the underactuation state by using all measurements simultaneously. To this end, we propose several contributions to implement a complete state estimator using factor graphs. Contact forces altering the underactuated dynamics are pre-integrated using a novel adaptation of the IMU pre-integration method, which constitutes the principal contribution. IMU pre-integration is also used to measure the positional motion of the base. Encoder measurements are then participating to the estimation in two ways: by providing leg odometry displacements, contributing to the observability of IMU biases; and by relating the positional and centroidal states, thus connecting the whole graph and producing a tightly-coupled whole-body estimator. The validity of the approach is demonstrated on real data captured by the Solo12 quadruped robot
-
Absolute humanoid localization and mapping based on IMU Lie group and fiducial markersMédéric Fourmy, Dinesh Atchuthan, Nicolas Mansard, and 2 more authorsIn IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019Current locomotion algorithms in structured (indoor) 3D environments require an accurate localization. The several and diverse sensors typically embedded on legged robots (IMU, coders, vision and/or LIDARS) should make it possible if properly fused. Yet this is a difficult task due to the heterogeneity of these sensors and the real-time requirement of the control. While previous works were using staggered approaches (odometry at high frequency, sparsely corrected from vision and LIDAR localization), the recent progress in optimal estimation, in particular in visual-inertial localization, is paving the way to a holistic fusion. This paper is a contribution in this direction. We propose to quantify how a visual-inertial navigation system can accurately localize a humanoid robot in a 3D indoor environment tagged with fiducial markers. We introduce a theoretical contribution strengthening the formulation of Forster’s IMU pre-integration, a practical contribution to avoid possible ambiguity raised by pose estimation of fiducial markers, and an experimental contribution on a humanoid dataset with ground truth. Our system is able to localize the robot with less than 2 cm errors once the environment is properly mapped. This would naturally extend to additional measurements corresponding to leg odometry (kinematic factors) thanks to the genericity of the proposed pre-integration algebra.