Denis Barthou Denis Barthou

Full professor | AI Systems · Parallel Computing · Performance Engineering



Background

I am full professor in Computer Science, at Bordeaux INP/ENSEIRB-MATMECA. I graduated from ENS Lyon and earned a PhD from the University of Versailles–Saint-Quentin in 1998, where I worked on dependency analysis in the polyhedral model. I then became an assistant professor at the same university. In 2009, I joined Bordeaux INP as a full professor and became part of Inria’s Runtime team. In 2017, I created and led Inria’s Storm team while also overseeing the Computer Science program at the ENSEIRB-MATMECA engineering school of Bordeaux INP. From 2023 to mid-2025, I was on leave to direct Huawei Paris’s Distributed and Parallel Research Lab. After returning from this leave, I joined Inria’s Topal team at Inria and LaBRI Computer science department of University of Bordeaux.




Research

Recently, I’ve increasingly focused on how to scale AI systems, especially open-source large language models. I’ve explored parallelization techniques for both training and inference, and how these models behave when deployed on large cloud infrastructures built around modern AI accelerators.
 My research has always revolved around making complex applications run faster and more efficiently. I’m interested in optimization, parallel and distributed methods, and the design of compilation and runtime systems for AI and high-performance computing. Over the years, I’ve gained substantial hands-on experience with a variety of heterogeneous hardware platforms—from the familiar NVIDIA GPU ecosystem to architectures like Huawei’s Ascend NPUs, as well as multicore ARM and Intel processors. These themes have accompanied me throughout my career, even as the computing landscape continues to evolve.
The following is a word cloud of the most frequent keywords/topics associated to my publications:




Industrial & Community Impact

I've been working with industry through different projects (European, ANR grants, direct industrial projects). The projects for instance of 2022 can be consulted in the Storm Inria scientific annual report.
Patents and innovation:
I am co-inventor on two international patent applications with Huawei, both addressing core challenges in the efficient execution of parallel and AI workloads. They are currently active and pending under the PCT framework.
  • WO2024/255997 — Optimizing Parallel Execution of Machine Learning Models. This patent covers methods and systems for improving the parallel execution of machine learning models through optimized scheduling and execution strategies on heterogeneous hardware platforms.
  • WO2025/194432 — Processing Computational Graphs for Task Scheduling. This patent introduces techniques for analyzing and transforming computational graphs to improve runtime scheduling and execution efficiency for complex task-based and parallel applications.
Publications cited in patents
My research has been cited in over 30 active international patents, including filings from major technology companies such as Google, IBM, Intel, Qualcomm, Microsoft, Oracle, Reservoir Labs and Xilinxa. The number of citations has grown significantly in recent years, with an industrial relevance in AI systems, compiler technology, heterogeneous computing, and performance engineering.
Software cited in patents
Beyond publications, the open-source software tools I contributed to have also been cited directly in patent filings:
  • MAQAO — cited in patents from IBM and Xilinx, in the context of performance analysis, compilation, and hardware-aware optimization.
  • AFF3CT — cited in a patent from MIT, U. of Columbia, and U. Ireland Maynooth, related to forward error-correction and simulation frameworks.
Logos are shown for identification purposes only.
Large-scale AI systems (selected experience)
I have contributed to AI system projects spanning the full lifecycle, from research and prototyping to deployment on large-scale production infrastructures, including customer environments involving tens of thousands of accelerators. This work covered performance modeling, parallelization strategies, and system-level optimization for real-world constraints.



Software & Tools

I have directly contributed to MAQAO and AFF3CT software, that have now active communities of users, in both industry and academia.

MAQAO License: LGPL v3LicenseLGPL v3
Modular Assembly Quality Analyzed and Optimizer is a parallel performance analysis and optimization framework that I initiated in 2004. Designed for developers and performance experts, it provides advanced capabilities for analyzing and optimizing low-level code. MAQAO has stood the test of time: it is actively maintained and extended at the University of Versailles, supported by a long-lasting community of users and contributors. It has become part of the VI-HPS Institute, is listed among the EU Innovation Radar technologies, and has been featured in an AWS blog highlighting its potential. GitLab Web page
MAQAO allows to explore scalability at the loop level and to quickly identify which parts of the code are the limiting factors when scaling. By computing the difference between the theoretical maximum speed up and the measured speed up MAQAO provides an efficiency ratio indicating the loop efficiency for each configuration. Figure from AWS blog.

BER/FER display using AFF3CT simulation, here on polar codes.
AFF3CT License: MITLicenseMIT
A Fast Forward Error Correction Toolbox is a high-performance simulation framework dedicated to Forward Error Correction (FEC, or channel coding). It was initiated during A. Cassagne’s PhD in 2017 and supports a wide range of codes—from well-established Turbo codes to modern Polar and LDPC codes. AFF3CT has grown into a vibrant, active and industrial community maintained and developped at Sorbonne University, IMS Lab/Bordeaux University and Inria. Users and developers gather annually during the AFF3CT Day, illustrating the maturity and engagement around the project. Thanks to its performance and robustness, AFF3CT is now used in both academic research and industrial settings. GitHub Web page

I have also contributed to other software tools through PhD supervision and research projects (e.g., MIPP, PARCOACH), focusing on SIMD abstraction, MPI correctness checking. These projects are actively maintained by their respective communities




People, Mentoring & Leadership

I regularly supervise PhD students in collaboration with academic and industrial partners. Please get in touch if you are interested.

Current PhD students
► L. Sauleau: PhD directed by Pr. C. Ancourt on Enhancing Data Transfer and Performance on Heterogeneous Architectures.
► B. Priour: PhD directed by Pr. A. Tchana on Topology-adaptive optimization for distributing computing in LLM using xOS's principles.
PhD Alumni
► V. Alba. Resource dimensioning for heterogeneous architectures., 2025, U.Bordeaux PhD thesis.
► D. Orhan. Modeling and dynamic optimization of software radio chains on heterogeneous architectures. 2025, U.Bordeaux PhD thesis, co-directed with C.Jego
► B. Coye (with Ubisoft). Dynamic Task Graph Scheduling by Composition, 2023, U.Bordeaux PhD thesis, co-directed with Pr. R.Namyst. Now research engineer at Ubisoft.
► V.-M. Nguyen. Compile-time Validation and Optimization of MPI Nonblocking Communications, 2022, U.Bordeaux PhD thesis, co-directed with P. Carribault (CEA). Now research engineer at Eviden.
► C. T. Ait Kaci. Static and dynamic analysis for memory access concurrency error detection in MPI-RMA applications, 2022, U.Bordeaux PhD thesis. Now scientific project manager at Cap Gemini.
► A. Cassagne. Optimization and parallelization methods for software-defined radio, 2020, U. Bordeaux PhD thesis, co-directed with Pr. C.Jego. Now Ass. Professor at Paris Sorbonne University
► P. Huchant. Static Analysis and Dynamic Adaptation for Parallelism, 2019, U. Bordeaux PhD thesis. Now Senior Software Engineer at Synopsis, Bordeaux
► H. Brunie. Optimization of data allocation for HPC applications on heterogeneous memory architectures, 2019, U. Bordeaux PhD thesis, co-directed with P. Carribault (CEA). Now Postdoc at Inria Grenoble.
► C. Haine. Kernel Optimization by Layout Restructuring, 2017, U. Bordeaux PhD thesis. Now research engineer at HPE, Swizerland.
► G. Vaumourin. Hybrid Memory Hierarchy and Dynamic Data Handling in Embedded Parallel Architectures., 2016, U. Bordeaux PhD thesis. Now research engineer at ATOS, Grenoble.
► E. Saillard. Static/dynamic/iterative analyses for validation and improvement of multi-models HPC applications, 2015, U. Bordeaux PhD thesis. Now Inria Researcher.
► B. Putigny. Benchmark-driven Approaches to Performance Modeling of Multi-core architectures, 2014, U. Bordeaux PhD thesis. Now HPC engineer at Eviden, Bordeaux.
► S. Henry. Programming Models and Runtime Systems for Heterogeneous Architectures, 2013, U. Bordeaux PhD thesis. Now engineer at IOHK.
► L. Duchateau. Automatic Algorithm Derivation and Exploration in Linear Algebra for Parallelism and Locality, 2013, UIUC PhD thesis, co-directed with Pr. D. Padua. Now senior software engineer at Pure Storage, Bellevue, USA.
► A. Mazouz. Une Etude Empirique des Performances des Applications OpenMP sur les Plateformes Multi-coeurs, 2012, UVSQ PhD thesis, co-directed with Pr. S.-A. Touati. Now senior software engineer at Intel, Paris.
► A. Charif-Rubial. On code performance analysis and optimization for multicore architectures, 2012, UVSQ PhD thesis, co-directed with Pr. W. Jalby. In memoriam.
► J. Jaeger. Source-to-source transformations for irregular and multithreaded code optimization, 2012, UVSQ PhD thesis. Now Research engineer at CEA.
► P. De Oliveira Castro Herrero. Expression and optimization of data reorganizations on data flow parallelism , 2010, UVSQ PhD thesis. Now Professor, HDR, at Paris-Saclay University, Versailles St Quentin en Yvelines.
► S. Donadio. Iterative optimization of performance libraries by hierarchical division of codes, 2007, UVSQ PhD thesis, directed by and co-advised with Pr. W. Jalby. Now Architect/Product manager at Bloomberg, New York, USA and adjunct professor at Columbia Engineering.
► C. Alias. Program Optimization by Template Recognition and Replacement, 2005, UVSQ PhD thesis, directed by and co-advised with P. Feautrier. Now Inria Researcher, HDR and chief scientific advisor of XtremLogic
Postdoc
► Lilia Ziane Khodja (2014), Modeling of parallel HPC applications running on platforms composed by modern multicore nodes interconnected with high performance networks, with B.Goglin. Now Consultant at ANEO;
Engineers
► P. Virouleau, working with ATOS on Parcoach project (2022-2024). Now permanent research engineer at Inria.
► M. Makni, working on H2020 Microcard project (2022-2023). Now research engineer at Lytid.
► C. Sakka, working on ANR Exacard project (2021-2022). Now engineer at ANEO.
► K. He, working on AFF3CT (2018-2019), with A. Cassagne and O. Aumage. Now engineer at IHU Liryc.
► A. Cassagne, working on optimizing Error Correcting Codes (2015-2016), with B. Le Gal (IMS), C. Leroux (IMS) and O. Aumage. Now Ass. Professor at Paris Sorbonne University;
► J. Tombi A Mba, working on MAQAO for Arm (2014-2015), with O. Aumage. Now engineer senior software engineer at BePatient;
► T. Meunier, working on performance analysis for vectorization and data restructuring (2013), with O. Aumage;
Leadership through Coaching
In addition to technical mentoring, I have experience training HR, engineering and research managers through leadership development programs, specifically leadership through coaching. This includes training sessions focused on technical leadership and team dynamics in high-technology environments.



Selected Publications
My complete list of publications can be found on ORCID.