Experienced software architect
- Designed large code bases (up to 400 kloc) for applications running primarily on Unix/Linux.
- Adopted and promoted modern software development practices such as agile programming in order to deliver robust software.
- Leveraged advanced technologies such as boolean solvers or deep neural networks to solve real life problems.
Skilled software engineer
- Proficient in C/C++, comfortable with Java and Python.
- Combined multithreading, GPUs, and distributed computing techniques to solve computationally intensive problems.
Proven leadership capabilities
- Over 20 years of experience as a hands-on technical and team leader.
- Mentored several junior engineers and helped them grow in their position.
- Managed engineering teams distributed across the US, India, Eastern and Western Europe.
Sep 18 - Present
Facebook (Online social services)
Senior Stafff Research Engineer, FAIR (Menlo Park, CA)
Dec 11 - Aug 18
Google (Internet related services and products)
Staff Software Engineer, Google Brain (Mountain View, CA)
- Led the development of Grappler, a toolbox for TensorFlow graph optimizations combining abstract interpretation of models with heuristics based graph rewrites and machine learning based combinatorial optimizations to improve performance by 12 to 50%.
- Envisioned, designed, and led the implementation of the high performance framework for numerical computing upon which TensorFlow was written. Several other teams also adopted the framework for applications such as image processing, recommendation engines, and text understanding.
- More than doubled the speed and scalability of the DistBelief machine learning engine, the precursor to TensorFlow. This resulted in annual savings exceeding $100 million.
Software Engineer, StreetView (Mountain View, CA)
- Trained a neural network capable of detecting license plates in the StreetView imagery with more than 99.9% accuracy.
- Redesigned and reimplemented the pipeline used to blur license plates and faces in StreetView imagery.
Sept 10 - Oct 11
Coverity (Source code analysis and verification)
Architect, New Product Initiative (San Francisco, CA)
- Spearheaded the design and implementation of a new product aiming to prove the correctness of safety critical applications written in C or C++.
- Developed the first prototypes and used them as vehicles for demonstrations. Leveraged off-the-shelf solutions such as Mahout for quick experiments.
- Wrote the functional specification for the final product and participated in its implementation.
- Managed the relationship with the project external partner in Japan.
Aug 07 - Sept 10
Tabula (Programmable logic design)
Technical Lead and Senior Software Manager, Infrastructure (Santa Clara, CA)
- Led and grew to
6 people a team developing the infrastructure software used to program the company's
semiconductor chips (code generators, device and design
databases, bit stream generator, timing analysis engines, clock router, and so
- Tackled cross functional projects such as reducing the power consumption of the chip, optimizing
the memory footprint of the software, or leveraging distributed computing to
speed up the generation of chip models.
- Personally designed and implemented an innovative approach to compute the maximum frequency at
which the chip can operate.
- Pioneered the use of contractors located in Eastern Europe as a flexible and cost efficient
way to increase the QA bandwidth when needed.
Mar 04 - Aug 07
Senior Manager, Formal Verification (San Jose, CA)
- Led the development of a tool that automatically detects several classes of bugs in a semiconductor design.
- Started the project from the ground up. Assembled, trained and coached a small but highly
technical team. Worked with the marketing department and potential customers to
define the initial requirements and roadmap.
- Acted as architect, technical lead and individual contributor. Conceived the overall
architecture including the custom high performance database. Designed and
implemented the property checking engine that formed the heart of the tool.
Senior Manager, Timing Analysis (Santa Clara, CA)
- Managed an international team of more than 10 people responsible for the incremental
timing analysis engine used to predict the frequency at which semiconductor
- Led major new developments: rewrote the tool to leverage multiple CPUs for speed, made use of
statistical models to capture the electrical variability of the transistors, etc.
- Incrementally increased the tool accuracy and expanded its reporting capabilities. Held brainstorming
sessions to identify technical solutions, scheduled their implementation, drove
(and often personally tackled) the development, and ensured final customer
adoption by integrating the feedback received during early beta testing
- Continuously improved the overall quality of the tool: decreased the runtime by up to 4 orders of
magnitude on some very large designs, dramatically reduced the bug filing rate,
fixed several design flaws, enforced more stringent code reviews, etc.
Sept 00 - Mar 04
Technical Lead, R&D (Mountain View, CA)
- Began in September 2000 as an intern; hired as a software developer in July 2001.
Became the technical lead of the timing analysis team in 2003.
- Increased the scalability of the timing analysis tool by reducing the computational
complexity of several algorithms and by leveraging up to 32 threads to perform
the computations. Designed and developed capabilities required to take into
account the impact of manufacturing process variations on the performance of
- Trained team members and helped them develop solutions to existing problems and
dramatically reduce the memory footprint of the tool.
Ensim (Internet infrastructure solutions for hosting providers)
Intern, Internet Application Network division (Sunnyvale, CA)
- Designed, developed and debugged software for managing licensing and billing.
- Adapted HP's OpenMail mail server to Ensim's "hosting ready" platform.
Intern, software development division (London, UK)
- Developed a powerful HTML generator and a web-based software interface.
- Improved the speed and robustness of the database connectivity engine.
Peregrine Systems (Corporate asset and infrastructure management)
Intern, product development department (Bourg La Reine, France)
- Developed an SNMP agent used for remote resource management.
2013 - Present
Eigen (Open source linear algebra library)
- Sped up the matrix multiplication code by optimizing memory accesses, adding support for the AVX and FMA instruction sets, and leveraging techniques such as loop peeling, thus making Eigen one of the fastest BLAS libraries.
- Added support for tensors, multithreading, and GPU to the library.
INRIA (French institute for research in computer science)
- Implemented a filter able to encapsulate any MPEG file into a transport stream.
- Implemented a MPEG stream demultiplexer.
1997 - 1999
Videolan (Open source project developing networked multimedia solutions)
- As project lead, coordinated a team of 10 people and was responsible for a budget of $75,000.
- Designed and implemented a distributed video streaming server.
- Drove the implementation of the VLC program, which to date has been downloaded more than 1 billion times.
1997 - 1998
Stone Age (Nonprofit organization promoting musical artists)
- Helped organize several concerts in Paris featuring artists such as Mogwai or Jay Jay Johanson.
- Worked with tour managers, agents, venues, and artists to put shows together.
2000 - 2001
Master's degree in computer science and control theory
- ISIA was a master's program offered by the Ecole des Mines de Paris (one of the top 4 French graduate schools) in partnership with INRIA.
- Strong emphasis on practice with a 14-months internship.
1997 - 2000
Master's degree in computer science
- Ecole Centrale Paris is one of the 4 best engineering schools in France.
- Stress on fundamental sciences (mathematics and physics).
- Focus on management and social sciences.
Value Function Based Performance Optimization of Deep Learning Workloads, Benoit Steiner, Chris Cummins, Horace He, and Hugh Leather. NeurIPS Workshop on ML For Systems 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library, Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala. NeurIPS 2019
Learning to Optimize Halide with Tree Search and Random Programs, Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, Jonathan Ragan-Kelley. SIGGRAPH 2019
A Hierarchical Model for Device Placement, Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc Le, Jeff Dean. SysML 2018
Hierarchical Planning for Device Placement, Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc Le, Jeff Dean. ICLR 2018
Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Mohammad Norouzi, Samy Bengio, Jeff Dean. ICML 2017
- TensorFlow: A System for Large-Scale Machine Learning, Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Yuan Yu, and Xiaoqiang Zheng. Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (OSDI '16)
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
- Device Placement Optimization with Reinforcement Learning Samy Bengio, Mohammad Edward Norouzi, Benoit Steiner, Jeffrey Adgate Dean, Hieu Hy Pham, Azalia Mirhoseini, Quoc V Le, Naveen Kumar, Yuefeng Zhou, Rasmus Munk Larsen (Pending)
- Hierarchical Device Placement with Reinforcement Learning Benoit Steiner, Anna Darling Goldie, Jeffrey Adgate Dean, Hieu Hy Pham, Azalia Mirhoseini, Quoc V. Le. US Patent 10,438,113, issued in 2019
- Optimized Matrix Multiplication using Vector Multiplication of Interleaved Matrix Values, Nishant Patil, Matthew Sarett, Rama Krishna Govindaraju, Benoit Steiner, Vincent Vanhoucke. US Patents 9,830,303 and 9,645,974, issued in 2017