Skills
Design
Experienced software architect
- Designed large code bases (up to 400 kloc) for applications running primarily on Unix/Linux.
- Adopted and promoted modern software development practices such as agile programming in order to deliver robust software.
- Leveraged advanced technologies such as boolean solvers or deep neural networks to solve real life problems.
Implementation
Skilled software engineer
- Proficient in C/C++, comfortable with Java and Python.
- Combined multithreading, GPUs, and distributed computing techniques to solve computationally intensive problems.
Leadership
Proven project and team lead
- Over 20 years of experience as a hands-on technical and team leader.
- Mentored numerous engineers and helped them grow in their position.
- Managed engineering teams distributed across the US, India, Eastern and Western Europe.
Work Experience
Sep 18 - Present
Meta (Online social services)
Senior Staff Research Engineer, FAIR (Menlo Park, CA)
- Demonstrated how to reduce the memory needed to train neural networks by 30 to 90%, and worked with several teams to deploy the solution in production.
- Prototyped a neural network compiler driven by reinforcement learning that delivered an average performance gain of 5x over PyTorch.
- As area lead, defined the overall strategy and priorities for the systems research group.
Dec 11 - Aug 18
Google (Internet related services and products)
Staff Software Engineer, Google Brain (Mountain View, CA)
- Led the development of Grappler, a toolbox of graph optimizations combining abstract interpretation of models with heuristics based graph rewrites and machine learning based combinatorial optimizations to improve the performance of TensorFlow by 12 to 50%.
- Envisioned, designed, and led the implementation of the high performance framework for numerical computing upon which TensorFlow was written. Several other teams also adopted the framework for applications such as image processing, recommendation engines, and text understanding.
- More than doubled the speed and scalability of the DistBelief machine learning engine, the precursor to TensorFlow. This resulted in annual savings exceeding $100 million.
Software Engineer, StreetView (Mountain View, CA)
- Trained a neural network capable of detecting license plates in the StreetView imagery with more than 99.9% accuracy.
- Redesigned and reimplemented the pipeline used to blur license plates and faces in StreetView imagery.
Sept 10 - Oct 11
Coverity (Source code analysis and verification)
Architect, New Product Initiative (San Francisco, CA)
- Spearheaded the design and implementation of a new product aiming to prove the correctness of safety critical applications written in C or C++.
- Developed the first prototypes and used them as vehicles for demonstrations. Leveraged off-the-shelf solutions such as Mahout for quick experiments.
- Wrote the functional specification for the final product and participated in its implementation.
- Managed the relationship with the project external partner in Japan.
Aug 07 - Sept 10
Tabula (Programmable logic design)
Technical Lead and Senior Software Manager, Infrastructure (Santa Clara, CA)
- Led and grew
a team developing the infrastructure software used to program the company's
semiconductor chips (code generators, device and design
databases, bit stream generator, timing analysis engines, clock router, and so
on).
- Pioneered the use of contractors located in Eastern Europe as a flexible and cost efficient
way to increase the engineering bandwidth when needed.
- Tackled cross functional projects such as reducing the power consumption of the chip, optimizing
the memory footprint of the software, or leveraging distributed computing to
speed up the generation of chip models.
- Personally designed and implemented an innovative approach to compute the maximum frequency at
which the chip can operate.
Mar 04 - Aug 07
Senior Manager, Formal Verification (San Jose, CA)
- Led the development of a tool that automatically detects several classes of bugs in a semiconductor design.
- Started the project from the ground up. Assembled, trained and coached a small but highly
technical team. Worked with the marketing department and potential customers to
define the initial requirements and roadmap.
- Acted as architect, technical lead and individual contributor. Conceived the overall
architecture including the custom high performance database. Designed and
implemented the property checking engine that formed the heart of the tool.
Senior Manager, Timing Analysis (Santa Clara, CA)
- Managed an international team of more than 15 people responsible for the incremental
timing analysis engine used to predict the frequency at which semiconductor
chips operate.
- Led major new developments: rewrote the tool to leverage multiple CPUs for speed, made use of
statistical models to capture the electrical variability of the transistors, etc.
- Incrementally increased the tool accuracy and expanded its reporting capabilities. Held brainstorming
sessions to identify technical solutions, scheduled their implementation, drove
(and often personally tackled) the development, and ensured final customer
adoption by integrating the feedback received during early beta testing
engagements.
- Continuously improved the overall quality of the tool: decreased the runtime by up to 4 orders of
magnitude on some very large designs, dramatically reduced the bug filing rate,
fixed several design flaws, enforced more stringent code reviews, etc.
Sept 00 - Mar 04
Technical Lead, R&D (Mountain View, CA)
- Began in September 2000 as an intern; hired as a software developer in July 2001.
Became the technical lead of the timing analysis team in 2003.
- Increased the scalability of the timing analysis tool by reducing the computational
complexity of several algorithms and by leveraging up to 32 threads to perform
the computations. Designed and developed capabilities required to take into
account the impact of manufacturing process variations on the performance of
the chip.
- Trained team members and helped them develop solutions to existing problems and
dramatically reduce the memory footprint of the tool.
Summer 2000
Ensim (Internet infrastructure solutions for hosting providers)
Intern, Internet Application Network division (Sunnyvale, CA)
- Designed, developed and debugged software for managing licensing and billing.
- Adapted HP's OpenMail mail server to Ensim's "hosting ready" platform.
Summer 1999
Intern, software development division (London, UK)
- Developed a powerful HTML generator and a web-based software interface.
- Improved the speed and robustness of the database connectivity engine.
Summer 1998
Peregrine Systems (Corporate asset and infrastructure management)
Intern, product development department (Bourg La Reine, France)
- Developed an SNMP agent used for remote resource management.
Related Experience
2013 - Present
Eigen (Open source linear algebra library)
- Sped up the matrix multiplication code by optimizing memory accesses, adding support for the AVX and FMA instruction sets, and leveraging techniques such as loop peeling, thus making Eigen one of the fastest BLAS libraries.
- Added support for tensors, multithreading, and GPU to the library.
1999
INRIA (French institute for research in computer science)
- Implemented a filter able to encapsulate any MPEG file into a transport stream.
- Implemented a MPEG stream demultiplexer.
1997 - 1999
Videolan (Open source project developing networked multimedia solutions)
- As project lead, coordinated a team of 10 people and was responsible for a budget of $75,000.
- Designed and implemented a distributed video streaming server.
- Drove the implementation of the VLC program, which to date has been downloaded more than 1 billion times.
1997 - 1998
Stone Age (Nonprofit organization promoting musical artists)
- Helped organize several concerts in Paris featuring artists such as Mogwai or Jay Jay Johanson.
- Worked with tour managers, agents, venues, and artists to put shows together.
Education
Master's degree in computer science and control theory
- ISIA was a master's program offered by the Ecole des Mines de Paris (one of the top 4 French graduate schools) in partnership with INRIA.
Master's degree in computer science
- Ecole Centrale Paris is one of the 4 best engineering schools in France and
emphasizes fundamental sciences (mathematics and physics).
Publications
Conferences
-
Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning Taoan Huang, Aaron Ferber, Yuandong Tian, Bistra Dilkina, Benoit Steiner, CoRR
-
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks, Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty
-
SurCo: Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems, Aaron Ferber, Taoan Huang, Daochen Zha, Martin Schubert, Benoit Steiner, Bistra Dilkina, Yuandong Tian.
-
Flashlight: Enabling Innovation in Tools for Machine Learning, Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert. ICML 2022
-
CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research, Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, Hugh Leather. CGO 2022
-
Learning Space Partitions for Path Planning, Kevin Yang, Tianjun Zhang, Chris Cummins, Brandon Cui, Benoit Steiner, Linnan Wang, Joseph E Gonzalez, Dan Klein, Yuandong Tian. NeurIPS 2021
-
Value Learning For Throughput Optimization Of Deep Neural Networks, Benoit Steiner, Chris Cummins, Horace He, and Hugh Leather. MLSys 2021
-
Value Function Based Performance Optimization of Deep Learning Workloads, Benoit Steiner, Chris Cummins, Horace He, and Hugh Leather. NeurIPS Workshop on ML For Systems 2020
-
PyTorch: An Imperative Style, High-Performance Deep Learning Library, Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala. NeurIPS 2019
-
Learning to Optimize Halide with Tree Search and Random Programs, Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, Jonathan Ragan-Kelley. SIGGRAPH 2019
-
A Hierarchical Model for Device Placement, Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc Le, Jeff Dean. SysML 2018
-
Hierarchical Planning for Device Placement, Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc Le, Jeff Dean. ICLR 2018
-
Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Mohammad Norouzi, Samy Bengio, Jeff Dean. ICML 2017
- TensorFlow: A System for Large-Scale Machine Learning, Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Yuan Yu, and Xiaoqiang Zheng. Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (OSDI '16)
White Paper
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
Patents
- Device Placement Optimization with Reinforcement Learning Samy Bengio, Mohammad Edward Norouzi, Benoit Steiner, Jeffrey Adgate Dean, Hieu Hy Pham, Azalia Mirhoseini, Quoc V Le, Naveen Kumar, Yuefeng Zhou, Rasmus Munk Larsen. US patent 10692003, issued in June 2020
- Hierarchical Device Placement with Reinforcement Learning Benoit Steiner, Anna Darling Goldie, Jeffrey Adgate Dean, Hieu Hy Pham, Azalia Mirhoseini, Quoc V. Le. US Patent 10,438,113, issued in 2019
- Optimized Matrix Multiplication using Vector Multiplication of Interleaved Matrix Values, Nishant Patil, Matthew Sarett, Rama Krishna Govindaraju, Benoit Steiner, Vincent Vanhoucke. US Patents 9,830,303 and 9,645,974, issued in 2017