Technical Program Manager, Machine Learning Capacity Management
Company: Google Inc.
Posted on: June 6, 2021
- Bachelor's degree in computer science, technology, engineering
related field or equivalent practical experience.
- 7 years of experience in technical program management on
- 7 years of experience in software program management or
- Experience managing highly-complex, technical cross-functional
- Experience using agile methodologies and tooling (Planr, Jira,
- Demonstrated success in identifying and driving solutions that
reduce toil, increase efficiency, and improve the customer
- Familiarity with ML infrastructure.
- Excellent communication and presentation skills, demonstrated
ability to work cross-functionally with multiple teams and
About the job
A problem isn't truly solved until it's solved for all. That's
why Googlers build products that help create opportunities for
everyone, whether down the street or across the globe. As a
Technical Program Manager at Google, you'll use your technical
expertise to lead complex, multi-disciplinary projects from start
to finish. You'll work with stakeholders to plan requirements,
identify risks, manage project schedules, and communicate clearly
with cross-functional partners across the company. You're equally
comfortable explaining your team's analyses and recommendations to
executives as you are discussing the technical tradeoffs in product
development with engineers.
Our vision is to deliver a common infrastructure as a service
for machine learning that is easy to use, efficient, and robust.
Google's Machine Learning (ML) Tensor Processing Unit (TPU)
infrastructure is one of Google's fastest growing infrastructure
investments. ML Fleet is the team in Technical Infrastructure that
provides this ML infrastructure as a service to our customers, such
that it can be effectively used for training and serving ML models.
The Product Area Resource Management (PARM) team coordinates and
enables the scalable, reliable, and efficient deployment and
consumption of ML compute resources across Google.
You'll lead the ML PARM Capacity Management team and work
directly with Google Capacity management systems to ensure we are
equipped, enabled, and optimized to support our customers during a
period of rapid scale. On a daily basis, you'll address capacity
needs for Google internal customers, lead escalation to ensure no
interruption of service, enable the team to defend service level
objectives, and play a key role in customer satisfaction.
Behind everything our users see online is the architecture built
by the Technical Infrastructure team to keep it running. From
developing and maintaining our data centers to building the next
generation of Google platforms, we make Google's product portfolio
possible. We're proud to be our engineers' engineers and love
voiding warranties by taking things apart so we can rebuild them.
We keep our networks up and running, ensuring our users have the
best and fastest experience possible.
- Identify and capture resource efficiency opportunities.
- Perform analytics to understand the larger system and deploy
automation to improve system efficiency.
- Surmount technical roadblocks and constraints that arise in the
course of capacity delivery execution, unblocking partner work and
ensuring continued progress.
- Work closely with site reliability engineers on communicating
status across ML Fleet and our customers.
- Use a data-driven and analytical approach to measure the team
operations, develop and drive team OKRs.
Keywords: Google Inc., Sunnyvale , Technical Program Manager, Machine Learning Capacity Management, Other , Sunnyvale, California
Didn't find what you're looking for? Search again!