Lead Site Reliability Engineer - Infrastructure Lifecycle
Company: Klaviyo Inc.
Location: Boston
Posted on: November 6, 2024
Job Description:
Lead Site Reliability Engineer - Infrastructure LifecycleLead
Site Reliability Engineering (SRE) is what you get when you treat
system operations as a software engineering problem. The mission of
the Site Reliability Engineering team is to ensure uninterrupted
service for Klaviyo customers and act as a force multiplier for
Klaviyo product teams to deliver better software faster. The SRE
team builds foundational backend services as well as tooling and
automation to allow product teams to release and scale their
software reliably and predictably. Lead SREs are team players who
embed themselves within product teams as needed to advance the
architecture and performance of software systems and train their
peers in topics such as debugging distributed systems, building
self-healing applications and eking out every drop of performance
possible. As a Lead Site Reliability Engineer, you will own
foundational Klaviyo services and make a big impact on the
productivity of our product engineering teams. Klaviyo is growing
fast and we have openings for all skill levels across all of our
teams. Learn more about our engineering culture at .How You'll Make
a Difference
- Ship foundational services to enable Klaviyo engineering to
move faster with confidence.
- Design and develop systems and processes that enable highly
available & scalable systems.
- Achieve breakthroughs in systems throughput by identifying and
eliminating bottlenecks.
- Leverage technology such as Python, AWS, Django, Kubernetes,
Bash, Terraform, MySQL, Redis, Cassandra, PostgreSQL to advance
Klaviyo's platform.
- Champion best practices by actively collaborating with other
teams in a culture that values whiteboarding and technical design
review.
- Contribute to the company in multiple areas, constantly pushing
yourself to be a better engineer and to level up all of your peers
within your team and within Klaviyo.
- Design, write and deliver software to dramatically improve the
availability, scalability, latency, and efficiency of Klaviyo's
services.
- Participate in periodic on-call duties with a focus on solving
issues when they are discovered, preventing recurrences and
minimizing alert fatigue.
- Implement architectural improvements to achieve breakthrough
results in Klaviyo systems' operational scalability and
reliability.
- Work hand-in-hand with product-facing engineers and other SREs
to ship impactful code.
- Perform quantitative analysis to understand and scale Klaviyo
systems.
- Uncover and advocate for preventative, upstream solutions with
internal stakeholders.
- Evangelize Site Reliability best practices across the
engineering organization.Who You Are
- Solid 10+ years of experience in the SRE/DevOps field.
- BA or BS Degree in Computer Science, related field, or
equivalent experience.
- Ability to handle yourself in outage situations and to drive
failures to root cause analysis and prevention of future
issues.
- Understanding of Linux (we run Ubuntu) and all layers of the
networking stack.
- Experience working on an engineering team building
software.
- Experience writing code using best practices in a language such
as Python, Ruby, Go, etc.The pay range for this role is listed
below. Sales roles are also eligible for variable compensation and
hourly non-exempt roles are eligible for overtime in accordance
with applicable law. This role is eligible for benefits, including:
medical, dental and vision coverage, health savings accounts,
flexible spending accounts, 401(k), flexible paid time off and
company-paid holidays and a culture of learning that includes a
learning allowance and access to a professional coaching service
for all employees.Base Pay Range For US Locations:$192,000 -
$288,000 USDGet to Know KlaviyoWe're Klaviyo (pronounced
clay-vee-oh). We empower creators to own their destiny by making
first-party data accessible and actionable like never before. We
see limitless potential for the technology we're developing to
nurture personalized experiences in e-commerce and beyond. To reach
our goals, we need our own crew of remarkable creators-ambitious
and collaborative teammates who stay focused on our north star:
delighting our customers. If you're ready to do the best work of
your career, where you'll be welcomed as your whole self from day
one and supported with generous benefits, we hope you'll join
us.Klaviyo is committed to a policy of equal opportunity and
non-discrimination. We do not discriminate on the basis of race,
ethnicity, citizenship, national origin, color, religion or
religious creed, age, sex (including pregnancy), gender identity,
sexual orientation, physical or mental disability, veteran or
active military status, marital status, criminal record, genetics,
retaliation, sexual harassment or any other characteristic
protected by applicable law.IMPORTANT NOTICE: Our company takes the
security and privacy of job applicants very seriously. We will
never ask for payment, bank details, or personal financial
information as part of the application process. All our legitimate
job postings can be found on our official career site. Please be
cautious of job offers that come from non-company email addresses
(@klaviyo.com), instant messaging platforms, or unsolicited
calls.You can find our Job Applicant Privacy Notice .
#J-18808-Ljbffr
Keywords: Klaviyo Inc., Portland , Lead Site Reliability Engineer - Infrastructure Lifecycle, Professions , Boston, Maine
Didn't find what you're looking for? Search again!
Loading more jobs...