Senior Staff Site Reliability Engineer
Rocket.Chat
Job Title: Site Reliability Engineer
Level: Senior Staff
Working Hours: Full Time (40h/Week)
Contract: Employee
Location: Remote (US)
Your Team 👥
You will report to our Engineering Manager and join the R&D team. On TheOrg you can view the complete structure of our organisation, including information about every team member, hiring managers and the size of each department.
Who We Are Looking For ✏️
As a Senior Staff Site Reliability Engineer, you will be responsible for the overall reliability, scalability, and operational excellence of Rocket.Chat’s infrastructure and deployment systems. We are looking for a high-level technical leader and force multiplier. You will lead the infrastructure strategy, guide the platform roadmap, and ensure that all systems run reliably and efficiently across our global deployments. This includes overseeing cloud infrastructure, Kubernetes platforms, deployment automation, monitoring systems, and operational processes. You will work closely with Engineering, Security, Product, and Leadership teams to ensure that infrastructure capabilities support product growth, customer demands, and operational resilience. Your leadership will ensure that reliability engineering, automation, and operational best practices are embedded into the development lifecycle across the company.
Mandatory Hard Skills 🎯
- Strong background in software engineering and infrastructure architecture with experience designing and operating large-scale distributed systems.
- Expert understanding of microservices, event-driven architectures, stateful vs. stateless scaling constraints, and data consistency models.
- Advanced coding proficiency (Go preferred, Python acceptable) capable of building complex core frameworks and contributing to the core Rocket.Chat codebase when necessary.
- Deep expertise with Kubernetes and cloud infrastructure platforms (e.g., AWS, GCP, Azure, OVH) in production environments.
- Extensive experience with Infrastructure as Code (IaC) tools such as Terraform, Pulumi, or Ansible.
- Strong experience designing and managing CI/CD and GitOps deployment systems using tools like ArgoCD.
- Hands-on experience with observability platforms including monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki).
- Strong understanding of networking fundamentals (TCP/IP, DNS, routing), security best practices, and cloud architecture principles.
- Experience leading infrastructure, SRE, or platform engineering teams responsible for production systems.
- Strong knowledge of containerized systems and deployment architectures supporting high availability and scalability.
- Familiarity with database technologies such as MongoDB or Redis and their operational considerations at scale.
- Experience supporting SaaS platforms with large-scale customer deployments.
- Experience managing multi-cluster Kubernetes environments and multi-region architectures.
- Ability to define and execute a long-term infrastructure vision aligned with company growth.
Desirable Hard Skills 💕
- Experience with open source software.
- Active U.S. Security Clearance (or eligibility to obtain one) is a strong plus.
Soft Skills ✨
- Passion: Genuine enthusiasm for what you do and how it contributes to our company's mission;
- Dream: Proactively seek out opportunities and challenges to achieve extraordinary results. If you're someone who takes initiative and is always striving to improve, you'll fit right in;
- Own: Take ownership of your work, set high standards for yourself, and be accountable for outcomes demonstrating a strong sense of responsibility and commitment;
- Trust: Recognizing the importance of trust and support and actively working towards a collaborative and inclusive workplace;
- Share: Communicating openly and transparently, ensures clarity and honesty in interactions.
What You'll Do 🖥️
- Influence core product architecture (Core, Fleetcommand, Omnichannel, etc.) before code is written to ensure reliability, scalability, and operability are baked in by design.
- Lead the engineering of systemic solutions that eliminate entire classes of failures, moving the organization from reactive firefighting to proactive prevention.
- Act as the technical visionary for our deployment products (LaunchControl, Airlock, Launchpad), defining the long-term technical roadmap and architectural standards alongside the Head of Infrastructure.
- Design, prototype, and build foundational tooling, core libraries, and frameworks (in Go, Python, etc.) that make it easier for both SREs and Product Engineers to deploy safely, monitor accurately, and operate efficiently.
- Champion and evolve the Infrastructure as Code (IaC) paradigms (Pulumi, Terraform) to ensure they meet the needs of increasingly complex, multi-region, and air-gapped enterprise deployments.
- Serve as the highest level of technical escalation for catastrophic, multi-domain Sev-1 incidents that baffle standard operational protocols.
- Drive the strategic direction of incident management, ensuring that post-mortems result in structural, org-wide improvements rather than localized band-aids.
- Evolve the company's disaster recovery (DR) and chaos engineering programs to simulate and defend against complex cascading failures.
- Define, document, and enforce global standards for observability (SLIs, SLOs, error budgets), alerting, and production readiness across all engineering squads.
- Author foundational Architectural Decision Records (ADRs) and Requests for Discussion (RFDs) that guide the technical direction of the company.
- Act as a role model and technical mentor for Senior and Mid-Level SREs, as well as Senior Product Engineers, elevating the overall technical culture of Rocket.Chat.
- Facilitate org-wide technical enablement sessions, knowledge sharing, and blameless culture advocacy.
- Partner with Engineering leadership, Product, Security, and Customer Success to align infrastructure strategy with business and customer needs.
- Represent Rocket.Chat’s technical vision through technical writing, conference talks, and community engagement within the infrastructure and open-source ecosystem.
- Foster a culture of ownership, operational excellence, and continuous improvement across the infrastructure organization.
Benefits ✨
- Fully Remote & Flexible Working Hours
- Flexible Paid Time Off, Holidays and Vacation
- Company Laptop
- Remote Benefit
- iTalki, Courses and Books
- Stock Options
- Multicultural Environment
- Vibrant Company Culture
Check out our handbook to dive into each of our awesome benefits! At Rocket.Chat, we have tailored base pay ranges according to work locations. This approach ensures that we can competitively and consistently compensate our employees across different geographic markets.
- While we define an initial seniority level and budget for each role, this can be adjusted during the hiring process. The selection process itself — including interviews and assessments — helps us better understand where the candidate fits within our career framework and which grade they should be positioned in.
- To ensure fairness and consistency, all applications are accepted exclusively via our Careers site. Submissions through other channels will not be taken into consideration.
About Rocket.Chat 🚀
Rocket.Chat is the world's largest open-source communications platform. Built for organizations needing more control over their communications, Rocket.Chat Secure CommsOS™ is a communication platform that unifies messaging, voice, video, AI, and mission-critical applications—ensuring uncompromising security, compliance, and operational efficiency for governments, defense, and critical infrastructure organizations operating in highly-regulated environments.
Tens of millions of users in over 150 countries and organizations such as Deutsche Bahn, the U.S. Navy and Credit Suisse trust Rocket.Chat every day to keep their communications completely private and secure. As Rocket.Chat we believe in reconnecting the world, one conversation at a time!
See yourself in that? So apply now! Check out our handbook for more information about our rocket.
If you're interested in keeping up with new roles at Rocket.Chat, you can now set up custom job alerts. Just click the link, pick the types of roles you want to hear about, and get notified whenever there’s a match.