Senior Site Reliability Engineer
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in Brazil.
In this role, you will be responsible for ensuring the reliability, scalability, and performance of complex cloud-based systems in a fully remote and collaborative environment. You will work closely with engineering teams to automate infrastructure, enhance monitoring, and streamline deployment processes. Acting as a key contributor to system resilience, you will help design and maintain robust platforms that support high user demand. This position offers the opportunity to work with modern technologies across AWS and GCP while driving operational excellence. You will also play a critical role in incident response and continuous improvement of system reliability. Itβs an ideal opportunity for engineers passionate about automation, cloud infrastructure, and building highly available systems at scale.
Accountabilities:
- Participate in on-call rotations to respond to incidents and ensure platform availability and performance
- Design, build, and maintain scalable cloud infrastructure across AWS and GCP environments
- Automate infrastructure provisioning and deployments using Terraform and CI/CD pipelines (e.g., GitHub Actions)
- Improve monitoring and observability using tools such as Datadog, Sentry, and CloudWatch
- Identify and automate repetitive manual tasks to reduce operational overhead and improve efficiency
- Enhance deployment, release, and migration processes with a focus on reliability and fault tolerance
- Troubleshoot and resolve production issues across different layers of the technology stack
- Collaborate closely with development teams to provide architectural guidance and infrastructure support
- Contribute to capacity planning and infrastructure growth strategies
- Strong experience with infrastructure as code, particularly Terraform and CI/CD automation tools
- Proficiency with cloud platforms such as AWS and GCP, including networking, storage, and compute services
- Experience with containerization technologies (e.g., Docker, ECS) and distributed systems
- Solid background in system administration, including operating systems, networking (VPCs, proxies, CDNs), and databases (MySQL, PostgreSQL, Neo4j, Redis)
- Hands-on experience with monitoring and observability tools like Datadog, Sentry, and log management systems
- Programming/scripting skills in languages such as Python, Shell, and SQL
- Strong understanding of reliability engineering principles, including scalability, availability, and disaster recovery
- Familiarity with agile methodologies and experience working in asynchronous, distributed teams
- Excellent problem-solving, communication, and collaboration skills
- Ability to lead initiatives, mentor peers, and contribute to continuous improvement practices
- Fully remote work environment with flexible working arrangements
- Monthly stipend to support home office setup and productivity
- Flexible paid time off, including holidays and seasonal breaks
- Access to learning and development resources, including training and conferences
- Wellness initiatives and lifestyle support programs
- Inclusive and supportive culture with opportunities to make a meaningful impact
- Exposure to cutting-edge technologies and large-scale systems
Requirements:
Benefits:
How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1