Engenheiro de MLOps Sênior
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Engenheiro de MLOps Sênior in Brazil.
This is an exciting opportunity for an experienced MLOps professional to support and optimize production-grade GenAI and AI-agent solutions in a highly collaborative and innovation-driven environment. In this role, you will ensure the reliability, scalability, and operational excellence of advanced machine learning systems by monitoring infrastructure, troubleshooting incidents, and improving operational processes. You will work closely with Engineering and Data Science teams to maintain high-performing AI services and continuously enhance observability, automation, and deployment standards. The position offers full remote flexibility, exposure to cutting-edge AI technologies, and the chance to make a direct impact on mission-critical AI operations at scale. This role is ideal for someone who thrives in dynamic environments, enjoys solving complex technical challenges, and is passionate about operational excellence in AI systems.
Accountabilities:
- Execute and maintain operational procedures for GenAI and AI-agent solutions running in production environments.
- Monitor platform health, inference pipelines, model performance, latency, throughput, and service reliability across multiple systems.
- Investigate, troubleshoot, and resolve production incidents through logs, metrics, tracing tools, and root cause analysis.
- Support observability initiatives by improving monitoring, alerting systems, dashboards, and workload instrumentation.
- Ensure AI services meet reliability, scalability, and operational best practice standards across environments.
- Collaborate with Engineering and Data Science teams to improve platform stability, operational efficiency, and system resilience.
- Contribute to continuous improvement initiatives, including automation of operational workflows and enhancement of runbooks and operational frameworks.
- Support runtime operations for LLM-based applications and AI-agent workflows while ensuring adherence to SLAs, SLOs, and escalation procedures.
- Proven experience working with MLOps, machine learning systems, or AI platform operations in production environments.
- Strong troubleshooting and analytical skills using observability tools, logs, traces, and metrics.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP).
- Solid understanding of ML pipelines, APIs, distributed systems, and production infrastructure.
- Hands-on experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or Azure Monitor.
- Familiarity with incident management processes, escalation flows, and reliability engineering practices.
- Advanced English communication skills, both written and verbal.
- Ability to work collaboratively in cross-functional and fast-paced technical environments.
- Nice to have: experience with GenAI/LLM applications, orchestration tools like Airflow, and model lifecycle management practices.
- Fully remote work model across Brazil.
- Opportunity to work with cutting-edge GenAI and AI-agent technologies.
- Exposure to large-scale AI and cloud-native production environments.
- Collaborative and innovation-focused culture with cross-functional teams.
- Career growth opportunities in AI operations and platform engineering.
- Participation in highly impactful and technically challenging projects.
- Flexible and dynamic remote work environment.
Requirements:
Benefits:
How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1