- Posting Location: Buenos Aires, Ciudad Autónoma de Buenos Aires
Description and Requirements
With more than 150 years of history and a presence in more than 40 countries, MetLife is leading the global transformation of the insurance industry. Bound by purpose and diverse perspectives, we are a collaborative community of more than 40,000 employees around the world, committed to building a more secure future for all of our key audiences—employees, customers, shareholders and the communities we serve.
We are looking for a Lead Observability Engineer to join our LatAm ITSM team and play a critical role in ensuring the reliability, performance, and scalability of our systems. This role is responsible for designing, implementing, and maintaining a regional observability command center that provides deep insights into our infrastructure and applications. It requires a deep understanding of observability principles, strong technical skills, and the ability to communicate effectively with both technical and non-technical stakeholders. The ideal candidate will foster an end-to-end view of the products we deliver and help drive operational excellence across Latin America.
What you will do in this role…
- Define and implement observability solutions: Develop and deploy tools and frameworks for monitoring, logging, and tracing to ensure deep visibility into system performance and health.
- Manage the observability command center: Operate the regional hub for monitoring and incident response, coordinating with application teams to detect and resolve issues efficiently.
- Lead incident response efforts: Provide insights and solutions to resolve issues quickly and prevent future occurrences.
- Develop dashboards, alerts, and reports: Align with country CIOs to provide visibility into system performance, health, and business execution.
- Conduct root cause analysis: Trigger corrective actions with relevant teams to prevent recurrence.
To help you succeed, you need to have…
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Strong experience in system monitoring and problem management roles, with proven background in observability (APM, synthetic monitoring, infrastructure monitoring) in complex organizational environments.
- Experience in Site Reliability Engineering.
- Proficiency with observability tools such as Elastic, Dynatrace, AppDynamics, Prometheus, Grafana, or similar.
- Strong understanding of cloud platforms (Azure, AWS, GCP) and container orchestration (Kubernetes, Docker).
- Experience with scripting and automation.
- Fluent in Spanish and English. Portuguese is a plus.
The benefits we offer…
- Hybrid work mode
- Annual Bonus
- Learning and development programs
- Health insurance for the family group
- Child Care Reimbursement
- Connectivity Reimbursement
- Day off for birthdays & Cultural Heritage Day
- Healthy breakfast with seasonal fruits
- In-company gym and agreement with SportClub for the employee and direct relatives
- Agreement with "Club de Beneficios" purchases of warehouse products
Join MetLife and let’s find out what we can build together!
Nuestros beneficios están diseñados para cuidar su bienestar holístico con programas para la salud física y mental, el bienestar financiero y el apoyo para las familias. Ofrecemos cobertura de atención médica para usted y su familia, descuentos en seguros para la protección del hogar, el automóvil y las mascotas, así como licencia parental. También ofrecemos membresía premium de gimnasio para usted y sus familiares directos, tiempo libre de voluntariado, descuentos con universidades específicas, viernes flexibles, reembolso de conectividad y mucho más.
