概要と必要条件
■Job Summary
With the acceleration of digital transformation (DX) and rapid changes in business models and IT systems at firms nowadays, we are focusing on the importance of data and promoting the utilization of data to provide more value to our customers with speed.
We are offering an exciting opportunity to contribute to our digital and AI transformation journey. We are seeking an experienced SRE Engineer dedicated to data domain, to drive the transformation that will enable business results.
Site Reliability Engineer (SRE) for Data will be responsible for ensuring the availability, scalability, and performance of our systems and services.
The team is a multinational team, including members from offshore sites. We strive to create an environment that embraces diversity and values each individual's differences.
We offer flexible work hours and hybrid work from home and office structure. We look forward to hearing from you!
Why US?
We don't fit into a box, we create our own boxes. Our Global Technology team is helping to transform a customer-first Fortune 50 company - by offering the high-tech digital solutions customers have come to expect, while delivery high-touch customer care during the moments that matter most.
Our tech teams enable the business, helping fuel company's purpose: always with you, building a more confident future. It's where you can grow your career, driving digital transformation in an agile, open, and inclusive environment, where every voice carries weight, and no idea is left off the table.
Here, innovation is everybody’s job. It’s not done within one team or in a lab. Whether you’re driving continuous improvement with your DevOps team, integrating data science and AI into our decision making, or developing best-in-class digital and cloud solutions that protect customer data and personalize their experience, what you build matters as part of a team where together, we can do more.
■Responsibility
- System Reliability and Performance: Ensure the reliability, scalability, and performance of our data platform and services, including monitoring, troubleshooting, and resolving issues.
- Service Design and Implementation: Collaborate with engineering teams to design, implement, and operate large-scale systems, including developing software that automates and streamlines our operations.
- Automation and Scripting: Develop and maintain automation scripts and tools to streamline operations, improve efficiency, and reduce manual errors.
- Monitoring and Alerting: Design and implement monitoring and alerting systems to ensure timely detection and resolution of issues.
- Collaboration and Communication: Work closely with engineering teams, product managers, and other stakeholders to ensure that systems and services meet business requirements and are aligned with company goals.
- Incident Response and Management: Participate in incident response and management, including root cause analysis, post-mortems, and implementation of corrective actions.
- Documentation and Knowledge Sharing: Maintain accurate and up-to-date documentation of systems, services, and processes, and share knowledge with team members to promote collaboration and improvement.
- Other Day-to-day operations for data platform with internal / global governance
- Vendor management
- As one of data engineering leads, enhance and improve team capability and maturity as well as support career development of each team member.
- Delivery with speed and automation in agile way
■Requirements
Candidate Qualifications:
- Bachelor's or advanced degree in Computer Science, Engineering, or a related field.
- Minimum 3 years of experience as a Site Reliability engineer supporting data platform or different application and application in a Hybrid-cloud platforms with mix of On-Prem and Azure.
- Strong scripting and programming skills in languages such as Python, Spark, Bash, or PowerShell
- Hands on experience on usage of ELK stack, observability tools like Grafana, Kibana, Splunk etc.
- Experience in Azure Public cloud services.
- Analyze application performance, performance tuning, and ensure high availability and stability of platform.
- Good hands-on experience with SQL and experience in No-SQL.
- Essential knowledge of core infrastructure technologies (Network, DNS, Firewalls, LB, Active Directory, RDBMS, Windows/RHEL, Infra-security and etc.)
- Knowledge of containerization and container orchestration platforms (Docker, Kubernetes), Terraform etc.
- Excellent communication skills.
- Strong analytical and problem-solving skills to identify and resolve issues in Production.
Skills and Competencies:
Competencies:
- Communication: Ability to communicate effectively to ensure results are achieved
- Collaboration: Proven track record collaborating and working effectively in a global and multi-cultural environment (e.g. Japanese)
- Diverse environment: Can-do attitude and ability to work in a high paced environment
Tech Stack:
- Python, Spark, Bash, PowerShell
- Azure Data Lake Gen2, Data Factory, Synapse Analytics (Data Warehouse, Spark, Pipeline), SQL Database / MI, Cosmos DB, Fabric
- Azure Application Insight, Azure Log Analytics, Splunk, Grafana, App Dynamics, ELK, Azure Monitor
- Azure DevOps, Azure Repos, Azure Container Repositories
- Docker, Kubernetes, AKS
- Service Now
- GitHub / Azure Copilot, LLMs
■Preferable
- Japanese Read and Write
- English: fluent or advanced
メットライフ生命では、ワークライフバランスと社員の心身の健康を促進する福利厚生を提供しています。社員はフレックスタイム制度、年次有給休暇、特別連続休暇、リフレッシュ休暇等の休暇を利用できます。また、社会保険や団体保険の他、旅行や英会話教室で使える割引も提供しています。柔軟に働けるよう、オフィス勤務と在宅勤務を組み合わせた勤務形態の導入や、短時間勤務制度、また柔軟性のあるドレスコードなども設けています。
メットライフは、グローバルで展開する金融サービス企業です。個人および法人のお客さまがより確かな未来を築けるよう、保険、年金、社員福利厚生給付、資産管理のサポートを提供しています。1868年に創業したメットライフは、世界40以上の市場で事業を展開し、米国、アジア、中南米、ヨーロッパ、中東において業界をリードしています。
メットライフ生命は、日本初の外資系生命保険会社として、1973年2月に国内での営業を開始しました。当社のPurpose「ともに歩んでゆく。よりたしかな未来に向けて。」は、メットライフが誇るグローバルネットワークと世界中のベストプラクティスを活かし、お客さまのニーズに応え、地域社会との信頼関係を築くという信念を表しています