Management, maintenance, and support of the Linux server estate
of 3000+ servers , Linux core OS expertise , troubleshooting problem from OS end etc. and core services such as Puppet, Ansible, Jenkins, NFS, AutoFS , Samba, DNS, LDAP, SMTP, FTP, NTP, etc.
-
Provide effective support for Kubernetes environments (Rancher K8s platform) to ensure a stable and resilient platform.
-
Manage tickets efficiently for quicker turnaround and improved end-user experience.
-
Lead effective incident response and resolution for Linux-related issues to minimize downtime and ensure platform resilience.
-
Execute structured change management for Linux and Kubernetes infrastructure modifications.
-
Drive infrastructure automation initiatives across the organization to address operational challenges.
-
Lead projects aimed at improving processes, procedures, and operational efficiency; remediate technical debt.
-
Own and deliver infrastructure projects, systematic improvement programs, and risk reduction roadmaps.
-
Collaborate across teams, ensuring effective stakeholder communication, transparency, and reporting.
-
Leverage AI-driven observability, diagnostics, and automation tools to enhance reliability, performance, and operational efficiency.
-
Champion AI-assisted workflows that augment human engineering judgment and accelerate troubleshooting, change execution, and decision-making.
Your Key Responsibilities
-
-
Administer and maintain Linux servers, services, and automation frameworks.
-
Exercise technical expertise with Kubernetes and Linux to drive cloud and on-prem solutions
-
Drive technical Kubernetes discussions across infrastructure, application development and support teams Design and implement Kubernetes clusters on GKE, AKS and on-premises
-
Troubleshoot and resolve issues related to Kubernetes clusters
-
Automate the deployment, scaling, and monitoring of Kubernetes clusters
-
Assist with developing and maintaining cloud standards, policies, processes and procedures to support and facilitate integration projects, cloud migration and onboarding
-
Lead projects to introduce and implement new technologies, ensuring security compliance and application performance to meet business needs
-
Experience with container orchestration and containerization technologies
-
Interface with technology providers to remediate integration related technical challenges Keep pace with emerging tools, techniques, and cloud platforms
-
Provide advanced support for Kubernetes clusters and associated container ecosystems. Specifically for Rancher Platform Kubernetes ecosystems.
-
Optimize ticket handling through tooling, automation, and AI-assisted triage to improve throughput and resolution quality.
-
Lead proactive troubleshooting using a combination of traditional engineering skills and AI-driven insights, pattern detection, and predictive diagnostics.
-
Experience with DevOps CI/CD tools such as Terraform, Azure DevOps, GitHub, Ansible
-
Experience with monitoring and logging tools such as Prometheus, Azure Log Analytics is a plus
-
Plan and implement Linux/Kubernetes infrastructure changes using established change processes and AI-supported impact analysis.
-
Drive automation initiatives, including AI-guided code generation, policy automation, and self-service workflows.
-
Deliver projects focused on operational excellence, scalability, and modernization.
-
Implement systematic improvement plans and risk-reduction initiatives, leveraging AI analytics to prioritize operational hotspots.
-
Foster strong cross-team collaboration and ensure clear, data-driven communication.
-
Actively experiment with, evaluate, and adopt AI solutions to improve efficiency, reduce toil, and enhance service quality.
-
Design and refine human-in-the-loop operational models where AI handles repetitive work while engineers review, validate , and improve outcomes.
Your skills and experience that will help you excel
-
-
Bachelor s degree in technical discipline and 10 + years of hands-on experience with Red Hat Linux, Oracle Linux, and Oracle/Sun Solaris ( ).
-
Deep understanding of Linux OS internals, performance tuning, and troubleshooting.
-
Strong experience installing, configuring, and optimizing RHEL/Oracle Linux on Intel platforms.
-
Proficiency with core Linux services ( Puppet, Jenkins, Ansible, NFS, AutoFS , Samba, DNS, LDAP, NTP, etc.).
-
Ability to research, validate , and apply system/security patches per corporate security standards.
-
Hands-on experience with ticketing tools (e.g., ServiceNow) and service workflows.
-
Working knowledge of Linux security methodologies, best practices, and hardening.
-
Experience generating system monitoring, performance, and operational reports.
-
Proficiency with Configuration Management tools such as Puppet/ Hiera and Ansible.
-
Ability to compile applications/tools from source.
-
Operational knowledge of VMware and Cisco UCS.
-
Understanding of SAN/NAS technologies.
-
Working knowledge of Docker, Kubernetes, and container orchestration. Specific to Rancher Platform.
-
Knowledge of Azure/AWS cloud operations; familiarity with Terraform/ARM desired.
-
Networking principles (TCP/IP) are a strong plus.
-
Experience using or integrating AI tools for system diagnostics, log analytics, incident triage, and RCA generation.
-
Ability to design AI-assisted workflows to reduce engineering toil, accelerate deployments, and enhance service reliability.
-
Hands-on experience with generative AI or LLM-powered tools for scripting (e.g., shell, Python), configuration automation, and code quality checks.
-
Understanding human-in-the-loop models for validating AI-generated remediation steps, configuration changes, and infrastructure recommendations.
-
Ability to evaluate AI outputs critically, refine prompts, and use model insights to improve operational decision-making.
-
Demonstrated capability to innovate using AI for monitoring, anomaly detection, predictive scaling, self-healing automation, and change risk evaluation.
-
Familiarity with AI agents or orchestration frameworks in Infrastructure Ops is a plus.
About MSCI
What we offer you
- Transparent compensation schemes and comprehensive employee benefits, tailored to your location, ensuring your financial security, health, and overall wellbeing.
- Flexible working arrangements, advanced technology, and collaborative workspaces.
- A culture of high performance and innovation where we experiment with new ideas and take responsibility for achieving results.
- A global network of talented colleagues, who inspire, support, and share their expertise to innovate and deliver for for ongoing skills development.
- Multi-directional career paths that offer professional growth and development through new challenges, internal mobility and expanded roles.
- We actively nurture an environment that builds a sense of inclusion belonging and connection, including eight Employee Resource Groups. All Abilities, Asian Support Network, Black Leadership Network, Climate Action Network, Hola! MSCI, Pride & Allies, Women in Tech, and Women s Leadership Forum