Our mission is to embed
AI-driven automation, telemetry, and observability
into MSCI s production environments, enabling the
Quality Center of Excellenace team
to deliver on its objectives of operational excellence, risk reduction, and quality at scale. We serve as the
engineering backbone
of production governance, ensuring that systems are reliable, efficient, and continuously improving through data-driven insights.
Your Key Responsibilities
-
AI Tooling & Framework Development
-
Build AI-driven tools for anomaly detection, incident triage, and root-cause analysis in production systems.
-
Develop and deploy automation frameworks to support
Level 1 / Level 2 support teams
and streamline repetitive operational tasks. -
Create self-healing and predictive monitoring capabilities using ML models.
-
Telemetry & Observability
-
Implement telemetry pipelines in
GCP (e.g., Stackdriver/Cloud Monitoring, BigQuery, Pub/Sub)
and Azure (e.g., Application Insights, Log Analytics, Monitor)
. -
Build dashboards, automated alerting, and intelligent log/metric analysis frameworks across hybrid cloud environments.
-
Leverage distributed tracing and logging frameworks to ensure end-to-end visibility of systems.
-
Incident Management Automation
-
Design AI-assisted runbooks to support incident triage and resolution.
-
Automate classification and escalation of incidents using ML and rule-based systems.
-
Integrate AI-powered insights with existing incident management platforms (e.g., ServiceNow, PagerDuty, Opsgenie).
-
Collaboration
-
Work closely with production teams, SREs, and system test engineers, incidnet managers to integrate AI solutions into live environments.
-
Partner with cloud engineering teams to ensure solutions are scalable, secure, and compliant.
-
Provide technical knowledge transfer and training on AI-enabled tools to support engineers.
Your skills and experience that will help you excel
-
Bachelor s or Master s degree in Computer Science, Data Engineering, or related field.
-
11+ years of hands-on engineering experience in
production support, SRE, or AI/ML platform development
. -
Strong programming skills in
Python
(preferred) and experience with AI/ML frameworks ( PyTorch, TensorFlow, Scikit-learn
). -
Hands-on expertise with
GCP
(BigQuery, Pub/Sub, Cloud Monitoring, Vertex AI) and Azure
(Application Insights, Log Analytics, Azure ML, Azure Monitor). -
Experience with
observability tools
(Prometheus, Grafana, ELK, Splunk, Datadog). -
Proficiency with
cloud-native infrastructure
(Kubernetes, Docker, Terraform, CI/CD pipelines). -
Strong understanding of incident management and ITIL practices.
-
Experience implementing
AIOps solutions
in hybrid cloud environments. -
Knowledge of
MLOps best practices
(model deployment, monitoring, retraining).
About MSCI
What we offer you
- Transparent compensation schemes and comprehensive employee benefits, tailored to your location, ensuring your financial security, health, and overall wellbeing.
- Flexible working arrangements, advanced technology, and collaborative workspaces.
- A culture of high performance and innovation where we experiment with new ideas and take responsibility for achieving results.
- A global network of talented colleagues, who inspire, support, and share their expertise to innovate and deliver for our clients.
- Global Orientation program to kickstart your journey, followed by access to our Learning@MSCI platform, LinkedIn Learning Pro and tailored learning opportunities for ongoing skills development.
- Multi-directional career paths that offer professional growth and development through new challenges, internal mobility and expanded roles.
- We actively nurture an environment that builds a sense of inclusion belonging and connection, including eight Employee Resource Groups. All Abilities, Asian Support Network, Black Leadership Network, Climate Action Network, Hola! MSCI, Pride & Allies, Women in Tech, and Women s Leadership Forum.