You will manage a talented team of data scientists and AI engineers, driving the adoption of intelligent automation, predictive analytics, and proactive problem resolution across our complex IT landscape. This position requires a leader with a deep understanding of both data science principles and the intricacies of enterprise IT operations.
Required Qualifications:
- Bachelors or Masters degree in Computer Science, Data Science, Artificial Intelligence, Engineering, or a related quantitative field.
- 10+ years of progressive experience in data science, machine learning, and/or AI engineering.
- 5+ years of experience in a leadership or management role, leading technical teams focused on data science or AI.
- Proven experience in designing, developing, and deploying AI/ML models for real-world applications, particularly within IT operations or related domains (e.g., observability, security, infrastructure management).
- Strong understanding of IT operations concepts, including monitoring, alerting, incident management, change management, and IT service management (ITSM).
- Proficiency in programming languages commonly used in data science and AI (e.g., Python, Scala, Java).
- Hands-on experience with big data technologies (e.g., Spark, Hadoop, Kafka) and cloud platforms (AWS, Azure, GCP).
- Solid grasp of machine learning algorithms (e.g., supervised, unsupervised, deep learning) and statistical modeling.
- Excellent communication, interpersonal, and leadership skills with the ability to articulate complex technical concepts to non-technical stakeholders.
- Demonstrated ability to drive strategic initiatives, manage complex projects, and deliver results in a fast-paced environment.
Preferred Qualifications:
- Experience with specific AIOps platforms or tools (e.g., Splunk, Dynatrace, Moogsoft, PagerDuty, ServiceNow, Datadog, ELK stack).
- Familiarity with IT service management frameworks (e.g., ITIL).
- Experience with containerization (Docker, Kubernetes) and microservices architectures.
- Knowledge of MLOps best practices and tools for automating and managing the ML lifecycle.
- Experience in a large-scale enterprise environment with diverse and complex IT infrastructure.
-
Strategic Leadership:
Define and execute the AIOps strategy and roadmap, aligning it with overall IT and business objectives. Identify opportunities to leverage AI/ML for enhanced IT observability, incident management, performance optimization, and automation. -
Team Management & Development:
Lead, mentor, and grow a high-performing team of data scientists and AI engineers. Foster a culture of innovation, continuous learning, and technical excellence. -
Solution Design & Development:
Oversee the end-to-end design, development, and deployment of AIOps solutions, including anomaly detection, predictive failure analysis, root cause analysis, intelligent alerting, and automated remediation. -
Cross-Functional Collaboration:
Partner closely with IT Operations, Site Reliability Engineering (SRE), Network Engineering, Application Development, and other stakeholders to understand operational challenges and deliver impactful AI-driven solutions. -
Data & Platform Management:
Ensure the availability, quality, and governance of operational data necessary for AI/ML model training and inference. Drive the selection, integration, and optimization of AIOps platforms and tools. -
Model Lifecycle Management:
Establish robust MLOps practices for model development, testing, deployment, monitoring, and retraining to ensure the continuous effectiveness and reliability of AI models in production. -
Innovation & Research:
Stay abreast of the latest advancements in AI/ML, AIOps, and IT operations. Drive research and experimentation to explore new techniques and technologies that can further enhance our operational intelligence. -
Performance & Metrics:
Define key performance indicators (KPIs) for AIOps initiatives and regularly report on the impact and value delivered to the organization. -
Budget & Resource Management:
Manage project budgets, resources, and timelines effectively to ensure successful delivery of AIOps programs.