Job Summary 
We are seeking a Senior MLOps / AIOps Platform Engineer with deep DevSecOps expertise and hands-on experience managing enterprise-grade AI/ML platforms. This critical role focuses on building, configuring, and operationalizing secure, scalable, and reusable infrastructure and pipelines that support AI and ML initiatives across the enterprise. The ideal candidate will have a strong background in Infrastructure as Code (IaC), pipeline automation, and platform engineering, with specific experience configuring and maintaining IBM watsonx and Google Cloud Vertex AI environments. 
Key Responsibilities 
Platform Engineering & Operations 
- Lead the provisioning, configuration, and ongoing support of IBM watsonx and Google Cloud Vertex AI platforms. 
- Ensure platforms are production-ready, secure, cost-efficient, and performant across training, inference, and orchestration workflows. 
- Manage lifecycle tasks such as patching, upgrades, integrations, and service reliability. 
- Partner with security, compliance, and product teams to align platforms with enterprise and regulatory standards. 
Enterprise MLOps / AIOps Enablement 
- Define and implement standardized MLOps/AIOps practices across business units for consistency and scalability. 
- Build and maintain reusable workflows for model development, deployment, retraining, and monitoring. 
- Provide onboarding, enablement, and support to AI/ML teams adopting enterprise platforms and tools. 
- Support development/deployment of GenAI applications and maintain them at an Enterprise scale. 
DevSecOps Integration 
- Embed security and compliance guardrails across the ML lifecycle, including CI/CD pipelines and IaC templates. 
- Implement policy-as-code, access controls, vulnerability scanning, and automated compliance checks . 
- Ensure all deployments meet enterprise and regulatory requirements (HIPAA, SOX, FedRAMP, etc.). 
Infrastructure as Code & Automation 
- Design and maintain IaC templates (Terraform, Pulumi, Ansible, CloudFormation) for reproducible ML infrastructure. 
- Build and optimize CI/CD pipelines for AI/ML assets including data pipelines, training workflows, deployment artifacts, and monitoring systems. 
- Enforce best practices around automation, reusability, and observability of infrastructure and workflows. 
Monitoring, Logging & Observability 
- Implement comprehensive observability for AI/ML workloads using Prometheus, Grafana, Stackdriver, or Datadog. 
- Monitor both infrastructure health (CPU, memory, cost) and ML-specific metrics (model drift, data integrity, anomaly detection). 
- Define KPIs and usage metrics to measure platform performance, adoption, and operational health . 
Qualifications 
Education 
- Bachelors or Masters degree in Computer Science, Engineering, or a related technical field. 
Experience 
- 5+ years in MLOps, DevOps, Platform Engineering, or Infrastructure Engineering . 
- 2+ years applying DevSecOps practices (secure CI/CD, vulnerability management, policy enforcement). 
- Hands-on experience configuring and managing enterprise AI/ML platforms (IBM watsonx, Google Vertex AI) . 
- Demonstrated success in building and scaling ML infrastructure, automation pipelines, and platform support models . 
Technical Skills 
- Proficiency with IaC tools (Terraform, Pulumi, Ansible, CloudFormation). 
- Strong scripting skills in Python and Bash . 
- Deep understanding of containerization and orchestration (Docker, Kubernetes). 
- Experience with model lifecycle tools (MLflow, TFX, Vertex Pipelines, or equivalents). 
- Familiarity with secrets management, policy-as-code, access control , and monitoring tools. 
- Working knowledge of data engineering concepts and their integration into ML pipelines. 
Preferred 
- Cloud certifications (e.g., GCP Professional ML Engineer, AWS DevOps Engineer, IBM Cloud AI Engineer). 
- Experience supporting platforms in regulated industries (HIPAA, FedRAMP, SOX, PCI-DSS). 
- Contributions to open-source projects in MLOps, automation, or DevSecOps. 
- Familiarity with responsible AI practices including governance, fairness, interpretability, and explainability. 
- Hands-on experience with enterprise feature stores, model monitoring frameworks, and fairness toolkits .