As part of our mission to leverage AI for business innovation, we have established an AI COE to develop Generative AI (GenAI) and Agentic AI solutions that enhance decision-making, automation, and user experiences.
Job Overview:
We are seeking dynamic and talented individuals to join our AI COE to design, architect and drive scalable AI systems. This team will focus on developing advanced AI solutions, integrating them into our cloud-based platform, and delivering impactful solutions that drive efficiency, innovation, and customer value.
Role Overview
We are seeking a visionary
Principal AI Solution Architect
to lead the design and implementation of our enterprise AI solution. You will be the primary technical authority responsible for translating our 4-layer architecture—spanning Data Ingestion, AI Feature development, and API Consumption—into a scalable, secure, and high-performance reality.As a cloud-agnostic leader, you will leverage
AWS
/
Azure
native services alongside open-source tools to build a platform capable of supporting diverse AI features, including Generative AI, NLP, and Pattern Recognition.
Key Responsibilities
- Foundational & Infrastructure Layer
- Orchestration: Design a containerized environment using Kubernetes (EKS/AKS) to host modular AI services.
- Observability: Implement end-to-end monitoring, logging, and distributed tracing (e.g., Prometheus, Grafana, OpenTelemetry) to track model performance and system health.
- Security & IAM: Architect a zero-trust security model using OIDC/OAuth2 and RBAC to protect sensitive data and model endpoints.
- Data Ingestion & Storage Layer
- Hybrid Pipelines: Build scalable ETL/ELT pipelines to ingest structured and unstructured data from sources like Salesforce, Zendesk, and live voice/email streams.
- Storage Strategy: Design a unified data strategy incorporating Data Lakes for raw storage and Vector Databases for high-dimensional AI retrieval (RAG).
- AI Feature & MLOps Layer
- Lifecycle Management: Establish a robust MLOps practice, including Model Registries, automated training pipelines, and model versioning (e.g., MLflow, Kubeflow).
- Feature Engineering: Implement a Feature Store to ensure consistent data is provided to models during both training and real-time inference.
- AI Specialization: Provide architectural guardrails for specialized modules: OCR/NLP, Anomaly Detection, Scoring & Recommendations, and Generative AI (LLMs).
- Consumption & API Layer
- Gateway Design: Architect a high-throughput API Gateway layer to serve internal and external users, ensuring low-latency response times and robust rate limiting.
- Integration: Ensure seamless connectivity between the AI Workbench and external enterprise platforms.
Technical Requirements
- Cloud Platforms: Expert-level proficiency in AWS (SageMaker, Bedrock, EKS, Lambda) OR Azure (Azure ML, OpenAI Service, AKS, Functions).
- AI/ML Frameworks: Deep experience with Python, PyTorch/TensorFlow, Scikit-learn, and GenAI frameworks like LangChain or LlamaIndex.
- Infrastructure as Code (IaC): Proficiency in Terraform or Pulumi for repeatable environment deployment.
- Data Technologies: Experience with Kafka/Spark for streaming and Pinecone/Milvus/Weaviate for vector storage.
- Architecture Patterns: Strong grasp of Microservices, Event-Driven Architecture, and REST/gRPC API design.
Soft Skills & Leadership
- Strategic Thinking: Ability to balance "cutting-edge" AI with "production-ready" stability and cost-efficiency.
- Communication: Capable of explaining complex AI trade-offs (e.g., Latency vs. Accuracy) to non-technical stakeholders.
- Mentorship: Experience leading and scaling cross-functional teams of Data Engineers, ML Engineers, and DevOps.
Preferred Qualifications
- Overall 12+ years of experience with 8+ years in Software/Systems Architecture.
- 3+ years specifically leading AI/ML production initiatives.
- Relevant Cloud Certifications (AWS Certified Solutions Architect – Professional or Azure Solutions Architect Expert).
Skills: aws,llamaindex,terraform,langchain,iam,azure,infrastructure as code (iac),orchestration,python,api,kubernetes,eks