Key Responsibilities System Architecture & Event-Driven Design • Design and implement event-driven architectures using Apache Kafka to orchestrate distributed microservices and streaming pipelines. • Define scalable message schemas (e.g., JSON/Avro), data contracts, and versioning strategies to support AI-powered services. • Architect hybrid event + request-response systems to balance real-time streaming and synchronous business logic. Backend & AI/ML Integration • Develop Python-based microservices using FastAPI, enabling both standard business logic and AI/ML model inference endpoints. • Collaborate with AI/ML teams to operationalize ML models (e.g., classification, recommendation, anomaly detection) via REST APIs, batch processors, or event consumers. • Integrate model-serving platforms such as SageMaker, MLflow, or custom Flask/ONNX-based services. Cloud-Native & Serverless Deployment (AWS) • Design and deploy cloud-native applications using AWS Lambda, API Gateway, S3, CloudWatch, and optionally SageMaker or Fargate. • Build AI/ML-aware pipelines that automate retraining, inference triggers, or model selection based on data events. • Implement autoscaling, monitoring, and alerting for high-throughput AI services in production. Data Engineering & Database Integration • Ingest and manage high-volume structured and unstructured data across MySQL, PostgreSQL, and MongoDB. • Enable AI/ML feedback loops by capturing usage signals, predictions, and outcomes via event streaming. • Support data versioning, feature store integration, and caching strategies for efficient ML model input handling. Testing, Monitoring & Documentation • Write unit, integration, and end-to-end tests for both standard services and AI/ML pipelines. • Implement tracing and observability for AI/ML inference latency, success/failure rates, and data drift. • Document ML integration patterns, input/output schema, service contracts, and fallback logic for AI systems. Preferred Qualifications • 6+ years of backend software development experience with 2+ years in AI/ML integration or MLOps. • Strong experience in productionizing ML models for classification, regression, or NLP use cases. • Experience with streaming data pipelines and real-time decision systems. • AWS Certifications (Developer Associate, Machine Learning Specialty) are a plus. • Exposure to data versioning tools (e.g., DVC), feature stores, or vector databases is advantageous.
About the Role We are seeking an experienced AWS Cloud Architect with deep expertise in event-driven solutions and multi-account cloud strategy. The ideal candidate will design and implement scalable, secure, and cost-optimized AWS architectures that support real-time workloads, multiple projects, and organizational governance. This role requires both hands-on technical expertise and strategic vision to align cloud adoption with business priorities. Key Responsibilities Event-Driven Architecture Architect and deliver event-driven systems leveraging AWS Event Bridge, SNS, SQS, Kinesis, Lambda, and Step Functions. Apply event sourcing, CQRS, and pub/sub patterns to build scalable, decoupled, and resilient systems. Develop real-time data pipelines for IoT, AI/ML, analytics, and transactional applications. Cloud Strategy & Governance Define and manage the AWS multi-account strategy using AWS Organizations, Control Tower, and Service Control Policies (SCPs). Establish account structures for dev/test, production, and shared services. Provide guidance on multi-project management under single AWS accounts through IAM, VPC segmentation, and tagging policies. Ensure alignment of cloud adoption with organizational governance and compliance frameworks. Cost Optimization Implement cost management practices using AWS Cost Explorer, Budgets and Trusted Advisor. Create tagging standards and cost allocation models across projects and departments. Optimize resources via autoscaling, right-sizing, spot instances, and savings plans. Establish chargeback/showback models for financial transparency and accountability. Security & Compliance Enforce least-privilege IAM access, SCPs, and automated guardrails. Centralize logging and monitoring (CloudTrail, GuardDuty, Security Hub). Ensure compliance with industry standards (PCI DSS, HIPAA, SOC 2, GDPR). Design secure event flows with encryption, key rotation, and monitoring. Collaboration & Leadership Partner with engineering, product, and operations teams to drive cloud-first, event-driven adoption. Lead POCs, reference architectures, and innovation initiatives for new event-driven technologies. Train and mentor teams on event-driven principles, multi-account best practices, and FinOps awareness. Act as a cloud evangelist, aligning stakeholders around long-term AWS strategy. Qualifications Required Skills Strong expertise in AWS Event-Driven Services (EventBridge, SNS, SQS, Kinesis, Lambda, Step Functions). Proven experience with AWS multi-account management (Organizations, Control Tower, SCPs, IAM). Solid knowledge of cost optimization strategies (tagging, chargeback/showback, reserved/spot instances). Proficiency in Infrastructure as Code (Terraform, Cloud Formation, AWS CDK). Deep understanding of security, networking, and compliance in AWS environments. Strong communication and leadership skills with ability to engage both technical and executive stakeholders. Preferred Skills Experience with Kafka/MSK or other event-streaming platforms. Familiarity with FinOps practices and cloud economics. Background in enterprise-scale migrations to event-driven architectures. Certifications (Preferred) AWS Certified Solutions Architect – Professional. AWS Certified DevOps Engineer – Professional. AWS Certified Advanced Networking – Specialty. FinOps Certified Practitioner (bonus). Experience 7+ years in IT, with 4+ years in AWS cloud architecture. Proven experience delivering enterprise-scale event-driven solutions. Hands-on background in multi-account strategy, governance, and cost optimization.