We're hiring an ML Engineer to work alongside Data Scientist(s) and support a leading client in the ad tech domain. You will own the infrastructure, low-latency APIs, data pipelines, deployment, and reliability of recommendation and ranking models in production. You'll be the bridge between data science and engineering: taking prototypes from the Data Scientist and turning them into robust, low-latency, high-availability services that operate at ad-tech scale. You should be comfortable with asynchronous communication(written updates, docs, Slack-style collaboration) with both the client and our internal team across time zones.
The Candidate Will Have Responsibilities Across The Following Functions
Model Productionization and Serving:
- Design, build, and maintain low-latency APIs for serving recommendation and ranking models.
- Take Data Scientist-built models (in Python) and productionize them for real-time or near-real-time serving.
- Implement and maintain model serving endpoints (e. g., using SageMaker, Vertex AI, custom Docker/Kubernetes-based services, or similar).
- Optimise for low latency and high throughput, suitable for ad-serving workloads.
Feature Pipelines And Data Engineering
- Design and build feature pipelines for training and inference:
- Batch pipelines using tools like Airflow, dbt, Beam, or Spark.
- Streaming / real-time features using Kafka, Pub/Sub, etc
- Design, integrate with, or operate an online feature store to serve low-latency features for real-time scoring.
- Ensure training-serving skew is minimised; maintain clear contracts for feature definitions and data schemas.
Infrastructure And MLOps
- Implement CI/CD for ML models and pipelines (e. g., GitHub Actions, GitLab CI, Cloud Build, etc. ).
- Manage containerization and deployment using Docker and Kubernetes (or managed equivalents).
- Set up and maintain model versioning, configuration management, and rollback strategies.
Monitoring, Observability And Reliability
- Work with the Data Scientist to define metrics and implement monitoring for:
- Model performance (prediction distribution, drift, business KPIs).
- System performance (latency, error rates, resource utilisation).
- Data quality (schema checks, nulls, outliers, volume anomalies).
- Build alerting and logging using the client's stack (e. g., Prometheus, Grafana, Cloud Monitoring, CloudWatch, etc. ).
- Investigate and resolve production issues, from infrastructure to data to model-related problems.
Experimentation Platform Support
- Integrate models with the client's AB testing/experimentation framework.
- Implement traffic splits, routing logic, and variant toggles (feature flags).
- Ensure metrics and logs needed for experiment analysis are correctly captured and accessible.
Collaboration And Client Interaction
- Work closely with the Data Scientist to understand modelling assumptions and requirements.
- Collaborate with the client Product and Engineering teams to align on SLAs, integration points, and architectural choices.
- Participate in technical discussions with client partners; communicate trade-offs and propose pragmatic solutions.
- Provide clear async updates(tickets, comments, design docs, status summaries) so both the client and internal teams stay aligned without needing constant meetings.
Requirements
- Experience: 2-5 years as an ML Engineer / Data Engineer / Software Engineer working on ML-heavy systems.
- Programming: Strong skills in Python
- Cloud: Hands-on experience with GCP or AWS
- Data Engineering: Experience building and operating data pipelines (batch and/or streaming) using tools like Airflow, dbt, Beam, Spark, or similar.
- MLOps / Infra: Experience with:
- Containerization (Docker) and orchestration (Kubernetes or managed alternatives).
- CI/CD for services or ML workflows.
- Monitoring/logging tools (Prometheus, Grafana, CloudWatch, Stackdriver, etc. ).
- Collaboration and Communication:
- Comfortable working in a remote, async-first environment: writing good design docs, giving structured written updates, and collaborating over Slack/email/tickets with distributed teams.
Nice-to-Have
- Experience with real-time / low-latency systems, especially in ad tech, recommendation, ranking, or search.
- Familiarity with feature stores and online feature serving.
- Familiarity with online experimentation frameworks and traffic routing for AB tests.
- Familiarity with Model registries and ML platforms (e. g., MLflow, SageMaker, Vertex AI pipelines).
- Comfort reading Data Scientist code/notebooks and refactoring them into clean, production-ready modules.
This job was posted by Akshay Singh from Yugen.ai.