Posted:2 months ago|
Platform:
Remote
Full Time
Senior ML Ops Engineer
About the company
It is an Artificial Intelligence company bringing the speed and insight of Applied AI to
visual assessment. Trained on millions of data points, our AI-powered solutions connect
everyone involved in insurance, repairs, and sales of homes and cars – helping people work
faster and smarter, while reducing friction and waste.
Founded in 2014, it is now the AI tool of choice for world-leading insurance and
automotive companies. Our solutions unlock the potential of Applied AI to transform the
whole recovery ecosystem, from assessing damage and accelerating claims and repairs to
recycling parts. They help make response to recovery up to ten times faster – even after
full-scale disasters like floods and hurricanes.
We're a diverse team, uniting individuals of over 40 different nationalities and from varied
backgrounds, with machine learning researchers and motor engineers collaborating together
on a daily basis. We empower each team member to have tangible impact and grow their
own scope by intentionally building a culture centred around collaboration, transparency,
autonomy and continuous learning.
What you will do
ML foundations team focuses on building tools and services for our internal customer within
research, product, engineering and Operation specialists.
We have 3 teams that tackle different aspects of this space, ML applications, Data operations
and ML Infrastructure. You'll be collaborating with peer teams and enhance, build and maintain
the ML infrastructure stack.
We are looking for a Senior [Data|ML Ops] Engineer to build and support systems that
enable the core mission of the company - to make applied AI possible - by optimising the
end-to-end Machine Learning life cycle. The vision of the ML Infrastructure is to enable
researchers to spend 80%+ of their time solving tricky ML problems rather than dealing with
engineering/infra/ops challenges.
You will help mature our ML and data platform to a world-class state. You will influence the
scope and technical direction as well as champion best practices within the team. You have
a relentless focus on user experience (researchers, data scientists and product engineers)
and you care deeply about what your team is building to make sure it will have the biggest
impact on your users. You will be a strong mentor, nurturing an encouraging and supportive
environment to enable the team to do their best work.
The role:
You'll play a key role in developing our ML & data platform from ground up, as part of a small
but high-performing team. You will influence the scope and technical direction as well as
champion best practices within the team. You will continuously pursue clean code practices
and contribute towards overall platform architecture, collaborating with our other Engineering
and Product teams.
You will:
● Work with engineers, researchers and data scientists to build the next generation of
Tractable’s ML & data platform
● Help identify and realise capabilities in our ML & data platform that massively speed
up getting research to production across dataset & model management, model
training, model serving, labelling, data & ML pipeline orchestration and more
● Support Research and Product Engineers with tools and processes to enable a
seamless data flywheel
● Deploy and continuously develop robust infrastructure, using best practices for
managing infrastructure-as-code
● Solve cost and performance scalability challenges in both model training and model
serving
● Run, monitor and maintain business-critical, production systems
● Adopt open-source technologies to best leverage our in-house resources
● Promote engineering best practices throughout the team
● Suggest, collect and synthesise requirements to create an effective feature roadmap
Tech Stack:
We rely heavily on the following tools and technologies, but we are likely to explore new
technologies / frameworks as we are building the platform from ground up. You don't need to
have prior experience in all of them, and we actively encourage diverse views on what the
best tools for the job are. We’re just keen to know that you're willing to break things, fix
things, learn fast and help build a great team that is capable of building a platform that
delights our customers.
● Main Infrastructure: AWS (EC2, S3, MSK, Lambda, StepFunctions, Glue, IAM,
Cognito, Systems Manager, CloudWatch, SQS, Route 53, Sagemaker), Apache
Kafka (AWS MSK), Kubernetes, Datadog (Metrics, Logs, Synthetics), Pagerduty,
Loki, Elastic Search
● Main CI/CD: Terraform, Docker, Harness
● Main Databases: Postgres / RDS, Redis, DynamoDB
● Main Languages: Python, Node + Typescript, SQL (Postgres)
● Main Data stack: AWS MSK, AWS Lambda, AWS Redshift, dbt, Airflow, Airbyte, AWS
Glue
● Main ML stack: Triton, TFServing, KServe, AWS Sagemaker, AWS Lambda, AWS
MSK, sync/async APIs, Weights & Biases, Tensorflow, Pytorch, dvc, Dagster/Flyte,
Streamlit
What you need to be successful [ML OPS ENGINEER]:
A strong ML Engineer who is passionate about building platforms that massively reduce lead
time from bringing Machine Learning research to production. You have a solid background in
core software engineering principles and a good understanding of the difficulties faced by
data scientists. A few things we are particularly interested in seeing from you:
● Have experience in building and managing end-to-end machine learning pipelines, from
model training to deployment.Experienced in managing and constructing complete machine
learning pipelines, spanning from model training to deployment.
● Great communication skills and a collaborative mindset
● An ability to catalyse both process and technical change in a complex, highly
cross-functional environment
● 2+ years of experience in building scalable Machine Learning systems
● Have experience building and/or managing scalable data infrastructure (data
ingestion, data lake, data warehouse, data orchestration)
● Strong programming experience, from self-contained algorithms to complex object
modelling design
● Worked with Python in a professional environment for 2+ years
● Experience working with and scaling model training across GPU clusters
● Experience in building data pipelines and managing data infrastructure
● Experience deploying and managing infrastructure-as-code
● Able to design scalable, robust, fault-tolerant system architecture and compare
trade-offs (distributed systems experience a plus)
● Experience building robust, intuitive tooling to support internal users (e.g. common
ML libraries, CLIs etc.)
● Experience deploying and managing infrastructure-as-code, preferably via AWS CDK
● Numerical computing experience
● Cares about team practices / pairing / advocate of CICD
● Basic ML knowledge, with experience in training computer vision models at scale
highly desirable
What’s in it for you
● Competitive salary
● 6 month salary reviews
● Equity
● Pension scheme
● Bupa private healthcare (full coverage)
● Flexible hours & WFH/hybrid setups
● Learning and Development budget
● Competitive maternity + paternity leave
● Daily office snacks & soft drinks
● Regular company office events such as Games Nights, Movie Nights, Lunch &
Learns, Monthly Brunch and more
Saplings HR
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Nowhyderabad, chennai, bengaluru
Experience: Not specified
1.0 - 3.0 Lacs P.A.
navi mumbai, maharashtra, india
3.5 - 7.0 Lacs P.A.
gurgaon, haryana, india
3.5 - 7.0 Lacs P.A.
Salary: Not disclosed
27.5 - 42.5 Lacs P.A.
hyderabad, chennai, bengaluru
15.0 - 30.0 Lacs P.A.
bengaluru
7.0 - 8.0 Lacs P.A.
pune, maharashtra, india
Salary: Not disclosed
Salary: Not disclosed
bengaluru east, karnataka, india
Experience: Not specified
Salary: Not disclosed