Cloud Operations & Monitoring Professional

5 - 10 years

7 - 12 Lacs

Posted:2 months ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Cloud Ops & Monitoring Engineer

Job Title:

Cloud Ops & Monitoring Engineer

Location:

Bangalore

Department:

Technology

Reporting To:

Cloud Infra Director

Position Overview

Tookitaki is seeking a

Cloud Ops & Monitoring Engineer

to ensure the

stability, performance, and security

of our

cloud-based infrastructure

across all product offerings. This role is crucial in maintaining

high availability

, optimizing

cloud operations

, and proactively monitoring our

cloud environments

. The ideal candidate will have deep expertise in

cloud platforms, automation

, and

observability tools

to drive

incident response, cost optimization

, and

operational efficiency

.

Position Purpose

The

Cloud Ops & Monitoring Engineer

is responsible for

monitoring, optimizing, and maintaining

Tookitaki s cloud infrastructure. This role ensures

high system reliability

,

proactive incident management

, and

efficient resource utilization

. By leveraging

automation

and

advanced monitoring tools

, the engineer will drive

operational excellence

,

minimize downtime

, and enhance

cloud security

.

Key Responsibilities

Cloud Operations Management

  • Monitor and manage

    cloud infrastructure (AWS, GCP, Azure)

    for

    performance, availability, and security

    .
  • Ensure

    99.99% uptime

    of mission-critical systems through

    proactive maintenance

    and

    incident resolution

    .
  • Implement

    best practices

    for

    cloud governance, cost optimization

    , and

    capacity planning

    .

Monitoring & Incident Response

  • Set up and maintain

    observability tools

    (
    Prometheus, Grafana, ELK stack, Datadog, New Relic
    ).
  • Develop

    real-time monitoring

    and

    alerting mechanisms

    to detect anomalies before they impact operations.
  • Act as the

    first responder

    for production incidents, ensuring

    swift issue resolution

    and

    root cause analysis

    .

Automation & Infrastructure Optimization

  • Develop and maintain

    Infrastructure as Code (IaC)

    scripts (
    Terraform, CloudFormation
    ) for

    cloud automation

    .
  • Automate

    cloud scaling, log management, and incident resolution workflows.
  • Optimize cloud environments

    for

    performance, security

    , and

    cost efficiency

    .

Security & Compliance Enforcement

  • Implement

    security best practices

    , including

    IAM policies, encryption

    , and

    vulnerability management

    .
  • Work closely with

    security teams

    to detect and mitigate

    threats in cloud environments

    .
  • Ensure

    compliance

    with global

    financial regulatory standards

    (
    GDPR, PCI-DSS, SOC 2
    ).

Cross-Team Collaboration & Reporting

  • Collaborate with

    DevOps, Security, and Development teams

    to enhance cloud performance.
  • Provide

    operational insights and reports

    on cloud system health, trends, and optimization opportunities.
  • Document

    incident reports, troubleshooting steps

    , and

    operational playbooks

    for continuous learning.

Key OKRs

  • Maintain

    99.99% system uptime

    by proactively monitoring and resolving cloud incidents.
  • Reduce cloud operational costs by 20%

    through optimization and automation.
  • Automate 80%

    of cloud monitoring and alerting processes within six months.
  • Ensure

    100% compliance

    with cloud security policies and regulatory standards.
  • Improve

    MTTR (Mean Time to Resolution) by 30%

    for critical incidents.

Qualifications and Skills

Education

  • Bachelor s or Master s degree

    in Computer Science, Engineering, or a related technical field.
  • Certifications

    in AWS, Azure, Google Cloud, or Kubernetes (

    preferred

    ).

Experience

  • 5+ years

    of experience in

    cloud operations, monitoring

    , or

    DevOps

    roles.
  • Proven experience in managing

    highly available, production-grade cloud environments

    .

Technical Expertise

  • Proficiency in

    AWS, GCP

    , or

    Azure cloud services

    .
  • Strong hands-on experience with

    monitoring tools

    (
    Prometheus, Grafana, ELK, Datadog, New Relic
    ).
  • Expertise in

    Infrastructure as Code (IaC)

    tools (
    Terraform, CloudFormation
    ).
  • Experience with

    containerization and orchestration

    (
    Docker, Kubernetes
    ).
  • Knowledge of

    cloud security

    ,

    IAM policies

    ,

    encryption

    , and

    threat detection

    .
  • Familiarity with

    CI/CD pipelines, scripting

    (
    Python, Bash
    ), and

    cloud automation

    .

Soft Skills

  • Analytical mindset

    with strong

    troubleshooting and problem-solving abilities

    .
  • Excellent communication skills

    to work cross-functionally with multiple teams.
  • Proactive and detail-oriented

    , with a focus on

    continuous improvement

    .
  • Ability to work in a

    fast-paced, dynamic environment

    with

    tight deadlines

    .

Key Competencies

  • Cloud Monitoring & Performance Optimization:

    Ensures system health and efficiency through real-time observability.
  • Incident Management & Troubleshooting:

    Rapidly diagnoses and resolves production issues with minimal downtime.
  • Automation & Infrastructure Management:

    Implements self-healing and scalable cloud solutions.
  • Security & Compliance Awareness:

    Ensures adherence to regulatory standards and cloud security best practices.
  • Cross-Functional Collaboration:

    Works closely with engineering, security, and DevOps teams to enhance cloud operations.

Success Metrics

  • Maintain

    99.99% system uptime

    , ensuring minimal service disruption.
  • Reduce MTTR

    (Mean Time to Resolution) for critical incidents by

    30%

    .
  • Automate 80%

    of cloud monitoring and incident response workflows.
  • Optimize cloud resource utilization

    , achieving a

    20% cost reduction

    .
  • Implement a

    fully operational cloud observability framework

    within six months.

Benefits

  • Competitive Salary:

    Aligned with industry standards and experience.
  • Professional Development:

    Access to training in

    big data, cloud computing

    , and

    data integration tools

    .
  • Comprehensive Benefits:

    Health insurance

    and

    flexible working options

    .
  • Growth Opportunities:

    Career progression within Tookitaki s

    rapidly expanding Services Delivery team

    .
",

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Tookitaki logo
Tookitaki

Software Development

Singapore Singapore

RecommendedJobs for You