Middleware - SRE MQ

0 years

0 Lacs

Posted:1 month ago| Platform: Indeed logo

Apply

Work Mode

On-site

Job Description

Who are we:
Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.


Responsibilities:

  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
  • Analyse ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health with automated alerts.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and detailed postmortems.
  • Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
  • Work with a global team spread across tech hubs in multiple geographies and time zones
  • Share knowledge and mentor junior resources
  • Primary skills should be - Messaging(kafka, mq, nats, Flink), config management tool(chef infra, habitat, ansible), CI-CD(Bitbucket, Jenkins, XLR), Scripting(Shell, Python), Programming language basics - Java
    Secondary - Event Management tools(Splunk, Dynatrace, Promethius), Cloud - preferred AWS.


Qualifications:

  • BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
  • Experience with algorithms, data structures, scripting, pipeline management, and software design.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Ability to help debug, optimize code, and automate routine tasks.
  • We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
  • Experience in one or more of the following is preferred: Python, Go, Bash Scripting.
  • Interest in designing, analysing, and troubleshooting large-scale distributed systems.
  • We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
  • For work on our ops team, engineer with experience in industry standard CI/CD tools like Git/Bitbucket, Jenkins, and Chef. Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is required.
  • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating.
  • Practice sustainable incident response and blameless post-mortems
  • Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover.
  • Work with a global team spread across tech hubs in multiple geographies and time zones.
  • Share knowledge and mentor junior resources.
  • For team members supporting the Dev Ops pipeline.
  • Design, implement, and enhance our deployment automation based on Chef. We need proven experience writing chef recipes/cookbooks as well as designing and implementing an overall Chef based release and deployment process.
  • Use Jenkins to orchestrate builds as well as link to Sonar, Chef, Maven, Artifactory, etc. to build out the CI/CD pipeline.
  • Support deployments of code into multiple lower environments. Supporting current processes needed with an emphasis on automating everything as soon as possible.
  • Design and implement a Git based code management strategy that will support multiple environment deployments in parallel. Experience with automation for branch management, code promotions, and version management is a plus.


Requirements

  • Proficiency in languages like Python, Go, Java, or Bash for automation scripts, tools, and integrations. Involves writing clean, maintainable code, debugging, API interactions, version control (e.g., Git), and unit testing
  • In-depth knowledge of Linux/Unix. Includes managing processes, file systems, permissions, kernel tuning, shell scripting, server configuration, updates, and security.
  • Expertise in cloud platforms (AWS, GCP, Azure) and tools like Terraform or CloudFormation for infrastructure as code (IaC). Includes managing virtual machines, serverless architectures, and container orchestration (e.g., Kubernetes, ECS) for scalability and high availability.
  • Understanding of TCP/IP, HTTP, DNS, load balancing, VPNs, and firewalls. Includes configuring network services and troubleshooting with tools like Wireshark or traceroute.
  • Proficiency in tools like Splunk, Dynatrace, Prometheus, Grafana, Datadog, Jaeger/Zipkin for logs, metrics, and tracing. Involves defining Service Level Indicators (SLIs), setting Service Level Objectives (SLOs), and creating dashboards for system health insights.
  • Experience with CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions. Includes automating build, test, and deployment processes, as well as rollback mechanisms for reliability.
  • Skills in diagnosing and resolving production issues using logs, metrics, and debugging tools. Includes incident management (e.g., PagerDuty), root cause analysis (RCA), and blameless postmortems.
  • Expertise in managing and operating Apache Kafka, NATS and MQ. Includes configuring topics (Kafka) or subjects (NATS), ensuring high availability, scaling clusters, monitoring performance metrics (e.g., consumer lag, throughput), and troubleshooting issues like message loss or latency. Involves understanding partitioning (Kafka) and pub/sub patterns (NATS) for event streaming and messaging.


Benefits

    Job Opening ID

    RRF_5492

    Job Type

    Permanent

    Industry

    IT Services

    Date Opened

    10/07/2025

    City

    Pune

    Province

    Maharashtra

    Country

    India

    Postal Code

    411057

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You