Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
8.0 years
0 Lacs
Greater Kolkata Area
On-site
Work Location : PAN India Duration : 12 Months (Extendable) Shift : Rotational shifts including night shifts and weekend availability Years of Experience : 8+ Years π§ Job Summary We are seeking an experienced and versatile Site Reliability Engineer (SRE) / Observability Engineer to join our project delivery team. The ideal candidate will bring a deep understanding of modern cloud infrastructure, monitoring tools, and automation practices to ensure system uptime, scalability, and performance across a distributed environment. π― Key Responsibilities Site Reliability Engineering Design, build, and maintain scalable, reliable infrastructure. Automate provisioning/configuration using tools like Terraform, Ansible, Chef, or Puppet. Develop automation tools/scripts in Python, Go, Java, or Bash. Administer and optimize Linux/Unix systems and network components (TCP/IP, DNS, load balancers). Deploy and manage infrastructure on AWS or Kubernetes platforms. Build and maintain CI/CD pipelines (e.g., Jenkins, ArgoCD). Monitor production systems with tools such as Prometheus, Grafana, Nagios, Datadog. Conduct postmortems and define SLAs/SLOs to ensure high system reliability. Plan and implement capacity management, failover systems, and auto-scaling mechanisms. Observability Engineering Instrument services for metrics/logs/traces using OpenTelemetry, Prometheus, Jaeger, etc. Manage observability stacks (e.g., Grafana, ELK Stack, Splunk, Datadog, Honeycomb). Work with time-series databases (e.g., InfluxDB, Prometheus) and log aggregation tools. Build actionable alerts and dashboards to reduce alert fatigue and increase insight. Advocate for observability best practices with developers and define performance KPIs. β Required Skills & Qualifications Proven experience as an SRE or Observability Engineer in production environments. Strong Linux/Unix and cloud infrastructure skills (especially AWS, Kubernetes). Proficient in scripting and automation (Python, Go, Bash, Java). Expertise in observability, monitoring, and alerting systems. Experience in Infrastructure as Code (IaC) and modern CI/CD practices. Strong troubleshooting skills and ability to respond to live production issues. Comfortable with rotational shifts, including nights and weekends. π Mandatory Technical Skills Ansible AWS Automation Services AWS CloudFormation AWS CodePipeline AWS CodeDeploy AWS DevOps Services Show more Show less
Posted 11 hours ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
The job market for reliability professionals in India is growing rapidly as companies across various industries recognize the importance of ensuring their systems and products are dependable and consistent. Reliability jobs in India offer competitive salaries, career progression opportunities, and the chance to work on cutting-edge technologies.
The average salary range for reliability professionals in India varies based on experience level: - Entry-level: βΉ4-6 lakhs per annum - Mid-level: βΉ8-12 lakhs per annum - Experienced: βΉ15-20 lakhs per annum
In the field of reliability, a typical career path may include roles such as: 1. Junior Reliability Engineer 2. Reliability Engineer 3. Senior Reliability Engineer 4. Reliability Manager 5. Director of Reliability
In addition to expertise in reliability, professionals in this field are often expected to have or develop skills in: - Data analysis - Programming languages like Python or R - Statistical modeling - Root cause analysis
As you prepare for interviews in the field of reliability, remember to showcase your problem-solving skills, technical knowledge, and experience in improving system dependability. Stay confident, stay curious, and best of luck in your job search journey!
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.