Posted:3 months ago|
Platform:
Work from Office
Full Time
System design, configuration, integration, deployment, and operations of Observability systems and tools. These systems include collection of metrics/logs/events from many backend services deployed across multiple AWS accounts and regions and consumed by multiple teams Working with engineering teams to enable them to support their services from development to production Ensure our Observability platform exceeds goals for availability, capacity, efficiency, scalability, and performance as well as meeting our internal SLOs Build the next generation of observability integrating with Istio Write libraries and APIs that provide a simple, unified interface to other developers when they use our monitoring, logging, and event processing systems Enhance the existing alerting capabilities with Slack, Jira and PagerDuty Helping build a continuous deployment system guided by metrics and data Bring anomaly detection into the observability stack Participate in 24x7 on-call rotation after at least 6 months of employment What you know Minimum of five years experience Strong with Python or Go Cloud of choice, preference for AWS - Lambda, CloudWatch, IAM, EC2, ECS, S3 Solid understanding of Kubernetes Prometheus, PromQL, Thanos, AlertManager, Grafana, etc. Strong knowledge of standard monitoring protocols/frameworks - Prometheus/Influx line format, SNMP, JMX, etc. Elastic stack, syslog, CloudWatch Logs Comfortable working with git, Github, and common CI/CD approaches IAC tooling like CloudFormation or Terraform How you do things Excited to use your expertise and be prescriptive about the right way forward Able to work well alongside SRE, platform and development teams Able to work independently and know when to reach out for support Passionate about automation - we do everything-as-code Other interesting things we d cheer about: Distributed tracing tools (eg: Jaeger, Sentry, Zipkin, Grafana Tempo) Java Some familiarity with open Observability initiatives (e.g., Open Tracing, Open Census, Open Metrics) Knowledge of Kafka Familiar with monitoring/observability in GCP and Azure AWS Certifications Comfortable with SQL
Arctic Wolf Networks
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Practice Video Interview with JobPe AI
Computer and Network Security
1001-5000 Employees
22 Jobs
Key People
12.0 - 16.0 Lacs P.A.
Calcutta
Experience: Not specified
Salary: Not disclosed
Pune, Maharashtra, India
7.0 - 10.0 Lacs P.A.
Pune, Maharashtra, India
3.0 - 6.0 Lacs P.A.
Pune, Maharashtra, India
5.0 - 10.0 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
3.0 - 6.0 Lacs P.A.
Pune, Maharashtra, India
6.0 - 10.0 Lacs P.A.
Pune, Maharashtra, India
6.0 - 10.0 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
Experience: Not specified
0.5 - 5.0 Lacs P.A.
Hyderabad / Secunderabad, Telangana, Telangana, India
6.0 - 8.0 Lacs P.A.