Posted:19 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

We are looking for a Principal Infrastructure who can command, design, and operate a globally distributed production environment across the US, UK, and AU regions. This role is for an architect-level engineer who still loves to get their hands dirty — the kind of person who can both lead a SRE/DevOps team and SSH into a production node to trace issues.



About the Role

This role involves end-to-end control of our multi-region infrastructure (AWS + on-prem hybrid) spanning compute, VoIP, databases, monitoring, and automation.


Responsibilities

  • Reliability and uptime for mission-critical communication systems handling millions of SIP sessions and HTTP requests daily.
  • Management and mentorship of DevOps/SRE and Support Engineers, ensuring process maturity and on-call readiness.
  • Responsibility for performance, scalability, and security across all environments.
  • Decision authority on infrastructure architecture, cost optimization, and modernization initiatives.



Qualifications

  • 10+ years in Infrastructure / DevOps / SRE roles
  • Proven expertise running production-grade, multi-region environments in AWS.
  • Deep understanding of networking, SIP signaling, NAT traversal, RTP, and media relay.
  • Proficiency in Debian/Linux internals, kernel tuning, and packet tracing.
  • Hands-on experience with WAF, ModSecurity, fail2ban, iptables, and system hardening.
  • Solid background in Ansible, AWX, Python (2.7/3.x), and Bash scripting.
  • Experience managing Grafana Loki, Prometheus, and alerting frameworks.
  • Familiarity with Docker / Docker Compose / Terraform for repeatable infra.
  • Prior leadership experience — leading cross-functional infrastructure teams in 24x7 environments.



Required Skills

  • Architect, operate, and continuously improve AWS environments (EC2, EKS, RDS, Route 53, VPCs, IAM, S3, LAMDA, Cloudfront).
  • Maintain multi-layer high-availability VoIP servers.
  • Design and enforce multi-region DR and failover policies with automated recovery.
  • Manage clusters for MariaDB, MongoDB, Redis, and RabbitMQ.
  • Own and optimize Kamailio, Asterisk, RTPProxy, and RTPEngine stacks — ensuring consistent SIP routing, NAT traversal, and call resilience across regions.
  • Build SIP-level observability: tracing dialogs, RTP flows, and registrations through Grafana, Loki, and sngrep pipelines.
  • Lead continuous performance and load testing with SIPp and Playwright-based automation frameworks.
  • Enforce multi-layer security using ModSecurity, fail2ban, IPTables, and real-time log-based intrusion prevention.
  • Build proactive defenses against SIP scanning, brute-force attacks, and zero-day threats.
  • Maintain end-to-end TLS, cert rotation, and infrastructure audit trails.
  • Drive compliance with regional (US/UK/AU) data protection requirements.
  • Manage a large-scale Grafana + Loki + Prometheus monitoring.
  • Correlate VoIP, network, and application metrics for unified visibility.
  • Build actionable alerts and dashboards for call traffic, system health, and anomaly detection.
  • Implement distributed tracing and incident replay for debugging call failures.
  • Maintain full configuration automation via Ansible + AWX with disaster recovery playbooks.
  • Build self-healing routines for proxy or service recovery via scripts and cron jobs.
  • Design CI/CD pipelines for infrastructure code, ensuring rapid, controlled deployments.
  • Implement chaos testing and resilience benchmarking to proactively harden infrastructure.
  • Lead a DevOps and Support Engineering team, setting standards for incident response, documentation, and reliability.
  • Conduct performance reviews and mentor team members on debugging, SIP analysis, and cloud automation.
  • Establish a culture of observability and ownership — no “restart and pray” mentality.
  • Work directly with development teams (Laravel, Node.js, React) to ensure application-level readiness for scale.



Preferred Skills

  • Experience with cloud-native technologies and microservices architecture.
  • Familiarity with Agile methodologies and DevOps practices.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You