(India & North Macedonia pods · United States & India internship programs · Java Spring Boot Microservices Angular Flutter Amazon Web Services Google Cloud Platform OVHcloud MongoDB MySQL PostgreSQL Jira with sprint-based delivery · Sub-millisecond / microsecond-class latency)
About the role
Lead two senior engineering pods in India and North Macedonia, plus coordinated internship programs in the United States and India. This player-coach role owns strategy and hands-on excellence for a platform that targets sub-millisecond and microsecond-class latency on critical paths. You will shape architecture, raise the code quality bar, and run a clear, Jira-driven sprint model that delivers predictable outcomes at extreme performance levels.
What you’ll do
- Institutionalize latency as a first-class goal: Define service-level objectives in microseconds/milliseconds (p50, p95, p99, p99.9), set per-service latency budgets, and enforce them in pull requests, load tests, and release gates.
- Architect for ultra-low latency: Evolve an application programming interface–first Spring Boot microservices platform on Kubernetes (Amazon Elastic Kubernetes Service, Google Kubernetes Engine, or OVHcloud Managed Kubernetes) with:
- lightweight binary protocols and efficient serialization (for example, Protocol Buffers where appropriate),
- connection pooling and keep-alive tuning,
- zero-copy and off-heap patterns where beneficial,
- lock-free or low-contention designs (for example, ring buffers / disruptor patterns),
- asynchronous and reactive pipelines for back-pressure control,
- network and operating system tuning (receive side scaling, interrupt moderation, jumbo frames where safe, non-uniform memory access awareness, thread pinning).
- Engineer the Java Virtual Machine for speed: Standardize low-pause garbage collectors (Z Garbage Collector or Shenandoah), heap sizing, just-in-time compiler warm-up, class data sharing, and profiling (Java Flight Recorder, async-profiler) with performance baselines checked into the repository.
- Data paths built for microseconds: Drive designs in MySQL, PostgreSQL, and MongoDB with partitioning/sharding, change-data-capture, prepared statements, read/write separation, hot caches (Redis), page-cache warming, and point-in-time recovery and disaster-recovery plans that do not compromise latency on the happy path.
- Quality, reliability, and safety at speed: Implement contract tests, end-to-end smoke tests, progressive delivery (canary and blue-green releases), and observability with high-resolution histograms for latency. Use OpenTelemetry traces, metrics, and logs to visualize tail latency and eliminate jitter.
- Security that respects performance: Apply Transport Layer Security termination with sensible cipher choices and hardware acceleration where available; run a secure software development life cycle with static, dynamic, and software-composition security testing and software bill of materials / artifact signing.
- Operate the sprint system: Make Jira the source of truth—well-formed epics, stories with acceptance criteria, two-week sprints, and ceremonies (refinement, planning, daily stand-ups, reviews, retrospectives). Publish live dashboards for velocity, burndown/burnup, cycle/lead time, throughput, and work-in-progress.
- Build and mentor player-coaches: Hire and grow Developers, Senior Developers, and a hands-on Engineering Manager at each site. Lead by example with design spikes, reference implementations, and deep code reviews.
- Run internship programs (United States and India): Create 10–12 week curricula, sandboxed backlogs, pair-programming, weekly demos, and conversion paths to full-time roles.
What success looks like (6–12 months)
- Latency targets met: Example targets—critical in-cluster request p99 ≤ 1 millisecond; in-process hot path p99 ≤ 150–300 microseconds; end-to-end user journey p95 ≤ 50 milliseconds (numbers will be finalized per service).
- Predictable delivery: ≥ 85% sprint predictability (planned versus completed) with reduced cycle time and mean time to recovery trending down quarter-over-quarter.
- Production confidence: Progressive delivery in place, service-level objectives consistently met, and zero critical vulnerabilities outstanding.
- Cost-aware performance: Measurable reduction in cost per customer or cost per transaction while maintaining latency goals.
- Talent engine: Two self-sufficient pods with strong engagement; internship programs meeting satisfaction and conversion targets.
Qualifications
- Experience: 10+ years in software engineering; 5+ years leading multi-team organizations; proven leadership of distributed pods and early-career programs.
- Low-latency depth (Java focus): Spring Boot 3.x, asynchronous/reactive design, Netty-class networking, disruptor or ring-buffer patterns, off-heap strategies, garbage-collector tuning (Z Garbage Collector or Shenandoah), and Linux performance tuning (thread pinning, non-uniform memory access awareness, kernel parameters).
- Platform: Kubernetes, Helm, Argo Continuous Delivery, GitHub Actions or GitLab Continuous Integration, and infrastructure as code with Terraform across Amazon Web Services, Google Cloud Platform, and OVHcloud.
- Data: MySQL, PostgreSQL, MongoDB, and Redis; schema design, indexing, partitioning, performance tuning, and change-data-capture.
- Observability and resilience: OpenTelemetry traces/metrics/logs; Prometheus and Grafana; Elasticsearch/Logstash/Kibana or OpenSearch; incident management with blameless postmortems.
- Security: OAuth 2.0, OpenID Connect, and JSON Web Tokens; secrets management; static/dynamic/software-composition testing; supply-chain hardening.
- Leadership: A true player-coach who can set crisp strategy, mentor managers and senior engineers, and translate microsecond-level engineering choices into business outcomes.
Our stack (you will influence and improve)
- Backend: Java 17+, Spring Boot 3.x, Spring Cloud, RESTful and GraphQL APIs
- Web/Mobile: Angular, Flutter
- Infrastructure and Cloud: Kubernetes, Helm, Argo Continuous Delivery, Terraform, GitHub Actions or GitLab Continuous Integration; Amazon Web Services; Google Cloud Platform; OVHcloud
- Data: MySQL, PostgreSQL, MongoDB; Redis for hot-path caching
- Observability and Security: OpenTelemetry; Prometheus and Grafana; Elasticsearch/Logstash/Kibana or OpenSearch; OAuth 2.0, OpenID Connect, JSON Web Tokens; Vault/Secrets Manager
- Process: Jira with Scrum/Kanban; Confluence for specifications and runbooks
Job Types: Full-time, Permanent
Pay: ₹2,913,711.81 - ₹3,581,863.33 per year
Benefits:
- Health insurance
- Life insurance
- Paid sick time
- Paid time off
- Provident Fund