Staff Site Reliability Engineer (8+ years)

5 - 7 years

0 Lacs

Posted:3 days ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network, enabling individuals, businesses, and economies to thrive while driven by a common purpose to uplift everyone, everywhere by being the best way to pay and be paid.

Make an impact with a purpose-driven industry leader. Join us today and experience Life at Visa.

  • Expert-level proficiency operating large-scale, distributed, mission-critical systems: designing for high availability, multi-region resiliency, low latency, and predictable performance under extreme load.

  • ?SRE fundamentals at Staff level: defines and drives SLOs/SLIs, error budgets, availability targets, and capacity guardrails codifies reliability requirements into design reviews and change-management gates.

  • Deep hands-on with Kubernetes and container platforms: multi-cluster operations, workload placement, HPA/VPA, pod disruption budgets, network policies, admission control, service mesh (Istio/Linkerd), and progressive delivery (blue/green, canary, feature flags).

  • ?Infra as Code and GitOps: Terraform (and/or Pulumi), Helm/Kustomize, Argo CD/Flux builds reusable modules, policy-as-code (OPA/Conftest), environment drift detection, and automated remediation.

  • Observability at scale: OpenTelemetry instrumentation/tracing, metrics (Prometheus), logging (ELK/OpenSearch), distributed tracing (Jaeger/Tempo/Zipkin), dashboards and SLO burn-rate alerts (Grafana) designs actionable alerts with runbook automation.

  • Proven incident leadership: serves as Incident Commander for P0/P1 events, coordinates cross-functional response, stabilizes systems, restores service quickly, and drives blameless postmortems with measurable follow-through.

  • Performance engineering and capacity planning: load and resilience testing, GC/heap and thread tuning (for JVM services), profiling (CPU, memory, IO), caching strategies, queue backpressure, and cost-aware capacity models.

  • Strong systems and networking: Linux internals, filesystems, TCP/UDP, TLS/mTLS, HTTP/2/3, DNS, BGP/Anycast concepts, L4L7 load balancing (Envoy/HAProxy/NGINX), CDN/edge (Cloudflare/Fastly/Akamai), WAF, and DDoS mitigation.?
  • ?Data/store reliability: operational experience with relational (PostgreSQL/MySQL/Oracle) and NoSQL (Cassandra/DynamoDB/MongoDB), streaming platforms (Kafka/Pulsar/Kinesis), and distributed caches (Redis/Hazelcast) backup/restore, consistency models, compaction/retention tuning, and multi-AZ/region failover.
  • Cloud and platform engineering: AWS/Azure/GCP core services, VPC design, IAM/RBAC, KMS, secrets management (Vault), service catalog, golden images/base containers, and paved-road platforms for developers.
  • Release engineering and CI/CD: Jenkins/GitHub Actions/GitLab CI, artifact/signing/SBOM, canary analysis, automated rollbacks, deployment safety checks, and change failure rate/MTTR improvements.
  • Reliability-by-design partnership: participates in and leads architecture/design reviews, threat modeling, and resilience patterns (bulkheads, circuit breakers, idempotency, retry/backoff, dead-letter handling).
  • Disaster recovery and business continuity: RTO/RPO objectives, runbooks, game days/chaos experiments (Litmus/Gremlin), regional evacuation, and active-active/active-passive strategies.
  • Security in depth for production systems: least privilege, workload identity, image and dependency scanning, supply-chain hardening (SLSA), SBOM, network segmentation/zero trust, and PCI-DSS-aligned operational controls.
  • Strong programming and automation: production-grade Go and/or Python (plus Bash), contributing SRE tooling, controllers/operators, and APIs code reviews, testing, and docs-as-code.
  • Effective communicator and influencer: aligns reliability strategy with business outcomes, mentors engineers, challenges assumptions with data, and proposes pragmatic, incremental improvements.
  • Experience leveraging GenAI/LLMs as copilots: accelerating runbook authoring, alert triage, knowledge retrieval, and post-incident synthesis with appropriate guardrails and data security.
  • Nice to have: JVM and Node.js runtime tuning experience traffic engineering at Internet scale mobile edge/network reliability considerations.

This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.

Qualifications

Basic Qualifications
5+ years of relevant work experience with a Bachelor's Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD, OR 8+ years of relevant work experience.Preferred Qualifications5+ years of relevant work experience with a Bachelor's Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD, OR 8+ years of relevant work experience.

Additional Information

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Visa logo
Visa

IT Services and IT Consulting

Foster City California

RecommendedJobs for You