Lead Site Reliability Engineer - DevOps

3 - 7 years

0 Lacs

Posted:3 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

You will be working as a Site Reliability Engineer on Qualys Cloud Platform & Middleware technologies. Your main tasks will include combining software development and systems engineering skills to build and run scalable, distributed, and fault-tolerant systems. The primary responsibilities of this role are as follows: - Co-develop and participate in the full lifecycle development of cloud platform services, from inception and design to deployment, operation, and improvement by applying scientific principles. - Increase the effectiveness, reliability, and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way, and evaluating the results. - Support the cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting, and participating in the testing/verification process. - Ensure that the cloud platform technologies are properly maintained by measuring and monitoring availability, latency, performance, and system health. - Advise the cloud platform team to improve the reliability of the systems in production and scale them based on need. - Participate in the development process by supporting new features, services, releases, and hold an ownership mindset for the cloud platform technologies. - Develop tools and automate the process for achieving large-scale provisioning and deployment of cloud platform technologies. - Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame. - Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting, and root cause analysis. As a qualified candidate, you should meet the following requirements: - 3 years of relevant experience in running distributed systems at scale in production. - Expertise in one of the programming languages: Java, Python, or Go. - Proficient in writing bash scripts. - Good understanding of SQL and NoSQL systems. - Good understanding of systems programming (network stack, file system, OS services). - Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs, etc. - Skilled in identifying performance bottlenecks, anomalous system behavior, and determining the root cause of incidents. - Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc. - Knowledge of best practices related to security, performance, high-availability, and disaster recovery. - Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments, and other related procedures. - Able to drive results and set priorities independently. - BS/MS degree in Computer Science, Applied Math, or related field. Bonus Points if you have experience with: - Managing large scale deployments of search engines like Elasticsearch. - Managing large scale deployments of message-oriented middleware such as Kafka. - Managing large scale deployments of RDBMS systems such as Oracle. - Managing large scale deployments of NoSQL databases such as Cassandra. - Managing large scale deployments of in-memory caching using Redis, Memcached, etc. - Container and orchestration technologies such as Docker, Kubernetes, etc. - Monitoring tools such as Graphite, Grafana, and Prometheus. - Hashicorp technologies such as Consul, Vault, Terraform, and Vagrant. - Configuration management tools such as Chef, Puppet, or Ansible. - In-depth experience with continuous integration and continuous deployment pipelines. - Exposure to Maven, Ant, or Gradle for builds.,

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now
Qualys logo
Qualys

Computer and Network Security

Foster City CA

RecommendedJobs for You