Site Reliability Engineer

8 years

0 Lacs

Posted:16 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role:

Experience:

Location:

Notice Period:


About Times Internet

At Times Internet, we create premium digital products that simplify and enhance the lives of

millions. As India’s largest digital products company, we have a significant presence across a

wide range of categories, including News, Sports, Fintech, and Enterprise solutions.

Our portfolio features market-leading and iconic brands such as TOI, ET, NBT, Cricbuzz, Times

Prime, Times Card, Indiatimes, Whatshot, Abound, Willow TV, Techgig, and Times Mobile

among many more. Each of these products is crafted to enrich your experiences and bring you

closer to your interests and aspirations.

As an equal opportunity employer, Times Internet strongly promotes inclusivity and diversity. We

are proud to have achieved overall gender pay parity in 2018, verified by an independent audit

conducted by Aon Hewitt.

We are driven by the excitement of new possibilities and are committed to bringing innovative

products, ideas, and technologies to help people make the most of every day. Join us and take

us to the next level!


We are looking for a Site Reliability Engineer (SRE) to join our News Team. The SRE will be

responsible for maintaining the reliability, scalability, and performance of our critical

infrastructure, ensuring high availability for our services.


Job Role:

As a Site Reliability Engineer (SRE) in the News Team, you will be responsible for ensuring the

stability, performance, and scalability of our systems. You will play a key role in various

migration activities, including Kubernetes cluster upgrades, and application re-platforming. A

significant part of your role will involve migrating applications into Kubernetes, ensuring

seamless deployment, high availability, and minimal downtime.

Additionally, you will be responsible for configuring and maintaining Elasticsearch and Kafka

clusters, ensuring optimal performance, availability, and reliability. You will work on tuning

Elasticsearch for efficient search and indexing, managing Kafka for real-time data streaming,

and troubleshooting any issues related to these services.

You will work on automating operational tasks, optimizing infrastructure, and proactively

resolving issues to maintain system reliability. Additionally, you will collaborate with

development, DevOps, and infrastructure teams to implement best practices for security,

observability, and scalability. Your expertise will be crucial in improving deployment pipelines,

incident response, and overall system performance.


Job Responsibilities:

● Ensure IT services and infrastructure uptime.

● Implement monitoring, alerting, and incident response processes

● Automate repetitive ops tasks (deployments, scaling, failover).

● Respond to outages and production incidents (on-call duties).

● Perform root cause analysis (RCA) and drive postmortems.

● Measure and optimize system performance (latency, throughput, resource usage).

● Support reliable and safe code releases

● Ensure systems are patched, hardened, and compliant with standards.

● Collaborate with technology teams for new requirements and deliver them


Technical Skills Required:

● 8+ years of experience in Site Reliability Engineering, or a related role.

● Proficiency in Kubernetes, Docker, and container orchestration.

● Experience with CI/CD tools.

● Strong knowledge of Linux systems and scripting (Bash, Python).

● Familiarity with configuration management tools like Ansible,Helm.

● Experience with monitoring and logging tools (ELK Stack, or NewRelic).

● Strong troubleshooting skills and incident management experience.

● Experience with Elasticsearch and Kafka

● Knowledge of networking concepts, load balancers, and DNS.

● Experience in performance tuning and optimization.


Soft Skills Required:

● Systems & OS Knowledge

● Linux/Unix administration (process management, system tuning, networking)

● Understanding of filesystems, memory, CPU, kernel basics (centos / Ubuntu )

● Scripting for automation: BASH, python

● Knowledge of cloud platforms : AWS, GCP, Azzure

● Networking and Protocols

● TCP/IP, DNS, HTTP/HTTPS, CDN concepts

● Debugging latency, connectivity, and routing issues

● CI/CD and DevOps Practices

● Jenkins, GitHub Actions, GitLab CI, BitBucket, Git

● Working knowledge of Apache, Tomcat, Nginx

● Knowledge of DNS, Load Balancer, WAF, Firewall.

● Working knowledge of Monitoring tools and ELK

● Knowledge hypervisor like VMware.

● Strong on Virtualization technologies, Docker, Kubernetes

● Knowledge of Database concepts


Qualifications - Education & Experience:

● Bachelor’s degree in Electronic and Telecom, Computer Science, Information

Technology, or a related field.

● 8+ years of experience in Site Reliability Engineering

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Times Internet logo
Times Internet

Digital Media, Technology

Mumbai

RecommendedJobs for You