Job Title:
Windows System Engineer (Disaster Recovery)
Location:
Bangalore, Chennai, Pune, Mumbai, Hyderabad
Experience Level:
5–8 years
Job Summary
We are seeking an experienced
Windows System Engineer (DR)
to manage, support, and optimize Windows-based enterprise infrastructure with a focus on
Disaster Recovery (DR)
and
Business Continuity Planning (BCP)
. The ideal candidate will have strong expertise in
failover recovery, networking, database health checks, load balancing
, and
monitoring/log analytics
using
Splunk
. The role requires a proactive engineer who ensures systems are resilient, secure, and compliant with defined
RTO/RPO
objectives.
Key Responsibilities
- Disaster Recovery (DR) Management
- Plan, implement, and maintain DR environments for Windows-based systems.
- Conduct failover and failback exercises to validate DR readiness and minimize downtime.
- Define and manage RTO (Recovery Time Objective) and RPO (Recovery Point Objective) metrics for critical systems.
- Perform DR drills and document lessons learned for continuous improvement.
- System and Network Administration
- Manage Windows Server environments (2016/2019/2022) including patching, performance tuning, and troubleshooting.
- Configure and monitor network components (DNS, DHCP, IP routing, firewall rules, VLANs) to ensure connectivity during DR operations.
- Collaborate with network teams to validate load balancing and failover mechanisms.
- Database and Application Checks
- Perform DB sanity checks and ensure data consistency post-DR switchovers.
- Coordinate with DBAs and application owners to validate system availability post-recovery.
- Load Balancer and Failover Operations
- Configure, test, and monitor load balancers (F5, Citrix, HAProxy, or similar) to ensure high availability.
- Validate load balancing rules, session persistence, and failover logic during DR scenarios.
- Monitoring and Incident Management
- Use Splunk to monitor system health, event logs, and performance metrics.
- Develop dashboards and alerts for proactive issue detection and resolution.
- Perform root cause analysis (RCA) for system outages and DR failures.
- Documentation and Compliance
- Maintain DR documentation, runbooks, and standard operating procedures (SOPs).
- Support audit and compliance activities by providing recovery metrics and validation reports.
- Collaborate with IT Security and Compliance teams to ensure DR adherence to organizational standards.
Required Skills & Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- 5–8 years of hands-on experience in Windows System Administration and Disaster Recovery Planning.
- Strong knowledge of Windows Server (2016/2019/2022) and Active Directory Services.
- Hands-on experience with failover clustering, replication, and backup technologies (e.g., Veeam, Commvault, Azure Backup).
- Solid understanding of RTO/RPO concepts and disaster recovery frameworks.
- Experience with networking fundamentals – TCP/IP, DNS, DHCP, VLAN, routing, and firewalls.
- Practical knowledge of load balancer configuration and failover testing.
- Experience using Splunk for log management, monitoring, and alerting.
- Familiarity with PowerShell scripting for automation and system checks.
- Excellent troubleshooting, analytical, and documentation skills