Responsibilities
Database Administration & Operations
- Configure, maintain, and administer SQL Server instances (SQL Server 2016/2019) for high availability, performance, and security.
- Monitor system health, capacity, and performance using industry-standard tools and build proactive automation where applicable.
- Manage database deployments, schema changes, and release activities in collaboration with development and DevOps teams.
High Availability, Clustering & Replication
- Design, configure, and manage AlwaysOn Availability Groups, including multi-subnet clusters, automatic failover, read-only routing, and AG listener management.
- Implement and support Failover Cluster Instances (FCI) and Windows Server Failover Clustering (WSFC).
- Manage SQL Server transactional replication, snapshot replication, and peer-to-peer replication, including monitoring, troubleshooting replication latency, reinitialization, and conflict resolution.
- Build DR strategies using AGs, replication, and log shipping based on RPO/RTO requirements.
- Participate in HA/DR testing, root cause analysis, and post-incident reviews.
Performance Tuning & Optimization
- Analyze and optimize SQL queries, stored procedures, execution plans, indexing strategies, and server configuration parameters.
- Identify performance bottlenecks related to CPU, IO, memory, blocking, deadlocks, and inefficient query patterns.
- Implement index maintenance routines, statistics management, and optimize storage/I/O footprints.
Backup, Recovery & Security
- Manage enterprise backup and recovery strategies using native SQL Server tools or third-party backup solutions.
- Perform recovery testing, PITR validation, and maintain DR runbooks.
- Configure and manage encryption mechanisms including TDE, Always Encrypted, SSL certificates, and column-level encryption.
- Ensure database security, access control, and audit compliance.
Production Support & On-call Responsibilities
- Respond to critical production alerts and participate in a rotational on-call schedule.
- Troubleshoot outages, replication failures, cluster failovers, blocking/deadlock events, and other production issues.
- Write runbooks, SOPs, and RCA documents.
Requirements
- 5-10 years of hands-on SQL Server experience, including:
- Strong experience with AlwaysOn AGs, WSFC clusters, listener configurations, and HA/DR implementations.
- Expertise in SQL Server replication (transactional, snapshot, P2P), including monitoring and troubleshooting.
- Strong T-SQL skills, including writing complex queries, stored procedures, and optimizing SQL code.
- Experience with SQL Server on Azure (IaaS/PaaS) including Managed Instance or SQL Database.
- Experience working in high-availability, large-scale OLTP environments.
- Expertise in index design, statistics, execution plan analysis, and server-level performance tuning.
- Experience with SQL Server security, encryption (TDE, Always Encrypted), certificates, and key management.
- Familiarity with monitoring tools such as SQL Monitor, SolarWinds DPA, Idera, Zabbix, SCOM, etc.
- Strong problem-solving and root cause analysis skills.
- Excellent written and verbal communication
What sets you apart
- Exposure to PostgreSQL or other open-source databases.
- Experience with PowerShell or other scripting languages for automation.
- Knowledge of CI/CD processes for database deployments.
- Experience with cloud-native HA/DR (Azure Site Recovery, Geo-replication).