Site Reliability Engineer -L3

5 - 9 years

9 - 13 Lacs

Posted:2 months ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Founded in 2021, Tessell is a hyper-growth Database-as-a-Service (DBaaS) company offering a revolutionary data infrastructure and management platform for both cloud-born and cloud-defining enterprises. Headquartered in the San Francisco Bay Area, with a big hub in Bangalore, Tessell provides a fully managed service for both open-source and commercial data technologies such as Oracle, PostgreSQL, MySQL, and SQL Server on all major clouds, including Amazon Web Services (AWS) and Microsoft Azure. Tessell offers a unified data management platform that focuses on delivering higher performance at lower costs. Were onto building a pathbreaking technology that revolutionises the way data is managed on the cloud. As a part of Global Support you will provide support to customers, customer support personnel, and field support staff that is focused on diagnosing, troubleshooting, repairing and debugging Tessell service. You respond to situations where first-line product support has failed to isolate or fix problems in software products, and you ensure delivery of optimal results. You must be a take charge professional with demonstrated technical problem-solving skills; and a subject matter expert; and have a strong customer service orientation and experience. Responsibilities: 1. Database Administration (DBA) Skills Relational Databases: MySQL, PostgreSQL, Oracle, MS SQL Server. Database Backup & Recovery: Tools and strategies for database backups and disaster recovery. Performance Tuning: Query optimization, indexing strategies, and database performance troubleshooting. Database Security: User management, roles, access control, and auditing. 2. Infrastructure as a Service Knowledge Infrastructure as Code (IaC): Terraform, CloudFormation, Kubernetes. Kubernetes & Containers: Good Knowledge and Understanding of Kubernetes and usage of Containers. Observability Tools: ELK stack (Elasticsearch, Logstash, Kibana) Database Migration: Migrating databases across different platforms or cloud environments. Infrastructure Scaling: Vertical and horizontal scaling techniques in cloud environments. 3. SRE Principles and knowledge (Site Reliability Engineering) Strong hands-on experience in AWS and Azure cloud, and a fair understanding of Google Cloud would be required. Experience in handling APIs, troubleshooting API calls, and ensuring seamless integration and performance. Incident Management: Handling database outages, incident response, and on-call rotations. Monitoring and Alerting: Tools like Prometheus, Grafana, Datadog, CloudWatch , suggest proactive monitoring for the application stack Understanding on core SRE principles: SLA, SLI, SLO, Error budgets etc Disaster Recovery Planning: Ensuring high availability (HA) and disaster recovery (DR) solutions. Performance Optimisation :- Track latency, slow performance , high utilisation issues and recommend optimisation as required. 4. Scripting and Automation Scripting Languages: Python, Shell scripting, Bash, PowerShell. Automation Tools: Ansible, Puppet, Chef. Infrastructure Automation: Automating database deployment, patching, and scaling. 5. Networking and Infrastructure Networking Basics: TCP/IP, DNS, Firewall, Load Balancers. Database Connectivity: Connection pooling, failover strategies, and multi- region deployment. Storage and Disk Management: Understanding IOPS, latency, and throughput. Infrastructure: Familiarity with AWS services like EC2, S3, VPC, Security Groups, Private and Public subnets,IAM, CloudWatch, Cloudtrail etc and Azure services like Virtual Machines, Azure functions, Virtual Network, Resource Manager, etc. 6. OS Skills Expertise in Linux OS ( RHEL, Ubuntu, Centos) Understanding of file systems (ext4, XFS, etc.), permissions, and ownerships Knowledge of process monitoring, management, and troubleshooting Proficiency with tools like top, htop, vmstat, iostat, sar, and dstat to monitor CPU, memory, disk I/O, and network usage. Ability to analyze system logs (/var/log/, journalctl, dmesg) for troubleshooting. Understanding of resource limits (CPU, memory, disk, network) and how they impact database performance. Knowledge of partitioning tools (fdisk, parted) and file system management (mkfs, mount, umount). Understanding of RAID configurations and Logical Volume Management (LVM) for storage scalability. 7. Troubleshooting and Debugging Log Analysis: Reading and analysing database and system logs. Root Cause Analysis (RCA): Performing in-depth analysis after major incidents and sharing RCA with customers. Query Performance: Analysing slow queries, deadlocks, and resource contention. 8 . Soft Skills Communication Skills: Clear written and verbal communication with internal and external stakeholders. Problem-Solving: Ability to prioritise, troubleshoot critical issues and bring them to closure.. Collaboration: Working closely with DevOps, Infrastructure, and Engineering teams.

Mock Interview

Practice Video Interview with JobPe AI

Start San Interview Now

My Connections Tessell

Download Chrome Extension (See your connection in the Tessell )

chrome image
Download Now
Tessell
Tessell

Software Development

San Francisco

50-200 Employees

17 Jobs

    Key People

  • Jane Doe

    CEO
  • John Smith

    CTO

RecommendedJobs for You

Bengaluru / Bangalore, Karnataka, India

Bengaluru / Bangalore, Karnataka, India

Bengaluru / Bangalore, Karnataka, India

Bengaluru / Bangalore, Karnataka, India