Job
Description
About The Role
Project Role :Custom Software Engineer
Project Role Description :Develop custom software solutions to design, code, and enhance components across systems or applications. Use modern frameworks and agile practices to deliver scalable, high-performing solutions tailored to specific business needs.
Must have skills :AWS Architecture
Good to have skills :Python (Programming Language), DevOps Architecture, Ansible on Microsoft Azure
Minimum 7.5 year(s) of experience is required
Educational Qualification :15 years full time education
Scope of the role
AWS operations:EC2, EKS, RDS, ALB/CloudFront, IAM/OIDC, VPC/TGW/SGs, patching, and hygiene.Application support:release readiness, runbooks, post-deploy smoke checks, performance baselines, and clean rollback paths.Visibility:dashboards, logs, metrics, traces, synthetics, error budgets, and alert health.Backup & DR:policies, schedules, retention, cross-region copies, restore testing, and DR runbooks (RPO/RTO owned and measured).Incident leadership:run Sev-1/2 bridges, keep comms clear, and land post-mortems with actions that actually close.Cost hygiene:tagging, right-sizing, SP/RI coverage, lifecycle cleanups (EBS/EIP/AMIs).Team enablement:guardrails, golden runbooks, and small automations that remove toil.Day-to-day (what this looks like)Triage overnight alerts and hot issues, set priorities, and make sure owners are clear.Keep dashboards honest; fix flapping or missing alerts before they wake people up.Check backups and recent restore points; open tickets for any gaps and track to done.Unblock releases; verify smoke checks; keep environments tidy and predictable.Lead or delegate break/fix; no lingering “mystery” incidents.Write down what we learned in the runbook so the next person can fix it faster.Weekly rhythmOps review:incidents, alerts, deploys, costs, capacity, and backup status in one short readout.Observability tune-up:delete noise, add the missing signal, and test a synthetic from the edge.Backup/DR:run a small restore test and record RPO/RTO evidence.Patch and change review:what shipped, what rolled back, why.Monthly outcomesShare availability/SLOs, MTTR, change failure rate, observability coverage, backup compliance, and costs in plain English.Close the top recurring issues (noisy alerts, flaky deploys).Refresh the most-used runbooks; validate DR for one critical workload (tabletop or live restore).Core responsibilitiesOwn production readiness and stability for assigned AWS accounts and apps.Lead incidents and land post-mortems; make the fixes stick.Keep monitoring/logging/tracing standards real; enforce SLOs and error budgets.Own backup strategy end-to-end, including monthly restore tests and DR docs.Keep access least-privileged and auditable; rotate secrets and certs on time.Drive cost posture and mentor the team; make on-call humane.What “good” looks likeVisibility:one clear dashboard per service, clean alert routing, low false positives.Backups:100% jobs green (or retried), documented RPO/RTO, and monthly restore tests that pass.Reliability:MTTR trending down; most issues solved by the first responder with a runbook.Change:predictable releases with smoke and rollback; fewer failed changes month over month.Cost:flat or down against growth; tagging at or above 95%. Experience we’re looking for8–10+ years in cloud/app operations with strong AWS hands-on. Comfortable leading incidents, shaping dashboards and alerts, and automating the boring bits (Terraform, Ansible, Python). Experience running backups/DR in AWS and proving it with real restore tests. Cloud network experience.
Qualification 15 years full time education