Job
Description
Project Role :Technology Support Engineer
Project Role Description :Resolve incidents and problems across multiple business system components and ensure operational stability. Create and implement Requests for Change (RFC) and update knowledge base articles to support effective troubleshooting. Collaborate with vendors and help service management teams with issue analysis and resolution.
Must have skills :Prometheus Event Monitoring System
Good to have skills :Kubernetes, GrafanaMinimum
7.5 year(s) of experience is required
Educational Qualification :15 years full time education
Summary:Role DescriptionWe are seeking a highly skilled Monitoring and Observability Specialist with expertise in Prometheus, AWS CloudWatch, Azure Monitor, and container monitoring. The candidate should have experience in setting up observability frameworks, integrating new monitoring solutions, executing migrations, and delivering large-scale projects. This role is suited for professionals with 7+ years of experience in enterprise monitoring and observability across hybrid and cloud-native environments.Must Have Skills- Strong hands-on experience with Prometheus, AWS CloudWatch, and Azure Monitor- Experience in container monitoring using tools like Prometheus with Kubernetes, Grafana, and cAdvisor- Expertise in setting up observability frameworks across distributed systems- Proficiency in scripting (Python, Shell) for automation and integration- Experience in setting up and managing monitoring for distributed systems and microservices- Familiarity with OpenTelemetry and telemetry data pipelines- Experience in integrating monitoring tools with cloud-native and hybrid environments- Experience with alerting, dashboarding, and metric , create Observability CenterGood to Have Skills- Exposure to cloud platforms such as AWS, Azure, and GCP- Experience with microservices and container orchestration platforms like Kubernetes and Docker Swarm- Knowledge of ITIL framework and incident management processes- Familiarity with log analytics tools like ELK Stack and Splunk- Understanding of REST APIs and third-party tool integrationsJob RequirementsThe candidate must have a strong background in monitoring and observability, with the ability to design, implement, and manage monitoring solutions for large-scale enterprise environments. They should be capable of working independently and collaboratively, with excellent problem-solving and communication skills.Key Responsibilities- Design and implement monitoring and observability strategies across infrastructure and applications- Set up and manage Prometheus, CloudWatch, and Azure Monitor for performance and availability tracking- Develop automation scripts and workflows to enhance monitoring capabilities- Lead integration and migration projects for cloud and legacy monitoring platforms- Collaborate with cross-functional teams to ensure observability and performance insights- Create and maintain dashboards, alerts, and reports for operational visibility- Provide L3 support and troubleshooting for monitoring tools- Document configurations, procedures, and best practices
Technical ExperienceMinimum of 8 years of experience in enterprise monitoring and observability. Proven expertise in Prometheus, CloudWatch, Azure Monitor, and container monitoring. Experience in setting up observability frameworks and delivering large-scale monitoring solutions.Professional AttributesExcellent verbal and written communication skills. Strong analytical and problem-solving abilities. Ability to work independently and in a team environment. Commitment to continuous learning and improvement.Educational Qualification and CertificationBachelors Degree in Computer Science, Information Technology, or related field. Relevant certifications in monitoring tools, cloud platforms, or observability frameworks are a plus. Qualification
15 years full time education