Job
Description
About The Role
Project Role :Technology Support Engineer
Project Role Description :Resolve incidents and problems across multiple business system components and ensure operational stability. Create and implement Requests for Change (RFC) and update knowledge base articles to support effective troubleshooting. Collaborate with vendors and help service management teams with issue analysis and resolution.
Must have skills :Prometheus Event Monitoring System
Good to have skills :Kubernetes, Grafana
Minimum 7.5 year(s) of experience is required
Educational Qualification :15 years full time education
Summary:Role DescriptionWe are seeking a highly skilled Monitoring and Observability Specialist with expertise in Prometheus, AWS CloudWatch, Azure Monitor, and container monitoring. The candidate should have experience in setting up observability frameworks, integrating new monitoring solutions, executing migrations, and delivering large-scale projects. This role is suited for professionals with 7+ years of experience in enterprise monitoring and observability across hybrid and cloud-native environments.Must Have Skills
Strong hands-on experience with Prometheus, AWS CloudWatch, and Azure MonitorExperience in container monitoring using tools like Prometheus with Kubernetes, Grafana, and cAdvisorExpertise in setting up observability frameworks across distributed systemsProficiency in scripting (Python, Shell) for automation and integrationExperience in setting up and managing monitoring for distributed systems and microservicesFamiliarity with OpenTelemetry and telemetry data pipelinesExperience in integrating monitoring tools with cloud-native and hybrid environmentsExperience with alerting, dashboarding, and metric , create Observability CenterGood to Have SkillsExposure to cloud platforms such as AWS, Azure, and GCPExperience with microservices and container orchestration platforms like Kubernetes and Docker SwarmKnowledge of ITIL framework and incident management processesFamiliarity with log analytics tools like ELK Stack and SplunkUnderstanding of REST APIs and third-party tool integrationsJob RequirementsThe candidate must have a strong background in monitoring and observability, with the ability to design, implement, and manage monitoring solutions for large-scale enterprise environments. They should be capable of working independently and collaboratively, with excellent problem-solving and communication skills.Key ResponsibilitiesDesign and implement monitoring and observability strategies across infrastructure and applicationsSet up and manage Prometheus, CloudWatch, and Azure Monitor for performance and availability trackingDevelop automation scripts and workflows to enhance monitoring capabilitiesLead integration and migration projects for cloud and legacy monitoring platformsCollaborate with cross-functional teams to ensure observability and performance insightsCreate and maintain dashboards, alerts, and reports for operational visibilityProvide L3 support and troubleshooting for monitoring toolsDocument configurations, procedures, and best practices
Technical ExperienceMinimum of 8 years of experience in enterprise monitoring and observability. Proven expertise in Prometheus, CloudWatch, Azure Monitor, and container monitoring. Experience in setting up observability frameworks and delivering large-scale monitoring solutions.Professional AttributesExcellent verbal and written communication skills. Strong analytical and problem-solving abilities. Ability to work independently and in a team environment. Commitment to continuous learning and improvement.Educational Qualification and CertificationBachelor's Degree in Computer Science, Information Technology, or related field. Relevant certifications in monitoring tools, cloud platforms, or observability frameworks are a plus. Qualification 15 years full time education