Home
Jobs

541 Grafana Jobs - Page 12

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 8.0 years

6 - 10 Lacs

Pune

Work from Office

Naukri logo

: Job TitleProduction Specialist, Associate LocationPune, India Role Description Our organization within Deutsche Bank is AFC Production Services. We are responsible for providing technical L2 application support for business applications. The AFC (Anti-Financial Crime) line of business has a current portfolio of 25+ applications. The organization is in process of transforming itself using Google Cloud and many new technology offerings. Your role will include hands-on production support and be actively involved in technical issues resolution across multiple applications. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support." What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy, Best in class leave policy. Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Provide technical support by handling and consulting on BAU, Incidents/emails/alerts for the respective applications. Perform post-mortem, root cause analysis using ITIL standards of Incident Management, Service Request fulfillment, Change Management, Knowledge Management, and Problem Management. Analyze occurred errors out of the batch processing and interfaces of related systems. Resolution or Workaround determination and implementation Supporting the resolution of high impact incidents on our applications, including attendance at incident bridge calls Escalate incident tickets timely and communicate effectively with business users, development teams, and stakeholders. Providing resolution for open problems or ensuring that the appropriate parties have been tasked with doing so. Supporting the handover from new Projects / Applications into Production Services with Service Transition before Go Life Phase. Assist in the process to approve application code releases as well as tasks assigned to support to perform. Keep key stakeholders informed using communication templates. Automate routine tasks and enhance operational efficiencies through scripts and tools. Support the transition of applications to Google Cloud and new technologies offering. Proactively Identify performance bottlenecks and suggest optimization strategies. Support audit, compliance, and regulatory requirements related to AFC applications. The candidate will have to work in shifts as part of a Rota covering APAC and EMEA hours between 07:00 IST and 09:00 PM IST (2 shifts). In the event of major outages or issues we may ask for flexibility to help provide appropriate cover. Supporting On Call-Support activities Your skills and experience 4-8 years of experience in providing hands on IT application support. Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience/diploma/certification). Preferred: ITIL v3 foundation certification or higher. Clear and concise documentation in general and especially a proper documentation of the status of incidents, problems, and service requests in the Service Management tool. Monitoring ToolsKnowledge of Elastic Search, Control M, Grafana, Geneos, OpenShift, Prometheus, Google Cloud Monitoring,Airflow, Splunk Red Hat Enterprise Linux (RHEL) professional skill in searching logs, process commands, start/stop processes, use of OS commands to aid in tasks needed to resolve or investigate issues. Shell scripting knowledge a plus. Understanding of database concepts and exposure in working with Oracle, MS SQL, Big Query etc. databases. Ability to work across countries, regions, and time zones with a broad range of cultures and technical capability. Skills That Will Help You Excel Strong written and oral communication skills, including the ability to communicate technical information to a non-technical audience and good analytical and problem-solving skills. Analytical and problem-solving skills, with a structured approach to troubleshooting, issue resolution and its documentation. Able to train, coach, and mentor and know where each technique is best applied. Experience with GCP or another public cloud provider to build applications. Experience in an investment bank, financial institution or large corporation using enterprise hardware and software. Knowledge of Actimize, Mantas, and case management software is good to have. Working knowledge of Big Data Hadoop/Secure Data Lake is a plus. Prior experience in automation projects is great to have. Exposure to python, shell, Ansible or other scripting language for automation and process improvement Strong stakeholder management skills ensuring seamless coordination between business, development, and infrastructure teams. How we'll support you Training and development to help you excel in your career. Coaching and support from experts in your team A culture of continuous learning to aid progression. A range of flexible benefits that you can tailor to suit your needs.

Posted 2 weeks ago

Apply

6.0 - 8.0 years

12 - 16 Lacs

Bengaluru

Work from Office

Naukri logo

: Job TitleSite Reliability Engineer LocationBangalore, India Corporate TitleAssociate Role Description You will work closely with application teams to ensure stable, well monitored applications that are resilient to faults. You will agree and review Service Level Objectives (SLOs) to achieve high availability for applications based on their criticality. You will maintain Error Budgets for the application teams and prevent releases in the event of production instability and reduced availability. You will focus on reducing manual toil, improving operational reliability and driving automation-first practices. This is a hands-on role with strong focus on implementing SRE practices and reducing toil for Developer Tools. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Drive stability, performance and reliability improvements for TDI Engineering applications. Build Monitoring and alerting solutions to alert in the event of failures/performance issues across TDI Engineering applications to help us providing the optimum service level to the users. Provide feedback loops to continually improve the application resilience across multiple application teams. Collaborate with product owners and engineering team to prioritize reliability and stability of these applications. Define, measure and maintain SLOs and Error Budgets to ensure availability for end users and to achieve appropriate levels of application stability. Identify opportunities for automation and self-service capabilities and implement them to eliminate toil for both the application teams and the SRE team to optimise effectiveness Manage outage resolution and agree actions to reduce the likelihood of failure happening in future by owning RCA and conducting blameless postmortems. Your skills and experience Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience or diploma). 4+ Years of Experience in IT in large corporate environments, specifically in controlled production environments. Demonstrable Site Reliability Engineering experience of at least 2+ Years. Excellent analytical and problem-solving skills Experience in implementing observability solution using any industry standard tools Scripting skills (Groovy, shell, Bash, Cron or any equivalent) Experience in mid-range technologies and platforms, i.e. UNIX/LINUX, ORACLE database and Nginx experience. Good to have: Understanding and experience in Developer Tools (Jira, Confluence, Bitbucket, TeamCity, Artifactory, Udeploy) as an enterprise level Administrator experienced in managing applications with large user base. Knowledge and experience of observability tools like Grafana, Prometheus. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs

Posted 2 weeks ago

Apply

6.0 - 8.0 years

10 - 15 Lacs

Bengaluru

Work from Office

Naukri logo

: Job TitleSite Reliability Engineer LocationBangalore,India Corporate TitleAnalyst Role Description You will work closely with application teams to ensure stable, well monitored applications that are resilient to faults. You will agree and review Service Level Objectives (SLOs) to achieve high availability for applications based on their criticality. You will maintain Error Budgets for the application teams and prevent releases in the event of production instability and reduced availability. You will focus on reducing manual toil, improving operational reliability and driving automation-first practices. This is a hands-on role with strong focus on implementing SRE practices and reducing toil for Developer Tools. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Drive stability, performance and reliability improvements for TDI Engineering applications. Build Monitoring and alerting solutions to alert in the event of failures/performance issues across TDI Engineering applications to help us providing the optimum service level to the users. Provide feedback loops to continually improve the application resilience across multiple application teams. Collaborate with product owners and engineering team to prioritize reliability and stability of these applications. Define, measure and maintain SLOs and Error Budgets to ensure availability for end users and to achieve appropriate levels of application stability. Identify opportunities for automation and self-service capabilities and implement them to eliminate toil for both the application teams and the SRE team to optimise effectiveness Manage outage resolution and agree actions to reduce the likelihood of failure happening in future by owning RCA and conducting blameless postmortems. Your skills and experience Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience or diploma). 2+ Years of Experience in IT in large corporate environments, specifically in controlled production environments. Demonstrable Site Reliability Engineering experience of at least 1+ Years. Excellent analytical and problem-solving skills Experience in implementing observability solution using any industry standard tools Scripting skills (Groovy, shell, Bash, Cron or any equivalent) Experience in mid-range technologies and platforms, i.e. UNIX/LINUX, ORACLE database and Nginx experience . Good to have Understanding and experience in Developer Tools (Jira, Confluence, Bitbucket, TeamCity, Artifactory, Udeploy) as an enterprise level Administrator experienced in managing applications with large user base. Knowledge and experience of observability tools like Grafana, Prometheus. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs

Posted 2 weeks ago

Apply

6.0 - 8.0 years

37 - 40 Lacs

Pune

Work from Office

Naukri logo

: Job TitleProduction Specialist, AVP LocationPune, India Role Description Our organization within Deutsche Bank is AFC Production Services. We are responsible for providing technical L2 application support for business applications. The AFC (Anti-Financial Crime) line of business has a current portfolio of 25+ applications. The organization is in process of transforming itself using Google Cloud and many new technology offerings. As an Assistant Vice President, your role will include hands-on production support and be actively involved in technical issues resolution across multiple applications. You will also be working as application lead and will be responsible for technical & operational processes for all application you support. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support." What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy, Best in class leave policy. Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Provide technical support by handling and consulting on BAU, Incidents/emails/alerts for the respective applications. Perform post-mortem, root cause analysis using ITIL standards of Incident Management, Service Request fulfillment, Change Management, Knowledge Management, and Problem Management. Manage regional L2 team and vendor teams supporting the application. Ensure the team is up to speed and picks up the support duties. Build up technical subject matter expertise on the applications being supported including business flows, application architecture, and hardware configuration. Define and track KPIs, SLAs and operational metrics to measure and improve application stability and performance. Conduct real time monitoring to ensure application SLAs are achieved and maximum application availability (up time) using an array of monitoring tools. Build and maintain effective and productive relationships with the stakeholders in business, development, infrastructure, and third-party systems / data providers & vendors. Assist in the process to approve application code releases as well as tasks assigned to support to perform. Keep key stakeholders informed using communication templates. Approach support with a proactive attitude, desire to seek root cause, in-depth analysis, and strive to reduce inefficiencies and manual efforts. Mentor and guide junior team members, fostering technical upskill and knowledge sharing. Provide strategic input into disaster recovery planning, failover strategies and business continuity procedures Collaborate and deliver on initiatives and install these initiatives to drive stability in the environment. Perform reviews of all open production items with the development team and push for updates and resolutions to outstanding tasks and reoccurring issues. Drive service resilience by implementing SRE(site reliability engineering) principles, ensuring proactive monitoring, automation and operational efficiency. Ensure regulatory and compliance adherence, managing audits,access reviews, and security controls in line with organizational policies. The candidate will have to work in shifts as part of a Rota covering APAC and EMEA hours between 07:00 IST and 09:00 PM IST (2 shifts). In the event of major outages or issues we may ask for flexibility to help provide appropriate cover. Weekend on-call coverage needs to be provided on rotational/need basis. Your skills and experience 9-15 years of experience in providing hands on IT application support. Experience in managing vendor teams providing 24x7 support. Preferred Team lead role experience, Experience in an investment bank, financial institution. Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience/diploma/certification). Preferred ITIL v3 foundation certification or higher. Knowledgeable in cloud products like Google Cloud Platform (GCP) and hybrid applications. Strong understanding of ITIL /SRE/ DEVOPS best practices for supporting a production environment. Understanding of KPIs, SLO, SLA and SLI Monitoring ToolsKnowledge of Elastic Search, Control M, Grafana, Geneos, OpenShift, Prometheus, Google Cloud Monitoring, Airflow,Splunk. Working Knowledge of creation of Dashboards and reports for senior management Red Hat Enterprise Linux (RHEL) professional skill in searching logs, process commands, start/stop processes, use of OS commands to aid in tasks needed to resolve or investigate issues. Shell scripting knowledge a plus. Understanding of database concepts and exposure in working with Oracle, MS SQL, Big Query etc. databases. Ability to work across countries, regions, and time zones with a broad range of cultures and technical capability. Skills That Will Help You Excel Strong written and oral communication skills, including the ability to communicate technical information to a non-technical audience and good analytical and problem-solving skills. Proven experience in leading L2 support teams, including managing vendor teams and offshore resources. Able to train, coach, and mentor and know where each technique is best applied. Experience with GCP or another public cloud provider to build applications. Experience in an investment bank, financial institution or large corporation using enterprise hardware and software. Knowledge of Actimize, Mantas, and case management software is good to have. Working knowledge of Big Data Hadoop/Secure Data Lake is a plus. Prior experience in automation projects is great to have. Exposure to python, shell, Ansible or other scripting language for automation and process improvement Strong stakeholder management skills ensuring seamless coordination between business, development, and infrastructure teams. Ability to manage high-pressure issues, coordinating across teams to drive swift resolution. Strong negotiation skills with interface teams to drive process improvements and efficiency gains. How we'll support you Training and development to help you excel in your career. Coaching and support from experts in your team A culture of continuous learning to aid progression. A range of flexible benefits that you can tailor to suit your needs. About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.

Posted 2 weeks ago

Apply

6.0 - 8.0 years

6 - 9 Lacs

Pune

Work from Office

Naukri logo

: Job TitleProduction support LocationPune, India Role Description You will be operating within Corporate Bank Production as a Production Support Engineer in the Client Access & Services subdivision. Client Access & Services, serves the critical client facing applications categorized under Payment Initiation, Account Services and Client Centric Technology. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Acting as a Production Support Analyst for the CB production team providing second level of support for the applications under the tribe working with key stakeholders and team members across the globe in 365 days, 24/7 working model As an individual contributor and prime liaison for the application suite into the incident, problem, change, release, capacity, and continuity functions Escalation, Management, and communication of major production incidents Liaising with development teams on new application handover and 3rd line escalation of issues Application rollout activities (may include some weekend activities) Develop a Continuous Service Improvement approach to resolve IT failings, drive efficiencies and remove repetition to streamline support activities, reduce risk, and improve system availability by understanding emerging trends and proactively addressing them. Carry out technical analysis of the Production platform to identify and remediate performance and resiliency issues. Update the RUN Book and KEDB as and when required Your skills and experience Good experience in Production Application Support and ITIL Practices Very good hands-on knowledge of databases (Oracle/MSSQL etc.), including working experience of writing SQL scripts and queries. Very Good hands-on experience on UNIX/Linux, Solaris, Java J2EE, Python, PowerShell scripts, Bash, Ansible, tools for automation (RPA, Workload, Batch) Experience in application performance monitoring tools Geneos, Splunk, Grafana & New Relic, Scheduling Tools (Control-M) Excellent Team player and People Management experience is an advantage Bachelor's degree (Economics, Business Administration, Finance preferred). Master's degree a plus. Previous relevant experience in Banking Domain 5+ years experience in IT in large corporate environments, specifically in the production support. Operating systems (e.g. UNIX, Windows) Understanding on environments - Middleware (e.g. MQ, WebLogic, Tomcat, Jboss, Apache, Kafka and similar) - Database environments (e.g. Oracle, MS-SQL, Sybase, No SQL) Experience in APM Tools like Splunk & Geneos; Control-M /Autosys; App dynamics Nice to have Cloud servicesGCP Experience with automation solutions (Ansible, Jenkins/Groovy, Python, Java) How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back Key Responsibilities Technical support Incident management, change management, problem management Monitoring Zabbix, Prometheus, ELK, Grafana Troubleshooting Customer issues, Application issues Linux file manipulation Perform system health checks Investigate Customer issues Investigate system alarms Lead/participate in Incident management Perform Change Reviews Problem management lead/participate in Postmortems Problem management drive resolution of customer impacting issues Improve detection of issues (alarm tuning) Fulfill daily requests Oncall Duties Required Qualifications To Be Successful In This Role Monitoring Zabbix, Prometheus, ELK, Grafana, Dynatrace, Nagios Incident Management Linux Additional Information Job Type Full Time Work ProfileHybrid (Work from Office/ Remote) Years of Experience3-7 Years LocationBangalore What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back Job Responsibilities Application monitoring and support: Manage traffic diversion during deployments Validation of code deployment success Post deployment health monitoring and reporting Production patching and monitoring activities for in scope applications (Liveliness Probe, DataGrid, SOSS, POD restarts) Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana) Monitor of DB server to verify through daily sanity check Verify Table Space / Disk Space status and warn if its reaching capacity Verify Memory and Processor usage and warn if its reaching capacity Production Monitoring Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution Providing the required Production Logs or access to Production Logs to analyze the incidents Provide the Root Cause Analysis for all Critical Incidents Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures Providing workarounds for Critical and High Incidents Updating relevant system, configuration or process documentation Document and promptly notify Bell of any emergency changes required Participate in AMS Operations Governance meetings (assumed to be bi-weekly) Responding to Application-related questions, performing data extraction as required Handling ad-hoc requests from end users for information, queries, or reports Providing holiday support coverage Performing peak period monitoring and reporting for specific critical applications Perform daily health checks for Critical applications Required Skills Docker/Kubernetes DB Administration CI/CD pipelines tools Scripting Log analysis, Monitoring tools (Grafana, Dynatrace) What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back At BCE Global Tech, immerse yourself in exciting projects that are shaping the future of both consumer and enterprise telecommunications This involves building innovative mobile apps to enhance user experiences and enable seamless connectivity on-the-go Thrive in diverse roles like Full Stack Developer, Backend Developer, UI/UX Designer, DevOps Engineer, Cloud Engineer, Data Science Engineer, and Scrum Master; at a workplace that encourages you to freely share your bold and different ideas If you are passionate about technology and eager to make a difference, we want to hear from you! Apply now to join our dynamic team in Bengaluru We're seeking a dedicated Site Reliability Engineer to join our team In this role, you will be responsible for maintaining the reliability, scalability, and performance of our systems You'll implement best practices for monitoring, incident response, and automation to ensure seamless operations Your expertise will help us build resilient infrastructure, reduce downtime, and enhance the overall user experience Key Responsibilities Experience working with various monitoring tools (eg ELK, Dyntrace, Cloudwatch, Cloud logging, Cloud Monitoring, BMC Surveyor, BMC Patrol, Grafana, Prometheus) Ensure monitoring and self-healing strategies are implemented and maintained to proactively prevent production incidents Perform root cause analysis of production issues Design and manage on call and escalation processes- Nice to Have Participate in design reviews and production reviews for new features, products, or pieces of infrastructure Designing and implementing ELK (Elasticsearch, Logstash and Kibana) stack, Prometheus and Grafana solutions for monitoring and alerting Debug production issues across services and levels of the stack Establish KPIs to demonstrate maturity, efficiency, and value to our business partners Works as an integral part of the DevOps team with complimentary skills and common goals L3 Support experience is an asset Work to create a Release management process and help with Out-of-business-hour deployments and support (Rotation with team members) Familiar and comfortable with agile development techniques Technology Skills (Mandatory) ELK, Dyntrace, Cloudwatch, Cloud logging, Cloud Monitoring, BMC Surveyor, BMC Patrol, Grafana, Prometheus Required Qualifications To Be Successful In This Role Bachelors degree in computer science engineering, or related field 8 -10 years of experience as a SRE Proven experience as an SRE, DevOps engineer, or similar role Strong programming skills in languages such as Python, Go, Java, or Ruby Strong problem-solving skills and ability to work under pressure Excellent communication and collaboration skills Flexible to work in EST time zones ( 9-5 EST) Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back Job Responsibilities Application monitoring and support: Manage traffic diversion during deployments Validation of code deployment success Post deployment health monitoring and reporting Production patching and monitoring activities for in scope applications (Liveliness Probe, DataGrid, SOSS, POD restarts) Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana) Monitor of DB server to verify through daily sanity check Verify Table Space / Disk Space status and warn if its reaching capacity Verify Memory and Processor usage and warn if its reaching capacity Production Monitoring Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution Providing the required Production Logs or access to Production Logs to analyze the incidents Provide the Root Cause Analysis for all Critical Incidents Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures Providing workarounds for Critical and High Incidents Updating relevant system, configuration or process documentation Document and promptly notify Bell of any emergency changes required Participate in AMS Operations Governance meetings (assumed to be bi-weekly) Responding to Application-related questions, performing data extraction as required Handling ad-hoc requests from end users for information, queries, or reports Providing holiday support coverage Performing peak period monitoring and reporting for specific critical applications Perform daily health checks for Critical applications Required Skills Docker/Kubernetes DB Administration CI/CD pipelines tools Scripting Log analysis, Monitoring tools (Grafana, Dynatrace) What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back Job Responsibilities Application monitoring and support: Manage traffic diversion during deployments Validation of code deployment success Post deployment health monitoring and reporting Production patching and monitoring activities for in scope applications (Liveliness Probe, DataGrid, SOSS, POD restarts) Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana) Monitor of DB server to verify through daily sanity check Verify Table Space / Disk Space status and warn if its reaching capacity Verify Memory and Processor usage and warn if its reaching capacity Production Monitoring Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution Providing the required Production Logs or access to Production Logs to analyze the incidents Provide the Root Cause Analysis for all Critical Incidents Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures Providing workarounds for Critical and High Incidents Updating relevant system, configuration or process documentation Document and promptly notify Bell of any emergency changes required Participate in AMS Operations Governance meetings (assumed to be bi-weekly) Responding to Application-related questions, performing data extraction as required Handling ad-hoc requests from end users for information, queries, or reports Providing holiday support coverage Performing peak period monitoring and reporting for specific critical applications Perform daily health checks for Critical applications Required Skills Docker/Kubernetes DB Administration CI/CD pipelines tools Scripting Log analysis, Monitoring tools (Grafana, Dynatrace) What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back As a Platform Support Engineer (APIGEE), you will have a solid understanding of API management platforms, such as Apigee, and cloud infrastructure (GCP), and possess a deep knowledge of networking, authentication, and monitoring This role involves troubleshooting and providing support for API integrations, working with both internal teams and external clients to resolve issues efficiently and maintain a smooth operational flow Key Responsibilities API Management SupportTroubleshoot, diagnose, and resolve issues related to API proxies and flows within the Apigee environment, including both Apigee X and Apigee Hybrid API Transaction DebuggingUse debugging tools to analyze API transactions and identify where problems may exist in the flow between the API Gateway and backend services Backend Integration TroubleshootingSupport tenant teams using Apigee, providing guidance on identifying and resolving issues with their APIs after backend upgrades or changes Platform MonitoringMonitor and interpret data from platforms such as ELK, Dynatrace, Datadog, New Relic, Grafana/Prometheus, and other monitoring tools to proactively detect and troubleshoot API issues Cloud Infrastructure Management (GCP)Utilize GCP services, including Compute Engine, Load Balancers, IAM/Roles permissions, Stack Driver/Cloud Logging, and Kubernetes clusters (GKE), to manage and troubleshoot platform issues Networking TroubleshootingAssist in troubleshooting network-related issues, such as DNS, load balancers, and firewalls, and investigate HTTPS protocol and certificate management issues Authentication SupportAddress API authentication issues, including LDAP, JWT, API Key, OIDC, and OAuth2 authentication flows Support Incident ManagementCoordinate and troubleshoot complex support scenarios, including debugging pipeline errors, analyzing logs, and providing solutions to client-facing issues Terraform & CI/CD Pipeline ManagementUse Terraform for infrastructure as code and GitLab CI/CD pipelines to deploy and maintain infrastructure changes Incident RecoveryBe able to identify and recover from issues such as network appliance crashes or deleted GSLB entries, and assist in the recovery of southbound network appliances via the GCP console Support Engagement Expectations API Access IssuesResolve issues when a tenant team cannot access their APIs after a backend upgrade, including analyzing transaction flows and identifying whether the issue lies with Apigee, the backend, or the client GSLB IssuesInvestigate and restore GSLB configurations when necessary, using Terraform pipelines to repair configurations System CrashesAnalyze logs and troubleshoot error states in network appliance clusters, using GCP console tools for recovery Pipeline ErrorsInvestigate and resolve errors in GitLab CI/CD pipelines, identifying issues with governance rules or pipeline status Requirements API Management KnowledgeStrong understanding of API protocols (REST, SOAP, GraphQL, gRPC) and the role of API Gateways and proxies in API management Apigee Expertise2+ years of experience with Apigee X or Apigee Hybrid, including troubleshooting of API proxy flows, policies, and transactions Cloud Infrastructure (GCP)Basic understanding of GCP services, such as Compute Engine, Load Balancers, IAM/Roles permissions, Stack Driver, and Kubernetes (GKE) Networking & SecurityFamiliarity with firewall management, DNS, Load Balancers (Global/Regional), HTTPS protocol, and certificate management Authentication SystemsKnowledge of LDAP, JWT, API Key-based authentication, OIDC, and OAuth2 authentication flows Monitoring ToolsExperience using data analytics and monitoring platforms like ELK, Dynatrace, Datadog, New Relic, Grafana/Prometheus, and interpreting the results Linux & AutomationExperience working with Linux CLI, Terraform for infrastructure as code, and Python/bash scripting for automation tasks CI/CD PipelinesFamiliarity with GitLab CI/CD-based pipelines for code deployment and troubleshooting pipeline issues TroubleshootingStrong troubleshooting and diagnostic skills to handle complex API system integrations and identify the root cause of issues What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Back Key Responsibilities Application monitoring and support Manage traffic diversion during deployments Validation of code deployment success Post deployment health monitoring and reporting Production patching and monitoring activities for in scope applications (Liveliness Probe, DataGrid, SOSS, POD restarts) Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana) Monitor of DB server to verify through daily sanity check Verify Table Space / Disk Space status and warn if its reaching capacity Verify Memory and Processor usage and warn if its reaching capacity Production Monitoring Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution Providing the required Production Logs or access to Production Logs to analyze the incidents Provide the Root Cause Analysis for all Critical Incidents Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures Providing workarounds for Critical and High Incidents Updating relevant system, configuration or process documentation Document and promptly notify Bell of any emergency changes required Participate in AMS Operations Governance meetings (assumed to be bi-weekly) Responding to Application-related questions, performing data extraction as required Handling ad-hoc requests from end users for information, queries, or reports Providing holiday support coverage Performing peak period monitoring and reporting for specific critical applications Perform daily health checks for Critical applications Preferred Qualifications Bachelor Degree Computer Science What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 weeks ago

Apply

5.0 - 10.0 years

6 - 12 Lacs

Kolkata

Work from Office

Naukri logo

At Gintaa, were redefining how India orders food. With our focus on affordability, exclusive restaurant partnerships, and hyperlocal logistics, we aim to scale across India's Tier 1 and Tier 2 cities. Were backed by a mission-driven team and expanding rapidly now’s the time to join the core tech leadership and build something impactful from the ground up. Job Summary We are looking for an experienced and motivated DevOps Engineer with 5–7 years of hands-on experience designing, implementing, and managing cloud infrastructure—particularly on Google Cloud Platform (GCP) and Amazon Web Services (AWS). The ideal candidate will have deep expertise in infrastructure as code (IaC), CI/CD pipelines, container orchestration, and cloud-native technologies. This role requires strong analytical skills, attention to detail, and a passion for optimizing cloud infrastructure performance and cost across multi-cloud environments. Key Responsibilities Multi-Cloud Infrastructure: Design, implement, and maintain scalable, reliable, and secure cloud infrastructure using GCP services (Compute Engine, GKE, Cloud Functions, Pub/Sub, BigQuery, Cloud Storage) and AWS services (EC2, ECS/EKS, Lambda, S3, RDS, CloudFront). CI/CD & GitOps: Build and manage CI/CD pipelines using GitHub/GitLab Actions, artifact repositories, and enforce GitOps practices across both GCP and AWS environments. Containerization & Serverless: Leverage Docker, Kubernetes (GKE/EKS), and serverless architectures (Cloud Functions, AWS Lambda) to support microservices and modern application deployments. Infrastructure as Code: Develop and manage IaC using Terraform (or CloudFormation for AWS) to automate provisioning and drift-detection across clouds. Observability & Monitoring: Implement observability tools like Prometheus, Grafana, Google Cloud Monitoring, and AWS CloudWatch for real-time system insights. Security & Compliance: Ensure best practices in cloud security, including IAM policies (GCP IAM + AWS IAM), encryption standards (KMS), network security (VPCs, Security Groups, Firewalls), and compliance frameworks. Service Mesh: Integrate and manage service mesh architectures such as Istio or Linkerd for secure and observable microservices communication. Troubleshooting & DR: Troubleshoot and resolve infrastructure issues, ensure high availability, disaster recovery (GCP Backup + AWS Backup/AWS DR strategies), and performance optimization. Cost Management: Drive initiatives for cloud cost management; use tools like GCP Cost Management and AWS Cost Explorer to suggest optimization strategies. Documentation & Knowledge Transfer: Document technical architectures, processes, and procedures; ensure smooth knowledge transfer and operational readiness. Cross-Functional Collaboration: Collaborate with Development, QA, Security, and Architecture teams to streamline deployment workflows. Required Skills & Qualifications 5–7 years of DevOps/Cloud Engineering experience, with at least 3 years on GCP and 3 years on AWS. Proficiency in Terraform (plus familiarity with CloudFormation), Docker, Kubernetes (GKE/EKS), and other DevOps toolchains. Strong experience with CI/CD tools (GitHub/GitLab Actions) and artifact repositories. Deep understanding of cloud networking, VPCs, load balancing, security groups, firewalls, and VPNs in both GCP and AWS. Expertise in monitoring/logging frameworks such as Prometheus, Grafana, Stackdriver (Cloud Monitoring), and AWS CloudWatch/CloudTrail. Strong scripting skills in Python, Bash, or Go for automation tasks. Knowledge of data backup, high-availability systems, and disaster recovery strategies across multi-cloud. Familiarity with service mesh technologies and microservices-based architecture. Excellent analytical, troubleshooting, and documentation skills. Effective communication and ability to work in a fast-paced, collaborative environment. Preferred Qualifications (Good to Have) Google Professional Cloud Architect Certification and/or AWS Certified Solutions Architect – Professional. Experience with multi-cloud or hybrid cloud setups, including VPN/Direct Connect and Interconnect configurations. Exposure to agile software development, DevSecOps, and compliance-driven environments (e.g., BFSI, Healthcare). Understanding of cost modeling and cloud billing analysis tools. Why Join Gintaa? Be a part of a purpose-driven startup revolutionizing food and local commerce in India. Build impactful, large-scale mobile applications from scratch. Work with a visionary leadership team and dynamic, entrepreneurial culture. Competitive salary and leadership visibility.

Posted 2 weeks ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Hyderabad

Work from Office

Naukri logo

AI Opportunities with Soul AIs Expert Community! Are you an MLOps Engineer ready to take your expertise to the next levelSoul AI (by Deccan AI) is building an elite network of AI professionals, connecting top-tier talent with cutting-edge projects Why Join Above market-standard compensation Contract-based or freelance opportunities (2"“12 months) Work with industry leaders solving real AI challenges Flexible work locations- Remote | Onsite | Hyderabad/Bangalore Your Role: Architect and optimize ML infrastructure with Kubeflow, MLflow, SageMaker Pipelines Build CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI/CD) Automate ML workflows (feature engineering, retraining, deployment) Scale ML models with Docker, Kubernetes, Airflow Ensure model observability, security, and cost optimization in cloud (AWS/GCP/Azure) Must-Have Skills: Proficiency in Python, TensorFlow, PyTorch, CI/CD pipelines Hands-on experience with cloud ML platforms (AWS SageMaker, GCP Vertex AI, Azure ML) Expertise in monitoring tools (MLflow, Prometheus, Grafana) Knowledge of distributed data processing (Spark, Kafka) (BonusExperience in A/B testing, canary deployments, serverless ML) Next Steps: Register on Soul AIs website Get shortlisted & complete screening rounds Join our Expert Community and get matched with top AI projects Dont just find a job Build your future in AI with Soul AI!

Posted 2 weeks ago

Apply

4.0 - 12.0 years

8 - 11 Lacs

Pune, Bengaluru

Work from Office

Naukri logo

Proven experience as a DevOps engineer or in a similar role with a focus on monitoring and observability. Expert-level knowledge of Splunk (advanced configuration, data indexing, search optimization, and alerting). Advanced experience with Grafana for creating real-time, interactive dashboards and visualizations. Strong proficiency in Linux/Unix systems administration and scripting (Bash, Python, etc.). Solid understanding of cloud platforms like AWS, Azure, or GCP and how to integrate monitoring solutions into these environments. Experience with containerization (Docker, Kubernetes) and orchestration tools. Familiarity with Infrastructure as Code tools (Terraform, Ansible, etc.). Experience with automation tools (Jenkins, GitLab CI, etc.) for deploying and managing infrastructure. Strong problem-solving skills with the ability to troubleshoot and resolve complex technical issues in a fast-paced environment. Experience with distributed systems and knowledge of performance tuning, scaling, and high-availability setups. Preferred Skills : Experience in managing large-scale Splunk and Grafana environments. Knowledge of log aggregation technologies (Fluentd, Logstash, etc.). Familiarity with Alerting Incident Management Tools (PagerDuty, Opsgenie, etc.). Certifications in Cloud Platforms (AWS Certified DevOps Engineer, Azure DevOps Engineer, etc.). Familiarity with Agile methodologies (Scrum/Kanban) and DevOps practices. Understanding of Security principles and practices as they relate to logging and monitoring.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

4 - 8 Lacs

Pune

Work from Office

Naukri logo

We are looking for Java Engineer to extend our existing applications and to create new applications. In this role, you will develop technology solutions that are scalable, relevant, and critical to our companys success. We believe in building the right product, we believe in using best practices, we believe in everybody's input. You will help drive us to a continuously delivered microservice environment. Roles and Responsibilities: Write well designed, clean, efficient code backed by unit tests. Develop scalable, lasting technology solutions. Abide by coding standards and guidelines. Build for security and performance. Work well in an Agile/Scrum environment (done the right way)Maintain a high standard of work quality and encourage others to do the same. Energetic individual with enthusiasm to learn new tools, technologies & processes. Must Have: 5+ years experience in Java and willing to learn Mandatory Skills: Core Java, J2ee, Spring, Rest API, SQL (Oracle, MYSQL, Postgresql, MSSQL). Well versed with OOPS, Data structures, Algorithms, Multithreading. Ability to write high quality, bug free, clean java code. Strong experience building RESTful APIs and microservices. Distributed Systems: Knowledge of scalable, high-performance backend architectures. Problem-Solving: Strong debugging and troubleshooting skills. Bachelors degree in computer science or related field required; masters degree is a plus Good to Have: Experience on NO SQL Database. Monitoring: Familiarity with Grafana , or similar & Orchestration: Familiar with Docker and & Version Control: Familiar with Git and CI/CD tuning. Analytics. Startup experience.

Posted 2 weeks ago

Apply

8.0 - 13.0 years

10 - 15 Lacs

Bengaluru

Work from Office

Naukri logo

About The Team: Cloud Platform Engineering(CPE) group is responsible for developing and managing platforms that allow Myntras tech products to be deployed and run at scale. The CPE team builds and maintains centralized and high-scale platforms for sophisticated application security frameworks, log collection, monitoring systems, access management, secret management, database access, change management systems, build, release and deployment. You will be part of the SRE team under CPE division.Position: Technical Lead - Site Reliability Engineering (SRE)Location: BengaluruEmployment Type: Full-time Role Overview: As a Technical Lead in Site Reliability Engineering (SRE), you will be responsible for leading a team of talented engineers and overseeing the design, implementation, and maintenance of our ecommerce platform's infrastructure. You will collaborate closely with cross-functional teams, including software development, operations, and program management, to ensure the reliability, availability, and performance of our systems. Your expertise will be essential in proactively identifying and resolving operational issues, improving system performance, and drivingautomation initiatives Responsibilities : Hosting infrastructure and setting up the core platform forms the backbone of any system. As part of this team, you will be responsible for 1. Lead and mentor a team of Site Reliability Engineers, providing technical guidance, support, and fostering a culture of continuous learning anddevelopment. 2. Collaborate with software development teams to ensure the seamless integration of new features and enhancements into the existing infrastructure. 3. Oversee the design, implementation, and maintenance of highly available and scalable systems, ensuring optimal performance and reliability. 4. Develop and implement monitoring and alerting systems to proactively identify and resolve operational issues, ensuring maximum uptime. 5. Conduct regular performance analysis and capacity planning to identify potential bottlenecks, optimize system performance, and plan for future growth. 6. Define and enforce best practices for incident management, change management, and problem resolution, ensuring adherence to SLAs. 7. Drive automation initiatives to streamline operational tasks, increase efficiency, and reduce manual intervention.8. Collaborate with cross-functional teams to identify opportunities for system improvements, scalability enhancements, and cost optimizations. 9. Stay up-to-date with industry trends, emerging technologies, and best practices in Site Reliability Engineering, and look for implementation in our infrastructure and operations. 10.Foster a culture of innovation, continuous improvement, and operational excellence within the team. Requirements: 1. Bachelor's or master's degree in Computer Science, Engineering 2. Experience (8+ years) in a similar role as a Technical Lead or Senior Site Reliability Engineer 3. Strong knowledge of infrastructure design, cloud-based platforms (Azure, GCP, AWS), and containerization technologies (Docker, Kubernetes). 4. Expertise in designing and implementing highly available, scalable, and fault-tolerant systems. 5. Solid understanding of networking, distributed systems, and database technologies. 6. Proficiency in scripting (Python, Bash) and automation tools (Ansible, Terraform).7. Experience with monitoring and logging tools (Prometheus, Grafana, Logging(ELF/EFK) stack).8. Strong problem-solving and troubleshooting skills, with the ability to diagnose and resolve complex system issues. 9. Excellent leadership and communication skills, with the ability to effectively collaborate with cross-functional teams. 10.Strong organizational and project management skills, with the ability to prioritize and manage multiple initiatives simultaneously.

Posted 2 weeks ago

Apply

3.0 - 6.0 years

5 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

About The Team: Cloud Platform Engineering(CPE) group is responsible for developing and managing platforms that allow Myntras tech products to be deployed and run at scale. The CPE team builds and maintains centralized and high-scale platforms for sophisticated application security frameworks, log collection, monitoring systems, access management, secret management, database access, change management systems, build, release and deployment. You will be part of the SRE team under CPE division.Position: M2 - Site Reliability Engineering (SRE)Location: BengaluruEmployment Type: Full-time Role Overview : As an SRE at M2 level, you will be playing an important role in the team related to availability, reliability, scalability and performance of Myntras production site. As part of the role, you will be working on the cloud platform, container platform and observability stack.This will also include developing automation tools mainly in bash,python and occasionally golang. Responsibilities: Hosting infrastructure and setting up the core platform forms the backbone of any system. As part of this team, you will be responsible for1. Collaborate with the lead and architect in the team to design, test and implement scalable and highly available solutions.2. Collaborate with software development teams to ensure the adoption of the platforms and platform components for high visibility.3. Participate in incident response as part of on-call duties of the team and provide solutions(short term and long term) along with providing RCAs for incidents4. Work closely within the team to proactively identify and rectify systems and help in preventing outages/incidents.5. Develop and implement monitoring and alerting systems to proactively identify and resolve operational issues, ensuring maximum uptime.6. Define and enforce best practices for incident management, change management, and problem resolution, ensuring adherence to SLAs.7. Drive automation initiatives to streamline operational tasks, increase efficiency, and reduce manual intervention.8. Collaborate with cross-functional teams to identify opportunities for system improvements, scalability enhancements, and cost optimizations.9. Contribute to the creation and maintenance of documentation related to system architecture, configurations, and operational procedures and actively participate in knowledge-sharing initiatives within the team.10.Foster a culture of innovation, continuous improvement, and operational excellence within the team. Requirements: 1. Bachelor's in Computer Science, Engineering or equivalent2. Experience (3-6 years) in a similar role as a Technical Lead or Senior Site Reliability Engineer3. Strong knowledge of infrastructure design, cloud-based platforms (Azure, GCP,AWS), and containerization technologies (Docker, Kubernetes).4. Solid understanding of networking, distributed systems, and database technologies.5. Proficiency in scripting (Python, Bash) and infra automation tools (Ansible, Terraform).6. Good knowledge of security and its best practices and experience implementing security controls in a production environment.7. Experience with monitoring and logging tools (Prometheus, Grafana,Logging(ELF/EFK) stack).8. Strong problem-solving and troubleshooting skills, with the ability to diagnose and resolve complex system issues.9. Excellent collaboration and communication skills.10.Experience in handling large scale distributed systems such as Elasticsearch

Posted 2 weeks ago

Apply

5.0 - 8.0 years

7 - 11 Lacs

Chennai

Work from Office

Naukri logo

Overview DevOps Engineer \u2013 OpenShift (OCP) Specialist Job Summary: FSS is seeking a highly skilled DevOps Engineer with hands-on experience in Red Hat OpenShift Container Platform (OCP) and associated tools like Argo CD, Jenkins, and Data Grid. The ideal candidate will drive automation, manage containerized environments, and ensure smooth CI/CD pipelines across hybrid infrastructure to support our financial technology solutions. Required Skills & Qualifications: Technical Skills: Strong hands-on experience with OpenShift (v4.x) administration and operations. Proficiency in CI/CD tools: Jenkins, Argo CD, GitHub Actions, GitLab CI/CD. Deep understanding of Kubernetes, Docker, and container orchestration. Experience with Red Hat Data Grid or other in-memory data grids. Skilled in IaC tools: Terraform, Ansible, CloudFormation. Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK, Splunk). Proficient in scripting languages: Bash, Python, or Shell. Soft Skills: Excellent problem-solving and analytical skills. Strong communication and collaboration abilities across cross-functional teams. Candidates should be able to work independently. Candidate should be able to provide solution based on customer requirements and work with customer\u2019s DevOps team during the project implementation. Responsibilities Key Responsibilities: OpenShift Platform Engineering: Deploy, manage, and maintain applications on OpenShift Container Platform. Configure and manage Operators, Helm charts, and OpenShift GitOps (Argo CD). Manage Red Hat Data Grid deployments and integrations. Support OCP cluster upgrades, patching, and troubleshooting. CI/CD Implementation & Automation: Design, implement, and manage CI/CD pipelines using Jenkins and Argo CD. Ensure seamless code integration, testing, and deployment processes with development teams. Infrastructure as Code (IaC): Automate infrastructure provisioning with tools like Terraform and Ansible. Manage hybrid infrastructure across on-prem and public clouds (AWS, Azure, or GCP). Monitoring & Performance Optimization: Implement and manage observability stacks (Prometheus, Grafana, ELK, etc.) for OCP and underlying services. Proactively identify and resolve system performance bottlenecks. Security & Compliance: Enforce security best practices in containerized and cloud environments. Conduct vulnerability assessments and ensure compliance with industry standards. Collaboration & Support: Collaborate with developers, QA, and IT teams to optimize DevOps workflows. Provide ongoing support and incident response for production and non-production environments. Qualifications BE, B-tech,MCA or Equivalent degree Payment gateway, Bank reconciliation, Card, Payment gateway Essential skills Technical Skills: Strong hands-on experience with OpenShift (v4.x) administration and operations. Proficiency in CI/CD tools: Jenkins, Argo CD, GitHub Actions, GitLab CI/CD. Deep understanding of Kubernetes, Docker, and container orchestration. Experience with Red Hat Data Grid or other in-memory data grids. Skilled in IaC tools: Terraform, Ansible, CloudFormation. Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK, Splunk). Proficient in scripting languages: Bash, Python, or Shell. Soft Skills: Excellent problem-solving and analytical skills. Strong communication and collaboration abilities across cross-functional teams. Candidates should be able to work independently. Candidate should be able to provide solution based on customer requirements and work with customer\u2019s DevOps team during the project implementation. Desired skills Technical Skills: Strong hands-on experience with OpenShift (v4.x) administration and operations. Proficiency in CI/CD tools: Jenkins, Argo CD, GitHub Actions, GitLab CI/CD. Deep understanding of Kubernetes, Docker, and container orchestration. Experience with Red Hat Data Grid or other in-memory data grids. Skilled in IaC tools: Terraform, Ansible, CloudFormation. Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK, Splunk). Proficient in scripting languages: Bash, Python, or Shell. Soft Skills: Excellent problem-solving and analytical skills. Strong communication and collaboration abilities across cross-functional teams. Candidates should be able to work independently. Candidate should be able to provide solution based on customer requirements and work with customer\u2019s DevOps team during the project implementation.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

15 - 30 Lacs

Noida

Hybrid

Naukri logo

Company Overview BOLD is an established and fast-growing product company that transforms work lives. Since 2005, weve helped more than 10,000,000 folks from all over America (and beyond!) reach higher and do better. A career at BOLD promises great challenges, opportunities, culture and the environment. With our headquarters in Puerto Rico and offices in San Francisco and India, were a global organization on a path to change the career industry. Position Overview As a DevOps SME, design and implement a variety of requirements to support infrastructure/platform as a service on Azure cloud platform. Your role will also be to ensure that design adequately represents and supports the needs of product teams while following industry practices like fault tolerance, availability, and observability. Role & responsibilities Uphold the "Automate Everything" philosophy within the team. Engage in the development and maintenance of our cloud infrastructure, responsible for hosting all cloud applications in both development and production environments. Collaborate with colleagues across products and platforms to ensure the reliability of our cloud services, supporting customers with uninterrupted access to critical applications 24/7. Effectively manage and troubleshoot server and platform issues. Perform analysis and monitoring of the performance of cloud-based infrastructure, installed applications, and shared resources. Preferred candidate profile Proficient in utilising DevOps tools within the Azure Cloud environment. Experience in cloud automation, employing Terraform, Helm, and Argo CD. Skilled in application containerization, Kubernetes cluster administration and management Knowledge of cloud monitoring and alerting tools such as New Relic, Prometheus, Azure Monitor and Grafana Strong experience in writing PowerShell and Bash scripts Familiarity with EFK log management solutions. Proficient in working with the agile methodology. Demonstrates outstanding interpersonal and communication skills, coupled with a willingness to mentor new team members. Possesses strong analytical abilities and excels in problem-solving

Posted 2 weeks ago

Apply

5.0 - 8.0 years

12 - 18 Lacs

Gandhinagar, Pune, Ahmedabad

Work from Office

Naukri logo

5+ Years of experience in Azure DevOps Good experience with cloud computing platforms, and scripting languages. Strong knowledge of automation tools and technologies, such as Azure DevOps, Docker, and Terraform. Knowledge of containerization and container orchestration, such as Kubernetes. Strong understanding of agile software development methodologies. Experience with version control systems and GIT workflows. Proficiency in Microsoft Azure cloud platform. Build and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs Automate deployment, monitoring, management and incident response Monitor site stability and performance and troubleshoot site issues Scale infrastructure to meet rapidly increasing demand Manage cross-functional requirements working with Engineering, Product, Services, and other departments. Experience with monitoring and logging tools like Grafana and Prometheus. Solid understanding of continuous integration and continuous delivery (CI/CD) pipelines. Working knowledge of databases and SQL (Postgres DB). Basic understanding of networking concepts. Thanks Email ID: Shivani.rathore@ics-global.in

Posted 2 weeks ago

Apply

5.0 - 10.0 years

15 - 30 Lacs

Pune, Chennai, Bengaluru

Work from Office

Naukri logo

GEN AI Consultant 4+Yrs(Hand on exp in GEN AI+ML) Upto 20L, BLR,Pune 7+Yrs(Hand on exp in GEN AI+ML) Upto 35L BLR,Pune,Chennai,Jaipur Skills:Gen AI,AI models,AI frameworks,Python,Any Cloud mansikohliimaginator@gmail.com Required Candidate profile Cloud platforms IBM Cloud, AZURE Cloud, Google Cloud, and AWS. Frameworks Flask, Django, Nginx + Gunicorn, Docker, Kubernetes, SCM (Git),DevOps (CI/CD) Tools such as Prometheus, MLflow, Grafana.

Posted 2 weeks ago

Apply

4.0 - 5.0 years

6 - 7 Lacs

Hyderabad

Work from Office

Naukri logo

Ensure the reliability, availability, and performance of services through automation, monitoring, and proactive incident management. Implement best practices for system reliability and scalability.

Posted 2 weeks ago

Apply

6.0 - 8.0 years

13 - 17 Lacs

Noida, Hyderabad, Chennai

Hybrid

Naukri logo

Role & responsibilities: Design, deploy, and maintain AWS infrastructure using infrastructure as code (IAC) using tools such as Terraform and CloudFormation Build and deploy applications in a repetitive and automated way Design and implement serverless architecture using AWS services such as Lambda, API Gateway, DynamoDB, S3, and others Monitor, troubleshoot, and optimize performance of cloud-based applications using monitoring and analytics tools such as New Relic, Grafana and Prometheus Collaborate with development teams to ensure the reliability, scalability, and security of our systems Automate processes using CI/CD tools such as Azure DevOps, TeamCity or Jenkins. Implement security best practices and ensure compliance with regulatory requirements Continuously improve our infrastructure and processes to meet evolving business needs and technology trends Mandatory Skills: 6+ years of experience in a DevOps role, with a focus on AWS services and infrastructure as code Experience with Terraform or other IaC tools such as CloudFormation or CDK Strong understanding of serverless architectures, microservices, and containerization using Kubernetes or other container orchestration tools Experience with monitoring and analytics tools such as Grafana, Prometheus, and New Relic Familiarity with CI/CD tools such as Azure DevOps, Jenkins, GitLab, or CircleCI Proficient in at least one scripting language (Bash, Python, JavaScript) Proficiency with Linux administration/engineering Deep understanding of cloud-scale and micro/macro-services architectures, experience in operating high performance, highly scalable, and fault-tolerant multi-tenant SaaS based applications. Strong problem-solving skills and the ability to troubleshoot issues in a complex environment. Excellent communication and collaboration skills to work effectively with cross-functional teams. A passion for continuous learning and keeping up with the latest technology trends in the DevOps and cloud computing space. Preferred candidate profile: Looking for immediate joiners minimum 15days PF history mandatory for all companies

Posted 2 weeks ago

Apply

4.0 - 8.0 years

5 - 15 Lacs

Bengaluru

Work from Office

Naukri logo

Azure Monitor, Application Insights, Log Analytics Prometheus / Datadog / Dynatrace Grafana, Power BI Python, REST API Required Skills Network Watcher, Databricks Logs, System tables, REST API Bash, Powershell

Posted 2 weeks ago

Apply

Exploring Grafana Jobs in India

Grafana is a popular tool used for monitoring and visualizing metrics, logs, and other data. In India, the demand for Grafana professionals is on the rise as more companies are adopting this tool for their monitoring and analytics needs.

Top Hiring Locations in India

  1. Bangalore
  2. Hyderabad
  3. Pune
  4. Mumbai
  5. Delhi

Average Salary Range

The average salary range for Grafana professionals in India varies based on experience level: - Entry-level: ₹4-6 lakhs per annum - Mid-level: ₹8-12 lakhs per annum - Experienced: ₹15-20 lakhs per annum

Career Path

A typical career path in Grafana may include roles such as: 1. Junior Grafana Developer 2. Grafana Developer 3. Senior Grafana Developer 4. Grafana Tech Lead

Related Skills

In addition to Grafana expertise, professionals in this field often benefit from having knowledge or experience in: - Monitoring tools such as Prometheus - Data visualization tools like Tableau - Scripting languages (e.g., Python, Bash) - Understanding of databases (e.g., SQL, NoSQL)

Interview Questions

  • What is Grafana and how is it used? (basic)
  • Explain the difference between Grafana and Kibana. (basic)
  • How do you create a dashboard in Grafana? (medium)
  • What are plugins in Grafana and how can they be used? (medium)
  • How can you integrate Grafana with Prometheus for monitoring? (advanced)
  • Explain how alerting works in Grafana. (advanced)
  • How do you optimize queries in Grafana for better performance? (advanced)

Closing Remark

As the demand for Grafana professionals continues to grow in India, it is essential to stay updated with the latest trends and technologies in this field. Prepare thoroughly for interviews and showcase your skills confidently to land your dream job in Grafana. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies