Jobs
Interviews

71 Loki Jobs - Page 2

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

4.0 - 8.0 years

0 Lacs

noida, uttar pradesh

On-site

You will be responsible for implementing ELK based monitoring, Loki logging, and alert generation with ElastAlert and Prometheus Alert Manager. Additionally, you will develop service dashboards using Kibana and Grafana. Your role will involve developing custom scripts for monitoring purposes. To be successful in this role, you should have at least 4 years of experience as a software engineer with a strong working knowledge of ELK stack, Prometheus, Loki, and Grafana. Familiarity with statistical functions used for real-time monitoring such as averages and rate of change is required. Experience with Nagios and SolarWinds is a plus. Strong programming skills in Python are necessary, as well as a solid understanding of web services, databases, networking, and related infrastructure/architectures as they pertain to monitoring and alerting. Experience with Google Cloud Platform is also desired. For this specific grade, the focus will be on Industrial Operations Engineering. You will be expected to develop competency in your area of expertise, provide guidance to others, interpret clients" needs, and work independently or with minimum supervision. Your problem-solving skills will be crucial in identifying and resolving issues, and you will contribute to teamwork and interact with customers effectively.,

Posted 3 weeks ago

Apply

4.0 - 6.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Profile Description Were seeking someone to join our team as AI Platform Engineering Specialist who will have strong hands-on experience building software platforms on any combination of the following platforms - Kubernetes, Cloud (AWS, Azure, and/or Google), API based development, REST framework, data engineering, and large-scale API Gateway environments etc. Knowledge of AIML and hands-on experience implementing solutions using Generative AI are also preferable. The candidate will have great communication skills, a team-based mentality and a strong passion for using AI to increase productivity as well as help generate new ideas for product & technical improvements. Enterprise_Technology Enterprise Technology & Services (ETS) delivers shared technology services for Morgan Stanley supporting all business applications and end users. ETS provides capabilities for all stages of Morgan Stanleys software development lifecycle, enabling productive coding, functional and integration testing, application releases, and ongoing monitoring and support for over 3,000 production applications. ETS also delivers all workplace technologies (desktop, mobile, voice, video, productivity, intranet/internet) in integrated configurations that boost the personal productivity of employees. Application and end user functions are delivered on a scalable, secure, and reliable infrastructure composed of seamlessly integrated datacenter, network, compute, cloud, storage, and database functions. Architecture & Modernization Architecture & Modernization Drives development of the global firm strategy to define modern architectures and guardrails to reduce legacy debt, while partnering with app dev to accelerate the adoption of modern capabilities. Software Engineering This is a position that develops and maintains software solutions that support business needs. Morgan Stanley is an industry leader in financial services, known for mobilizing capital to help governments, corporations, institutions, and individuals around the world achieve their financial goals. At Morgan Stanley India, we support the Firms global businesses, with critical presence across Institutional Securities, Wealth Management, and Investment management, as well as in the Firms infrastructure functions of Technology, Operations, Finance, Risk Management, Legal and Corporate & Enterprise Services. Morgan Stanley has been rooted in India since 1993, with campuses in both Mumbai and Bengaluru. We empower our multi-faceted and talented teams to advance their careers and make a global impact on the business. For those who show passion and grit in their work, theres ample opportunity to move across the businesses for those who show passion and grit in their work. Interested in joining a team thats eager to create, innovate and make an impact on the world Read on What Youll Do In The Role Develop tooling and self-service capabilities for deploying AI solutions for the firm leveraging Kubernetes/OpenShift, Python, authentication solutions, APIs, REST framework, etc. Develop Terraform modules and Cloud architecture to enable secure AI cloud service deployment and consumption at scale. Have a platform mindset and build common, reusable solutions to scale Generative AI use cases using pre-trained models as well as fine-tuned models. Leverage Kubernetes/OpenShift to develop modern containerized workloads. Integrate with capabilities such as large-scale vector stores for embeddings. Author best practices on the Generative AI ecosystem, when to use which tools, available models such as GPT, Llama, Hugging Face etc. and libraries such as Langchain. Analyze, investigate, and implement GenAI solutions focusing on Agentic Orchestration and Agent Builder frameworks. Author and publish architecture decision records to capture major design decisions and product selection for building Generative AI solutions. Inclusive of app authentication, service communication, state externalization, container layering strategy and immutability. Ensure AI platform are reliable, scalable, and operational; (e.g. blueprints for upgrade/release strategies (E.g. Blue/Green); logging/monitoring/metrics; automation of system management tasks) Participate in all teams Agile/ Scrum ceremonies. Participate in teams on call rotation in build/run team model What Youll Bring To The Role At least 4 years relevant experience would generally be expected to find the skills required for this role Bachelors or Masters degree in Computer Science or related field, or equivalent job experience 4 years of experience in software engineering, design and development Strong hands-on Application Development background in at least one prominent programming language, preferably Python Flask or FAST Api. Broad understanding of data engineering (SQL, NoSQL, Big Data, Kafka, Redis), data governance, data privacy and security. Experience in development, management, and deployment of Kubernetes workloads, preferably on OpenShift. Experience with designing, developing, and managing RESTful services for large-scale enterprise solutions. Experience deploying applications on Azure, AWS, and/or GCP using IaC (Terraform) Hands-on experience with multiprocessing, multithreading, asynchronous I/O, performance profiling in at least one prominent programming language, preferably python. Ability to articulate technical concepts effectively to diverse audiences. Excellent communication skills. Demonstrated ability to work effectively and collaboratively in a global organization, across time zones, and across organizations Demonstrated experience in DevOps, understanding of CI/CD (Jenkins) and GitOps. Knowledge of DevOps and Agile practices. Nice to have Practitioner of unit testing, performance testing and BDD/acceptance testing. Understanding of OAuth 2.0 protocol for secure authorization. Proficiency with Open Telemetry tools including Grafana, Loki, Prometheus, and Cortex. Good knowledge of Microservice based architecture, industry standards, for both public and private cloud. Good understanding of modern Application configuration techniques. Hands on experience with Cloud Application Deployment patterns like Blue/Green. Good understanding of State sharing between scalable cloud components (Kafka, dynamic distributed caching). Good knowledge of various DB engines (SQL, Redis, Kafka, etc) for cloud app storage. Experience building AI applications, preferably Generative AI and LLM based apps. Deep understanding of AI agents, Agentic Orchestration, Multi-Agent Workflow Automation, along with hands-on experience in Agent Builder frameworks such Lang Chain and Lang Graph. Experience working with Generative AI development, embeddings, fine tuning of Generative AI models. Understanding of ModelOps/ ML Ops/ LLM Op. Understanding of SRE techniques. What You Can Expect From Morgan Stanley We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - arent just beliefs, they guide the decisions we make every day to do what&aposs best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, youll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. Theres also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser. Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents. Show more Show less

Posted 3 weeks ago

Apply

7.0 - 12.0 years

10 - 15 Lacs

pune

Work from Office

Sarvaha would like to welcome a skilled Observability Engineer with a minimum of 7 years of experience to contribute to designing, deploying, and scaling our monitoring and logging infrastructure on Kubernetes. In this role, you will play a key part in enabling end-to-end visibility across cloud environments by processing Petabyte data scales, helping teams enhance reliability, detect anomalies early, and drive operational excellence. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the globe. What Youll Do : - Configure and manage observability agents across AWS, Azure & GCP. - Use IaC techniques and tools such as Terraform, Helm & GitOps, to automate deployment of Observability stack. - Experience with different language stacks such as Java, Ruby, Python and Go. - Instrument services using OpenTelemetry and integrate telemetry pipelines. - Optimize telemetry metrics storage using time-series databases such as Mimir & NoSQL DBs. - Create dashboards, set up alerts, and track SLIs/SLOs. - Enable RCA and incident response using observability data. - Secure the observability pipeline. You Bring : - BE/BTech/MTech (CS/IT or MCA), with an emphasis in Software Engineering. - Strong skills in reading and interpreting logs, metrics, and traces. - Proficiency with LGTM (Loki, Grafana, Tempo, Mimir) or similar stack, Jaeger, Datadog, Zipkin, InfluxDB etc. - Familiarity with log frameworks such as log4j, lograge, Zerolog, loguru etc. - Knowledge of OpenTelemetry, IaC, and security best practices. - Clear documentation of observability processes, logging standards & instrumentation guidelines. - Ability to proactively identify, debug, and resolve issues using observability data. - Focused on maintaining data quality and integrity across the observability pipeline.

Posted 3 weeks ago

Apply

2.0 - 4.0 years

4 - 9 Lacs

bengaluru

Work from Office

Skills Required: Technical areas (hands-on experience in academic projects/internships) Experience with Kubernetes, Jenkins, Gitlab, Github, CI/CD, Terraform, Linux, Bash, Python, AWS, GCP, GKE, and EKSUnderstanding of Public/Private/Hybrid Cloud Solutions. Own the responsibility for platform management, supporting services, and all related tooling and automation. Proficient in cloud-native technologies, automation, and containerization. Experience in setting up and managing cloud infrastructure and services for a wide range of Applications. Some experience in ReactJS / NodeJS, PHP, Phyton and UNIX shell,so a background in system- oriented languages is important. Managing and deploying cloud-native applications on Kubernetes clusters, Setting CI/CD pipelines in (Jenkins, Gitlab, Github), Databases Migration (MySQL, Postgresql, Cassandra), Setting up Monitoring (Grafana, Loki, Prometheus, Mimir, ELK Stack). Certified in Kubernetes and Jenkins.Experienced in using Terraform to automate infrastructure provisioning. We are looking for bright, passionate, and dedicated people with clearly demonstrated initiative and a history of success in their past positions to join our growing team.

Posted 3 weeks ago

Apply

2.0 - 6.0 years

0 Lacs

haryana

On-site

Join Tufin and enjoy a people-centric culture, an open atmosphere, and opportunities for career growth. Benefit from great mentors and a company culture that encourages knowledge sharing with leading tech experts. Embrace the opportunity to make a difference and bring your passion and inspiration to the table. With Tufin celebrating 20 years of business stability in 2025 and boasting over 2,000 worldwide customers, you will be a valuable part of the dynamic Cyber Security industry. As an Escalations TL at Tufin, you will leverage your troubleshooting skills to address customer issues escalated to R&D by the Technical Support/Services organization. Lead a team of escalation engineers, providing guidance, mentorship, and fostering growth. Assist team members in analyzing, diagnosing, debugging, and resolving complex issues that customers encounter while using our products. Take charge of escalations and efficiently drive resolutions through offline or online customer sessions. Develop and implement diagnostic tools, patches, and fixes as needed, based on identified root causes, or escalate to R&D teams when necessary. Serve as the technical focal point for coordinating with R&D teams to resolve customer issues and identify and implement solutions based on patterns observed in escalated customer issues. Collaborate with R&D to provide feedback on common escalated issues and work towards devising permanent solutions, such as product hotfixes or design changes. Additionally, train Technical Support engineers to enhance their efficiency in resolving support cases. Requirements: - Minimum of 2 years of proven experience in software team leadership roles. - At least 4 years of solid programming experience in Java server-side development within a Linux environment. - Proficiency in troubleshooting and problem-solving. - Capability to multitask, organize, and prioritize work effectively. - Strong verbal and written communication skills, including technical writing, in a multicultural work environment. - Previous experience in R&D escalations, Tier-3, or customer-facing positions. - Ability to communicate with customers in an effective, responsible, and respectful manner. Nice to have knowledge or experience in (but not mandatory): - Spring and Hibernate frameworks, REST APIs. - Monitoring distributed systems and virtualization. - ELK, Grafana, Prometheus, Loki, monitoring tools. - NGINX and NGINX ingress controller configuration. - Familiarity with Kubernetes, Networking, helm, Golang.,

Posted 4 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

hyderabad, telangana

On-site

You are always on the lookout for Talented Professionals who are Passionate, Curious, Creative, and Solution-driven Team Players. If you are interested, there are exciting job openings available for you to apply. If you do not find a suitable position from the list below, you can simply drop your resume at careers@resolvetech.com. The team will get back to you when a suitable opportunity arises that aligns with your profile. ServiceNow Sr. Associate SRE & Monitoring Operations | Hyderabad Key Responsibilities: Dashboard Development: Design, develop, and maintain Grafana dashboards to visualize metrics, logs, and traces from various sources including Prometheus, Loki, and Tempo. Integration and Configuration: Configure and integrate Grafana with Prometheus, Loki, Tempo, and other data sources to collect, store, and visualize monitoring data effectively. Log and Metrics Analysis: Utilize LogQL and PromQL query languages to analyze logs and metrics data, identify trends, and troubleshoot issues proactively. Synthetic Monitoring: Implement and manage synthetic monitoring solutions to simulate user interactions and monitor the performance and availability of critical endpoints and workflows. API Management (APIM) Integration: Collaborate with APIM teams to integrate monitoring solutions into APIM platforms, leveraging LogQL and PromQL for analyzing API logs and metrics. Tempo Tracing Setup: Set up and configure Tempo for distributed tracing, enabling end-to-end visibility into application performance and latency across microservices architectures. Alerting and Notification: Configure alerting rules and notifications within Grafana to notify relevant teams of potential issues or anomalies in real-time. Performance Optimization: Identify and implement optimizations to improve the performance, scalability, and efficiency of monitoring systems. Automation and Scripting: Develop automation scripts and templates for deploying, managing, and scaling. Contact No: O: 8919338521 & 9944499791 Email: rudra.kumar@resolvetech.com or jasmine.lourduraj@resolvetech.com Why RTS Grow with us Resolve Tech Solutions is on a steady growth curve, rapidly expanding our expertise and at the same time helping our clients meet their expectations through traditional and emerging technologies. Join us on a growth journey as we create a niche for ourselves as a leading technology solutions provider. Innovative Spaces While we continue to build on our leadership in SAP, we are also expanding our capabilities to include emerging technologies that would benefit our clients. Our focus areas are advanced data analytics, smart technology solutions, and solution-specific accelerators. Collaborative Culture Our company fabric is built on collaboration and teamwork. We recognize that our biggest strength lies with our people, and therefore we support our teams with mentorship and guidance that encourages personal and professional growth.,

Posted 1 month ago

Apply

8.0 - 12.0 years

0 Lacs

navi mumbai, maharashtra

On-site

As a candidate for this position, you should hold a Bachelor's degree in Computer Engineering, Computer Science, Information Systems, or a related field. In addition, you should have a minimum of 8 years of experience in a DevOps-related role, particularly for senior-level candidates. Strong English communication skills, both written and verbal, are essential for effective collaboration within the team. Proficiency in Linux system engineering is a must, along with hands-on experience in setting up CI/CD pipelines, preferably using GitHub Actions. Your expertise should extend to working with Infrastructure as Code (IaC) tools, with a particular focus on Terraform. Familiarity with configuration management tools like Ansible or Salt will be advantageous in this role. The ideal candidate will also have experience with logging and monitoring tools such as Grafana, Promtail, and Loki. Proficiency in Kubernetes, Helm, and GitOps tools like ArgoCD is highly preferred. Kubernetes administration experience will be a strong advantage, coupled with a solid understanding of Microservices Architecture and Service Mesh concepts. Your background should include experience in building and managing Windows-based infrastructure and familiarity with artifact management tools such as JFrog Artifactory. A strong knowledge of AWS services, including but not limited to EC2, ECS, RDS, S3, CloudFront, WAF, API Gateway, Lambda, ElastiCache, Elasticsearch, SQS, SNS, EKS, etc., is essential for this role. Moreover, proficiency in at least one programming language, preferably Python, is required. Experience with systems like Kafka, Keycloak, Airflow, NiFi, Rentaho, Redis, PostgreSQL, etc., will be beneficial in fulfilling the responsibilities of this position.,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

pune, maharashtra

On-site

Sarvaha is seeking a skilled Observability Engineer with at least 3 years of experience to assist in the design, deployment, and scaling of monitoring and logging infrastructure on Kubernetes. As part of this role, you will be instrumental in establishing end-to-end visibility in cloud environments by managing Petabyte data scales, aiding teams in improving reliability, early anomaly detection, and promoting operational excellence. You will be responsible for configuring and overseeing observability agents on AWS, Azure & GCP, utilizing Infrastructure as Code (IaC) techniques like Terraform, Helm & GitOps for automating the deployment of the Observability stack. Additionally, you should have experience working with various language stacks such as Java, Ruby, Python, and Go, instrumenting services using OpenTelemetry, integrating telemetry pipelines, optimizing telemetry metrics storage with time-series databases like Mimir & NoSQL DBs, creating dashboards, setting up alerts, and tracking SLIs/SLOs. Your role will also involve enabling Root Cause Analysis (RCA) and incident response using observability data, as well as securing the observability pipeline. The ideal candidate will possess a BE/BTech/MTech (CS/IT or MCA) degree with a focus on Software Engineering, strong skills in interpreting logs, metrics, and traces, proficiency in tools like LGTM (Loki, Grafana, Tempo, Mimi), Jaeger, Datadog, Zipkin, InfluxDB, familiarity with log frameworks such as log4j, lograge, Zerolog, loguru, knowledge of OpenTelemetry, IaC, and security best practices, ability to document observability processes, logging standards & instrumentation guidelines, proactive issue identification and resolution using observability data, and a commitment to maintaining data quality and integrity throughout the observability pipeline. At Sarvaha, you can expect top-notch remuneration, excellent growth prospects, a supportive work environment with talented individuals, challenging software implementation and deployment tasks, and the flexibility of a hybrid work mode offering complete work-from-home options even prior to the pandemic.,

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

karnataka

On-site

Join us in bringing joy to customer experience. Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide. Living our values everyday results in our team-first culture and enables us to innovate, grow, and thrive while enjoying the journey together. We celebrate diversity and foster an inclusive environment, empowering our employees to be their authentic selves. We are seeking a highly experienced Cloud Infrastructure Engineer to implement and support our community-driven OpenStack-based private cloud infrastructure. You will be directly responsible for ensuring the scalability, resilience, automation, and performance optimization of our cloud environment by leveraging Infrastructure-as-Code (IaC) tools like Ansible and Terraform. This is a deeply technical role requiring expert-level understanding of Ubuntu KVM internals, OpenStack internals, and Kubernetes. You will also collaborate with platform and SRE teams to maintain secure, performant, and multi-tenant-isolated services that serve high-throughput, mission-critical applications. This position is based out of one of the offices of our affiliate Acqueon Technologies in India and will adopt the hybrid work arrangements of that location. You will be a member of the Acqueon team with responsibilities supporting Five9 products, collaborating with global teammates based primarily in the United States. Key Responsibilities: - Implement and support multi-tenant OpenStack infrastructure with Ubuntu KVM supporting multi-region virtual infrastructure deployments. - Implement and support multi-tenant Kubernetes clusters leveraging BareMetal servers and software-defined storage protocols. - Automate the provisioning, lifecycle management, and configuration of Ubuntu KVM, Cisco UCS servers, Pure Storage for block and file, Ceph for software-defined storage, and supporting components using Ansible, Terraform, or Pulumi. - Implement continuous delivery pipelines for infrastructure updates, including patch management, service upgrade testing, and rollback procedures. - Develop automated monitoring, alerting, and healing mechanisms using GitOps principles and observability stacks (e.g., Prometheus, Loki, Grafana). - Harden services for high availability, disaster recovery, and scale-out operations. - Perform deep-dive troubleshooting and performance analysis of infrastructure services across hypervisors, backend storage protocols, and networking layers. - Participate in on-call rotation, incident response, and root cause analysis for platform reliability issues. Minimum Qualifications: - 5+ years of experience operating and automating large-scale OpenStack cloud environments with Ubuntu KVM hypervisors and Kubernetes, preferably in community-driven or upstream-contributing teams. - Proficiency in Infrastructure-as-Code with Ansible, Terraform, or Pulumi. - Strong hands-on knowledge of Cisco UCS Blade Servers (or similar server infrastructure), block/file storage (software-defined storage preferred). - Strong Linux (RHEL/CentOS/Ubuntu) systems engineering background with advanced scripting in Python, Bash, or Go. - Fluency with Git, CI/CD pipelines, and automated test frameworks. - Strong understanding of hypervisor technologies (KVM, vSphere), storage protocols (iSCSI, NFS, CEPH), and L2/L3 networking. - Demonstrated success building or maintaining multi-region or high-availability OpenStack clusters. - Ability to write technical documentation and contribute to community wikis or knowledge bases. Preferred Qualifications: - Contributions to upstream Ubuntu codebases or participation in SIGs/WGs. - Understanding of security best practices for Ubuntu OS patching and security compliance (e.g., CIS, NIST). - Background in telco, edge cloud, or large enterprise infrastructure environments. - Experience building and maintaining automated test environments for Ubuntu upgrades and validation. - Bachelor's degree in Computer Science, IT, Engineering, or a related field preferred; equivalent experience and relevant industry certifications will also be considered. What You'll Get: - A collaborative team that's deeply invested in open source, community contribution, and infrastructure excellence. - Complex technical challenges that require creative, scalable solutions. - The opportunity to shape a next-generation private cloud platform built on true open infrastructure principles. - Access to the latest tools, frameworks, and upstream project developments. Skills And Attributes: - Analytical Thinking & Problem Solving: Demonstrated ability to translate complex, cross-domain requirements into scalable and resilient cloud infrastructure and automation solutions. - Collaboration & Teamwork: Strong interpersonal and communication skills with a proven track record of effective collaboration across multidisciplinary teams, including developers, operations, security, and product stakeholders. - Mentorship & Leadership: Passionate about knowledge-sharing and mentorship, with experience guiding junior engineers and fostering a team culture of continuous learning, innovation, and technical excellence in cloud engineering and DevOps practices. Five9 embraces diversity and is committed to building a team that represents a variety of backgrounds, perspectives, and skills. The more inclusive we are, the better we are. Five9 is an equal opportunity employer.,

Posted 1 month ago

Apply

4.0 - 8.0 years

0 Lacs

noida, uttar pradesh

On-site

As a software engineer with 4+ years of experience, you will be responsible for implementing ELK-based monitoring, Loki logging, and alert generation using ElastAlert and Prometheus Alert Manager. Your role will involve developing service dashboards in Kibana and Grafana, as well as creating custom scripts for monitoring purposes. You should have a strong working knowledge of the ELK stack, Prometheus, Loki, and Grafana, along with familiarity with statistical functions used for real-time monitoring such as averages and rate of change. Experience with Nagios and SolarWinds would be a plus. Proficiency in Python programming is essential, and you should possess a solid understanding of web services, databases, networking, and related infrastructure/architectures in the context of monitoring and alerting. Experience with Google Cloud Platform is also desired. In this role, you will focus on Industrial Operations Engineering, developing competency in your area of expertise. You will be expected to share your expertise with others, interpret client needs, and provide guidance and support. Working independently or with minimal supervision, you will identify and solve problems in straightforward situations, contributing to teamwork and customer interactions.,

Posted 1 month ago

Apply

4.0 - 8.0 years

0 Lacs

noida, uttar pradesh

On-site

Implement ELK based monitoring, Loki logging, and alert generation with ElastAlert and Prometheus Alert Manager. Develop service dashboards using Kibana/Grafana and custom scripts for monitoring. Utilize your 4+ years of experience as a software engineer to work with ELK stack, Prometheus, Loki, and Grafana. Apply your knowledge of statistical functions for real-time monitoring such as averages and rate of change. Experience with Nagios and SolarWinds is a plus. Strong programming skills in Python are required. You should have a solid understanding and experience in web services, databases, networking, and related infrastructure/architectures as they pertain to monitoring and alerting. Additionally, experience with Google Cloud Platform is desired. In this role, you will focus on Industrial Operations Engineering, developing competency in your area of expertise. You will share your expertise, provide guidance and support to others, and interpret clients" needs. You should be able to work independently or with minimum supervision, identifying and solving problems in straightforward situations. Collaboration within a team and interaction with customers is an essential part of this role.,

Posted 1 month ago

Apply

2.0 - 4.0 years

0 Lacs

Pune, Maharashtra, India

On-site

About AlphaSense The worlds most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert calls, news, trade journals, and clients own research content. The acquisition of Tegus by AlphaSense in 2024 advances our shared mission to empower professionals to make smarter decisions through AI-driven market intelligence. Together, AlphaSense and Tegus will accelerate growth, innovation, and content expansion, with complementary product and content capabilities that enable users to unearth even more comprehensive insights from thousands of content sets. Our platform is trusted by over 6,000 enterprise customers, including a majority of the S&P 500. Founded in 2011, AlphaSense is headquartered in New York City with more than 2,000 employees across the globe and offices in the U.S., U.K., Finland, India, Singapore, Canada, and Ireland. Come join us! About The Role We seek a highly skilled Software Development Engineer in Test ( SDET ) to join our dynamic engineering Content Mission. The ideal candidate is a versatile engineer who is well-versed in both software development and quality assurance practices. You will play a crucial role in ensuring the quality of our product by embedding quality ownership within the team. Your expertise in automation, cloud services, Kubernetes, and modern programming languages will be instrumental in driving our testing strategy and delivering high-quality software. You will collaborate closely with developers, product managers, and other stakeholders to identify test requirements and contribute to the overall quality strategy and product development. Requirements Must-Have Minimum 2 years of experience in Software Development with proficiency in any of the following languages: Java, Kotlin, Python. Good understanding of data structures, algorithms, and computer science fundamentals. Enthusiasm for uncovering code vulnerabilities and finding flaws Strong experience in Whitebox testing and API automation using RestAssured or similar tools. Hands-on experience in building and maintaining scalable Automation Frameworks Basic experience in developing applications in any of the following frameworks: SpringBoot, Django, FastAPI Basic understanding of data structures, algorithms, and computer science fundamentals. Excellent problem-solving skills and ability to work independently as well as collaboratively in a team environment. Strong communication and interpersonal skills, with the ability to effectively collaborate with team members and stakeholders. Nice to have Advanced familiarity with CI/CD tools and practices, including integrating test automation into pipelines using Jenkins, ArgoCD, and TestKube. Hands-on experience in frontend testing and automation using relevant tools Knowledge of performance testing tools (e.g., K6, JMeter). Familiarity with observability tools (e.g., Prometheus, Grafana, Loki). AlphaSense is an equal-opportunity employer. We are committed to a work environment that supports, inspires, and respects all individuals. All employees share in the responsibility for fulfilling AlphaSenses commitment to equal employment opportunity. AlphaSense does not discriminate against any employee or applicant on the basis of race, color, sex (including pregnancy), national origin, age, religion, marital status, sexual orientation, gender identity, gender expression, military or veteran status, disability, or any other non-merit factor. This policy applies to every aspect of employment at AlphaSense, including recruitment, hiring, training, advancement, and termination. In addition, it is the policy of AlphaSense to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations, and ordinances where a particular employee works. Recruiting Scams and Fraud We At AlphaSense Have Been Made Aware Of Fraudulent Job Postings And Individuals Impersonating AlphaSense Recruiters. These Scams May Involve Fake Job Offers, Requests For Sensitive Personal Information, Or Demands For Payment. Please Note AlphaSense never asks candidates to pay for job applications, equipment, or training. All official communications will come from an @alpha-sense.com email address. If youre unsure about a job posting or recruiter, verify it on our Careers page. If you believe youve been targeted by a scam or have any doubts regarding the authenticity of any job listing purportedly from or on behalf of AlphaSense please?contact us. Your security and trust matter to us. Show more Show less

Posted 1 month ago

Apply

8.0 - 12.0 years

0 Lacs

chennai, tamil nadu

On-site

You are seeking a skilled Observability & Site Reliability Engineer to join the team in supporting large-scale, enterprise-grade infrastructure. The ideal candidate will have extensive experience with observability tools such as Grafana, Loki, Mimir, and Kubernetes metrics/logs, and a strong passion for performance, scalability, and system uptime. It is essential that candidates have 8 to 12 years of experience and can join within immediate to 30 days notice period. Key Must-Have Skills: - 5+ years of experience in Observability Engineering. - Expertise in Grafana, Loki, Mimir, and Alloy agent. - Strong understanding of infrastructure metrics like GPU, CPU, and Kubernetes. - Proficiency in scripting languages such as Python, Go, and Bash. - Prior exposure to tools like Prometheus, ELK, Docker, and Terraform. - Flexibility to collaborate with Korean stakeholders and work within the Korean time zone. Role Highlights: - Design and manage the observability stack across large-scale data center infrastructure. - Build scalable telemetry systems, dashboards, alerts, and reports. - Apply Site Reliability Engineering (SRE) best practices to ensure system reliability and performance. - Troubleshoot real-time issues and contribute to ongoing system optimization. Good To Have: - Previous experience working with Korean stakeholders. - Familiarity with cloud platforms like AWS, GCP, or Azure.,

Posted 1 month ago

Apply

4.0 - 8.0 years

0 Lacs

navi mumbai, maharashtra

On-site

As a DevOps Engineer at Allerin, you will be responsible for owning the cloud infrastructure by designing, provisioning, and maintaining multi-region AWS and GCP environments using Terraform and Helm. Your role will involve optimizing cost, performance, and scalability for high-traffic AI/IoT platforms. You will also be required to evolve GitLab pipelines to achieve true trunk-based, push-to-prod workflows with automated testing, security scans, and blue-green deployments. It will be essential to champion immutable builds and container security best practices in this capacity. In addition, your responsibilities will include implementing end-to-end monitoring with Prometheus, Grafana, Loki, and alerting via PagerDuty. You will lead post-incident reviews, driving Mean Time To Recovery (MTTR) down through runbook automation and chaos testing. Furthermore, you will embed shift-left security practices (SAST, DAST, IaC scanning) into pipelines and collaborate with the SOC2 team to maintain least-privilege IAM, secret rotation, and audit trails. Collaboration and a strong team culture are integral parts of this role. You will work closely with developers, QA, and product teams to ensure reliability is incorporated into every user story. Additionally, mentoring junior engineers in infrastructure-as-code, Kubernetes, and site reliability principles will be part of your responsibilities. To be successful in this role, you should have at least 4 years of experience running production workloads on AWS, GCP, or Azure. Strong knowledge of IaC (Terraform or Pulumi) and Kubernetes (EKS/GKE preferred), proven experience with GitLab or GitHub Actions CI/CD automation, a deep understanding of Linux internals, networking, and container runtime security, scripting skills in Bash, Python, or Go for automation and tooling, and hands-on experience with observability stacks (Prometheus/Grafana/ELK/Loki) are required qualifications for this position.,

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability and uptime of critical services for our client's team. Your primary responsibilities will revolve around Kubernetes administration, CentOS server management, Java application support, incident handling, and change management. The ideal candidate for this role should have a solid background in ArgoCD for Kubernetes management, Linux proficiency, basic scripting skills, and familiarity with modern monitoring, alerting, and automation tools. We are seeking a self-motivated individual with strong communication skills, both verbal and written, who can work effectively both independently and collaboratively. Your daily tasks will include monitoring, maintaining, and managing applications on CentOS servers to ensure high availability and performance. You will be responsible for conducting routine system and application maintenance tasks following standard operating procedures to prevent and resolve issues promptly. Additionally, you will be in charge of responding to and managing incidents, facilitating post-mortem meetings, conducting root cause analysis, and ensuring timely issue resolution. Furthermore, you will monitor production systems, applications, and overall performance, utilizing tools to detect abnormal behaviors in software and collect relevant information for developers to understand and address the underlying causes. Security checks, policy and procedure documentation, script/code writing for tool and service development, post-mortem learning, and administration work on tools like JIRA and New Relic are also part of your responsibilities. In terms of technical skills, you should have at least 5 years of experience in a SaaS and Cloud environment. Proficiency in Kubernetes cluster administration, Linux scripting, database systems (MySQL, DB2), Linux (CentOS / RHEL) administration, change management procedures, on-call responsibilities, deployment management using Jenkins, monitoring tools (e.g., New Relic, Splunk, Nagios), log aggregation tools (e.g., Splunk, Loki, Grafana), and scripting knowledge in at least one language is essential. Experience with API programming and integrating tools such as Jira, Slack, xMatters/PagerDuty will be advantageous for this role.,

Posted 1 month ago

Apply

2.0 - 6.0 years

0 Lacs

ghaziabad, uttar pradesh

On-site

At RightCrowd, we are revolutionizing physical access control with SmartAccess, a next-generation platform that redefines how people interact with security systems. We are transforming an outdated industry into a seamless, futuristic experience. Imagine doors opening effortlessly, just like in Star Trek! Our innovative platform powers cutting-edge solutions that enhance the daily experiences of employees, visitors, and users. Trusted by some of the world's largest organizations, including top tech companies, our products are making a global impact. We are not looking for the perfect candidate with a flawless resume. Instead, we value curiosity, a willingness to learn, and a commitment to making a difference. If you are excited about tackling challenges, growing your skills, and contributing to innovative solutions, we'd love to hear from you, even if you do not meet every single requirement. To enhance existing features and develop new, groundbreaking solutions, we are looking for a passionate Full Stack Software Engineer to join our remote team. Our team has its roots in a Belgian startup, and we still carry the startup spirit within us. We strive to maintain a small team size and minimize corporate overhead. In essence, we offer a high-responsibility, high-expectation environment with cutting-edge technology, free from unnecessary rules and constraints. **Key Responsibilities:** - Develop and maintain our web interfaces. - Review and give feedback on use cases, UI and UX design. - Contribute to the development of our backend services. - Support and evolve our cloud-native platform and infrastructure. - Perform development testing to ensure high-quality deliverables. - Assist in requirements gathering & architectural decision-making and provide feedback to shape the product roadmap. - Create and maintain documentation while continuously sharing knowledge with the team and the broader company. - Assist in third-line support and handle customer support requests when needed. - Eager learner. We don't expect anyone to already know everything. **Requirements:** - Fluency in English, clear communicator - A commitment to lifelong learning - Proven 2-4 years of experience in software development within complex environments - Strong knowledge and experience in: - NodeJS and related frameworks - TypeScript - React - Unix systems and networking - Containerization, Docker - Excellent debugging and problem-solving skills - Analytical, intelligent and well-organized - Flexible, hands-on and comfortable in a fast-paced environment *Bonus points if you have experience with any of the following:* - Terraform - Containerization, Docker - Unix systems and networking - Good understanding of and experience with Kubernetes & GitOps - Observability (metrics, logs, and tracing) **Why Join Us ** - Be part of a company that is a leader in the safety, security, and compliance solution space. - Opportunity to work on innovative products that have a real impact on safety and security. - Collaborative and supportive work environment with opportunities for professional growth and development. - Competitive salary and benefits package. Ready to make an impact Apply now to join our team!,

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

karnataka

On-site

You will be joining our client's team as a Site Reliability Engineer, where your main responsibility will be ensuring the reliability and uptime of critical services. This will involve a strong focus on Kubernetes administration, CentOS servers, Java application support, incident management, and change management. The ideal candidate for this role will have strong experience with ArgoCD for Kubernetes management, Linux skills, basic scripting knowledge, and familiarity with modern monitoring, alerting, and automation tools. We are looking for someone who is self-motivated, possesses excellent communication skills (both oral and written), and can work both independently and collaboratively. Your main tasks will include monitoring, maintaining, and managing applications on CentOS servers to ensure high availability and performance. You will also be responsible for conducting routine tasks for system and application maintenance, following SOPs to correct and prevent issues. In addition, you will respond to and manage running incidents, conduct post-mortem meetings, perform root cause analysis, and ensure timely resolution. Furthermore, you will be monitoring production systems, applications, and overall performance, using tools to detect abnormal behaviors in the software and collect information to help developers understand the root causes of problems. Security checks, running meetings with business partners, writing and maintaining policy and procedure documents, writing scripts or code as necessary to develop tools and services, and learning from post-mortems to prevent new incidents are also part of your responsibilities. Technical skills required for this role include 5+ years of experience working in a SaaS and Cloud environment, administration of Kubernetes clusters with ArgoCD, Linux scripting for automation, experience with database systems like MySQL and DB2, Linux administration skills, understanding of change management procedures, on-call responsibilities, experience with managing deployments using Jenkins, and familiarity with monitoring tools like New Relic, Splunk, and Nagios. Additionally, experience with log aggregation tools like Splunk, Loki, or Grafana, strong scripting knowledge in at least one language, and experience with API programming and integrating tools such as Jira, Slack, and xMatters/PagerDuty are preferred. This is an exciting opportunity for a motivated individual with the right skill set to make a significant impact on our client's team.,

Posted 1 month ago

Apply

2.0 - 4.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Position: Engineering Support Analyst As an Engineering Support Analyst, you will play a critical role in ensuring the stability and performance of essential business systems. Acting as a software detective, you will identify, investigate, and resolve issues across various system components. Your responsibilities will include triaging bugs, escalating tickets with detailed context, responding to system alerts, and initiating On-Call procedures when necessaryall while maintaining clear and effective communication. Key Responsibilities Provide technical support for critical business systems, ensuring timely issue identification and resolution. Collaborate with Traders, Developers, DevOps, and SRE teams to maintain seamless system operations. Conduct root cause analysis and implement preventive measures to mitigate recurring issues. Monitor system alerts and proactively address incidents to minimize downtime. Escalate issues with comprehensive documentation to ensure swift resolution. Offer coverage for global teams, including those in Australia (AEDT) and Europe (CET). Continuously drive improvements in system reliability and support processes. Key Accountabilities Deliver high-quality support to global stakeholders. Resolve incidents efficiently and effectively. Leverage monitoring tools to detect and respond to issues proactively. Contribute to continuous improvement initiatives and innovation in support practices. Preferred Experience & Skills 23 years of experience in technical support for critical business systems. Strong analytical and problem-solving abilities. Excellent verbal and written communication skills for effective collaboration with global teams. Solid understanding of incident and problem management principles. Experience with server stack and website support is a plus. Proficiency in debugging, issue analysis, and resolution. Technical Knowledge Familiarity with monitoring and observability tools such as: Grafana , Prometheus , Loki , Tempo Kubernetes , Docker Linux , Windows Kafka , Postgres Experience in building Grafana dashboards that integrate metrics, logs, and traces for proactive error detection. Testing experience is an added advantage. Education & Certifications A tertiary qualification in Information Technology or a related field is highly desirable. Show more Show less

Posted 1 month ago

Apply

7.0 - 9.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl We are always moving forward always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you&aposll be at the forefront of driving continuous improvement and delivering exceptional service to our customers. Your role goes beyond traditional engineering, as you&aposll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You&aposll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems. We&aposre looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth. As an SRE, you&aposll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You&aposll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions. With an unwavering focus on quality, robustness, and security, you&aposll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers. If you&aposre passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do. Your Future at Kyndryl Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential offering a wide range of professional and personal growth opportunities that you wont find anywhere else. Who You Are Youre good at what you do and possess the required experience to prove it. However, equally as important you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused someone who prioritizes customer success in their work. And finally, youre open and borderless naturally inclusive in how you work with others. Required Technical And Professional Experience MS SQL with 7+ years of experience in operational management, including incident management and escalations. Oversee maintenance and optimization of various databases, ensuring reliability, performance, and availability. Service Recovery Management System to recovery customer IT service(s) in response to severity incidents. Engage/provide subject matter expertise and create & lead recovery plan. Handle customer communication (if required) & strong troubleshooting and problem solving approach, performance tuning and strong architectural knowledge. Conduct performance tuning activities, analyze database metrics, and make recommendations for improvement. Lead troubleshooting and resolution of database-related issues, conducting root cause analysis and implementing preventive measures. Review RCA documents for quality check & learnings & Mentor and provide guidance to team members, fostering their professional development and effectiveness. Hypercare support to troubled accounts to ensure stability of IT operations. Conduct Technical Heath Assessment (THA) to support service availability, service reliability and service stability. Preferred Technical And Professional Experience Degree in Computer Science, Engineering, or other highly technical, scientific discipline. Expertise with Ansible, Terraform, and Python. Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes. Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki. Being You Diversity is a whole lot more than what we look like or where we come from, its how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But were not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you and everyone next to you the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. Thats the Kyndryl Way. What You Can Expect With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed. Get Referred! If you know someone that works at Kyndryl, when asked How Did You Hear About Us during the application process, select Employee Referral and enter your contact&aposs Kyndryl email address. Show more Show less

Posted 1 month ago

Apply

2.0 - 6.0 years

0 Lacs

chennai, tamil nadu

On-site

Job Description: Explore your next opportunity at a Fortune Global 500 organization and envision innovative possibilities as you experience a rewarding culture and work with talented teams that help you become better every day. If you have the unique combination of skill and passion to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level at UPS. Job Summary: As a member of the UPS team, you will provide input, support, and perform full systems life cycle management activities, including analyses, technical requirements, design, coding, testing, and implementation of systems and applications software. You will participate in component and data architecture design, technology planning, and testing for Applications Development (AD) initiatives to meet business requirements. Collaboration with teams and support for emerging technologies to ensure effective communication and achievement of objectives will be a key aspect of this role. Your expertise will be utilized to provide knowledge and support for applications development, integration, and maintenance, with input to department and project teams on decisions supporting projects. Responsibilities: - Experience developing with various technologies including front end, APIs/services/backend, database, MQ/Messaging, HTML/JavaScript, .NET, .NET Core, OpenShift, Azure DevOps Server/TFS, GIT, Jenkins - CI/CD, SonarQube, Netsparker, Dynatrace, Grafana/Loki - Security compliance - Experience with Restful services and CI/CD pipelines - Proficiency in Object-Oriented Analysis & Design - Familiarity with Agile and Scrum concepts - Excellent written and verbal communication skills - Strong problem-solving and debugging skills Qualifications: - 2-4 years of development experience using .Net, Angular, and frontend technologies - Bachelor's Degree or International equivalent in Computer Science, Information Systems, Mathematics, Statistics, or related field (Preferred) Employee Type: Permanent UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

As a DevOps Engineer at NTT DATA Business Solutions, your role involves implementing and maintaining cloud infrastructure to ensure the smooth operation of the environment. You will be responsible for evaluating new technologies in infrastructure automation and cloud computing, looking for opportunities to enhance performance, reliability, and automation. Additionally, you will provide DevOps capability to team members and customers, perform code deployments, and manage release activities. Your responsibilities will also include resolving incidents and change requests, documenting solutions, and communicating them to users. You will work on optimizing existing solutions, diagnosing, troubleshooting, and resolving issues to ensure the smooth operation of services. Demonstrating a proactive attitude and aptitude for taking ownership of your work and collaborating with team members will be crucial. To excel in this role, you are required to have a Bachelor's degree in IT, computer science, computer engineering, or a related field, along with a minimum of 6 years of overall experience with at least 3 years as a DevOps Engineer. Advanced experience with Cloud Infrastructure and Cloud Services, particularly on Microsoft Azure, is essential. You should also have expertise in container orchestration (Kubernetes, Docker, Helm), Linux scripting (Bash, Python), log and metrics management (ELK Stack), monitoring tools (Prometheus, Loki, Grafana, Dynatrace), and infrastructure as code (Terraform). Furthermore, you must be proficient in continuous integration/continuous delivery tools (Gitlab CI, Jenkins, Nexus), infrastructure security principles, Helm, CI/CD pipelines configuration, and DevOps tools like Jenkins, SonarQube, Nexus, etc. Exposure to SDLC and Agile processes, SSO integrations, and AI tools is desirable. In addition to technical skills, you should possess strong attitude, soft, and communication skills. Experience in handling technically critical situations, driving expert teams, and providing innovative solutions is essential. Critical thinking, a DevOps mindset, and customer-centric thinking are key attributes for this role. Proficiency in English (written and spoken) is mandatory, while knowledge of other languages such as German or French is a plus. If you are looking to join a dynamic team at NTT DATA Business Solutions and transform SAP solutions into value, this opportunity is for you. Get empowered by our innovative and collaborative work environment. For further inquiries regarding this position, please contact the Recruiter, Pragya Kalra, at Pragya.Kalra@nttdata.com. Join us in our mission to deliver cutting-edge IT solutions and become a part of our global success story!,

Posted 1 month ago

Apply

3.0 - 7.0 years

3 - 8 Lacs

Noida

Work from Office

We are seeking a skilled and proactive Observability Engineer to join our team. In this role, you will be responsible for configuring and implementing observability solutions, setting up performance monitoring systems and creating actionable insights to enhance the reliability, capacity, and scalability of our infrastructure. Technical Skills: Extensive knowledge and experience of Performance monitoring/Observability tool using Prometheus and Grafana Experience on Observability tools configuration, implementation, alerts setup and integrations Possess knowledge on SRE, KPIs/SLOs/Metrics for monitoring the health of application & infrastructure components Have hands on knowledge of alerting, incidents creation and dashboard creations. Hands on experience in creating single pane of view for IT & Business visualization Work with dev, platform engineering to finalize business KPI's logic for Observability Pilot, recommendations, solutions & establish success criteria. Execute business observability pilot for any identified critical user journeys. Derive implementation roadmap with milestones & continuous improvement opportunities. Role & responsibilities : Gather Performance Monitoring Requirements. Conduct system performance engineering to ensure system reliability, capacity and scalability. Generate monitoring reports for IT stakeholders review. Analyze root causes of performance issues and provide corrective actions. Suggest techniques to improve monitoring efficiency. Preferred Tools: Grafana, Prometheus, Loki, New Relic, Data Dog, App Dynamics, Tempo, Mimir etc. Why Join Us? Be part of a forward-thinking company that values reliability, efficiency, and user experience. Work in a collaborative environment that encourages continuous learning and professional growth. Competitive salary and benefits package, with opportunities for career advancement.

Posted 1 month ago

Apply

2.0 - 4.0 years

6 - 10 Lacs

Chennai, Bengaluru

Work from Office

Location: Bangalore, India Experience: 2 to 4 Years Employment Type: Full-Time Job Description: We are looking for a skilled DevOps Engineer with hands-on experience in GitLab to join our team in Bangalore. The ideal candidate should have a strong understanding of CI/CD pipelines, infrastructure automation, and cloud technologies. If you are passionate about DevOps and want to work in a dynamic and fast-paced environment, we would love to hear from you! Key Responsibilities: Customer Engagement & Implementation: Work directly with enterprise customers to understand their DevOps landscape and GitLab implementation needs. Lead the design, installation, and configuration of GitLab Self-Managed (OnPrem) environments across cloud and on-premise infrastructure. Translate customer requirements into scalable GitLab deployment architectures. CI/CD Pipeline Enablement: Architect and set up secure and scalable GitLab CI/CD pipelines aligned with customer release workflows. Integrate GitLab with third-party tools such as Kubernetes, Docker, Terraform, Jenkins, and Prometheus. Automation & Infrastructure as Code (IaC): Leverage Ansible, Terraform, and Helm charts for environment provisioning and GitLab automation. Manage GitLab runners and their configuration across distributed infrastructures. Monitoring & Optimization: Implement observability using tools like Prometheus, Loki, Grafana, and GitLab metrics dashboards. Optimize performance, ensure high availability (HA), backup, disaster recovery (DR), and auto-scaling. Knowledge Transfer & Documentation: Deliver technical documentation, operational runbooks, and knowledge transfer sessions for client upskilling. Assist clients in building internal GitLab usage guidelines, governance models, and compliance checks. Collaboration & Support: Coordinate closely with DevOps, Development, Support, and Infrastructure teams to ensure smooth rollouts and version upgrades. Troubleshoot GitLab issues including user management, access controls, LDAP/SAML integration, and runner performance Required Skills & Experience: 2 to 5 years of hands-on experience in DevOps engineering, preferably in customer-facing roles. Proven expertise in GitLab Self-Managed (OnPrem) setup, configuration, upgrade, and maintenance . Strong experience with CI/CD tools , Docker, Kubernetes, and cloud platforms (Azure, AWS, GCP). Proficiency in Infrastructure-as-Code using Terraform, Ansible, and Helm. Experience in monitoring stacks: Prometheus, Loki, Grafana, and OpenTelemetry . Working knowledge of scripting (e.g., Python, Bash ) and Linux system administration. Experience implementing GitLab RBAC, GitOps principles, and GitLab security scans is a plus Preferred Qualifications: Bachelors degree in Computer Science, Information Technology, or a related field. GitLab Certified Associate or GitLab CI/CD Specialist certification is a plus. Exposure to Agile/Scrum practices and experience leading technical deliverables. Experience in customer environments requiring high uptime and regulatory compliance. Why Join Us? • Opportunity to work on cutting-edge DevOps technologies. • Collaborative and innovative work environment. • Competitive salary and benefits. • Career growth and learning opportunities. If you are an experienced DevOps Engineer with GitLab expertise and are ready to join immediately, apply now!

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

karnataka

On-site

You will be joining our client's team as a Site Reliability Engineer, where your main responsibility will be to ensure the reliability and uptime of critical services. Your focus will include Kubernetes administration, CentOS servers, Java application support, incident management, and change management. The ideal candidate for this role will have strong experience with ArgoCD for Kubernetes management, Linux skills, basic scripting knowledge, and familiarity with modern monitoring, alerting, and automation tools. We are looking for a self-motivated individual with excellent communication skills, both oral and written, who can work effectively both independently and collaboratively. Your responsibilities will include monitoring, maintaining, and managing applications on CentOS servers to ensure high availability and performance. You will be conducting routine tasks for system and application maintenance and following SOPs to correct or prevent issues. Responding to and managing running incidents, including post-mortem meetings, root cause analysis, and timely resolution will also be part of your responsibilities. Additionally, you will be monitoring production systems, applications, and overall performance, using tools to detect abnormal behaviors in the software and collecting information to help developers understand the issues. Security checks, running meetings with business partners, writing and maintaining policy and procedure documents, writing scripts or code as necessary, and learning from post-mortems to prevent new incidents are also key aspects of the role. Technical skills required for this position include: - 5+ years of experience in a SaaS and Cloud environment - Administration of Kubernetes clusters, including management of applications using ArgoCD - Linux scripting to automate routine tasks and improve operational efficiency - Experience with database systems like MySQL and DB2 - Experience as a Linux (CentOS / RHEL) administrator - Understanding of change management procedures and enforcement of safe and compliant changes to production environments - Knowledge of on-call responsibilities and maintaining on-call management tools - Experience with managing deployments using Jenkins - Prior experience with monitoring tools like New Relic, Splunk, and Nagios - Experience with log aggregation tools such as Splunk, Loki, or Grafana - Strong scripting knowledge in one of Python, Ruby, Bash, Java, or GoLang - Experience with API programming and integrating tools like Jira, Slack, xMatters, or PagerDuty If you are a dedicated professional who thrives in a high-pressure environment and enjoys working on critical services, this opportunity could be a great fit for you.,

Posted 1 month ago

Apply

3.0 - 8.0 years

6 - 12 Lacs

Gurugram

Work from Office

Location: NCR Team Type: Platform Operations Shift Model: 24x7 Rotational Coverage / On-call Support (L2/L3) Team Overview The OpenShift Container Platform (OCP) Operations Team is responsible for the continuous availability, health, and performance of OpenShift clusters that support mission-critical workloads. The team operates under a tiered structure (L2, L3) to manage day-to-day operations, incident management, automation, and lifecycle management of the container platform. This team is central to supporting stakeholders by ensuring the container orchestration layer is secure, resilient, scalable, and optimized. L2 OCP Support & Platform Engineering (Platform Analyst) Role Focus: Advanced Troubleshooting, Change Management, Automation Experience: 3–6 years Resources : 5 Key Responsibilities: Analyze and resolve platform issues related to workloads, PVCs, ingress, services, and image registries. Implement configuration changes via YAML/Helm/Kustomize. Maintain Operators, upgrade OpenShift clusters, and validate post-patching health. Work with CI/CD pipelines and DevOps teams for build & deploy troubleshooting. Manage and automate namespace provisioning, RBAC, NetworkPolicies. Maintain logs, monitoring, and alerting tools (Prometheus, EFK, Grafana). Participate in CR and patch planning cycles. L3 – OCP Platform Architect & Automation Lead (Platform SME) Role Focus: Architecture, Lifecycle Management, Platform Governance Experience: 6+ years Resources : 2 Key Responsibilities: Own lifecycle management: upgrades, patching, cluster DR, backup strategy. Automate platform operations via GitOps, Ansible, Terraform. Lead SEV1 issue resolution, post-mortems, and RCA reviews. Define compliance standards: RBAC, SCCs, Network Segmentation, CIS hardening. Integrate OCP with IDPs (ArgoCD, Vault, Harbor, GitLab). Drive platform observability and performance tuning initiatives. Mentor L1/L2 team members and lead operational best practices. Core Tools & Technology Stack Container Platform: OpenShift, Kubernetes CLI Tools: oc, kubectl, Helm, Kustomize Monitoring: Prometheus, Grafana, Thanos Logging: Fluentd, EFK Stack, Loki CI/CD: Jenkins, GitLab CI, ArgoCD, Tekton Automation: Ansible, Terraform Security: Vault, SCCs, RBAC, NetworkPolicies

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies