Jobs
Interviews

71 Loki Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

0.0 years

0 Lacs

india

Remote

Site Reliability Engineer Hyderabad-based | Multiple timezones available | Hybrid | Work from Home and the Office At Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the reliability and performance of mission-critical databases. We quickly earned a reputation for solving tough data challenges. We were there when the industry moved from on-premises to cloud environments, and as enterprises sought more from their data, we expanded our competencies to include advanced analytics. Today, we empower organizations to embrace transformation and leverage advanced technologies, including AI, to stay competitive. We deliver innovative solutions that meet each client's data goals and have built strong partnerships with Google Cloud, AWS, Microsoft, Oracle, SAP, and Snowflake. The powerful combination of our extensive expertise in data and cloud and our ability to keep on top of the latest bleeding edge technologies make us the perfect partner to help mid and large-sized businesses transform to stay ahead in today's rapidly changing digital economy. Why You: Pythian is building a next-generation Site Reliability Engineering team, and we're looking for talented, motivated engineers who thrive in fast-paced, problem-solving environments. As an SRE, you'll design, deploy, and operate large-scale distributed systems across compute, storage, networking, and AI/ML environments. You'll lead projects from architecture to automation to intelligent monitoring, collaborating with both clients and teammates to build resilient, high-performing infrastructure. If this is you, and you wonder what it would be like to work at Pythian, reach out to us and find out!Intrigued to see what a life is like at Pythian Check out #pythianlife on ! What you will be doing: Operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems. Automate workflows using Go, Python, and Shell scripting. Build monitoring and observability solutions with Prometheus, Grafana, and Loki. Troubleshoot complex networking, storage, and system performance issues. Partner with AI/ML teams to ensure infrastructure readiness for model training and data pipelines. Participate in on-call rotations and postmortem reviews to improve system resilience. What you bring: Experience with Google Cloud, plus IaC tools (Terraform). Strong knowledge of microservices, containers (Kubernetes, Docker), and networking. Hands-on experience with PKI, service mesh, and Linux systems administration. SRE mindset with a focus on automation, scalability, and reliability. What you get in return: Love your career : Competitive total rewards package. during work hours. Hone your skills or learn new ones with our substantial training allowance participate in professional development days, attend training, become certified, whatever you like! Love your work/life balance : Flexibly work remotely from your home, there's no daily travel requirement to an office! All you need is a stable internet connection. Love your coworkers : Collaborate with some of the best and brightest in the industry! Love your workspace : We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment! Love yourself : Pythian cares about the health and well-being of our team. You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more). Additionally, you will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity. Hiring Disclaimer To be considered for this position, applicants will be required to complete a technical test through a third party platform. The successful applicant will need to fulfill the requirements necessary to obtain a background check. Accommodations are available upon request for candidates taking part in any aspect of the selection process.

Posted 4 days ago

Apply

3.0 - 7.0 years

0 Lacs

ahmedabad, gujarat

On-site

You are seeking a skilled Cloud Database & DevOps Architect to play a pivotal role in overseeing the design, automation, and scalability of crucial systems within our organization. Fusing your profound expertise in databases, particularly PostgreSQL and MongoDB, with your DevOps proficiency, you will be instrumental in ensuring the efficiency, performance, and resilience of our systems across various containerized and cloud environments. Your responsibilities will span creating and implementing best practices for high availability, replication, backups, disaster recovery, monitoring, and CI/CD pipelines, all while actively engaging with both development and operations teams. Your core responsibilities will include architecting and managing PostgreSQL and MongoDB clusters with an emphasis on high availability, replication, and failover mechanisms. Additionally, you will be tasked with devising and executing backup and disaster recovery strategies, containerizing and orchestrating databases and services using Docker and Kubernetes, setting up and overseeing CI/CD pipelines for seamless application and database deployments, and employing infrastructure-as-code principles for consistent configurations. Monitoring and optimizing database and infrastructure performance, safeguarding security, ensuring compliance and data governance, and collaborating with development and quality assurance teams for streamlined delivery pipelines will also be integral parts of your role. To excel in this position, you must possess expert-level knowledge in PostgreSQL and MongoDB, substantial experience with containers (specifically Docker and Kubernetes), a proven track record in high availability, clustering, replication, and disaster recovery planning, and hands-on familiarity with DevOps tools like CI/CD pipelines, GitLab CI/Jenkins, and ArgoCD. Proficiency in Infrastructure as Code tools such as Terraform and Ansible, knowledge of cloud platforms like AWS, Azure, and GCP, and a solid background in monitoring and observability tools like Prometheus, Grafana, ELK, and Loki are essential. An aptitude for troubleshooting and automation is crucial for success in this role. Desirable skills that would further enhance your suitability for this position include familiarity with Redis, Kafka, and RabbitMQ, knowledge of service mesh and API gateway in Kubernetes environments, and experience in security compliance standards such as GDPR, HIPAA, and SOC2. This is a full-time role, and the ideal candidate will have a minimum of 5 years of experience with PostgreSQL, 3 years with MongoDB, and 3 years in database administration and Docker. Availability for overnight shifts is preferred for this position.,

Posted 6 days ago

Apply

5.0 - 7.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. Arista is a well-established and profitable company with over $8 billion in revenue. Aristas award-winning platforms, ranging in Ethernet speeds up to 800G bits per second, redefine scalability, agility, and resilience. Arista is a founding member of the Ultra Ethernet consortium. We have shipped over 20 million cloud networking ports worldwide with CloudVision and EOS, an advanced network operating system. Arista is committed to open standards, and its products are available worldwide directly and through partners. At Arista, we value the diversity of thought and perspectives each employee brings. We believe fostering an inclusive environment where individuals from various backgrounds and experiences feel welcome is essential for driving creativity and innovation. Our commitment to excellence has earned us several prestigious awards, such as the Great Place to Work Survey for Best Engineering Team and Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest quality and performance standards in everything we do. Job Description Who Youll Work With SRE&aposs at Arista combine strong software and systems engineering with a passion for operating production systems at scale. As an SRE youll be part of the team responsible for our global service fleet. What Youll Do: CloudVision is deployed on Kubernetes across global regions using Spinnaker for our CI/CD pipeline. Our tech stack runs on GKE, using HBase/Hadoop as main distributed database and storage layer, ElasticSearch for powering search data, ClickHouse for fast real time queries of flow data, our own Kafka-based distributed real time stream processing layer for analytics, and TensorFlow for ML analysis. Our monitoring system is built on top of Prometheus, Grafana, Loki, and other OSS tools. As a Senior SRE, youll be responsible for our global CloudVision service fleet. This includes: Build, deploy safely and incrementally and operate critical production systems with focus on scalability, reliability, observability, performance and security. Monitor, support and enhance product deployment experience across services. Build automation to remove toil and efficiently operate production systems. Proactively monitor, respond to, and enhance alerts and set up automated alert handling Create and maintain the incident response runbooks. Build and deploy new systems with scalability, reliability, and observability as primary requirements Triage platform/infrastructural issues and help Arista software engineers in their triages. Engage with 3rd party vendor support. Deploy new systems in a staged manner Write postmortem documents and build solutions to avoid incidents from repeating. Plan and communicate maintenance windows on production systems. Work with Aristas product development teams to identify infrastructural issues that are causing bottlenecks and limitations in their workflows. Design and implement solutions to resolve them. Survey and adopt best practices around infrastructure/platform to maintain secure, scalable and fault-tolerant systems. Implement solutions to scale the systems Implement fault-tolerance and performance to improve availability of the systems Study the design and sufficient implementation details of OSS systems for better triage and fix resolution. Qualifications At least Bachelors in Computer Science or Engineering + 5 years experience, MS Computer Science or Engineering + 5 years experience, or equivalent work experience. Knowledge of one or more of Go, Python, bash shell scripting to be able to implement medium complexity automation workflows. Knowledge of Linux (or UNIX) from administration and debugging perspective Hands-on experience in operating software systems (infrastructure, complex applications etc) at scale Experience in server provisioning (esp from storage and networking perspective). Strong problem solving and software troubleshooting skills Experience with infrastructure-as-code. Desirable to have one/more of the following skills Experience managing databases - eg: PostgreSQL or equivalent RDBMS etc Experience with docker and virtualization technologies Experience managing monitoring stack - Prometheus, Grafana etc Experience managing Artifactory, docker registry etc Experience managing CI/CD systems like GitLab tools, Spinnaker etc Experience with infrastructure-as-code frameworks like Terraform Experience with container orchestration via Kubernetes Additional Information Arista stands out as an engineering-centric company. Our leadership, including founders and engineering managers, are all engineers who understand sound software engineering principles and the importance of doing things right. We hire globally into our diverse team. At Arista, engineers have complete ownership of their projects. Our management structure is flat and streamlined, and software engineering is led by those who understand it best. We prioritize the development and utilization of test automation tools. Our engineers have access to every part of the company, providing opportunities to work across various domains. Arista is headquartered in Santa Clara, California, with development offices in Australia, Canada, India, Ireland, and the US. We consider all our R&D centers equal in stature. Join us to shape the future of networking and be part of a culture that values invention, quality, respect, and fun. Show more Show less

Posted 6 days ago

Apply

6.0 - 9.0 years

0 Lacs

bengaluru, karnataka, india

On-site

About Groww We are a passionate group of people focused on making financial services accessible to every Indian through a multi-product platform. Each day, we help millions of customers take charge of their financial journey. Customer obsession is in our DNA. Every product, every design, every algorithm down to the tiniest detail is executed keeping the customers needs and convenience in mind. Our people are our greatest strength. Everyone at Groww is driven by ownership, customer-centricity, integrity and the passion to constantly challenge the status quo. Are you as passionate about defying conventions and creating something extraordinary as we are Lets chat. Our Vision Every individual deserves the knowledge, tools, and confidence to make informed financial decisions. At Groww, we are making sure every Indian feels empowered to do so through a cutting-edge multi-product platform offering a variety of financial services. Our long-term vision is to become the trusted financial partner for millions of Indians. Our Values Our culture enables us to be what we are Indias fastest-growing financial services company. It fosters an environment where collaboration, transparency, and open communication take center-stage and hierarchies fade away. There is space for every individual to be themselves and feel motivated to bring their best to the table, as well as craft a promising career for themselves. The values that form our foundation are: Radical customer centricity Ownership-driven culture Keeping everything simple Long-term thinking Complete transparency Expertise and Qualifications We are seeking a highly motivated and experienced Senior Site Reliability Engineer to join our engineering team. As an SRE, you will be responsible for ensuring the reliability, availability, scalability, and performance of our applications and infrastructure. You will collaborate closely with software developers, platform engineers, and other team members to design, provision, build, and maintain systems that are scalable, secure, and highly available. What will make you a great fit for the role: 69 years of experience in SRE, DevOps, or system architecture roles with large-scale production systems. Extensive experience managing and scaling high-traffic, low-latency fintech systems, ensuring reliability, compliance, and secure transaction processing. Proven expertise in the networking stack, with hands-on experience in BGP, OSPF, DNS, HTTP(S), TCP/IP, MPLS, and VPN protocols. Advanced knowledge of GCP networking (VPC design, Shared VPC, Private Service Connect, Global Load Balancers, Cloud DNS, Cloud NAT, Network Intelligence Center, and Service Mesh). Strong background in managing complex multi-cloud environments (AWS, GCP, Azure) with a focus on secure and compliant architectures in regulated industries. Hands-on expertise in Terraform and Infrastructure-as-Code (IaC) for repeatable, automated deployments. Expertise in Kubernetes, container orchestration, and microservices, with production experience in regulated fintech environments. Advanced programming and scripting skills in Python, Go, or Java, applied to automation, risk reduction, and financial system resilience. Proficiency with monitoring and logging tools (Prometheus, Mimir, Grafana, Loki) to ensure real-time visibility into trading, payments, and transaction flows. Strong understanding of networking, load balancing, and DNS management across multi-cloud and hybrid infrastructures. Implemented end-to-end observability solutions (metrics, logs, and traces) to monitor and optimize transaction throughput, adhering to latency SLAs. Leadership skills with experience mentoring teams, fostering a culture of reliability, and partnering with cross-functional stakeholders in product teams. Strong communication, critical thinking, and incident management abilities, especially in high-stakes production incidents involving customer transactions. Bachelors or Masters degree in Computer Science, Engineering, or equivalent experience. What youll do: Architect and lead the design of scalable, reliable infrastructure solutions. Implement strategies for high availability, scalability, and low-latency performance. Define service-level objectives (SLOs) and service-level indicators (SLIs) to track performance and reliability. Drive incident management by identifying root causes and providing long-term solutions. Mentor junior engineers and foster a collaborative, learning-focused environment. Design advanced monitoring and alerting systems for proactive system management. Architect and optimize network topologies (hybrid cloud, multi-cloud, and on-prem) to support ultra-low-latency trading and compliance-driven workloads. Configure and manage cloud and on-prem networking components (VPCs, Shared VPCs, Private Service Connect, Cloud NAT, and Global Load Balancers) for secure and compliant transaction flows. Implement secure connectivity solutions (VPNs, Interconnect, Direct Connect, and service meshes) to meet fintech regulatory requirements and standards. Develop and maintain DNS, load-balancing, and traffic-routing strategies to ensure millisecond-level latency for real-time transactions. Evolve Infrastructure as Code (IaC) practices and principles to automate infrastructure provisioning. Collaborate on reliability roadmaps, performance benchmarks, and disaster recovery plans tailored for low-latency and high-throughput workloads. Manage Kubernetes clusters at scale, integrating service meshes like Istio or Linkerd. Implement chaos engineering principles to strengthen system resilience. Influence technical direction, reliability culture, and organizational strategies. Show more Show less

Posted 6 days ago

Apply

2.0 - 4.0 years

0 Lacs

chennai, tamil nadu, india

On-site

Before you apply to a job, select your language preference from the options available at the top right of this page. Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrowpeople with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level. Job Description Job Summary This position provides input, support, and performs full systems life cycle management activities (e.g., analyses, technical requirements, design, coding, testing, implementation of systems and applications software, etc.). He/She participates in component and data architecture design, technology planning, and testing for Applications Development (AD) initiatives to meet business requirements. This position provides input to applications development project plans and integrations. He/She collaborates with teams and supports emerging technologies to ensure effective communication and achievement of objectives. This position provides knowledge and support for applications development, integration, and maintenance. He/She provides input to department and project teams on decisions supporting projects. Responsibilities * Experience developing with most technologies (examples: front end, APIs-services/backend, database, MQ/Messaging. HTML/JavaScript, .NET, .NET Core, OpenShift, , Azure DevOps Server / TFS, GIT, Jenkins - CI/CD, SonarQube, Netsparker Dynatrace, Grafana/Loki). Security compliance Experience with Restful services, CI/CD pipelines Experience with Object Oriented Analysis & Design Experience with Agile and Scrum concepts Excellent written and verbal communication skills Excellent problem solving skills Excellent debugging skills Qualifications 2-4yrs of development experience using .Net, Angular, frontend technologies Bachelors Degree or International equivalent Bachelor&aposs Degree or International equivalent in Computer Science, Information Systems, Mathematics, Statistics, or related field - Preferred Employee Type Permanent UPS is committed to providing a workplace free of discrimination, harassment, and retaliation. Show more Show less

Posted 6 days ago

Apply

2.0 - 6.0 years

0 Lacs

karnataka

On-site

As an Engineering Support Analyst, you will be responsible for ensuring the stability and performance of critical business systems. You will act as a software detective, identifying and resolving issues across various system components. Your duties will include triaging bugs, escalating tickets with detailed context, responding to system alerts, and initiating On-Call procedures as needed, all while maintaining clear and effective communication. You will provide technical support for essential business systems, collaborating with Traders, Developers, DevOps, and SRE teams to ensure seamless system operations. Conducting root cause analysis, implementing preventive measures, and monitoring system alerts to proactively address incidents are key aspects of your role. You will escalate issues with comprehensive documentation and offer coverage for global teams, including those in Australia (AEDT) and Europe (CET). Additionally, driving continuous improvements in system reliability and support processes will be part of your responsibilities. Key Accountabilities: - Deliver high-quality support to global stakeholders. - Resolve incidents efficiently and effectively. - Utilize monitoring tools to detect and respond to issues proactively. - Contribute to continuous improvement initiatives and innovation in support practices. Preferred Experience & Skills: - 2-3 years of experience in technical support for critical business systems. - Strong analytical and problem-solving abilities. - Excellent verbal and written communication skills for effective collaboration with global teams. - Solid understanding of incident and problem management principles. - Experience with server stack and website support is advantageous. - Proficiency in debugging, issue analysis, and resolution. Technical Knowledge: - Familiarity with monitoring and observability tools such as Grafana, Prometheus, Loki, and Tempo. - Knowledge of Kubernetes, Docker, Linux, Windows, Kafka, and Postgres. - Experience in building Grafana dashboards integrating metrics, logs, and traces for proactive error detection. - Testing experience is a plus. Education & Certifications: - A tertiary qualification in Information Technology or a related field is highly desirable.,

Posted 1 week ago

Apply

2.0 - 6.0 years

0 Lacs

ghaziabad, uttar pradesh

On-site

As a Full Stack Software Engineer at RightCrowd, you will be part of a team revolutionizing physical access control through our SmartAccess platform. Our cutting-edge solutions are trusted by top organizations worldwide, enhancing the daily experiences of employees, visitors, and users. Our cloud-native backend powers SmartAccess, integrating wearables, mobile apps, and web applications to provide unmatched convenience and security. Join us in building the future of access control and contributing to innovative solutions that make a global impact. We are not seeking the perfect candidate with a flawless resume; instead, we value curiosity, a commitment to learning, and a desire to make a difference. If you are enthusiastic about overcoming challenges, expanding your skills, and driving groundbreaking solutions, we encourage you to apply, even if you do not meet all requirements. As a passionate Full Stack Software Engineer, you will collaborate with our remote team to enhance existing features and develop new solutions. Our team, rooted in a Belgian startup, embodies a startup spirit, emphasizing a high-responsibility, high-expectation environment with cutting-edge technology and minimal corporate overhead. Your responsibilities will include developing and maintaining web interfaces, contributing to backend services, evolving our cloud-native platform and infrastructure, ensuring high-quality deliverables through development testing, participating in requirements gathering and architectural decision-making, and providing third-line support when necessary. Your role will also involve reviewing use cases, UI and UX design, sharing knowledge through documentation, assisting in customer support requests, and being an eager learner open to adapting to new technologies. Key technologies in our stack include TypeScript, React, NodeJS, Mongo, Postgres, Redis, Azure, Terraform, Docker, Kubernetes, Grafana, Prometheus, Git, and more. To qualify for this role, you should have fluency in English, a commitment to lifelong learning, 2-4 years of software development experience in complex environments, expertise in NodeJS, TypeScript, React, Unix systems, Docker, excellent debugging and problem-solving skills, and the ability to work in a fast-paced environment. Join us at RightCrowd for the opportunity to be part of a leading company in safety, security, and compliance solutions, work on impactful products, grow professionally in a collaborative environment, and receive a competitive salary and benefits package. Apply now to make a difference with us!,

Posted 1 week ago

Apply

0.0 - 4.0 years

0 Lacs

haryana

On-site

As a Contract Logistics Specialist at Kuehne+Nagel, you will be responsible for managing end-to-end warehousing operations for customers. Your precise management will not only enhance team success but also contribute significantly to the day-to-day operations and overall success of warehouse and distribution centers. For instance, you will oversee tasks such as storing and delivering delicate flowers and fresh ingredients to local stores, catering to everyday lunches and family celebrations. Your role at Kuehne+Nagel truly plays a crucial part in various aspects beyond what one can initially imagine. We are currently looking for a motivated and inquisitive Junior DevOps Engineer to join our dynamic team. This position is well-suited for individuals with a strong service mindset, a solid technical background, and a keen interest in continuous learning and automation. You will collaborate closely with senior engineers to provide support in infrastructure maintenance and manage deployment pipelines effectively. Key Responsibilities: - Assist in deploying and maintaining WMS virtual machines utilizing tools like Jenkins and GitLab. - Work alongside development and operations teams to ensure seamless application delivery and optimal performance. - Monitor systems using tools such as Grafana and Oracle EM for efficient operations. - Contribute to troubleshooting and resolving infrastructure and application-related issues. - Document processes and actively participate in knowledge sharing within the team. Qualifications: - Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience). - Strong service orientation with a dedication to supporting internal teams and enhancing user experience. - Technical proficiency in areas like Linux, Oracle, Java, Virtualization, Cloud Platforms (e.g., AWS), Git, Jira, GitLab, Jenkins, ArgoCD, Terraform, Ansible, Docker, Kubernetes, OpenShift, Grafana, Loki, Mimir, Tempo. - Fundamental understanding of operations and system administration, monitoring and alerting practices, automation of repetitive tasks, troubleshooting, and root cause analysis. - Previous internship or project exposure in DevOps or related domains. - Familiarity with Agile methodologies. What's in it for you: Joining Kuehne+Nagel means becoming part of a global logistics leader that focuses on creating tangible impacts on business, customers, and careers. Here's what you can expect: - Global Exposure: Explore a world of international opportunities with a presence in over 100 countries. - People-Centric Culture: Join a team where your opinion is valued, and individuals genuinely care for each other. - Learning & Development: Evolve personally and professionally through top-notch training and career growth opportunities. - Innovation & Sustainability: Contribute to a forward-thinking organization driving real change in logistics and environmental sustainability. - Rewards & Recognition: Be acknowledged for your dedication, performance, and potential. - Stability with Agility: Experience the best of both worlds - the reliability of a trusted global brand combined with the entrepreneurial spirit of a startup. About Us: Logistics plays a vital role in everyday life, from the goods we use to the healthcare we rely on. At Kuehne+Nagel, your work transcends logistics; it influences both ordinary and extraordinary moments in people's lives worldwide. As a global leader with a rich history and a vision to propel the world forward, we provide a secure and supportive environment where your career can truly make a meaningful impact. Whether we are facilitating the delivery of life-saving medications, developing sustainable transportation solutions, or supporting local communities, your career journey with us will contribute to a greater purpose than you can envision.,

Posted 1 week ago

Apply

3.0 - 5.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Job Description : Work Schedule Standard (Mon-Fri) Environmental Conditions Office As part of the Thermo Fisher Scientific team, you will discover meaningful work that makes a positive impact on a global scale. Join our colleagues in bringing our Mission to life every day to enable our customers to make the world healthier, cleaner, and safer. We provide our global teams with the resources needed to achieve individual career goals while helping to take science a step beyond by developing solutions for the world's toughest challenges, like protecting the environment, ensuring our food is safe, or helping find cures for cancer. About the Role We are seeking a highly skilled and driven Developerto join our dynamic team. In this role, you will play a pivotal part in designing, building, and deploying innovative solutions that enhance the capabilities of our platform and deliver tangible business value. You will collaborate closely with cross-functional teams to understand requirements, identify opportunities, and develop scalable, high-quality software that aligns with our strategic goals. Our technology stack spans modern full-stack development, cloud-native infrastructure, and emerging areas such as AI/ML-offering you the opportunity to work on cutting-edge projects that drive real impact. This is a unique opportunity for a seasoned professional who is passionate about technology and eager to contribute to meaningful digital transformation initiatives. Key Responsibilities Design, develop, and implement advanced solutions to enhance our platform's capabilities. Customize applications and integrate seamlessly with other enterprise systems. Write, test, and debug complex scripts to support automation and operational efficiency. Develop and maintain API integrations with external systems and services. Collaborate with cross-functional teams, including business analysts and technical leads, to gather requirements and design scalable solutions. Perform regular system upgrades, health checks, and performance tuning to ensure optimal system health. Ensure high availability and performance of critical services across all environments. Troubleshoot and resolve technical issues to minimize disruption to business operations. Identify opportunities for improvement and recommend enhancements to system functionality and user experience. Participate actively in Agile/Scrum ceremonies, contributing to continuous delivery and improvement. Maintain detailed and up-to-date technical documentation. Participate in global on-call rotations, supporting operations in a follow-the-sun model. What We're Looking For Bachelor's degree in Computer Science, Engineering, or a related field. 3-5 years of experience building enterprise-scale web applications across the full Agile project lifecycle. Proven track record in developing scalable, high-availability distributed software systems. Strong development experience with: Backend: Golang, Python, Shell scripting Frontend: JavaScript, jQuery, Bootstrap, Angular or React, HTML5, CSS, JSON Node.js for server-side applications Proficient in working with relational and NoSQL databases such as PostgreSQL, MySQL, Oracle, and MongoDB. Hands-on experience with cloud platforms like AWS and/or Azure. Familiarity with container orchestration and infrastructure tools: Kubernetes (EKS), Terraform, Akamai (CDN), and related services. Solid understanding of microservices architecture, CI/CD pipelines, and DevOps practices. Experience working in Linux environments with knowledge of OS, networking, and storage fundamentals. Exposure to observability tools such as Grafana, Prometheus, Loki, Tempo, and Mimir. Experience working within Agile or SAFe frameworks. Familiarity with Major Incident Management (MIM) processes and effective incident participation. Proficient in API integration techniques and cross-platform data exchange. Strong analytical thinking, problem-solving ability, and excellent communication skills.

Posted 1 week ago

Apply

7.0 - 11.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Allegion India is seeking a highly motivated Senior Site Reliability Engineer who will play a critical role in ensuring the reliability, scalability, and performance of our organization&aposs systems and infrastructure, who will work with a team of cross-functional product development engineers to design, implement, and maintain highly available and resilient systems and whose expertise in automation, monitoring, and incident response will contribute to the overall stability and efficiency of our technology stack throughout the Allegion product portfolio. Job Description: Design, implement, and maintain highly available and scalable infrastructure systems, ensuring maximum uptime and performance. Collaborate with software engineering teams to build and deploy applications using best practices in reliability, scalability, and security. Develop and implement automation tools and frameworks to streamline operational processes, reduce manual intervention, and improve efficiency. Monitor and analyse system performance, identifying bottlenecks, and implementing solutions to optimize performance and scalability. Implement and maintain effective monitoring, alerting, and logging systems to proactively identify and resolve issues before they impact users. HandsOn Experience in building CI/CD automated pipelines using GitHUB Actions/Jenkins/GitLab or equivalent platform Excellent in Automating workflows or solutions using Python/Go/Shell Lead incident response and root cause analysis efforts, driving continuous improvement and preventing future incidents. Collaborate with cross-functional teams to define and enforce best practices, standards, and guidelines for system reliability and performance. Participate in on-call rotations and respond to incidents, ensuring timely resolution and minimal impact to users and thereby meeting SLAs. Plan and devise Disaster Recovery (DR) strategies and implement DR Plans. Mentor and provide guidance to junior team members, fostering a culture of learning and growth. Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions. Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement. Provide primary operational support and engineering for multiple large-scale distributed software applications. Required Knowledge, Skills and Abilities: Proven experience as a Site Reliability Engineer or similar role, with a focus on designing and maintaining highly available and scalable systems. Strong programming and scripting skills (Python, Bash, etc.) to automate operational tasks and develop tooling. Experience with cloud platforms (AWS) and containerization technologies (Docker, EKS). Proficient in configuration management tools like Ansible and infrastructure-as-code frameworks such as Terraform and CloudFormation. Experience with monitoring and logging tools (Prometheus, Grafana, Loki, Sentry.io, CloudWatch, etc.) for proactive system monitoring and troubleshooting. Ability to program (Structured and OOP) using one or more high-level languages, such as Java and JavaScript Solid understanding of networking principles, protocols, and security best practices. Strong problem-solving skills and the ability to work effectively in a fast-paced, dynamic environment. Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams. Experience with distributed storage technologies such as NFS, Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn) Proactive approach to identifying problems, performance bottlenecks, and areas for improvement. Experience in Agile methodologies Strong skills in software design, design patterns Experience in different architecture patterns like client-server/server less computing. Effective written, verbal and presentation skills with the ability to clearly articulate ideas and concepts. Self-directed and able to direct others. Desired Skills & Abilities: Experience with setting up performance/load test environments. Familiarity with SOC2 audit processes Required Education and/or Experience: BE/B Tech/M Tech/MCA/MSc in Computer Science Engineering 7 to 11 Years of experience in Software Application Development/CloudOps/SRE Allegion is a diverse and inclusive environment. We are an equal opportunity employer and are dedicated to hiring qualified protected veterans and individuals with disabilities. If for any reason you cannot apply through the job center, please contact HR, Allegion India for special accommodation. We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law. Show more Show less

Posted 1 week ago

Apply

8.0 - 12.0 years

0 Lacs

chennai, tamil nadu

On-site

As a Multi-Cloud senior DevOps and Team Manager with experience in CICD, you will be responsible for leading a team and ensuring efficient deployment and management of cloud infrastructure and services across various platforms. With a minimum of 8-10 years of experience, including at least 2 years in a leadership or Architect role, you will have a strong background in technologies such as Kubernetes, GCP, AWS, DevOps practices, and more. Your role will require expertise in areas such as Public Cloud (AWS, GCP, Azure), Microservices (Kubernetes, Docker, OpenShift), Infrastructure as Code (Terraform, Pulumi), and a variety of monitoring tools and networking technologies. Proficiency in CICD tools like Jenkins, GitLabCI, and knowledge of scripting languages like Bash and Python will be essential for optimizing deployment processes. In addition to technical skills, you will be expected to demonstrate experience in Release Management, Security practices, and Agile methodologies. As a Team Manager, you will oversee the performance and development of your team members, ensuring successful project delivery and alignment with organizational goals. This is a full-time position based in Chennai, Tamil Nadu, offering a hybrid remote work environment. Please note that candidates with less than 8 years of experience need not apply, and the role is not suitable for Junior or Mid-level professionals. If you meet the specified qualifications and have a proven track record in cloud infrastructure management and team leadership, we encourage you to apply by providing details about your current employer, location, notice period, current and expected salary, and confirming your Lead or Architect level status.,

Posted 1 week ago

Apply

4.0 - 8.0 years

0 Lacs

pune, maharashtra

On-site

You are seeking a highly skilled Grafana Stack Consultant with expertise in managing and scaling self-hosted observability platforms. This is a full-time contract opportunity that allows you to work remotely from anywhere. Your existing observability infrastructure includes Grafana, Loki, Tempo, and Mimir, running in a Docker Swarm environment across multiple Linux VMs. The system processes around 500GB of logs daily and utilizes Wasabi S3 as the object storage backend. Your responsibilities involve conducting a detailed performance analysis of the Grafana observability stack, particularly focusing on Loki, Tempo, and Mimir. You need to identify and address performance bottlenecks in log ingestion, indexing, and query execution. Optimizing system configurations for improved throughput, latency, and fault tolerance is crucial. Troubleshooting and resolving instability and downtime in the observability pipeline are key tasks. You are expected to recommend and implement best practices for operating the Grafana stack in Docker Swarm environments. Collaboration with internal engineering teams is necessary to guide architecture improvements and ensure that the observability stack meets reliability goals. To qualify for this role, you should have at least 5 years of hands-on experience with self-hosted Grafana, Loki, Tempo, and Mimir. A solid understanding of observability architectures, experience operating components at scale, proficiency in container orchestration (especially Docker Swarm), familiarity with Linux VM-based deployments, and experience with object storage backends like Wasabi S3 are essential. You must possess the ability to diagnose complex issues across distributed systems and propose effective solutions. Strong problem-solving skills and the capacity to work independently in a remote setting are required. Desirable qualifications include experience with migrating from Docker Swarm to Kubernetes, familiarity with CI/CD practices and infrastructure automation, and previous consulting experience in infrastructure or observability-focused roles.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

hyderabad, telangana

On-site

As a Senior Observability Engineer, you will play a crucial role in leading the design, development, and maintenance of observability solutions across our infrastructure, applications, and services. Your primary responsibility will be to implement cutting-edge monitoring, logging, and tracing solutions to ensure the reliability, performance, and availability of our complex, distributed systems. Collaboration with cross-functional teams, including Development, Infrastructure Engineers, DevOps, and SREs, will be essential to optimize system observability and enhance our incident response capabilities. Key Responsibilities: - Lead the Design & Implementation of observability solutions for cloud and on-premises environments, encompassing monitoring, logging, and tracing. - Drive the Development and maintenance of advanced monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics. - Implement Distributed Tracing frameworks like OpenTelemetry, Jaeger, or Zipkin to enhance application performance diagnostics and troubleshooting. - Optimize Log Management and analysis strategies using tools like Elasticsearch, Splunk, Loki, and Fluentd for efficient log processing and insights. - Develop Advanced Alerting and anomaly detection strategies to proactively identify system issues and improve Mean Time to Recovery (MTTR). - Collaborate with Development & SRE Teams to enhance observability in CI/CD pipelines, microservices architectures, and various platform environments. - Automate Observability Tasks by leveraging scripting languages such as Python, Bash, or Golang to increase efficiency and scale observability operations. - Ensure Scalability & Efficiency of monitoring solutions to manage large-scale distributed systems and meet evolving business requirements. - Lead Incident Response by providing actionable insights through observability data for effective troubleshooting and root cause analysis. - Stay Abreast of Industry Trends in observability, Site Reliability Engineering (SRE), and monitoring practices to continuously improve processes. Required Qualifications: - 5+ years of hands-on experience in observability, SRE, DevOps, or related fields, with a proven track record in managing complex, large-scale distributed systems. - Expert-level proficiency in observability tools such as Prometheus, Grafana, Datadog, New Relic, AppDynamics, and the ability to design and implement these solutions at scale. - Advanced experience with log management platforms like Elasticsearch, Splunk, Loki, and Fluentd, optimizing log aggregation and analysis for performance insights. - Deep expertise in distributed tracing tools like OpenTelemetry, Jaeger, or Zipkin, focusing on performance optimization and root cause analysis. - Extensive experience with cloud environments (Azure, AWS, GCP) and Kubernetes for deploying and managing observability solutions in cloud-native infrastructures. - Advanced proficiency in scripting languages like Python, Bash, or Golang, and experience with Infrastructure as Code (IaC) tools such as Terraform and Ansible. - Strong understanding of system architecture, performance tuning, and troubleshooting production environments with scalability and high availability in mind. - Proven leadership experience and the ability to mentor teams, provide technical direction, and drive best practices for observability and monitoring. - Excellent problem-solving skills, emphasizing actionable insights and data-driven decision-making. - Ability to lead high-impact projects, communicate effectively with stakeholders, and influence cross-functional teams. - Strong communication and collaboration skills, working closely with engineering teams, leadership, and external partners to achieve observability and system reliability goals. Preferred Qualifications: - Experience with AI-driven observability tools and anomaly detection techniques. - Familiarity with microservices, serverless architectures, and event-driven systems. - Proven track record of handling on-call rotations and incident management workflows in high-availability environments. - Relevant certifications in observability tools, cloud platforms, or SRE best practices are advantageous.,

Posted 2 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

haryana

On-site

As a Cloud & DevOps Engineer at Omniful, you will play a crucial role in designing, implementing, and maintaining scalable cloud infrastructure on platforms like AWS, GCP, and Azure. Your responsibilities will include automating infrastructure provisioning using Terraform, building and managing CI/CD pipelines with tools like Jenkins, GitHub Actions, and GitLab, as well as deploying and managing containerized applications using Docker, Kubernetes (EKS, GKE), and AWS ECS. Your expertise will ensure system reliability, security, and performance by utilizing modern observability tools such as Prometheus, Grafana, and others. Collaboration with engineering teams will be essential to support fast and reliable operations. We are looking for candidates with a minimum of 3 years of hands-on experience in Cloud & DevOps Engineering, possessing strong expertise in technologies like AWS, GCP, Azure, and Terraform. Proficiency in CI/CD tools like Jenkins, GitHub Actions, GitLab, and ArgoCD, along with experience in Docker, Kubernetes, and managing AWS ECS environments, is required. Scripting knowledge in Bash, Python, and YAML is essential, as well as familiarity with observability stacks and a strong understanding of Git and version control systems. Additionally, having experience in incident response, system troubleshooting, and understanding of cloud security best practices will be advantageous in excelling in this role at Omniful. If you are passionate about building and managing scalable cloud infrastructure, automating systems, optimizing CI/CD pipelines, and implementing infrastructure as code, this position offers an exciting opportunity to contribute to our fast-growing B2B SaaS startup and make a significant impact in the field of operations and supply chain management.,

Posted 2 weeks ago

Apply

15.0 - 19.0 years

0 Lacs

karnataka

On-site

As a leader at Allegion, you will be at the forefront of creating peace of mind by pioneering safety and security for the people you know and love. With a global presence spanning more than 30 brands, 12,000+ employees, and products distributed in 130 countries, we specialize in security solutions centered around doorways and beyond. We take pride in being recognized with the Gallup Exceptional Workplace Award in 2024, a testament to our highly engaged workplace culture. Your primary responsibilities will include leading a team in developing and maintaining reliable infrastructure and CI/CD pipelines, as well as identifying and recommending continuous improvements in DevOps services. You will standardize and train members across product teams on relevant DevOps processes, create and manage an Observability Platform to monitor application health, and stay abreast of industry trends and emerging technologies in the DevOps domain. Moreover, you will contribute to DevOps workflows through scripting and system administration, optimize system performance, apply enterprise security policies, and lead the technology strategy for Infrastructure as Code and tooling. It will be essential to enhance cloud infrastructure, identify cost-saving opportunities, and maintain documentation of processes and technical guidelines. Building strong relationships with stakeholders at various levels, you will effectively communicate the value and impact of DevOps initiatives. Ideal candidates for this role will possess 15 to 18 years of experience in Software Application Development, DevOps, or SRE, with at least 8 years of DevOps experience. Proficiency in virtualization, container technologies, and cloud solutions is highly desirable, along with knowledge of software development languages and experience with tools like Git, Azure DevOps, and Artifactor. Skills in configuration management tools, monitoring, and logging tools are also key requirements. Education-wise, a BE/B Tech/M Tech/MCA/MSc in Computer Science Engineering is preferred. If you are a passionate leader with a keen interest in DevOps, DevSecOps, Agile, and Security, possess strong communication skills, and thrive in a dynamic, matrix organization, this role presents an exciting opportunity for professional growth and development at Allegion. Allegion is dedicated to fostering a diverse and inclusive workplace where differences are celebrated, and all employees have the opportunity to thrive and grow. If you are seeking a rewarding career that allows you to make a meaningful impact, values work-life balance, and prioritizes professional development, Allegion could be the perfect place for you to grow your career.,

Posted 2 weeks ago

Apply

2.0 - 6.0 years

0 Lacs

noida, uttar pradesh

On-site

We are seeking a skilled Backend Engineer proficient in Golang (or Rust) to contribute to the development of cutting-edge decentralized communication systems. In this role, you will focus on tasks such as enhancing peer-to-peer networks, implementing secure messaging solutions, integrating cryptographic protocols, and incorporating blockchain technologies in a privacy-centric setting. This is a full-time position that requires your presence on-site. If you are enthusiastic about security, networking, and decentralization, this opportunity is tailored for you. Your primary responsibilities will include designing and executing backend systems dedicated to decentralized communication and data transmission. You will also be tasked with refining P2P protocols, establishing secure messaging layers, constructing secure relay nodes, and supporting offline messaging functionalities. Furthermore, you will play a crucial role in integrating advanced cryptography and facilitating self-hosted deployments using Docker and Kubernetes. Collaboration with AI engineers, engaging in code reviews, and contributing to security audits are also key aspects of this role. The technologies you will be working with include Golang (proficiency required), Rust (mandatory), and optionally C and C++. In terms of networking, you will engage with libp2p, QUIC, Onion Routing (Tor/I2P concepts), TLS 1.3, Noise Protocol, WebSocket, and gRPC. Your role will also involve working with various cryptographic tools such as libsodium, liboqs, secp256k1, ed25519, ECDSA, SHA-256/Keccak, Zero-Knowledge Proofs, HashiCorp Vault, HKDF, and Noise handshake protocols. Blockchain-related tasks may encompass Decentralized Identity (DID) integrations, smart contracts (optional), and MPC wallet signing for BTC, ETH, and TRON. DevOps responsibilities will involve Docker, Kubernetes, Terraform, HashiCorp Vault, AWS KMS, and Sealed Secrets, while monitoring tasks will utilize Prometheus, Grafana, Loki, and Tempo. To be successful in this role, you should possess strong backend development skills with a focus on Golang (knowledge of Rust is advantageous), a solid understanding of P2P networking, distributed systems, and encryption, practical experience with Docker, Kubernetes, and self-hosted environments, a genuine interest in decentralized technologies, network security, and privacy, and the willingness to work full-time on-site. Additional qualifications that would be advantageous include experience with Mixnet overlays or anonymous routing systems, exposure to blockchain integration for decentralized identity or staking systems, and contributions to open-source privacy/security projects. This position requires a B.Tech/BCA/MCA qualification, a morning shift schedule, and is based in Noida Sector 62. It is a full-time role with 2 to 5 years of experience required. Please note that this is an in-person position with a day shift schedule.,

Posted 2 weeks ago

Apply

8.0 - 10.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Teamwork makes the stream work. Roku is changing how the world watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we&aposve set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers. From your first day at Roku, you&aposll make a valuable - and valued - contribution. We&aposre a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines. About the role Do you want to help build Rokus next-generation unified cloud-agnostic hosting platform Are you experienced with Terraform, Kubernetes, and Istio Can you write applications and automation in Golang, Python, or Shell Are you interested in being part of a multinational team to design and create the platform If so, this role is for you! About the team The central Infrastructure Engineering team is looking for highly skilled infrastructure and software engineers to help develop and drive Rokus service mesh hosting architecture. Our team is responsible for building and scaling both the Platform (Kubernetes, Istio, Envoy, operators, and more) to affect Rokus transition towards a single, unified, cloud-agnostic system where all teams speak the same infrastructure language. We are engaging with Rokus engineering teams to migrate hundreds of workloads to our common platform, including helping augment and automate CI/CD flows. We are looking for engineers that love working collaboratively across teams to achieve results that impact the entire company. What youll be doing: Help architect, design, build, deploy Rokus next generation service mesh and cloud infrastructure. Contribute to evolving our?deployments?by building solutions using?Docker, Kubernetes, Istio/Envoy, and Terraform. Join in efforts to investigate new technology and tools to be adopted by Roku.? Help build and integrate security as part of the infrastructure. Collaborate on internal customer engagements as we migrate workloads to Kubernetes + Istio + open-source observability tools and technologies. Work closely with the Observability team to integrate and scale existing and new?observability?tools as part of a holistic solution. Work closely with the SRE team to maintain availability of our services and improve onboarding workflows. Mentor other team members to define and adopt new or improve existing processes and procedures. Were excited if you have: Strong hands-on experience in cloud technologies. AWS, ECS, and Kubernetes (EKS, GKE, AKS or other) preferred.?Knowledge of another cloud platform like GCP or Azure is a plus but not required. Demonstrated understanding of overall infrastructure design and developing tools to enable and automate the infrastructure.? Experienced with a high-level scripting language?(such as?Python) and?a?system programming language?(such as Go). Strong experience with Kubernetes. Production experience in testing and deploying applications via modern CI/CD tools and concepts Familiarity with Observability tools like Prometheus, Thanos, Loki, Grafana, etc. The drive and self-motivation to understand the intricate details of a complex infrastructure environment.?? Ability to work independently. Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders.?? Experience with integrating AI tools for improving processes and reducing toil is a plus. Masters degree or equivalent experience (8+ years) You have either tried Gen AI in your previous work or outside of work or are curious about Gen AI and have explored it. Benefits Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Our employees can take time off work for vacation and other personal reasons to balance their evolving work and life needs. It&aposs important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter. The Roku Culture Roku is a great place for people who want to work in a fast-paced environment where everyone is focused on the company&aposs success rather than their own. We try to surround ourselves with people who are great at their jobs, who are easy to work with, and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. We&aposre independent thinkers with big ideas who act boldly, move fast and accomplish extraordinary things through collaboration and trust. In short, at Roku you&aposll be part of a company that&aposs changing how the world watches TV.? We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers, which itself is a two-part idea. We come up with the solution, but the solution isn&apost real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation, one that has served us well since 2002.? To learn more about Roku, our global footprint, and how we&aposve grown, visit https://www.weareroku.com/factsheet. By providing your information, you acknowledge that you have read our Applicant Privacy Notice and authorize Roku to process your data subject to those terms. Show more Show less

Posted 2 weeks ago

Apply

7.0 - 9.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Profile Description Were seeking someone to join our team as (Title)Were seeking someone to join our team as Site Reliability Engineer experienced in developing and/or supporting Enterprise Applications, Willingness to embrace Agile and DevOps/SRE concepts and experienced in Solid analytical skills, problem determination, and resolution recovery processes. WM_Technology Wealth Management Technology is responsible for the design, development, delivery, and support of the technical solutions behind the products and services used by the Morgan Stanley Wealth Management Business. Practice areas include: Analytics, Intelligence, & Data Technology (AIDT), Client Platforms, Core Technology Services (CTS), Financial Advisor Platforms, Global Banking Technology (GBT), Investment Solutions Technology (IST), Institutional Wealth and Corporate Solutions Technology (IWCST), Technology Delivery Management (TDM), User Experience (UX), and the CAO team. Core Platform Services Core Platform Services is responsible for driving Resiliency, Automation, Performance, Stability, and Efficiency across Wealth Management Technology. Software Production Management & Reliability Engineering This is Director position that oversees the production environment, ensuring the operational reliability of deployed software, and implements strategies to optimize performance and minimize downtime. Morgan Stanley is an industry leader in financial services, known for mobilizing capital to help governments, corporations, institutions, and individuals around the world achieve their financial goals. At Morgan Stanley India, we support the Firms global businesses, with critical presence across Institutional Securities, Wealth Management, and Investment management, as well as in the Firms infrastructure functions of Technology, Operations, Finance, Risk Management, Legal and Corporate & Enterprise Services. Morgan Stanley has been rooted in India since 1993, with campuses in both Mumbai and Bengaluru. We empower our multi-faceted and talented teams to advance their careers and make a global impact on the business. For those who show passion and grit in their work, theres ample opportunity to move across the businesses for those who show passion and grit in their work. Interested in joining a team thats eager to create, innovate and make an impact on the world Read on What Youll Do In The Role Proactively detecting, troubleshooting, and resolving all issues affecting production applications. This involves coordination with and escalation to development and external teams where necessary. This team owns all issues escalated to us until it is resolved or a workaround is provided for end user to continue functioning. Responsible for maintaining clear, concise, and timely communications with affected parties during the investigation and resolution of any individual or system-wide outage. Responsible for the stability of the Production environment. Develop and continually revise (in partnership with other teams where necessary) suitable policies and procedures to ensure appropriate application development standards are available to guide development for systems deployed to Production. As the gatekeepers of the Production environment, responsible for ensuring the Change Implementation Management guidelines/policies are adhered to for all systems deployed to Production.* Responsible for servicing all requests for data or other activities that require access to Production systems Work with development teams at the appropriate stages in application development to ensure any new systems or projects meet the Production standard Responsible for maintaining and growing a body of knowledge that is accessible to all team members. Ensure information regarding any support related activities or issues are available and easily accessible. The goal is to improve self-reliance and reduce dependency on the availability of development or external team resources for the initial troubleshooting and resolution of problems. As a team member with expertise in deep analytical triage, you will provide subject matter expertise in debugging, issue analysis and troubleshooting, working with business and technical colleagues to provide reviews and recommendations to avoid any future application issues. Produce guidance documentation, standards and procedures, products assessments, and training material including working with the various application and infrastructure support teams ensuring that they are documenting every single troubleshooting step in Morgan Stanley knowledge base system to resolve issues in a faster time frame. You will serve as a fully seasoned/proficient technical resource; provide technical knowledge in outage management and proactive solutions to improve the user experience. What Youll Bring To The Role At least 4 years relevant experience would generally be expected to find the skills required for this role Minimum 7 years of extensive experience in Mainframe (Batch & Backend processing), SQL, MSSQL and Teradata technologies. 5+ years of experience with handling mainframe & Autosys jobs abends. Good understanding of COBOL, JCL, Mainframe & Distributed DB2 technologies SQL/Databases (MSSQL/SYBASE/DB2): Understanding of tables, views, indexes, and stored procedures, and the ability to understand them by reading their definitions. Familiarity with SQL constructs. Understanding of transactions, query plan analysis and database troubleshooting. Unix / Linux: Experience of supporting Unix based applications including experience troubleshooting in a Unix environment. Shell Scripting: Ability to write a shell script from scratch and ability to understand existing scripts by looking at them. Autosys: Ability to create and debug Autosys jobs and dependencies. Ability to analyze a complex job stream and correct any inconsistencies, errors or omissions and point out potential problems. Experience with languages such as: Java, Cobol, PowerShell, PHP, Python, Perl, and Ruby Have experience with observability tools such as Prometheus, Grafana , Loki, kibana, splunk etc 5+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail. Have administrative competence in at least one major programming language or platform (for example: COBOL, JCL, Perl, Powershell, Python, Java or C#) Experience/knowledge with distributive web hosting services, databases and MQ processing. i.e. Tomcat, WebSphere, Microsoft IIS, Db2, MSSQL Experience in developing and/or supporting Distributed, Mainframe batch and ETL technologies. Experience working in Job Schedulers like TWS, comet and Autosys.* Good working knowledge of Cloud Engineering. Understanding of private cloud principles and exposure to public cloud offerings such as AWS, Azure or similar technology is preferred Willingness to embrace Agile and DevOps/SRE concepts. Windows: Basic understanding of the Windows environment Experience with incident on call and ability to respond to emergencies on a 24/7 basis Proven ability to understand and troubleshoot complex problems under pressure Hands-on experience administering large-scale, high-availability systems and the tools to monitor performance and availability Solid analytical skills, problem determination, and resolution recovery processes Ability to interface and cultivate excellent working relationships with technology teams, business analysts, and vendors Should be a fast learner of technologies in a quick paced environment. Have strong organizational skills and the ability to manage multiple tasks and high pressure situations for outage handling, management, or resolution Are driven to learn about new technologies, techniques and what it takes to be an integral member of this team Hands-on experience administering large-scale, high-availability systems and the tools to monitor performance and availability Experience creating technical architecture documentation Excellent communication and writing skills specific to technical discussions across the management layers BS/MS or equivalent, preferably in quantitative discipline (Computer Science, Computer Engineering, EE, Math, Physics). What you can expect from Morgan Stanley We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 85 years. At our foundation are five core values putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back that guide our more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, youll find trusted colleagues, committed mentors and a culture that values diverse perspectives, individual intellect and cross-collaboration. Our Firm is differentiated by the caliber of our diverse team, while our company culture and commitment to inclusion define our legacy and shape our future, helping to strengthen our business and bring value to clients around the world. Learn more about how we put this commitment to action: morganstanley.com/diversity. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. What You Can Expect From Morgan Stanley We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - arent just beliefs, they guide the decisions we make every day to do what&aposs best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, youll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. Theres also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser. Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents. Show more Show less

Posted 2 weeks ago

Apply

8.0 - 12.0 years

0 Lacs

chennai, tamil nadu

On-site

You should have 8 to 11 years of experience in the field. The job is based in Chennai or Bangalore. Your technical skills should include: - Expertise in Prometheus setup, scaling, and federation. You should have knowledge of Thanos, Cortex, or VictoriaMetrics for long-term storage. Additionally, hands-on experience with PromQL for writing complex queries is required. - Proficiency in Grafana for creating dashboards and integrating with multiple data sources. - In-depth experience with ELK, Splunk, Loki, or similar logging tools, both with query languages and dashboarding. - Hands-on experience managing observability infrastructure in Kubernetes, Docker, or other container technologies. - Proficiency in scripting and automation using Python, Bash, or similar scripting languages. Experience with Infrastructure as Code tools like Terraform or Ansible is preferred.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

ahmedabad, gujarat

On-site

As a Cloud Infrastructure & DevOps Engineer in the Space Technology industry, your primary responsibility will be to lead the development of cloud-agnostic backend and infrastructure systems that can be seamlessly deployed across various environments such as AWS, GCP, Azure, and on-premise setups. You will be tasked with designing resilient deployment pipelines, optimizing compute and storage resources, ensuring secure multi-tenant SaaS practices, and implementing observability for the mission-critical satellite intelligence platform. This role demands a deep understanding of DevOps principles, a robust systems mindset, and the ability to build solutions that are both portable and resilient. Your key responsibilities will include: Infrastructure Ownership & Cost Optimization: - Architecting a cloud-agnostic infrastructure using open-source and platform-independent tools. - Implementing autoscaling, load balancing, and spot/preemptible compute optimization. - Containerizing services for consistent deployments across different providers and environments. Security & Deployment Policies: - Implementing secure networking, RBAC, and identity controls abstracted from cloud providers. - Defining and enforcing platform-agnostic rollout/rollback strategies for high-availability deployments. - Ensuring compliance with SaaS security standards across all cloud platforms. Observability & Monitoring: - Setting up unified observability stacks such as Prometheus, Grafana, and Loki to monitor infrastructure health. - Defining and tracking SLAs, SLOs, and real-time incident alerting. - Building centralized logging and metrics pipelines that are environment-agnostic. Database & Data Management: - Managing cloud-agnostic data backends like PostgreSQL/PostGIS, MongoDB, and object storage compatible with S3 APIs. - Implementing multi-region backups, replication, and disaster recovery plans. CI/CD & Infrastructure as Code (IaC): - Building reusable, cloud-agnostic CI/CD pipelines using tools like GitHub Actions, GitLab CI/CD, or ArgoCD. - Using Terraform and Ansible to provision infrastructure on any cloud provider or bare-metal server. - Packaging workloads using Docker and deploying them using Kubernetes clusters across cloud/on-prem environments. Required Skills & Experience: - Demonstrated expertise in multi-cloud or hybrid cloud deployments (AWS, GCP, Azure, on-prem). - Proficiency in Kubernetes, Docker, Terraform, Ansible, and container orchestration best practices. - Deep understanding of IAM, RBAC, VPNs, secrets management, and secure API design. - Experience with cloud-neutral observability stacks like Prometheus, Grafana, Loki, and ELK. - Operational knowledge of PostgreSQL/PostGIS, MongoDB, Redis, and S3-compatible storage. - Cloud-agnostic CI/CD experience with GitHub Actions, GitLab, or ArgoCD. Preferred Qualifications: - Experience with on-premise Kubernetes clusters (e.g., k3s, RKE2, OpenShift). - Familiarity with cloud abstraction platforms. - Knowledge of event-driven or edge computing architectures. In this role, you will have the opportunity to work with a passionate team of engineers dedicated to building resilient, portable, and cloud-neutral systems that scale across various environments. You will receive mentorship from experienced professionals in the field and be part of a growth-minded team that values learning and innovation. Join us for a challenging yet rewarding experience with plenty of opportunities for fun and personal development.,

Posted 2 weeks ago

Apply

8.0 - 13.0 years

8 - 12 Lacs

pune, maharashtra, india

On-site

Minimum 8 years of software development experience with strong expertise in Spring Boot and React frameworks. Proven hands-on experience with AWS services and infrastructure automation tools such as Terraform and CloudFormation. Practical knowledge of monitoring and logging tools, including Prometheus, Loki, and Grafana. Experience working with Postgres or similar relational databases. Bachelors degree in Computer Science, IT, or a related technical field, complemented by strong problem-solving and communication skills. Your Benefits GLOBAL DIVERSITY Diversity means many things to us, different brands, cultures, nationalities, genders, generations even variety in our roles. You make us unique! ENTERPRISING SPIRIT- Every role adds value. Were committed to helping you develop and grow to realize your potential. POSITIVE IMPACT Make it personal and help us feed the world. INNOVATIVE TECHNOLOGIES - You can combine your love for technology with manufacturing excellence and work alongside teams of people worldwide who share your enthusiasm. MAKE THE MOST OF YOU Benefits include health care and wellness plans and flexible and virtual work option .

Posted 2 weeks ago

Apply

15.0 - 17.0 years

0 Lacs

bengaluru, karnataka, india

On-site

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you! NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for over 30 years. It's an outstanding legacy of innovation that's fueled by phenomenal technology and exceptional people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and exceptional talent. As an NVIDIAN, you'll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. What You Will Be Doing: Architect, lead, and scale globally distributed production systems supporting AI/ML, HPC, and critical engineering platforms across hybrid and multi-cloud environments. Design and lead implementation of automation frameworks that reduce manual tasks, promote resilience, and uphold standard methodologies for system health, change safety, and release velocity. Define and evolve platform-wide reliability metrics, capacity forecasting strategies, and uncertainty testing approaches for sophisticated distributed systems. Lead cross-organizational efforts to assess operational maturity, address systemic risks, and establish long-term reliability strategies in collaboration with engineering, infrastructure, and product teams. Pioneer initiatives that influence NVIDIA's AI platform roadmap, participating in co-development efforts with internal partners and external vendors, and staying ahead of academic and industry advances. Publish technical insights (papers, patents, whitepapers) and drive innovation in production engineering and system design. Lead and mentor global teams in a technical capacity, participating in recruitment, design reviews, and developing standard methodologies in incident response, observability, and system architecture. What We Need to See: 15+ years of experience in SRE, Production Engineering, or Cloud Infrastructure, with a strong track record of leading platform-scale efforts and high-impact programs. Deep expertise in Linux/Unix systems engineering and public/private cloud platforms (AWS, GCP, Azure, OCI). Expert-level programming in Python and one or more languages such as C++, Go or Rust. Demonstrated experience with Kubernetes at scale, CPU/GPU scheduling, microservice orchestration, and container lifecycle management in production. Hands-on expertise in observability frameworks (Prometheus, Grafana, ELK, Loki, etc.) and Infrastructure as Code (Terraform, CDK, Pulumi). Proficiency in Site Reliability Engineering concepts like error budgets, SLOs, distributed tracing, and architectural fault tolerance. Ability to influence multi-functional collaborators and drive technical decisions through effective written and verbal communication. Proven track record to complete long-term, forward-looking platform strategies. Degree in Computer Science or related field, or equivalent experience Ways to Stand Out from the Crowd: Hands-on experience building platforms for large-scale AI training, inferencing, and data movement pipelines. Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow, JAX) and orchestration frameworks (e.g., Ray, Kubeflow). Expertise in hardware fleet observability, predictive failure analysis, and power/resource-aware scheduling. Experience leading operational readiness efforts and reliability engineering in GPU-heavy environments. Track record of driving cultural improvements in incident management, root cause analysis, and postmortem processes across large teams. Join us and build the infrastructure that powers the world's most advanced AI. Apply now and make your mark at NVIDIA! Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.

Posted 2 weeks ago

Apply

12.0 - 14.0 years

0 Lacs

bengaluru, karnataka, india

On-site

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you! NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for over 30 years. It's a unique legacy of innovation that's fueled by phenomenal technology and outstanding people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAn, you'll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the AI Infrastructure Production engineering team and see how you can make a lasting impact on the world. What You Will Be Doing: Develop and maintain large-scale systems supporting critical use cases for AI Infrastructure, driving reliability, operability, and scalability across global public and private clouds. Implement SRE fundamentals, including incident management, monitoring, and performance optimization, while designing automation tools to reduce manual processes and operational overhead. Build tools and frameworks to improve observability, define actionable reliability metrics, and enable fast issue resolution, driving continuous improvement in system performance. Establish frameworks for operational maturity, lead sustainable incident response protocols, and conduct blameless postmortems to improve team efficiency and system resilience. Work with engineering teams to deliver innovative solutions, mentor peers, uphold high standards for code and infrastructure, and contribute to hiring for a diverse, high-performing team. What We Need to See: Degree in Computer Science or related field, or equivalent experience with 12+ years in Software Development, SRE, or Production Engineering. Proficiency in Python and at least one other language (C/C++, Go, Perl, Ruby). Expertise in systems engineering within Linux or Windows environments and cloud platforms (AWS, OCI, Azure, GCP). Strong understanding of SRE principles, including error budgets, SLOs, SLAs, and Infrastructure as Code tools (e.g., Terraform CDK). Hands-on experience with observability platforms (e.g., ELK, Prometheus, Loki) and CI/CD systems (e.g., GitLab). Strong communication skills with the ability to convey technical concepts effectively to diverse audiences. Commitment to fostering a culture of diversity, curiosity, and continuous improvement. Ways to stand out from the crowd: Experience in AI training, inferencing, and data infrastructure services. Proficiency in deep learning frameworks like PyTorch, TensorFlow, JAX, and Ray. A strong background in hardware health monitoring and system reliability. Hands-on expertise in operating and scaling distributed systems with stringent SLAs, ensuring high availability and performance. Proven experience in incident, change, and problem management processes, fostering continuous improvement in sophisticated environments.

Posted 2 weeks ago

Apply

4.0 - 8.0 years

0 Lacs

haryana

On-site

As a DevOps Engineer specializing in Azure Cloud, CI/CD automation, Kubernetes, and SRE, you will be responsible for managing cloud resources, automating infrastructure, and ensuring system reliability. Your primary responsibilities will include managing Azure Cloud infrastructure using Terraform, designing and maintaining CI/CD pipelines with Jenkins, automating SRE tasks for high availability and performance, deploying and managing Kubernetes applications using Helm, setting up monitoring & logging with Loki, Grafana, ELK, optimizing system performance, troubleshooting issues, collaborating with teams, and documenting best practices. You must possess expertise in Azure Cloud with additional knowledge of AWS/GCP being a plus. Proficiency in Terraform for IaC, Jenkins for CI/CD, Kubernetes, Helm for container orchestration, and observability tools such as Loki, Grafana, and ELK is essential. Experience in automation using Ansible or similar tools is required. Additionally, exposure to AWS/GCP, Ansible, and knowledge of DevOps best practices would be advantageous. Certifications in Azure and Kubernetes (CKA) are preferred. As a part of Binmile, you will have the opportunity to work closely with the Leadership team, enjoy Health Insurance benefits, and have a flexible working structure. Binmile is an Equal Employment Opportunity Employer, promoting diversity and aiming to build a team that encompasses various backgrounds, perspectives, and skills.,

Posted 2 weeks ago

Apply

0.0 years

0 Lacs

pune, maharashtra, india

On-site

Job description Some careers shine brighter than others. If you're looking for a career that will help you stand out, join HSBC and fulfil your potential. Whether you want a career that could take you to the top, or simply take you in an exciting new direction, HSBC offers opportunities, support and rewards that will take you further. HSBC is one of the largest banking and financial services organisations in the world, with operations in 64 countries and territories. We aim to be where the growth is, enabling businesses to thrive and economies to prosper, and, ultimately, helping people to fulfil their hopes and realise their ambitions. We are currently seeking an experienced professional to join our team in the role of a Senior Consultant Specialist In this role, you will: Be a Hands on Full stack Java developer role with primary responsibility for re-platforming master data management and workflow application user interface, java micro services and driving User Interface strategy for the team. Ability to work on Backend Java REST application if/when needed. The UI developer assumes accountability for the technical design & implementation of all IT solutions in response to scheduled stakeholder requirements and un-scheduled changes, such as service operations problem workaround and permanent fixes and defects encountered during the testing of change delivery. Solution design ensures compliance to both architectural and IT principals and standards of change delivery and ultimately, the stability and availability of the production environment. The UI developer gathers data and information to support the decision-making process of UI design implementation, presenting the optionality of solution to the Pod Lead, Product Owner and Stakeholder, upon which the decision is made to commence technical delivery. In Data Operations IT (DataOps IT), Future State Architecture (aka FSA) programme is being executed. This requirement is for FSA application with in DataOps IT. FSA is IT-re-platform initiative. Under this program proposed set of technology patterns will be implemented for multiple applications. Which will also help reduce/eliminate dependencies on vendor platforms. FSA team is working on multiple new and industry pioneer initiatives giving good exposure to the available talent and various new learning opportunities including Micro Frontend, React JS, Micro Services, Cloud, CICD etc. Responsible for Micro front-end, end to end delivery for master data management and workflow application User Interface Creation and execution of functional / non-functional test decks to quality check implementation adherence to business signed-off UI solution design. Bing as a part of Agile POD, proving production supporting to application on rota basis Build and maintain positive and objective relationships with IT stakeholders, business operations teams, stakeholder programmes of change, MMO Change, Product Owners, pod members and the wider FMO IT community. Requirements To be successful in this role, you should meet the following requirements: Primary skills: Java, Spring Boot, React JS, CSS, Java Script, Micro Front End, Rest Services, JSON, User Interface performance tuning, AG Grid component Familiar with: REST API, Cloud technologies (GCP, AWS, Azure), Application Security, Deployment CI/CD: Jenkins, Ansible, PostGreSQL Experience of project delivery using Agile or Iterative methodologies and frameworks DevOps Good understanding of the Financial Services industry and building security technology Good to have skills: Selenium and Serenity, Activiti (workflow engine), Splunk, Observability Tools (Grafana, Loki etc) Experience of project delivery using Agile or Iterative methodologies and frameworks DevOps Good understanding of the Financial Services industry and building security technology You'll achieve more when you join HSBC. www.hsbc.com/careers HSBC is committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and opportunities to grow within an inclusive and diverse environment. Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement, which is available on our website. Issued by - HSBC Software Development India

Posted 2 weeks ago

Apply
Page 1 of 3
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies