Jobs
Interviews

479 Opentelemetry Jobs - Page 2

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 years

0 Lacs

Delhi, India

Remote

Position Title: Infrastructure Solution Architect Position Type: Regular - Full-Time Position Location: New Delhi Requisition ID: 32004 Job Purpose As a Cloud Infrastructure Solution Architect, you'll drive the success of our IT Architecture program through your design expertise and consultative approach. You'll collaborate with stakeholders to understand their technical requirements, designing and documenting tailored solutions. Your blend of architecture and operations experience will enable you to accurately size work efforts and determine the necessary skills and resources for projects. Strong communication, time management, and process skills are essential for success in this role. You should have deep experience in defining Infrastructure solutions: Design, Architecture and Solution Building blocks. Role Overview The cloud infrastructure architect role helps teams (such as product teams, platform teams and application teams) successfully adopt cloud infrastructure and platform services. It is heavily involved in design and implementation activities that result in new or improved cloud-related capabilities, and it brings skills and expertise to such areas as cloud technical architecture (for a workload’s use of infrastructure as a service [IaaS] and platform as a service [PaaS] components); automating cloud management tasks, provisioning and configuration management; and other aspects involved in preparing and optimizing cloud solutions. Successful outcomes are likely to embrace infrastructure-as-code (IaC), DevOps and Agile ways of working and associated automation approaches, all underpinned by the cloud infrastructure engineer’s solid understanding of networking and security in the cloud. The nature of the work involved means that the cloud infrastructure engineer will directly engage with customer teams, but will also work on cloud infrastructure platform capabilities that span multiple teams. The cloud infrastructure architect collaborates closely with other architects, product/platform teams, software developers, Cloud Engineers, site reliability engineers (SREs), security, and network specialists, as well as other roles, particularly those in the infrastructure and operations. Being an approachable team-player is therefore crucial for success, and willingness to lead initiatives is important too. The cloud infrastructure engineer also supports colleagues with complex (escalated) operational concerns in areas such as deployment activities, event management, incident and problem management, availability, capacity and service-level management, as well as service continuity. The cloud infrastructure architect is expected to demonstrate strong attention to detail and a customer-centric mindset. Inquisitiveness, determination, creativity, communicative and collaboration skills are important qualities too. Key Responsibilities Provide expert knowledge on cloud infrastructure and platforms solutions architecture, to ensure our organization achieves its goals for cloud adoption. This involves translating cloud strategy and architecture into efficient, resilient, and secure technical implementations. Define cloud infrastructure landing zones, regional subscriptions, Availability Zone, to ensure HA, resiliency and reliability of Infrastructure and applciations Offer cloud-engineering thought leadership in areas to define specific cloud use cases, cloud service providers, and/or strategic tools and technologies Support cloud strategy working on new cloud solutions including analysing requirements, supporting technical architecture activities, prototyping, design and development of infrastructure artifacts, testing, implementation, and the preparation for ongoing support. Work on cloud migration projects, including analyzing requirements and backlogs, identifying migration techniques, developing migration artifacts, executing processes, and ensuring preparations for ongoing support. Design, build, deliver, maintain and improve infrastructure solutions. This includes automation strategies such as IaC, configuration-as-code, policy-as-code, release orchestration and continuous integration/continuous delivery (CI/CD) pipelines, and collaborative ways of working (e.g., DevOps). Participate in change and release management processes, carrying out complex provisioning and configuration tasks manually, where needed. Research and prototype new tools and technologies to enhance cloud platform capabilities. Proactively identify innovative ways to reduce toil, and teach, coach or mentor others to improve cloud outcomes using automation. Improve reliability, scalability and efficiency by working with product engineers and site reliability engineers to ensure well-architected and thoughtfully operationalized cloud infrastructures. This includes assisting with nonfunctional requirements, such as data protection, high availability, disaster recovery, monitoring requirements and efficiency considerations in different environments. Provide subject matter expertise for all approved IaaS and PaaS services, respond promptly to escalated incidents and requests, and build reusable artifacts ready for deployment to cloud environments. Exert influence that lifts cloud engineering competency by participating in (and, where applicable, leading) organizational learning practices, such as communities of practice, dojos, hackathons and centers of excellence (COEs). Actively participate in mentoring. Practice continuous improvement and knowledge sharing (e.g., providing KB articles, training and white papers). Participate in planning and optimization activities, including capacity, reliability, cost management and performance engineering. Establish FinOps Practices — Cloud Cost management, Scale up/down, Environment creation/deletion based on consumption Work closely with security specialists to design, implement and test security controls, and ensure engineering activities align to security configuration guidance. Establish logging, monitoring and observability solutions, including identification of requirements, design, implementation and operationalization. Optimize infrastructure integration in all scenarios — single cloud, multicloud and hybrid. Convey the pros and cons of cloud services and other cloud engineering topics to others at differing levels of cloud maturity and experience, and in different roles (e.g., developers and business technologists). Be forthcoming and open when the cloud is not the best solution. Work closely with third-party suppliers, both as an individual contributor and as a project lead, when required. Engage with vendor technical support as the customer lead role when appropriate. Participate/Lead problem management activities, including post-mortem incident analysis, providing technical insight, documented findings, outcomes and recommendations as part of a root cause analysis. Support resilience activities — e.g., disaster recovery (DR) testing, performance testing and tabletop planning exercises. The role holder is also expected to: Ensure that activities are tracked and auditable by leveraging service enablement systems, logging activity in the relevant systems of record, and following change and release processes. Collaborate with peers from other teams, such as security, compliance, enterprise architecture, service governance, and IT finance to implement technical controls to support governance, as necessary. Work in accordance with the organization’s published standards and ensure that services are delivered in compliance with policy. Promptly respond to requests for engineering assistance from technical customers as needed. Provide engineering support, present ideas and create best-practice guidance materials. Strive to meet service-level expectations. Foster ongoing, closer and repeatable engagement with customers to achieve better, scalable outcomes. Take ownership of personal development, working with line management to identify development opportunities. Work with limited guidance, independently and/or as part of a team on complex problems, potentially requiring close collaboration with remotely based employees and third-party providers. Follow standard operating procedures, propose improvements and develop new standard operating procedures to further industrialize our approach. Advocate for simplification and workflow optimization, and follow documentation standards. Skills And Experience Skills and Experience in the following activities/working styles is essential: Collaboration with developers (and other roles, such as SREs and DevSecOps Engineers) to plan, design, implement, operationalize and problem solve workloads that leverage cloud infrastructure and platform services. Working in an infrastructure or application support team. Cloud migration project experience. [Data center to Cloud IAAS, Cloud Native, Hybrid Cloud] Securing cloud platforms and cloud workloads in collaboration with security teams. Familiarity or experience with DevOps/DevSecOps. Agile practices (such as Scrum/Sprints, Customer Journey Mapping, Kanban). Proposing new standards, addressing peer feedback and advocating for improvement. Understanding of software engineering principles (source control, versioning, code reviews, etc.) Working in an environment that complies with Health and, Manufacturing Event-based architectures and associated infrastructure patterns Experience working with specific technical teams: [R&D teams, Data and analytics teams, etc.] Experience where immutable infrastructure approaches have been used Implementing highly available systems, using multi-AZ and multi region approaches Skills And Experience In The Following Technology Areas Experience with Azure, GCP, AWS, SAP cloud provider services (Azure and SAP preferred) Experience with these cloud provider services is preferred: Infra, Data, App, API and Integration Services DevOps-tooling such as CI/CD (e.g., Jenkins, Jira, Confluence, Azure DevOps/ADO, TeamCity, GitHub, GitLab) Infrastructure-as-code approaches, role-specific automation tools and associated programming languages (e.g., Ansible, ARM, Chef, Cloud Formation, Pulumi, Puppet, Terraform, Salt, AWS CDK, Azure SDK) Orchestration Tools (e.g., Morpheus Data, env0, Cloudify, Pliant, Quali, RackN, VRA, Crossplane, ArgoCD) Knowledge of software development frameworks/Languages; [e.g., Spring, Java, GOlang, PHP, Python] Container management (e.g., Docker, Rancher, Kubernetes, AKS, EKS, GKE, RHOS, VMware Tanzu) Virtualization platforms (e.g., VMware, Hyper-V) Operating systems (e.g., Windows and Linux including scripting experience) Database technologies and caching (e.g., Postgres, MSSQL, NoSQL, Redis, CDN) Identity and access management (e.g., Active Directory/Azure AD, Group Policy, SSO, cloud RBAC and hierarchy and federation) Monitoring tools (e.g., AWS CloudWatch, Elastic Stack (Elastic Search/Logstash/Kibana), Datadog, LogicMonitor, Splunk) Cloud networking (e.g., Subnetting, Route Tables, Security Groups, VPC, VPC Peering, NACLS, VPN, Transit Gateways, optimizing for egress costs) Cloud security (e.g., key management services, encryption, other core security services/controls the organization uses) Landing Zone Automation solutions (e.g., AWS Control tower) Policy guardrails (e.g., policy-as-code approaches, cloud provider native policy tools, Hashicorp Sentinel, Open Policy Agent) Scalable architectures, including APIs, microservices and PaaS. Analyzing cloud spending and optimizing resources (e.g., Apptio Cloudability, Flexera One, IBM Turbonomic, Netapp Spot, VMware CloudHealth) Implementing resilience (e.g., multi-AZ, multi-region, backup and recovery tools) Cloud provider frameworks (e.g., Well-Architected) Working with architecture tools and associated artifacts General skills, behaviors, competencies and experience required includes: Strong communication skills (both written and verbal), including the ability to adapt style to a nontechnical audience Ability to stay calm and focused under pressure Collaborative working Proactive and detail-oriented, strong analytical skills, and the ability to leverage a data-driven approach Willing to share expertise and best practices, including mentoring and coaching others Continuous learning mindset, keen to learn and explore new areas — not afraid of starting from a novice level Ability to present solutions, defend criticism of ideas, and provide constructive peer reviews Ability to build consensus, make decisions based on many variables and gain support for initiatives Business acumen, preferably industry and domain-specific knowledge relevant to the enterprise and its business units Deep understanding of current and emerging I&O, and, in particular, cloud, technologies and practices Achieve compliance requirements by applying technical capabilities, processes and procedures as required Job Requirements Education and Qualifications Essential Bachelor’s or master's degree in computer science, information systems, a related field, or equivalent work experience Ten or more years of related experience in similar roles Must have worked on implementing cloud at enterprise scale Desirable Cloud provider/Hyperscalers certifications preferred. Must Have Skills and Experience Strong problem solving and analytical skills. Strong interpersonal and written and verbal communication skills. Highly adaptable to changing circumstances. Interest in continuously learning new skills and technologies. Experience with programming and scripting languages (e.g. Java, C#, C++, Python, Bash, PowerShell). Experience with incident and response management. Experience with Agile and DevOps development methodologies. Experience with container technologies and supporting tools (e.g. Docker Swarm, Podman, Kubernetes, Mesos). Experience with working in cloud ecosystems (Microsoft Azure AWS, Google Cloud Platform,). Experience with monitoring and observability tools (e.g. Splunk, Cloudwatch, AppDynamics, NewRelic, ELK, Prometheus, OpenTelemetry). Experience with configuration management systems (e.g. Puppet, Ansible, Chef, Salt, Terraform). Experience working with continuous integration/continuous deployment tools (e.g. Git, Teamcity, Jenkin, Artifactory). Experience in GitOps based automation is Plus Qualifications Bachelor’s degree (or equivalent years of experience). 5+ years of relevant work experience. SRE experience preferred. Background in Manufacturing, Platform/Tech compnies is preferred. Must have Public Cloud provider certifications (Azure, GCP or AWS) Having CNCF certification is plus Started sharing status update to Function Owner and CC to Hiring Manager twice a week Approaching Hiring Manager for the status keeping in CC, McCain's HR Head and TA Head Started interacting with Hiring Managers on MS Teams every alternate days McCain Foods is an equal opportunity employer. We see value in ensuring we have a diverse, antiracist, inclusive, merit-based, and equitable workplace. As a global family-owned company we are proud to reflect the diverse communities around the world in which we live and work. We recognize that diversity drives our creativity, resilience, and success and makes our business stronger. McCain is an accessible employer. If you require an accommodation throughout the recruitment process (including alternate formats of materials or accessible meeting rooms), please let us know and we will work with you to meet your needs. Your privacy is important to us. By submitting personal data or information to us, you agree this will be handled in accordance with the Global Employee Privacy Policy Job Family: Information Technology Division: Global Digital Technology Department: Infrastructure Architecture Location(s): IN - India : Haryana : Gurgaon Company: McCain Foods(India) P Ltd

Posted 4 days ago

Apply

0 years

4 - 6 Lacs

Hyderābād

On-site

Job Summary We are looking for a highly skilled and adaptable Site Reliability Engineer to become a key member of our Cloud Engineering team. In this crucial role, you will be instrumental in designing and refining our cloud infrastructure with a strong focus on reliability, security, and scalability . As an SRE, you'll apply software engineering principles to solve operational challenges, ensuring the overall operational resilience and continuous stability of our systems. This position requires a blend of managing live production environments and contributing to engineering efforts such as automation and system improvements. Key Responsibilities: Cloud Infrastructure Architecture and Management: Design, build, and maintain resilient cloud infrastructure solutions to support the development and deployment of scalable and reliable applications. This includes managing and optimizing cloud platforms for high availability, performance, and cost efficiency. Enhancing Service Reliability: Lead reliability best practices by establishing and managing monitoring and alerting systems to proactively detect and respond to anomalies and performance issues. Utilize SLI, SLO, and SLA concepts to measure and improve reliability. Identify and resolve potential bottlenecks and areas for enhancement. Driving Automation and Efficiency: Contribute to the automation, provisioning, and standardization of infrastructure resources and system configurations. Identify and implement automation for repetitive tasks to significantly reduce operational overhead. Develop Standard Operating Procedures (SOPs) and automate workflows using tools like Rundeck or Jenkins. Incident Response and Resolution: Participate in and help resolve major incidents, conduct thorough root cause analyses, and implement permanent solutions. Effectively manage incidents within the production environment using a systematic problem-solving approach. Collaboration and Innovation: Work closely with diverse stakeholders and cross-functional teams, including software engineers, to integrate cloud solutions, gather requirements, and execute Proof of Concepts (POCs). Foster strong collaboration and communication. Guide designs and processes with a focus on resilience and minimizing manual effort. Promote the adoption of common tooling and components, and implement software and tools to enhance resilience and automate operations. Be open to adopting new tools and approaches as needed. Required Skills and Experience: Cloud Platforms: Demonstrated expertise in at least one major cloud platform (AWS, Azure, or GCP). Infrastructure Management: Proven proficiency in on-premises hosting and virtualization platforms (VMware, Hyper-V, or KVM). Solid understanding of storage internals (NAS, SAN, EFS, NFS) and protocols (FTP, SFTP, SMTP, NTP, DNS, DHCP). Experience with networking and firewall technologies. Strong hands-on experience with Linux internals and operating systems (RHEL, CentOS, Rocky Linux). Experience with Windows operating systems to support varied environments. Extensive experience with containerization (Docker) and orchestration (Kubernetes) technologies. Automation & IaC: Proficiency in scripting languages (shell and Python). Experience with configuration management tools (Ansible or Puppet). Must have exposure to Infrastructure as Code (IaC) tools (Terraform or CloudFormation). Monitoring & Observability: Experience setting up and configuring monitoring tools (Prometheus, Grafana, or the ELK stack). Hands-on experience implementing OpenTelemetry for observability. Familiarity with monitoring and logging tools for cloud-based applications. Service Reliability Concepts: A strong understanding of SLI, SLO, SLA, and error budgeting. Soft Skills & Mindset: Excellent communication and interpersonal skills for effective teamwork. We value proactive individuals who are eager to learn and adapt in a dynamic environment. Must possess a pragmatic and adaptable mindset, with a willingness to step outside comfort zones and acquire new skills. Ability to consider the broader system impact of your work. Must be a change advocate for reliability initiatives. Desired/Bonus Skills: Experience with DevOps toolchain elements like Git, Jenkins, Rundeck, ArgoCD, or Crossplane. Experience with database management, particularly MySQL and Hadoop. Knowledge of cloud cost management and optimization strategies. Understanding of cloud security best practices, including data encryption, access controls, and identity management. Experience implementing disaster recovery and business continuity plans. Familiarity with ITIL (Information Technology Infrastructure Library) processes

Posted 5 days ago

Apply

2.0 years

4 - 8 Lacs

Bengaluru

On-site

Company Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network, enabling individuals, businesses, and economies to thrive while driven by a common purpose – to uplift everyone, everywhere by being the best way to pay and be paid. Make an impact with a purpose-driven industry leader. Join us today and experience Life at Visa. Job Description We are seeking a motivated Site Reliability Engineer (SRE) to join our Observability team. In this role, you will support the team in maintaining and improving the reliability, security, and performance of our systems. You will learn from experienced engineers while gaining hands-on experience with modern monitoring, logging, and automation tools. As an SRE I, you will assist in day-to-day operational tasks, help monitor system health, and participate in basic troubleshooting. You will also contribute to the maintenance of documentation and develop your technical skills through training and on-the-job experience. This is a hybrid position, requiring 2–3 days per week in the office, as determined by leadership. Responsibilities Assist in maintaining system security by applying hotfixes and operating system patches under guidance to protect against cybersecurity threats. Support the deployment and configuration of monitoring and logging tools. Help automate routine operational tasks to improve efficiency and support system integration. Assist with the maintenance and basic management of observability tools such as Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch. Work with team members to help implement and maintain monitoring solutions in development, staging, and production environments. Learn and apply DevOps and SRE best practices as directed by senior engineers. Contribute to the setup and maintenance of CI CD pipelines to support automated build, test, and deployment processes. Provide support in managing cloud infrastructure (AWS, GCP) to help ensure availability and security. Learn to use infrastructure as code tools such as Terraform, Ansible, or CloudFormation to support environment configuration. Monitor system performance and assist in identifying and escalating issues for resolution. Support the implementation and management of containerization technologies like Docker and Kubernetes. Participate in basic troubleshooting and assist with root cause analysis for production incidents. Help create and update documentation for infrastructure, processes, and operational procedures. Provide first-level support for routine infrastructure and deployment issues, escalating complex problems as needed. Look for opportunities to automate repetitive tasks and suggest improvements to workflows. Justification Visa’s Observability ecosystem includes over 2,000 platform nodes, utilizing approximately 15 different tools for logging, monitoring, and tracing, alongside 80,000 client agents. The system handles daily log ingestion exceeding 100TB and oversees hundreds of critical applications, supporting vital alerts, dashboards, and reports. To maintain this high level of performance and reliability, we need a Site Reliability Engineer (SRE) with comprehensive knowledge and practical experience. This position requires an I4-level engineer who can operate independently with minimal supervision. About Visa’s PRE Observability Team Visa’s Product Reliability Engineering (PRE) Observability team partners with Product Development as well as Operations & Infrastructure teams to build and manage innovative, reliable, scalable, secure, and cost-effective observability platform solutions. We are looking for talented Senior Site Reliability Engineers to join our driven team, with a focus on maximizing system availability, performance, security, and reliability. This dynamic role requires technical leadership, strong problem-solving skills, and expertise in coding, testing, and debugging. This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Qualifications Basic Qualifications: Bachelor’s degree with at least 2 years of relevant work experience, OR Advanced degree (e.g., Master’s, MBA, JD, MD) with no required work experience, OR 5+ years of relevant professional experience. Preferred Qualifications: Academic, internship, or hands-on experience with at least one observability tool (e.g., Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, or CloudWatch). Familiarity with setting up or configuring exporters (such as Node exporter or Cert exporter) for collecting metrics. Exposure to containerization technologies such as Docker or Kubernetes, either through coursework, projects, or internships. Basic understanding or experience with CI CD tools and pipelines (e.g., GitHub Actions, Jenkins, or Ansible). Introductory knowledge of Infrastructure as Code concepts and tools like Terraform or Ansible. Awareness of query languages such as PromQL, SQL, or Splunk SPL. Experience using Linux or Unix environments and basic scripting skills in Python and or Shell. Interest in cloud platforms such as AWS or GCP,cloud certifications are a plus. Strong problem-solving and analytical skills, with a willingness to learn and grow in a collaborative environment. Effective verbal and written communication skills. Ability to work well in a team and take initiative in learning new technologies and practices. Additional Information Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Posted 5 days ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Job Summary We are looking for a highly skilled and adaptable Site Reliability Engineer to become a key member of our Cloud Engineering team. In this crucial role, you will be instrumental in designing and refining our cloud infrastructure with a strong focus on reliability, security, and scalability . As an SRE, you'll apply software engineering principles to solve operational challenges, ensuring the overall operational resilience and continuous stability of our systems. This position requires a blend of managing live production environments and contributing to engineering efforts such as automation and system improvements. Key Responsibilities: Cloud Infrastructure Architecture and Management: Design, build, and maintain resilient cloud infrastructure solutions to support the development and deployment of scalable and reliable applications. This includes managing and optimizing cloud platforms for high availability, performance, and cost efficiency. Enhancing Service Reliability: Lead reliability best practices by establishing and managing monitoring and alerting systems to proactively detect and respond to anomalies and performance issues. Utilize SLI, SLO, and SLA concepts to measure and improve reliability. Identify and resolve potential bottlenecks and areas for enhancement. Driving Automation and Efficiency: Contribute to the automation, provisioning, and standardization of infrastructure resources and system configurations. Identify and implement automation for repetitive tasks to significantly reduce operational overhead. Develop Standard Operating Procedures (SOPs) and automate workflows using tools like Rundeck or Jenkins. Incident Response and Resolution: Participate in and help resolve major incidents, conduct thorough root cause analyses, and implement permanent solutions. Effectively manage incidents within the production environment using a systematic problem-solving approach. Collaboration and Innovation: Work closely with diverse stakeholders and cross-functional teams, including software engineers, to integrate cloud solutions, gather requirements, and execute Proof of Concepts (POCs). Foster strong collaboration and communication. Guide designs and processes with a focus on resilience and minimizing manual effort. Promote the adoption of common tooling and components, and implement software and tools to enhance resilience and automate operations. Be open to adopting new tools and approaches as needed. Required Skills and Experience: Cloud Platforms: Demonstrated expertise in at least one major cloud platform (AWS, Azure, or GCP). Infrastructure Management: Proven proficiency in on-premises hosting and virtualization platforms (VMware, Hyper-V, or KVM). Solid understanding of storage internals (NAS, SAN, EFS, NFS) and protocols (FTP, SFTP, SMTP, NTP, DNS, DHCP). Experience with networking and firewall technologies. Strong hands-on experience with Linux internals and operating systems (RHEL, CentOS, Rocky Linux). Experience with Windows operating systems to support varied environments. Extensive experience with containerization (Docker) and orchestration (Kubernetes) technologies. Automation & IaC: Proficiency in scripting languages (shell and Python). Experience with configuration management tools (Ansible or Puppet). Must have exposure to Infrastructure as Code (IaC) tools (Terraform or CloudFormation). Monitoring & Observability: Experience setting up and configuring monitoring tools (Prometheus, Grafana, or the ELK stack). Hands-on experience implementing OpenTelemetry for observability. Familiarity with monitoring and logging tools for cloud-based applications. Service Reliability Concepts: A strong understanding of SLI, SLO, SLA, and error budgeting. Soft Skills & Mindset: Excellent communication and interpersonal skills for effective teamwork. We value proactive individuals who are eager to learn and adapt in a dynamic environment. Must possess a pragmatic and adaptable mindset, with a willingness to step outside comfort zones and acquire new skills. Ability to consider the broader system impact of your work. Must be a change advocate for reliability initiatives. Desired/Bonus Skills: Experience with DevOps toolchain elements like Git, Jenkins, Rundeck, ArgoCD, or Crossplane. Experience with database management, particularly MySQL and Hadoop. Knowledge of cloud cost management and optimization strategies. Understanding of cloud security best practices, including data encryption, access controls, and identity management. Experience implementing disaster recovery and business continuity plans. Familiarity with ITIL (Information Technology Infrastructure Library) processes

Posted 5 days ago

Apply

5.0 years

0 Lacs

India

Remote

Job Title: Business Intelligence Location: Remote Note: Candidate should be comfortable to work for US Shifts/Night Shifts Interview Mode: Virtual (Two rounds of interviews (60 min technical + 30 min technical & cultural discussion) Client: Turing Experience: 5+ yrs Job Type : Contract to hire. Notice Period:- Immediate joiners. Roles and Responsibilities: We are seeking a highly skilled Business Intelligence (BI) Analyst to join our team. This individual will play a critical role in supporting our Network Operations Center (NOC) by identifying and analyzing use cases, monitoring dashboards, and closing gaps in efficiency and usage. The ideal candidate will bring a strong analytical mindset, proficiency in SQL, and a background in Management Information Systems (MIS) or a related field. Key Responsibilities: Design, build, and optimize intuitive and actionable dashboards using New Relic One (NRQL, NerdGraph, custom visualizations). Integrate New Relic with microservices, APIs, databases, message queues, and infrastructure components across cloud environments (AWS, Azure, or GCP). Create robust and scalable monitoring solutions by defining service-level indicators (SLIs), service-level objectives (SLOs), and setting up intelligent alerting policies. Collaborate with SREs, DevOps, and Application teams to identify telemetry gaps and ensure comprehensive observability coverage (APM, Infra, Logs, Browser, Synthetics). Develop custom NRQL queries and leverage New Relic's programmable platform for tailored observability use cases. Create documentation and knowledge articles for dashboards, alerts, and instrumentation procedures. Required Skills & Experience: 4–5 years of experience in monitoring and observability using New Relic. Proficient with New Relic Query Language (NRQL), dashboard widgets, custom events, and metric visualizations. Experience in integrating New Relic with Java/.NET/Python/Node.js based applications. Familiarity with cloud platforms (AWS, Azure, or GCP) and related telemetry ingestion setups. Strong understanding of telemetry concepts (logs, metrics, traces) and observability best practices. Hands-on experience with alerting strategies (incident routing, thresholds, dynamic baselines). Basic scripting or automation skills (e.g., Python, Shell, or Terraform for observability-as-code). Nice to Have: Experience with OpenTelemetry or custom instrumentation libraries. Exposure to other observability platforms (e.g., Datadog, Grafana, Prometheus, etc.). CI/CD pipeline integration for observability tooling. Performance benchmarking and capacity planning exposure.

Posted 5 days ago

Apply

0 years

1 - 5 Lacs

Hyderābād

On-site

Job description Some careers shine brighter than others. If you’re looking for a career that will help you stand out, join HSBC and fulfil your potential. Whether you want a career that could take you to the top, or simply take you in an exciting new direction, HSBC offers opportunities, support and rewards that will take you further. HSBC is one of the largest banking and financial services organisations in the world, with operations in 64 countries and territories. We aim to be where the growth is, enabling businesses to thrive and economies to prosper, and, ultimately, helping people to fulfil their hopes and realise their ambitions. We are currently seeking an experienced professional to join our team in the role of Storage & Data Protection Services – Architect Key Responsibilities: You will be a technical lead aligned to our Storage & Data Protection Services infrastructure team. You will be accountable to work with the portfolio and platform architects, technical leads and engineers, to define detailed solution designs (incl. story decomposition) for stories in backlog. You will be a part of a highly skilled, self-organising team whilst building forward-thinking solutions and creating new capabilities to support multiple, cross-functional teams. We are continuously looking to further improve our technology stack, data quality and reliability, and your vision and ambition will contribute to shaping our solutions toward data-driven decisions across the business. The ideal candidate is self-directed, comfortable with challenging and leading on best practice, and able to adapt to regularly shifting business requirements and occasional ambiguity. This is a fast-paced hands-on role and would be well-suited to someone who loves clean design, clean architecture and using the latest tools and technology to tackle constantly evolving business and tech challenges. This role will carry out some or all of the following responsibilities: - Accountable for ensuring the products & services are supported by the right architectures and solutions meet the customer needs. Accountable for ensuring the design of the product solutions are cost effective and maintained through the agile development lifecycle, managing the flow of the backlog of design activities. Working with ops engineers to ensure operational issues (performance, operator intervention, alerting, design defect related issues, etc) are resolved and that any design related issues are addressed in a timely manner. Convert requirements into actionable product/service requirements that feed technology solutions development & influence service direction. Responsible for ensuring solutions are aligned with the platform architecture and roadmap, group standards and policies and the overall enterprise architecture for their function. We are looking for a candidate with experience in leading/developing solutions designs and architectural blueprints for Storage & Data Protection infrastructure Requirements To be successful in this role, you should meet the following requirements: Proven track record within the disciplines of solutions design and architecture. A deep technical understanding of Block, File, Object and Data Protection technologies Designs and implements FILE (NAS) storage solutions for customers, focusing on performance, scalability, and cost-effectiveness. Develops and maintains reference architectures and best practices. Leads technical workshops and proof-of-concepts. Prioritize customer needs and deliver solutions that meet their requirements Understanding of the following Block technologies desirable, Dell, Pure, HPE Understanding of the following File technologies desirable, Netapp FAS/AFF, VAST Understanding of the following Object technologies desirable, Dell, Netapp, VAST Understanding of the following Data Protection technologies desirable, Veritas, CommVault. Knowledge and/or experience developing observability solutions. Knowledge and/or experience with OpenTelemetry, Kafka and/or other streaming technologies. Knowledge and/or experience with observability tooling e.g. Splunk & Grafana is beneficial. Knowledge of Scrum, Kanban or other agile frameworks. Work with Agile methodology, representing the pod and area lead in standups and problem-solving meetings. Experience working in relevant market/context, i.e. IT in finance, is desirable. Able to collaborate and effectively pair with other engineers/architects. Strong analytical skills and problem-solving skills Self-awareness with confidence to work independently and take responsibility for own development. Excellent written and spoken communication skills; an ability to communicate with impact, ensuring complex information is articulated in a meaningful way to wide and varied audiences. Willingness to undertake the training / study required in this role for new products and services. You’ll achieve more when you join HSBC. www.hsbc.com/careers HSBC is committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and opportunities to grow within an inclusive and diverse environment. Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement, which is available on our website. Issued by – HSBC Software Development India

Posted 1 week ago

Apply

8.0 years

0 Lacs

Pune, Maharashtra, India

Remote

Senior Software Engineer – Back end and Inferencing– Technology (Maersk) This position will be based in India – Bangalore/Pune A.P. Moller - Maersk A.P. Moller – Maersk is the global leader in container shipping services. The business operates in 130 countries and employs 80,000 staff. An integrated container logistics company, Maersk aims to connect and simplify its customers’ supply chains. Today, we have more than 180 nationalities represented in our workforce across 131 Countries and this mean, we have elevated level of responsibility to continue to build inclusive workforce that is truly representative of our customers and their customers and our vendor partners too. The Brief We are seeking a Senior Software Engineer with deep backend expertise to lead the development of scalable infrastructure for LLM inferencing, Model Context Protocol (MCP) integration, Agent-to-Agent (A2A) communication, prompt engineering, and robust API platforms. This role sits at the core of our AI systems stack — enabling structured, contextual, and intelligent communication between models, agents, and services. You'll design modular backend services that interface seamlessly with inferencing engines, orchestrate model contexts, and expose capabilities via APIs for downstream products and agents. What I'll be doing – your accountabilities? Architect and implement backend services that support dynamic model context management via MCP for LLM-based systems. Build scalable and token-efficient inference pipelines with support for streaming, context merging, memory, and retrieval. Enable Agent-to-Agent (A2A) messaging and task coordination through contextual protocols, message contracts, and execution chains. Design and maintain developer-friendly, secure, and versioned APIs for agents, tools, memory, context providers, and prompt libraries. Lead efforts in prompt engineering workflows including templating, contextual overrides, and programmatic prompt generation. Collaborate across engineering, ML, and product teams to define and implement context-aware agent systems and inter-agent communication standards to enable closed-loop enterprise AI Services ready for consumption by the enterprise. Own end-to-end delivery of infrastructure, inferencing, back-end, API and communication management in multi-agent system. Ensure models are modular, extensible, and easily integrated with external services/platforms (e.g., dashboards, analytics, AI agents). Foundational / Must Have Skills Bachelor’s, Master’s or Phd in Computer Science, Engineering, or related technical field. 8+ years of experience in backend systems design and development — ideally in AI/ML or data infrastructure domains. Strong proficiency in Python (FastAPI preferred); additional experience with Node.js, Go, or Rust is a plus. Experience with LLM inferencing pipelines, context windowing, and chaining prompts with memory/state persistence. Familiarity with or active experience implementing Model Context Protocol (MCP) or similar abstraction layers for context-driven model orchestration. Strong understanding of REST/GraphQL API design, OAuth2/JWT-based auth, and event-driven backend architectures. Practical knowledge of Redis, PostgreSQL, and one or more vector databases (e.g., Weaviate, Qdrant). Comfortable working with containerized applications, CI/CD pipelines, and cloud-native deployments (AWS/GCP/Azure). Preferred To Have Experience building or contributing to agent frameworks (e.g., LangGraph, CrewAI, AutoGen, Agno etc.). Background in multi-agent systems, dialogue orchestration, or synthetic workflows. Familiarity with OpenAI, Anthropic, HuggingFace, or open-weight model APIs and tool-calling protocols. Strong grasp of software security, observability (OpenTelemetry, Prometheus), and system performance optimization. Experience designing abstraction layers for LLM orchestration across different provider APIs (OpenAI, Claude, local inference). What You Can Expect Opportunity to lead backend architecture for cutting-edge, LLM-native systems. High-impact role in shaping the future of context-aware AI agent communication. Autonomy to drive backend standards, protocols, and platform capabilities across the org. Collaborative, remote-friendly culture with deep technical peers. As a performance-oriented company, we strive to always recruit the best person for the job – regardless of gender, age, nationality, sexual orientation or religious beliefs. We are proud of our diversity and see it as a genuine source of strength for building high-performing teams. Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements. We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing accommodationrequests@maersk.com.

Posted 1 week ago

Apply

5.0 years

0 Lacs

Ahmedabad, Gujarat, India

Remote

Employment Type: Full-Time Location: Remote Experience Required: 5+ Years About Techiebutler Techiebutler partners with startup founders and CTOs to deliver high-quality products quickly. We’re a focused team dedicated to execution, innovation, and solving real-world challenges with minimal bureaucracy. Role Overview We’re seeking a Senior Golang Backend Engineer to lead the design and development of scalable, high-performance backend systems. You’ll play a pivotal role in shaping our solutions, tech stack, driving technical excellence, and mentoring the team to deliver robust solutions. Key Responsibilities Design and develop scalable, high-performance backend services using Go Optimize systems for reliability, efficiency, and maintainability Establish technical standards for development, testing Mentor team members and conduct code reviews to enhance code quality Monitor and troubleshoot systems using tools like DataDog, Prometheus Collaborate with cross-functional teams on API design, integration, and architecture. What We’re Looking For Experience: 5+ years in backend development, with 3+ years in Go Cloud & Serverless: Proficient in AWS (Lambda, DynamoDB, SQS) Containerization: Hands-on experience with Docker and Kubernetes Microservices: Expertise in designing and maintaining microservices and distributed systems Concurrency: Strong understanding of concurrent programming and performance optimization Domain-Driven Design: Practical experience applying DDD principles Testing: Proficient in automated testing, TDD, and BDD CI/CD & DevOps: Familiarity with GitLab CI, GitHub Actions, or Jenkins Observability: Experience with ELK Stack, OpenTelemetry, or similar tools Collaboration: Excellent communication and teamwork skills in Agile/Scrum environments. Why Join Us? Work with cutting-edge technologies to shape our platform’s future Thrive in a collaborative, inclusive environment that values innovation Competitive salary and career growth opportunities Contribute to impactful projects in a fast-paced tech company. Apply Now If you’re passionate about building scalable systems and solving complex challenges, join our high-performing team! Apply today to be part of Techiebutler’s journey

Posted 1 week ago

Apply

5.0 years

0 Lacs

Sahibzada Ajit Singh Nagar, Punjab, India

On-site

Everything we do is for our customers! Featured on Deloitte's Technology Fast 500 list and G2's leaderboard, Maropost offers a unified commerce experience that our customers need, transforming ecommerce, retail, marketing automation, merchandising, helpdesk and AI operations with one platform designed to scale for fast-growing businesses. With a relentless focus on our customers’ success, we are motivated by customer obsession, extreme urgency, excellence and resourcefulness to to power 5,000+ global brands while we head to 100,000+. Driven by the same customer-centric mentality as above, we empower businesses to achieve their goals and grow alongside us. If you're a driver and not passenger and are ready to make a significant impact and be part of our transformative journey, Maropost is the place for you. The Opportunity Thrive on change and grow beyond limits! We are looking for a bold thinker who sees a chance to learn and define what's possible with every challenge! Ready to make an impact? Welcome to Maropost and you can turn ideas into action! What You'll Be Responsible For Building and managing REST API stack for Maropost Web Apps preferably using ASPNET framework. MUST have good debugging skillset. Good knowledge in .NET, Standard .NET, .NET Core Stacks. MUST have experience in at least 1 production level application using .NET Stack Good understanding of JavaScript internals. MUST have experience in at least 1 production level application using any JavaScript UI framework. MUST have Good SQL experience using SQL Server. MUST have effective communication Skill. Drive innovation within the engineering team, identifying opportunities to improve processes, tools, and technologies. Evaluating and improving the tools and frameworks used in software development. Reviewing the architecture and code written by other developers. What You'll Bring To Maropost B.E/B.Tech 5+ years of hands-on experience with building enterprise grade application Enthusiasm to learn building and managing API endpoints for multimodal clients. Enthusiasm to learn and contribute to a challenging & fun-filled startup. A knack for problem-solving and following efficient coding practices. Extraordinarily strong interpersonal communication and collaboration skills Hands-on experience with tech stacks – .Net / .NET Core / VB.net Hands-on experience in - any JavaScript UI framework (Preferably Angular) Hands-on experience in - any database (preferably using MS SQL Server) Hands-on experience in - any Code versioning platform like GitHub/BitBucket and CI/CD platform like Jenkins/AzureDevOps Frontend: HTML, CSS, JavaScript Familiarity in any of the following will be added advantage - Databases and caching: Redis, CosmosDB, DynamoDB Cloud services: Managing infrastructure with basic services from (GCP/AWS/Azure), such as VMs, API Gateway, and Load Balancers. Monitoring and observability tools: Prometheus, Grafana, Loki, OpenTelemetry. Network protocols and libraries: HTTP, WebSocket, Socket.io. Version control and CI/CD: Jenkins, Argo CD, Spinnaker, Terraform You exemplify Maropost’s Values: Customer Obsessed Extreme Urgency Excellence Resourceful Message from the Founders: Maropost is looking for builders - people who want to drive our business forward at all costs in order to achieve the goals we have both short and long term for the results and outcomes that that will bring to us all. If that isn't for you that’s ok, for those of you that it is please get in touch with us!

Posted 1 week ago

Apply

10.0 years

3 - 10 Lacs

Gurgaon

On-site

You Lead the Way. We’ve Got Your Back. With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you’ll learn and grow as we help you create a career journey that’s unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally. At American Express, you’ll be recognized for your contributions, leadership, and impact—every colleague has the opportunity to share in the company’s success. Together, we’ll win as a team, striving to uphold our company values and powerful backing promise to provide the world’s best customer experience every day. And we’ll do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong. Join Team Amex and let's lead the way together. About Enterprise Architecture: Enterprise Architecture is an organization within the Chief Technology Office at American Express and it is a key enabler of the company’s technology strategy. The four pillars of Enterprise Architecture include: 1. Architecture as Code : this pillar owns and operates foundational technologies that are leveraged by engineering teams across the enterprise. 2. Architecture as Design : this pillar includes the solution and technical design for transformation programs and business critical projects which need architectural guidance and support. 3. Governance : this pillar is responsible for defining technical standards, and developing innovative tools that automate controls to ensure compliance. 4. Colleague Enablement: this pillar is focused on colleague development, recognition, training, and enterprise outreach. What you will be working on: We are looking for a Senior Engineer to join our Enterprise Architecture team. In this role you will be designing and implementing highly scalable real-time systems following the best practices and using the cutting-edge technology. This role is best suited for experienced engineers with broad skillset who are open, curious and willing to learn. Qualifications : What you will Bring: Bachelor's degree in computer science, computer engineering or a related field, or equivalent experience 10+ years of progressive experience demonstrating strong architecture, programming and engineering skills. Firm grasp of data structures, algorithms with fluency in programming languages like Java, Kotlin, Go Demonstrated ability to lead, partner, and collaborate cross functionally across many engineering organizations Experience in building real-time large scale, high volume, distributed data pipelines on top of data buses (Kafka). Hands on experience with large scale distributed NoSQL databases like Elasticsearch Knowledge and/or experience with containerized environments, Kubernetes, docker. Knowledge and/or experience with any of the public cloud platforms like AWS, GCP. Experience in implementing and maintained highly scalable micro services in Rest, GRPC Experience in working infrastructure layers like service mesh, istio , envoy. Appetite for trying new things and building rapid POCs Preferred Qualifications: Knowledge of Observability concepts like Tracing, Metrics, Monitoring, Logging Knowledge of Prometheus Knowledge of OpenTelemetry / OpenTracing Knowledge of observability tools like Jaeger, Kibana, Grafana etc. Open-source community involvement We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.

Posted 1 week ago

Apply

4.0 years

0 Lacs

Vellore, Tamil Nadu, India

On-site

Wissen Technology is Hiring for Python Automation Engineer About Wissen Technology: At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time. Job Summary: We are seeking highly motivated Python Automation Engineers with experience in Telecom/Networking domains to develop and maintain automation solutions for legacy systems, network services, and cloud-based deployments. The roles involve building reusable Python modules, integrating with APIs and tools, and designing end-to-end automation workflows to enhance network operations and observability. Experience: 4-9 Years Location: Chennai Mode of Work: Full Time Key Responsibilities: Develop automation for legacy Telecom systems (e.g., NMS/EMS). Create robust, multi-threaded Python applications. Build reusable Python libraries for automation and RPA use cases. Design self-healing and alerting mechanisms using Python. Implement AI/ML-based enhancements for predictive fault detection or remediation. Develop event-driven automation based on real-time network telemetry. Build observability frameworks using Prometheus, Grafana, ELK, OpenTelemetry. Integrate with Telecom APIs across BSS, OSS, NMS, using protocols like REST, SNMP, Netconf. Design and deploy CNFs/VNFs using Docker and Kubernetes. Build automation solutions for cloud-based networking (AWS, Azure, GCP). Requirements: Strong Python programming skills. Experience with automation tools and frameworks (e.g., Selenium, PyAutoGUI). Familiarity with networking concepts and telecom operations. Hands-on with tools like Prometheus, ELK, Grafana, Jenkins, Git, Docker. Knowledge of protocols: REST, SNMP, Netconf. Experience with cloud platforms and Kubernetes (for senior role) Good To Have Skills: Bachelor’s/Master’s degree in Computer Science, Electronics, Telecom, or related field. AI/ML exposure for network automation use cases (optional but desirable). Telecom certifications or exposure to BSS/OSS systems. Wissen Sites: Website: www.wissen.com LinkedIn: https://www.linkedin.com/company/wissen-technology Wissen Leadership: https://www.wissen.com/company/leadership-team/ Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All Wissen Thought Leadership: https://www.wissen.com/articles/

Posted 1 week ago

Apply

6.0 years

0 Lacs

Pune, Maharashtra, India

On-site

About The Role We are looking for a highly skilled Software Engineer with strong AI/ML expertise and a foundational understanding of SRE principles to help transform reliability engineering through intelligent, automation-driven solutions. This role is not just about applying AI; it’s about applying engineering mindset and AI capabilities to reliability problems. You should be comfortable writing clean, maintainable code and have a understanding of SRE principles such as observability, incident response, and automation. By combining software skills with practical knowledge of operational challenges, you'll help eliminate toil, drive proactive reliability improvements, and embed intelligence into day-to-day engineering workflows. Your efforts will directly contribute to unifying reliability efforts across teams, enabling consistent engineering standards, and fostering a shared accountability model for service health. By driving operational discipline and aligning reliability goals with business priorities, you will help create a culture where platform stability, developer productivity, and customer experience go hand in hand. These contributions will play a vital role in supporting the organization's broader strategy—enabling faster innovation, scalable growth, and a resilient technology foundation aligned with long-term business outcomes. Key Responsibilities Support initiatives to enhance SRE capabilities using AI/ML, ensuring strong foundations in reliability engineering and operational excellence. Leverage AI and machine learning technologies to architect and implement solutions that advance the overall SRE agenda—improving reliability, automation, observability, and operational efficiency across complex systems. Contribute to incident management, change management, and release processes—bringing structure, automation, and intelligent insights to drive stability, safety, and velocity. Participate and Drive key SRE practices and routines—including initiation and facilitation of SRE Community of Practice (CoP), aligning SLAs/SLOs, launching error budget governance, and enabling data-driven process improvements across reliability areas. Partner effectively with SREs, platform engineers, and data teams to develop production-grade, measurable, and reliable models and tools. Develop and maintain internal frameworks and tooling to accelerate AI/ML adoption across reliability use cases. Partner , Understand and assist in driving Zero-Touch Operations by enabling platforms to detect, analyze, and resolve issues autonomously. Utilize metrics, logs, and historical incident data to build actionable insights and reliability dashboards. Actively participate in on-call rotations, improving incident response processes and escalation management. Integrate security best practices into workflows and collaborate with security teams to ensure platform stability. Contribute significantly to shaping the AI-in-SRE strategy and mentor junior team members. Required Skills & Qualifications 3–6 years of experience as a software engineer or platform engineer, with a strong focus on building production-grade systems, developer tooling, or intelligent automation. LLM-Native Development Approach- Proficiency in leveraging LLM-powered tools (e.g., for research, code generation, or automation). Demonstrated experience building AI-assisted workflows or custom automations that enhance engineering efficiency, reduce manual effort, or accelerate operational tasks. Proficient in Python, Go, or equivalent, with strong software engineering fundamentals—testing, version control, CI/CD, and clean code practices. Understanding of core SRE principles (SLIs/SLOs, incident response, error budgets), with the ability to partner with SREs to productionize reliability tooling. Hands-on experience with cloud platforms (AWS, GCP, Azure), containers/orchestration (Docker, Kubernetes), and infrastructure-as-code patterns. Familiarity with observability and telemetry systems—building or integrating with tools like Prometheus, OpenTelemetry, or Elastic stack. Comfortable working with Linux-based systems, debugging performance issues, and understanding systems-level behavior. Ability to translate operational pain points into intelligent, automated solutions using software, AI, and data-driven techniques. Preferred Qualifications. Advanced SRE Practice Exposure: Familiarity with operating in mature SRE environments—such as participating in production readiness reviews, chaos engineering exercises, Capacity planning, Error budget governance and operational health reviews etc. Exposure to building AI-assisted tools using LLMs, vector databases, or prompt engineering techniques to streamline engineering or operational workflows would be a big plus. Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements. We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing accommodationrequests@maersk.com.

Posted 1 week ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

On-site

About The Role We are looking for a highly skilled Software Engineer with strong AI/ML expertise and a foundational understanding of SRE principles to help transform reliability engineering through intelligent, automation-driven solutions. This role is not just about applying AI; it’s about applying engineering mindset and AI capabilities to reliability problems. You should be comfortable writing clean, maintainable code and have a understanding of SRE principles such as observability, incident response, and automation. By combining software skills with practical knowledge of operational challenges, you'll help eliminate toil, drive proactive reliability improvements, and embed intelligence into day-to-day engineering workflows. Your efforts will directly contribute to unifying reliability efforts across teams, enabling consistent engineering standards, and fostering a shared accountability model for service health. By driving operational discipline and aligning reliability goals with business priorities, you will help create a culture where platform stability, developer productivity, and customer experience go hand in hand. These contributions will play a vital role in supporting the organization's broader strategy—enabling faster innovation, scalable growth, and a resilient technology foundation aligned with long-term business outcomes. Key Responsibilities Support initiatives to enhance SRE capabilities using AI/ML, ensuring strong foundations in reliability engineering and operational excellence. Leverage AI and machine learning technologies to architect and implement solutions that advance the overall SRE agenda—improving reliability, automation, observability, and operational efficiency across complex systems. Contribute to incident management, change management, and release processes—bringing structure, automation, and intelligent insights to drive stability, safety, and velocity. Participate and Drive key SRE practices and routines—including initiation and facilitation of SRE Community of Practice (CoP), aligning SLAs/SLOs, launching error budget governance, and enabling data-driven process improvements across reliability areas. Partner effectively with SREs, platform engineers, and data teams to develop production-grade, measurable, and reliable models and tools. Develop and maintain internal frameworks and tooling to accelerate AI/ML adoption across reliability use cases. Partner , Understand and assist in driving Zero-Touch Operations by enabling platforms to detect, analyze, and resolve issues autonomously. Utilize metrics, logs, and historical incident data to build actionable insights and reliability dashboards. Actively participate in on-call rotations, improving incident response processes and escalation management. Integrate security best practices into workflows and collaborate with security teams to ensure platform stability. Contribute significantly to shaping the AI-in-SRE strategy and mentor junior team members. Required Skills & Qualifications 3–5 years of experience as a software engineer or platform engineer, with a strong focus on building production-grade systems, developer tooling, or intelligent automation. LLM-Native Development Approach- Proficiency in leveraging LLM-powered tools (e.g., for research, code generation, or automation). Demonstrated experience building AI-assisted workflows or custom automations that enhance engineering efficiency, reduce manual effort, or accelerate operational tasks. Proficient in Python, Go, or equivalent, with strong software engineering fundamentals—testing, version control, CI/CD, and clean code practices. Understanding of core SRE principles (SLIs/SLOs, incident response, error budgets), with the ability to partner with SREs to productionize reliability tooling. Hands-on experience with cloud platforms (AWS, GCP, Azure), containers/orchestration (Docker, Kubernetes), and infrastructure-as-code patterns. Familiarity with observability and telemetry systems—building or integrating with tools like Prometheus, OpenTelemetry, or Elastic stack. Comfortable working with Linux-based systems, debugging performance issues, and understanding systems-level behavior. Ability to translate operational pain points into intelligent, automated solutions using software, AI, and data-driven techniques. Preferred Qualifications. Advanced SRE Practice Exposure: Familiarity with operating in mature SRE environments—such as participating in production readiness reviews, chaos engineering exercises, Capacity planning, Error budget governance and operational health reviews etc. Exposure to building AI-assisted tools using LLMs, vector databases, or prompt engineering techniques to streamline engineering or operational workflows would be a big plus. Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements. We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing accommodationrequests@maersk.com.

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

noida, uttar pradesh

On-site

As a Senior Software Engineer-I at Sumo Logic, based in Bengaluru or Noida, you will be part of the Open Source/Open Telemetry Collector team. Your primary focus will be on designing and implementing features for a robust and efficient OpenTelemetry collection engine. This engine simplifies and enhances the performance and behavior monitoring of intricate distributed systems, enabling our customers to derive meaningful insights from their data effortlessly. Your responsibilities will include writing high-quality code with a strong emphasis on unit and integration testing. You will contribute to the upstream OpenTelemetry project, analyzing and enhancing the efficiency, scalability, and reliability of our backend systems. Additionally, you will have the opportunity to work collaboratively with team members to address business needs effectively and efficiently. To excel in this role, you should ideally hold a B.Tech, M.Tech, or Ph.D. in Computer Science or a related discipline, coupled with 5-8 years of industry experience demonstrating ownership and accountability. Proficiency in GoLang or other statically typed languages like Java, Scala, or C++ is preferred, with a willingness to learn GoLang if not already experienced. Strong communication skills, the ability to work well in a team-oriented environment, and a knack for quickly learning and adapting to new technologies are crucial for success. It would be advantageous if you have experience contributing to open-source projects, particularly in the telemetry collection domain. Familiarity with monitoring/observability tools, GitHub Actions or other CI pipelines, multi-threaded programming, and distributed systems is highly desirable. Moreover, comfort with Unix-type operating systems such as Linux and exposure to Docker, Kubernetes, Helm, and Terraform will be beneficial. Agile software development experience, including test-driven development and iterative practices, will also be valued. Join Sumo Logic, Inc., a company dedicated to empowering modern, digital businesses by providing a reliable and secure cloud-native application platform. As a Senior Software Engineer-I, you will play a pivotal role in delivering real-time analytics and insights across observability and security solutions, ensuring the success of cloud-native applications worldwide. For more information about Sumo Logic, visit www.sumologic.com. As an employee, you will be expected to adhere to federal privacy laws, regulations, and organizational data protection policies.,

Posted 1 week ago

Apply

10.0 years

0 Lacs

Gurgaon, Haryana, India

On-site

You Lead the Way. We’ve Got Your Back. With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you’ll learn and grow as we help you create a career journey that’s unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally. At American Express, you’ll be recognized for your contributions, leadership, and impact—every colleague has the opportunity to share in the company’s success. Together, we’ll win as a team, striving to uphold our company values and powerful backing promise to provide the world’s best customer experience every day. And we’ll do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong. Join Team Amex and let's lead the way together. About Enterprise Architecture: Enterprise Architecture is an organization within the Chief Technology Office at American Express and it is a key enabler of the company’s technology strategy. The four pillars of Enterprise Architecture include: 1. Architecture as Code : this pillar owns and operates foundational technologies that are leveraged by engineering teams across the enterprise. 2. Architecture as Design : this pillar includes the solution and technical design for transformation programs and business critical projects which need architectural guidance and support. 3. Governance : this pillar is responsible for defining technical standards, and developing innovative tools that automate controls to ensure compliance. 4. Colleague Enablement: this pillar is focused on colleague development, recognition, training, and enterprise outreach. What you will be working on: We are looking for a Senior Engineer to join our Enterprise Architecture team. In this role you will be designing and implementing highly scalable real-time systems following the best practices and using the cutting-edge technology. This role is best suited for experienced engineers with broad skillset who are open, curious and willing to learn. Qualifications : What you will Bring: Bachelor's degree in computer science, computer engineering or a related field, or equivalent experience 10+ years of progressive experience demonstrating strong architecture, programming and engineering skills. Firm grasp of data structures, algorithms with fluency in programming languages like Java, Kotlin, Go Demonstrated ability to lead, partner, and collaborate cross functionally across many engineering organizations Experience in building real-time large scale, high volume, distributed data pipelines on top of data buses (Kafka). Hands on experience with large scale distributed NoSQL databases like Elasticsearch Knowledge and/or experience with containerized environments, Kubernetes, docker. Knowledge and/or experience with any of the public cloud platforms like AWS, GCP. Experience in implementing and maintained highly scalable micro services in Rest, GRPC Experience in working infrastructure layers like service mesh, istio , envoy. Appetite for trying new things and building rapid POCs Preferred Qualifications: Knowledge of Observability concepts like Tracing, Metrics, Monitoring, Logging Knowledge of Prometheus Knowledge of OpenTelemetry / OpenTracing Knowledge of observability tools like Jaeger, Kibana, Grafana etc. Open-source community involvement We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.

Posted 1 week ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

We are seeking a skilled Observability Engineer to design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus , Grafana , Google Cloud Monitoring , and OpenTelemetry , along with exposure to SolarWinds . You should be comfortable working with metrics, logs, and traces , and be able to correlate telemetry data to proactively detect, diagnose, and resolve performance issues. Key Responsibilities: Design and maintain observability pipelines using OpenTelemetry, Prometheus, and Grafana. Build dashboards and alerts to monitor system health, application performance, and business KPIs. Integrate observability solutions with Google Cloud Platform services and SolarWinds. Correlate logs, metrics, and traces to troubleshoot incidents and reduce MTTR. Collaborate with SREs, DevOps, and development teams to improve end-to-end system observability. Implement best practices for telemetry data collection, enrichment, storage, and visualization. Requirements: Strong experience with Prometheus and Grafana for monitoring and alerting. Proficiency in OpenTelemetry for instrumenting distributed systems. Working knowledge of observability tools in Google Cloud (e.g., Cloud Monitoring, Logging, Trace). Exposure to SolarWinds for network and infrastructure monitoring. Solid understanding of telemetry data types: metrics, logs, and traces. Ability to correlate and analyze multi-source observability data. Scripting skills (Python, Bash) and familiarity with Infrastructure-as-Code is a plus. Preferred Qualifications: Experience in Site Reliability Engineering or Platform Engineering roles. Knowledge of SLIs/SLOs and performance benchmarking. Experience with APM tools (e.g., Datadog, New Relic) is a plus.

Posted 1 week ago

Apply

5.0 years

15 - 18 Lacs

Hyderābād

On-site

Site Reliability Engineer We are building scalable, reliable, and high-performance cloud-native applications on Microsoft Azure. We are seeking a talented and passionate Site Reliability Engineer (SRE) to join our team, focusing on establishing robust observability with OpenTelemetry and driving operational excellence across our Azure infrastructure. Role Overview: As an SRE with OpenTelemetry and Azure expertise, you will play a critical role in ensuring the availability, performance, and scalability of our production systems. You will be responsible for designing, implementing, and maintaining our observability stack using OpenTelemetry standards, integrating it seamlessly with Azure services, and applying SRE principles to build resilient and efficient systems. You will work closely with development teams to embed reliability from the ground up, automate operational tasks, and respond to incidents with speed and precision. Requirements Key Responsibilities: OTEL Monitoring Setup & Observability: Design, implement, and manage a comprehensive observability platform using OpenTelemetry for distributed tracing, metrics, and logs across our microservices and applications. Ensure full instrumentation of applications (e.g., Java, Python, Node.js) to capture end-to-end telemetry data. Configure and optimize OpenTelemetry Collectors to receive, process, and export telemetry data to various backends (e.g., Prometheus, Grafana, Application Insights, Jaeger, Loki, Tempo and Azure Monitor). Develop custom instrumentation and semantic conventions to enhance monitoring capabilities and provide deeper insights into application behavior. Establish robust alerting and anomaly detection based on OpenTelemetry signals, utilizing tools like Azure Monitor, Prometheus Alert manager, or similar. Create informative and actionable dashboards (e.g., Grafana, Azure Dashboards) for real-time system insights, performance monitoring, and incident response. Continuously evaluate and integrate new OpenTelemetry features and best practices to improve our observability posture. Azure SRE Capabilities: Reliability & Performance Engineering: Monitor system performance, reliability, and availability metrics across Azure services. Identify bottlenecks, anticipate scaling needs, and implement strategies to reduce downtime and improve performance. Incident Management & Response: Participate in on-call rotations, lead incident response efforts, conduct thorough root cause analysis (RCA), and implement preventative measures to minimize recurrence. Develop and maintain runbooks and playbooks for effective incident resolution. Automation & Infrastructure as Code (IaC): Automate repetitive operational tasks, deployments, and infrastructure provisioning using Azure DevOps, Terraform, Azure Bicep, PowerShell, or Bash scripting. CI/CD Integration: Integrate observability checks and validation steps into CI/CD pipelines to ensure the reliability and performance of new releases. Capacity Planning & Cost Optimization: Conduct capacity planning, analyze usage patterns, and optimize Azure resources for cost efficiency, performance, and scalability. Security & Compliance: Implement and enforce security best practices within Azure environments, collaborate with security teams, and ensure adherence to relevant compliance standards. Collaboration & Mentorship: Work closely with development teams to foster a culture of reliability, provide guidance on observability best practices, and share knowledge across the organization. Required Skills and Experience: 5+ years of experience in a Site Reliability Engineering (SRE), DevOps, or a similar infrastructure-focused role. Deep practical experience with OpenTelemetry (OTEL) for instrumenting, collecting, processing, and exporting traces, metrics, and logs. Strong proficiency in Azure cloud services and their monitoring capabilities (Azure Monitor, Log Analytics, Application Insights). Hands-on experience with Infrastructure as Code (IaC) tools such as Terraform, Azure Bicep, or ARM templates. Solid scripting and automation skills (e.g., Python, PowerShell, Bash). Experience with containerization technologies (Docker) and orchestration platforms (Kubernetes/AKS). Expertise with various observability backends like Grafana, Alloy, Loki, Tempo, Prometheus, Jaeger. Strong understanding of distributed systems, microservices architectures, and cloud-native principles. Excellent problem-solving, analytical, and troubleshooting skills. Strong communication and collaboration abilities. Preferred Qualifications: Azure certifications (e.g., AZ-104 Azure Administrator, AZ-400 Azure DevOps Engineer Expert). Experience with chaos engineering practices. Understanding of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets. Familiarity with database monitoring (e.g., PostgreSQL, Azure SQL). Experience in a high-availability, regulated, or customer-facing environment. Education: Bachelor's degree in Computer Science, Information Technology, or a related technical field, or equivalent practical experience. Job Type: Full-time Pay: ₹130,000.00 - ₹150,000.00 per month Experience: Site Reliability Engineering: 7 years (Required) DevOps: 6 years (Required) OpenTelemetry: 5 years (Required) Azure cloud services : 6 years (Required) orchestration platforms (Kubernetes/AKS): 5 years (Required) Work Location: In person

Posted 1 week ago

Apply

10.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Company Qualcomm India Private Limited Job Area Engineering Group, Engineering Group > Software Engineering General Summary Job Summary: Qualcomm is seeking a seasoned Staff Engineer, DevOps to join our central software engineering team. In this role, you will lead the design, development, and deployment of scalable cloud-native and hybrid infrastructure solutions, modernize legacy systems, and drive DevOps best practices across products. This is a hands-on architectural role ideal for someone who thrives in a fast-paced, innovation-driven environment and is passionate about building resilient, secure, and efficient platforms. Key Responsibilities Architect and implement enterprise-grade AWS cloud solutions for Qualcomm’s software platforms. Design and implement CI/CD pipelines using Jenkins, GitHub Actions, and Terraform to enable rapid and reliable software delivery. Develop reusable Terraform modules and automation scripts to support scalable infrastructure provisioning. Drive observability initiatives using Prometheus, Grafana, Fluentd, OpenTelemetry, and Splunk to ensure system reliability and performance. Collaborate with software development teams to embed DevOps practices into the SDLC and ensure seamless deployment and operations. Provide mentorship and technical leadership to junior engineers and cross-functional teams. Manage hybrid environments, including on-prem infrastructure and Kubernetes workloads supporting both Linux and Windows. Lead incident response, root cause analysis, and continuous improvement of SLIs for mission-critical systems. Drive toil reduction and automation using scripting or programming languages such as PowerShell, Bash, Python, or Go. Independently drive and implement DevOps/cloud initiatives in collaboration with key stakeholders. Understand software development designs and compilation/deployment flows for .NET, Angular, and Java-based applications to align infrastructure and CI/CD strategies with application architecture. Required Qualifications 10+ years of experience in IT or software development, with at least 5 years in cloud architecture and DevOps roles. Strong foundational knowledge of infrastructure components such as networking, servers, operating systems, DNS, Active Directory, and LDAP. Deep expertise in AWS services including EKS, RDS, MSK, CloudFront, S3, and OpenSearch. Hands-on experience with Kubernetes, Docker, containerd, and microservices orchestration. Proficiency in Infrastructure as Code using Terraform and configuration management tools like Ansible and Chef. Experience with observability tools and telemetry pipelines (Grafana, Prometheus, Fluentd, OpenTelemetry, Splunk). Experience with agent-based monitoring tools such as SCOM and Datadog. Solid scripting skills in Python, Bash, and PowerShell. Familiarity with enterprise-grade web services (IIS, Apache, Nginx) and load balancing solutions. Excellent communication and leadership skills with experience mentoring and collaborating across teams. Preferred Qualifications Experience with api gateway solutions for API security and management. Knowledge on RDBMS, preferably MSSQL/Postgresql is good to have. Proficiency in SRE principles including SLIs, SLOs, SLAs, error budgets, chaos engineering, and toil reduction. Experience in core software development (e.g., Java, .NET). Exposure to Azure cloud and hybrid cloud strategies. Bachelor’s degree in Computer Science or a related field Minimum Qualifications Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience. OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience. OR PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience. 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc. Applicants : Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries). Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law. To all Staffing and Recruiting Agencies : Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications. If you would like more information about this role, please contact Qualcomm Careers. 3076889

Posted 1 week ago

Apply

6.0 years

0 Lacs

Kanpur, Uttar Pradesh, India

On-site

Description We are seeking a highly technical Lead Software Engineer to drive the architecture, scalability, and engineering excellence of our core platforms. This role is ideal for someone who thrives on solving complex engineering problems and scaling distributed systems in production. You will design, build, and optimize full-stack systems with a strong focus on microservices, event-driven architecture, and cloud-native DevOps. You’ll work across the stack, from backend services and frontend performance to CI/CD, observability, and security. Responsibilities 1. Architecture & Scalabilit y • Architect and implement highly scalable microservices-based systems using Python (Django) or Node.js. • Design event-driven architectures using Kafka, RabbitMQ, or AWS SQS/SNS. • Build low-latency, high-throughput APIs, utilizing Redis/Memcached and CDNs. • Apply distributed systems patterns (e.g., CQRS, Saga, Circuit Breaker) for resilience and consistency. • Use container orchestration (e.g., Kubernetes) and serverless platforms (AWS Lambda, Azure Functions) for scalable, cloud-native deployments. 2. Full-Stack Engineeri ng • Build RESTful or gRPC APIs with Python (Django), or Node.js (Express/NestJS). • Develop high-performance frontend applications with React.js (Next.js), TypeScript, and state management (Redux, Zustand). • Design optimized database schemas across PostgreSQL, MySQL, MongoDB, or Cassandra, with attention to indexing, replication, and sharding. • Implement real-time features using WebSockets (Socket.io) or GraphQL subscription s. 3. DevOps & Cloud Infrastructure • Build and automate CI/CD pipelines using GitHub Actions, GitLab CI, or Jenkins with IaC tools like Terraform or Pulumi. • Manage Kubernetes clusters (EKS, GKE, AKS) using Helm and service meshes (Istio, Linkerd). • Set up robust monitoring and observability stacks (Prometheus, Grafana, OpenTelemetry, ELK). • Deploy security-first infrastructure in AWS, GCP, or Azure, following DevSecOps best practices. 4. Code Quality & Security • Enforce engineering standards via linters (ESLint, Pylint), static analysis (SonarQube), and automated testing (Jest, Pytest). • Conduct security audits and integrate SAST/DAST tools (Snyk, OWASP ZAP, Trivy) into CI/CD. • Implement zero-trust architectures using OAuth 2.0, JWT, and RBAC for access control. • Ensure compliance with OWASP Top 10 and other secure development standards. Eligibility ✅ 6+ years of hands-on experience building scalable, distributed software systems. ✅ Deep backend experience in Python (Django) or Node.js (Express/NestJS). ✅ Strong frontend experience with React.js, TypeScript, and Next.js. ✅ Proven experienc e in microservices, event-driven architect ures, and message br okers like Kafka or RabbitMQ. ✅ Hands-on expertise in both SQL (PostgreSQL, MySQL) and NoSQL (MongoDB, Redis, Cassandra). ✅ Solid DevOps skills including Kubernetes, Docker, and cloud plat forms (AWS/GCP/Azure). ✅ Strong knowledg e of secure coding practices, API security, and web application hardening. Preferred (Nice to Have): 🔹 Knowledge of blockchain technologies (Ethereum, Hyperledger, Solidity). 🔹 Experience with Web3 libraries (Web3.js, Ethers.js). 🔹 Contributions to open-source, technical blogs, or whitepapers. Educational Qualifications: 🎓 B.Tech / M.Tech in Computer Science or related field (Mandatory). Travel As and when required, across the country for project execution and monitoring as well as for coordination with geographically distributed teams. Communication Submit a cover letter summarising your experience in relevant technologies and software along with a resume and the Latest passport-size photograph.

Posted 1 week ago

Apply

2.0 - 6.0 years

0 Lacs

karnataka

On-site

NTT DATA is looking for a Sr Information Security Engineer to be a part of their team based in Bangalore, Karnataka (IN-KA), India. As a Sr Information Security Engineer, you will bring your expertise and experience in IT Technology, AI / ML, data engineering, python coding, and enterprise-wide integration programs to contribute to our innovative and forward-thinking organization. **Required Skills:** - You should have at least 5 years of experience in IT Technology. - With a minimum of 2 years of experience in AI / ML, you should possess a strong working knowledge in neural networks. - Having 2+ years of data engineering experience, preferably using AWS Glue, Cribl, SignalFx, OpenTelemetry, or AWS Lambda, will be an added advantage. - Your proficiency should include 2+ years of python coding using numpy, vectorization, and Tensorflow. - You must have 2+ years of experience in leading complex enterprise-wide integration programs and efforts as an individual contributor. **Preferred Skills:** - A degree in Mathematics or Physics would be preferred. - Technical knowledge in cloud technologies such as AWS, Azure, and GCP for at least 2 years is desirable. - Excellent verbal, written, and interpersonal communication skills are a must. - Your ability to provide strong customer service will be highly valued. If you are someone who is passionate about innovation and growth, and can contribute effectively to our diverse and global team, we encourage you to apply now and be a part of NTT DATA's commitment to helping clients innovate, optimize, and transform for long-term success. About NTT DATA: NTT DATA is a $30 billion trusted global innovator of business and technology services. With a strong presence in more than 50 countries, we serve 75% of the Fortune Global 100 and have a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation, and management of applications, infrastructure, and connectivity. Being one of the leading providers of digital and AI infrastructure globally, we are dedicated to helping organizations and society move confidently and sustainably into the digital future. Join us in this journey towards innovation and success! #LI-INPAS,

Posted 1 week ago

Apply

5.0 years

0 Lacs

Noida

On-site

Senior Software Engineer-I (Open Source Collection) Location: Bengaluru/Noida Our team: At Sumo Logic, we ingest petabytes of data every day and empower our customers by providing them with extremely reliable and fast tools to derive meaningful insights from their data. The Open Source/Open Telemetry Collector team provides the next-generation data collector built on Open Telemetry to simplify and streamline the performance and behavior monitoring of complex distributed systems. Responsibilities: Design and implement features for an extremely robust and lean OpenTelemetry collection engine. Good hands-on understanding of Kubernetes and the Ability to quickly diagnose and resolve complex issues in a production environment. Write robust code with unit and integration tests. Contribute to the upstream OpenTelemetry project. Analyze and improve the efficiency, scalability, and reliability of our backend systems. Work as a team member, helping the team respond quickly and effectively to business needs Experience with CI and on-call production support Requirements B.Tech, M.Tech, or Ph.D. in Computer Science or related discipline 5-8 years of industry experience with a proven track record of ownership Experience with GoLang or other statically typed language (Java, Scala, C++). Willingness to learn GoLang if you don't have the experience. Strong communication skills and the ability to work in a team environment Understand the performance characteristics of commonly used data structures (maps, lists, trees, etc) Demonstrated ability to learn quickly, solve problems and adapt to new technologies Nice to have Contributing to an open-source project and preferably open-source telemetry collection Familiarity with the monitoring/observability space. Working experience with GitHub Actions or other CI pipelines A GitHub account with recent activity and contributions to open source projects Experience in multi-threaded programming and distributed systems is highly desirable. Comfortable working with Unix-type operating systems (Linux, OS X) Familiarity with Docker, Kubernetes, Helm, Terraform, etc. Agile software development experience (test-driven development, iterative and incremental development) About Us Sumo Logic, Inc. empowers the people who power modern, digital business. Sumo Logic enables customers to deliver reliable and secure cloud-native applications through its Sumo Logic SaaS Analytics Log Platform, which helps practitioners and developers ensure application reliability, secure and protect against modern security threats, and gain insights into their cloud infrastructures. Customers worldwide rely on Sumo Logic to get powerful real-time analytics and insights across observability and security solutions for their cloud-native applications. For more information, visit www.sumologic.com. Sumo Logic Privacy Policy. Employees will be responsible for complying with applicable federal privacy laws and regulations, as well as organizational policies related to data protection.

Posted 1 week ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

We are looking for a highly skilled and proactive Senior DevOps Specialist to join our Infrastructure Management Team. In this role, you will lead initiatives to streamline and automate infrastructure provisioning, CI/CD, observability, and compliance processes using GitLab, containerized environments, and modern DevSecOps tooling. You will work closely with application, data, and ML engineering teams to support MLOps workflows (e.g., model versioning, reproducibility, pipeline orchestration) and implement AIOps practices for intelligent monitoring, anomaly detection, and automated root cause analysis. Your goal will be to deliver secure, scalable, and observable infrastructure across environments. Key Responsibilities Architect and maintain GitLab CI/CD pipelines to support deployment automation, environment provisioning, and rollback readiness. Implement standardized, reusable CI/CD templates for application, ML, and data services. Collaborate with system engineers to ensure secure, consistent infrastructure-as-code deployments using Terraform, Ansible, and Docker. Integrate security tools such as Vault, Trivy, tfsec, and InSpec into CI/CD pipelines. Govern infrastructure compliance by enforcing policies around secret management, image scanning, and drift detection. Lead internal infrastructure and security audits and maintain compliance records where required. Define and implement observability standards using OpenTelemetry, Grafana, and Graylog. Collaborate with developers to integrate structured logging, tracing, and health checks into services. Enable root cause detection workflows and performance monitoring for infrastructure and deployments. Work closely with application, data, and ML teams to support provisioning, deployment, and infra readiness. Ensure reproducibility and auditability in data/ML pipelines via tools like DVC and MLflow. Participate in release planning, deployment checks, and incident analysis from an infrastructure perspective. Mentor junior DevOps engineers and foster a culture of automation, accountability, and continuous improvement. Lead daily standups, retrospectives, and backlog grooming sessions for infrastructure-related deliverables. Drive internal documentation, runbooks, and reusable DevOps assets. Must Have Strong experience with GitLab CI/CD, Docker, and SonarQube for pipeline automation and code quality enforcement Proficiency in scripting languages such as Bash, Python, or Shell for automation and orchestration tasks Solid understanding of Linux and Windows systems, including command-line tools, process management, and system troubleshooting Familiarity with SQL for validating database changes, debugging issues, and running schema checks Experience managing Docker-based environments, including container orchestration using Docker Compose, container lifecycle management, and secure image handling Hands-on experience supporting MLOps pipelines, including model versioning, experiment tracking (e.g., DVC, MLflow), orchestration (e.g., Airflow), and reproducible deployments for ML workloads. Hands-on knowledge of test frameworks such as PyTest, Robot Framework, REST-assured, and Selenium Experience with infrastructure testing tools like tfsec, InSpec, or custom Terraform test setups Strong exposure to API testing, load/performance testing, and reliability validation Familiarity with AIOps concepts, including structured logging, anomaly detection, and root cause analysis using observability platforms (e.g., OpenTelemetry, Prometheus, Graylog) Exposure to monitoring/logging tools like Grafana, Graylog, OpenTelemetry. Experience managing containerized environments for testing and deployment, aligned with security-first DevOps practices Ability to define CI/CD governance policies, pipeline quality checks, and operational readiness gates Excellent communication skills and proven ability to lead DevOps initiatives and interface with cross-functional stakeholders

Posted 1 week ago

Apply

0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Join us as we pursue our ground-breaking vision to make machine data accessible, usable, and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we are committed to our work, customers, having fun, and most significantly to each other’s success. The Splunk Observability Cloud provides full-fidelity monitoring and fixing across infrastructure, applications, and user interfaces, in real-time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud-native systems engineers who use infrastructure-as-code, microservices, automation, and efficient design to build, operate, and scale our products About The O11y SWAT Team The o11y SWAT Team is pivotal in diagnosing and resolving complex issues in large-scale, multi-tiered, and diverse infrastructures, including Kubernetes clusters and cloud environments, and applications are developed using multiple technologies. We collaborate with leading companies worldwide, including top corporations in communications, supply chain, transportation, and financial sectors. Our expertise greatly enhances the productivity and efficiency of these distinguished customers globally. As significant contributors to the development of OpenTelemetry (OTEL), this presents an outstanding opportunity for talented candidates to engage in impactful work on a global scale. What You'll Get To Do Participate in troubleshooting and resolving issues related to backend services for a scalable platform. Focus heavily on managing and diagnosing issues in large-scale Kubernetes deployments and containerized applications. Collaborate with cross-functional teams to identify and address performance and reliability issues. Assist in integrating monitoring solutions to enhance system diagnostics and troubleshooting processes. Contribute to improving operational practices and maintaining high system availability. Learn and apply best practices for identifying and resolving issues in microservices architecture and cloud-native environments. Stay informed about industry trends and emerging technologies related to backend services and Kubernetes. Must-Have Qualifications Over 5+ yrs of strong experience in software engineering principles, including debugging and problem-solving techniques. Strong familiarity with Kubernetes, including deployment, scaling, and management of containerized applications. Experience with backend technologies using languages such as Java, Python, or Go. Basic knowledge of cloud platforms and microservices architecture. Strong analytical skills and attention to detail. Bachelor's degree in Computer Science or a related field, with relevant experience. Nice-to-Have Qualifications Experience with cloud-native tools and services (e.g., AWS, Azure, or Google Cloud Platform). Familiarity with monitoring and observability tools, such as Prometheus or OpenTelemetry. Familiarity with web applications and frameworks like react Understanding of distributed systems and data management. Exposure to CI/CD pipelines and automation tools. We value diversity, equity, and inclusion at Splunk and are an equal employment opportunity employer. Qualified applicants receive consideration for employment without regard to race, religion, color, national origin, ancestry, sex, gender, gender identity, gender expression, sexual orientation, marital status, age, physical or mental disability or medical condition, genetic information, veteran status, or any other consideration made unlawful by federal, state, or local laws. We consider qualified applicants with criminal histories, consistent with legal requirements. Note

Posted 1 week ago

Apply

5.0 - 7.0 years

0 - 4 Lacs

Hyderabad, Telangana, India

On-site

Job description Minimum 5 years of relevant work experience with Datadog or alternative products like Dynatrace set up in critical production environments Has experience working with AWS hosted applications and services Experience using observability dashboards particularly centered around APM Experience with Ansible automation on production and non-production environments Working knowledge of coding .NET applications and log frameworks Core Capabilities: Expert level knowledge on Datadog integration with agents as well as APM and RUM Ability to convert existing ElasticSearch Grok patterns and filters to Datadog and set up a new forwarder Proficient in AWS, particularly CloudWatch, CloudTrail and EC2 Ability to deploy Datadog agents across on-prem and cloud-hosted instances Understand how to build observability into monolithing applications to expose telemetry via the OpenTelemetry SDK or Zipkin traces Understand the customer service requirement to define and create service level objectives (SLOs) as well as corresponding dashboards Strong knowledge of Ansible and Powershell to automate deployment of Datadog agents Experienced in running R&D labs as well as incubating new solutions Strong grasp of 4 Golden Signals, MELT and RED approaches to observability Creating actionable dashboards and associated alerts based on thresholds using Datadog Ability to write IaC using Terraform or Packer

Posted 1 week ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

We are looking for a highly skilled and proactive Senior DevOps Specialist to join our Infrastructure Management Team. In this role, you will lead initiatives to streamline and automate infrastructure provisioning, CI/CD, observability, and compliance processes using GitLab, containerized environments, and modern DevSecOps tooling. You will work closely with application, data, and ML engineering teams to support MLOps workflows (e.g., model versioning, reproducibility, pipeline orchestration) and implement AIOps practices for intelligent monitoring, anomaly detection, and automated root cause analysis. Your goal will be to deliver secure, scalable, and observable infrastructure across environments. Key Responsibilities: Architect and maintain GitLab CI/CD pipelines to support deployment automation, environment provisioning, and rollback readiness. Implement standardized, reusable CI/CD templates for application, ML, and data services. Collaborate with system engineers to ensure secure, consistent infrastructure-as-code deployments using Terraform, Ansible, and Docker. Integrate security tools such as Vault, Trivy, tfsec, and InSpec into CI/CD pipelines. Govern infrastructure compliance by enforcing policies around secret management, image scanning, and drift detection. Lead internal infrastructure and security audits and maintain compliance records where required. Define and implement observability standards using OpenTelemetry, Grafana, and Graylog. Collaborate with developers to integrate structured logging, tracing, and health checks into services. Enable root cause detection workflows and performance monitoring for infrastructure and deployments. Work closely with application, data, and ML teams to support provisioning, deployment, and infra readiness. Ensure reproducibility and auditability in data/ML pipelines via tools like DVC and MLflow. Participate in release planning, deployment checks, and incident analysis from an infrastructure perspective. Mentor junior DevOps engineers and foster a culture of automation, accountability, and continuous improvement. Lead daily standups, retrospectives, and backlog grooming sessions for infrastructure-related deliverables. Drive internal documentation, runbooks, and reusable DevOps assets. Must Have: Strong experience with GitLab CI/CD, Docker, and SonarQube for pipeline automation and code quality enforcement Proficiency in scripting languages such as Bash, Python, or Shell for automation and orchestration tasks Solid understanding of Linux and Windows systems, including command-line tools, process management, and system troubleshooting Familiarity with SQL for validating database changes, debugging issues, and running schema checks Experience managing Docker-based environments, including container orchestration using Docker Compose, container lifecycle management, and secure image handling Hands-on experience supporting MLOps pipelines, including model versioning, experiment tracking (e.g., DVC, MLflow), orchestration (e.g., Airflow), and reproducible deployments for ML workloads. Hands-on knowledge of test frameworks such as PyTest, Robot Framework, REST-assured, and Selenium Experience with infrastructure testing tools like tfsec, InSpec, or custom Terraform test setups Strong exposure to API testing, load/performance testing, and reliability validation Familiarity with AIOps concepts, including structured logging, anomaly detection, and root cause analysis using observability platforms (e.g., OpenTelemetry, Prometheus, Graylog) Exposure to monitoring/logging tools like Grafana, Graylog, OpenTelemetry. Experience managing containerized environments for testing and deployment, aligned with security-first DevOps practices Ability to define CI/CD governance policies, pipeline quality checks, and operational readiness gates Excellent communication skills and proven ability to lead DevOps initiatives and interface with cross-functional stakeholders

Posted 1 week ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies