Jobs
Interviews

310 Elk Jobs - Page 5

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

3.0 - 7.0 years

0 Lacs

haryana

On-site

As a Kafka Administrator at our Merchant Ecommerce platform, located in Noida Sector 62, you will be responsible for managing, maintaining, and optimizing our distributed, multi-cluster Kafka infrastructure in an on-premise environment. Your role will require a deep understanding of Kafka internals, Zookeeper administration, performance tuning, and operational excellence in high-throughput, low-latency production systems. Additionally, experience with API gateway operations (specifically Kong) and observability tooling would be advantageous. Your key responsibilities will include managing multiple Kafka clusters with high-availability Zookeeper setups, conducting end-to-end operational support, capacity planning, implementation of backup and disaster recovery processes, enforcing security configurations, optimizing Kafka producer and consumer performance, planning and executing upgrades and patching, integrating with monitoring platforms, defining log retention and archival policies, monitoring Kafka metrics and logs, collaborating on security and compliance measures, and supporting regular vulnerability assessments. You will be expected to have at least 3+ years of hands-on Kafka administration experience in production environments, a strong understanding of Kafka internals and Zookeeper management, experience with performance tuning and troubleshooting, familiarity with security mechanisms like TLS/mTLS, ACLs, and SASL, proficiency with monitoring and logging tools, and scripting skills for operational automation. Experience with API gateways, Kubernetes-based environments, compliance standards, security hardening practices, and IaC tools would be a plus. In return, we offer you a mission-critical role in managing large-scale real-time data infrastructure, a flexible work environment, opportunities for growth, a supportive team, and access to modern observability and automation tools.,

Posted 2 weeks ago

Apply

2.0 - 6.0 years

0 Lacs

pune, maharashtra

On-site

You will be joining as a talented SDE1 - DevOps Engineer with the exciting opportunity to contribute towards building a top-notch DevOps infrastructure that can scale to accommodate the next 100M users. As an ideal candidate, you will be expected to tackle a variety of challenges with enthusiasm and take full ownership of your responsibilities. Your main responsibilities will include running a highly available Cloud-based software product on AWS, designing and implementing new systems in close collaboration with the Software Development team, setting up and maintaining CI/CD systems, and automating the deployment of software. You will also be tasked with continuously enhancing the security posture and operational efficiency of the Amber platform, as well as optimizing the operational costs. To excel in this role, you should possess 2-3 years of experience in a DevOps / SRE role, with a minimum of 2 years. You must have hands-on experience with AWS services such as ECS, EKS, RDS, Elasticache, and CloudFront, as well as familiarity with Google Cloud Platform. Proficiency in Infrastructure as Code tools like Terraform, CI/CD tools like Jenkins and GitHub Actions, and scripting languages such as Python and Bash is essential. Additionally, you should have a strong grasp of SCM in GitHub, networking concepts, and experience with observability and monitoring tools like Grafana, Loki, Prometheus, and ELK. Prior exposure to On-Call Rotation and mentoring junior DevOps Engineers would be advantageous. While not mandatory, knowledge of NodeJS and Ruby, including their platforms and workflows, would be considered a plus for this role.,

Posted 2 weeks ago

Apply

10.0 - 14.0 years

0 Lacs

andhra pradesh

On-site

You are seeking a highly skilled Technical Architect with expertise in Java Spring Boot, React.js, IoT system architecture, and a strong foundation in DevOps practices. As the ideal candidate, you will play a pivotal role in designing scalable, secure, and high-performance IoT solutions, leading full-stack teams, and collaborating across product, infrastructure, and data teams. Your key responsibilities will include designing and implementing scalable and secure IoT platform architecture, defining data flow and event processing pipelines, architecting micro services-based solutions, and integrating them with React-based front-ends. You will also be responsible for defining CI/CD pipelines, managing containerization & orchestration, driving infrastructure automation, ensuring platform monitoring and observability, and enabling auto-scaling and zero-downtime deployments. In addition, you will need to collaborate with product managers and business stakeholders to translate requirements into technical specs, mentor and lead a team of developers and engineers, conduct code and architecture reviews, set goals and targets, and provide coaching and professional development to team members. Your role will also involve conducting unit testing, identifying risks, using coding standards and best practices to ensure quality, and maintaining a long-term outlook on the product roadmap and its enabling technologies. To be successful in this role, you must have hands-on IoT project experience, experience in designing and deploying multi-tenant SaaS platforms, strong knowledge of security best practices in IoT and cloud, excellent problem-solving, communication, and team leadership skills. It would be beneficial if you have experience with Edge Computing frameworks, AI/ML model integration into IoT pipelines, exposure to industrial protocols, experience with digital twin concepts, and certifications in relevant technologies. Ideally, you should have a Bachelor's or Master's degree in Computer Science, Engineering, or a related field. By joining us, you will have the opportunity to lead architecture for cutting-edge industrial IoT platforms, work with a passionate team in a fast-paced and innovative environment, and gain exposure to cross-disciplinary challenges in IoT, AI, and cloud-native technologies.,

Posted 2 weeks ago

Apply

6.0 - 10.0 years

0 Lacs

ahmedabad, gujarat

On-site

You will be responsible for leading a team of DevOps engineers in Ahmedabad. Your main duties will include managing and mentoring the team, overseeing the deployment and maintenance of various applications such as Odoo, Magento, and Node.js. You will also be in charge of designing and managing CI/CD pipelines using tools like Jenkins and GitLab CI, handling environment-specific configurations, and containerizing applications using Docker. In addition, you will need to implement and maintain Infrastructure as Code using tools like Terraform and Ansible, monitor application health and infrastructure, and ensure systems are secure, resilient, and compliant with industry standards. Collaboration with development, QA, and IT support teams is essential for seamless delivery, and troubleshooting performance, deployment, or scaling issues across tech stacks will also be part of your responsibilities. To be successful in this role, you should have at least 6 years of experience in DevOps/Cloud/System Engineering roles, with a minimum of 2 years managing or leading DevOps teams. Hands-on experience with Odoo, Magento, Node.js, and AWS/Azure/GCP infrastructure is required. Strong scripting skills in Bash, Python, PHP, or Node CLI, as well as a deep understanding of Linux system administration and networking fundamentals, are essential. Experience with Git, SSH, reverse proxies, and load balancers is also necessary, along with good communication skills and client management exposure. Preferred certifications that would be highly valued for this role include AWS Certified DevOps Engineer Professional, Azure DevOps Engineer Expert, and Google Cloud Professional DevOps Engineer. Bonus skills that are nice to have include experience with multi-region failover, HA clusters, MySQL/PostgreSQL optimization, GitOps, ArgoCD, Helm, VAPT 2.0, WCAG compliance, and infrastructure security best practices.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

punjab

On-site

You will be responsible for the design, implementation, support, and resolution of complex problems related to network, security, systems, storage, and wireless solutions. This role includes a consultancy and policy component regarding network technologies. As a mid-level Engineer, you will provide comprehensive design, scoping, and technical support to the sales team. Additionally, you will assist and mentor mid to junior level engineers in both technical aspects and overall capacity. Your responsibilities will include analyzing, developing, interpreting, and evaluating complex system design and architecture specifications. You will research, analyze, evaluate, and monitor network infrastructure to ensure optimal performance. Recommending improvements to network operations and integrated hardware, software, communications, and operating systems will be part of your role. Providing specialized skills in supporting and troubleshooting network problems and emergencies is crucial. Installing, configuring, testing, maintaining, and administering new and upgraded networks, software, database applications, servers, and workstations will also fall under your duties. Furthermore, you will be responsible for network programming to support specific business needs and requirements. You will prepare and maintain procedures and documentation for network inventory, as well as record diagnosis and resolution of network faults, enhancements, modifications, and maintenance instructions. Monitoring network traffic, activity, capacity, and usage to ensure continued integrity and optimal performance is essential. Taking ownership of assigned tasks, communicating regularly with customers" personnel as required, creating dashboards, and automating workloads in Splunk and Control-M are also part of your responsibilities. Testing jobs and integrations with other enterprise products will be necessary. You should possess a strong understanding of network infrastructure and hardware, along with the ability to think through problems and visualize solutions. Implementing, administering, and troubleshooting network infrastructure devices, creating accurate network diagrams, and documentation are key skills required for this role. You must be able to quickly learn new technology and products, work with staff at all levels, and demonstrate good analytical and problem-solving skills. Additionally, you should be a self-starter who can work independently and collaboratively in a team environment. Dependability, flexibility when necessary, in-depth knowledge of Network Splunk Enterprise, Workload Automation Solution, and specific skills related to network management technologies, networks, security, and various tools and products are essential for this role. Special skills required include expertise in Network Management Technologies, Nagios XI, Control-M, Dynatrace APM tool, ITSM products, ELK (Elastic, Logstash, Kibana), Software Defined Network Procedure, and creating custom solutions using different network technologies. Trade/professional qualifications or training such as Dynatrace Certified Associate, Accredited Associate BMC Control-M Administrator, Accredited Specialist Service Now ITSM Specialization, and ITILV3 Found Ext Certification are valuable for this position.,

Posted 2 weeks ago

Apply

10.0 - 15.0 years

10 - 20 Lacs

Bengaluru

Work from Office

Description: Boomi India Lab (11013020) Requirements: Job Description AWS (VPC/ECS/EC2/CloudFormation/RDS) Artifactory Some knowledge with CircleCI/Saltstack is preferred but not required Responsibilities: Manage containerized applications using kubernetes, Docker, etc. Automate Build/deployments (CI&CD) & other repetitive tasks using shell/Python scripts or tools like Ansible, Jenkins, etc Coordinate with development teams to fix issues, release new code Setup configuration management using tools like Ansible etc. Implement High available, auto-scaling, Fault tolerant, secure setup Implement automated jobs tasks like backups, cleanup, start-stop, reports. Configure monitoring alerts/alarms and act on any outages/incidents Ensure that the infrastructure is secured and can be accessed from limited IPs and ports Understand client requirements propose solutions and ensure delivery Innovate and actively look for improvements in overall infrastructure Must Have: Bachelors Degree, with at least 7+ year experience in DevOps Should have worked on various DevOps tools like: GitLab, Jenkins, SonarQube, Nexus, Ansible etc. Should have worked on various AWS Services -EC2, S3, RDS, CloudFront, CloudWatch, CloudTrail, Route53, ECS, ASG, Route53 etc. Well-versed with shell/python scripting & Linux Well-versed with Web-Servers (Apache, Tomcat etc) Well-versed with containerized application (Docker, Docker-compose, Docker-swarm, Kubernetes) Have worked on Configuration management tools like Puppet, Ansible etc. Have experience in CI/CD implementation (Jenkins, Bamboo, etc..) Self-starter and ability to deliver under tight timelines Good to have: Exposure to various tools like New Relic, ELK, Jira, confluence etc Prior experience in managing infrastructure for public facing web-applications. Prior experience in handling client communications Basic Networking knowledge – VLAN, Subnet, VPC, etc. Knowledge of databases (PostgreSQL). Key Skills- Must have Jenkins, Docker, Python, Groovy, Shell-Scripting, Artifactory, Gitlab, Terraform, VM Ware,PostgreSQL, AWS, Kafka Job Responsibilities: Responsibilities: Manage containerized applications using kubernetes, Docker, etc. Automate Build/deployments (CI&CD) & other repetitive tasks using shell/Python scripts or tools like Ansible, Jenkins, etc Coordinate with development teams to fix issues, release new code Setup configuration management using tools like Ansible etc. Implement High available, auto-scaling, Fault tolerant, secure setup Implement automated jobs tasks like backups, cleanup, start-stop, reports. Configure monitoring alerts/alarms and act on any outages/incidents Ensure that the infrastructure is secured and can be accessed from limited IPs and ports Understand client requirements propose solutions and ensure delivery Innovate and actively look for improvements in overall infrastructure What We Offer: Exciting Projects: We focus on industries like High-Tech, communication, media, healthcare, retail and telecom. Our customer list is full of fantastic global brands and leaders who love what we build for them. Collaborative Environment: You Can expand your skills by collaborating with a diverse team of highly talented people in an open, laidback environment — or even abroad in one of our global centers or client facilities! Work-Life Balance: GlobalLogic prioritizes work-life balance, which is why we offer flexible work schedules, opportunities to work from home, and paid time off and holidays. Professional Development: Our dedicated Learning & Development team regularly organizes Communication skills training(GL Vantage, Toast Master),Stress Management program, professional certifications, and technical and soft skill trainings. Excellent Benefits: We provide our employees with competitive salaries, family medical insurance, Group Term Life Insurance, Group Personal Accident Insurance , NPS(National Pension Scheme ), Periodic health awareness program, extended maternity leave, annual performance bonuses, and referral bonuses. Fun Perks: We want you to love where you work, which is why we host sports events, cultural activities, offer food on subsidies rates, Corporate parties. Our vibrant offices also include dedicated GL Zones, rooftop decks and GL Club where you can drink coffee or tea with your colleagues over a game of table and offer discounts for popular stores and restaurants!

Posted 2 weeks ago

Apply

2.0 - 5.0 years

15 - 20 Lacs

Pune

Hybrid

Monitor production systems & services using observability tools (logs, metrics, traces, dashboards, Respond to incidents Design, implement & maintain observability solutions (eg Prometheus, Grafana, ELK) Technical Operations & Continuous Improvement Required Candidate profile Must have* Exp in Azure services with AWS Hands on with (IaC) tools such as Terraform Scripting skills in Python/Bash/PowerShell Familiarity with Gitlab CI/CD tools Notice Period - 1 month or less

Posted 2 weeks ago

Apply

3.0 - 6.0 years

4 - 6 Lacs

Kochi

Work from Office

Job brief The Security Operation Centre (SOC) Information Security Analyst are the first level responsible for ensuring the protection of digital assets from unauthorized access, identify security incidents and report to customers for both online and on-premises. The position monitors and responds to security events from managed customer security systems as part of a team on a rotating 24 x 7 x 365 basis. They are alert and aggressive to filter out suspicious activity and mitigate risks before any incident occur. Your background should include exposure to security technologies including firewalls, IPS/IDS, logging, monitoring and vulnerability management. You should understand network security practices. Excellent customer service while solving problems should be a top priority for you. Main Responsibilities Tier 2 SOC analysts are incident responders, remediating serious attacks escalated from Tier 1, assessing the scope of the attack, and affected systems, and collecting data for further analysis. Work proactively to seek out weaknesses and stealthy attackers, review vulnerability assessments (CVEs) on monitored assets. Focus more on doing deep dives into datasets to understand what's happening during and after attacks. Monitor security events from the various SOC entry channels (SIEM, Tickets, Email and Phone), based on the security event severity and suspicious activities, escalate to managed service support teams, tier 3 information security specialists, and/or customer as appropriate to perform further investigation and resolution. Works as a Team lead for the SOC Analysts helping them to ensure that corporate data and technology platform components are safeguarded from known threats. Analyse the Events & incidents and identify the root cause. Assist in keeping the SIEM platform up to date and contribute to security strategies as an when new threats emerge. Staying up to date with emerging security threats including applicable regulatory security requirements. Bring enhancements to SOC security process, procedures, and policies. Document and maintain customer build documents, security procedures and processes. Document incidents to contribute to incident response and disaster recovery plans. Review critical incident reports and scheduled weekly & monthly reports and make sure they are technically and grammatically accurate. Keep updated with new threats, vulnerabilities, create/contribute to use cases, threat hunting etc. Other responsibilities and additional duties as assigned by the security management team or service delivery manager Requirements: Min 3 Years Experience as SOC Analyst (Experience in SIEM Tool ELK & Wazuh preferable) Process and Procedure adherence General network knowledge and TCP/IP Troubleshooting Ability to trace down an endpoint on the network, based on ticket information Familiarity with system log information and what it means Understanding of common network services (web, mail, DNS, authentication) Knowledge of host-based firewalls, Anti-Malware, HIDS Understanding of common network device functions (firewall, IPS/IDS, NAC) General Desktop OS and Server OS knowledge TCP/IP, Internet Routing, UNIX / LINUX & Windows. Excellent written and verbal communication skills Skills: Excellent event or log analytical skills Proven experience as IT Security Monitoring or similar role Exceptional organizing and time-management skills Very good communication abilities ELK, Wazuh, Splunk, ArcSight SIEM management skills Reporting

Posted 2 weeks ago

Apply

14.0 - 20.0 years

15 - 20 Lacs

Pune

Hybrid

So, what’s the role all about? We are looking for a highly skilled and motivated Site Reliability Engineering (SRE) Manager to lead a team of SREs in designing, building, and maintaining scalable, reliable, and secure infrastructure and services. You will work closely with engineering, product, and security teams to improve system performance, availability, and developer productivity through automation and best practices. How will you make an impact? Build server-side software using Java Lead and mentor a team of SREs; support their career growth and ensure strong team performance. Drive initiatives to improve availability, reliability, observability, and performance of applications and infrastructure. Establish SLOs/SLAs and implement monitoring systems, dashboards, and alerting to measure and uphold system health. Develop strategies for incident management, root cause analysis, and postmortem reporting. Build scalable automation solutions for infrastructure provisioning, deployments, and system maintenance. Collaborate with cross-functional teams to design fault-tolerant and cost-effective architectures. Promote a culture of continuous improvement and reliability-first engineering. Participate in capacity planning and infrastructure scaling. Manage on-call rotations and ensure incident response processes are effective and well-documented. Work in a fast-paced, fluid landscape while managing and prioritizing multiple responsibilities Have you got what it takes? Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. 10+ years of overall experience in SRE/DevOps roles, with at least 2 years managing technical teams. Proficiency in at least one programming language (e.g., Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell). Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc) Experience with infrastructure as code tools such as CloudFormation, Terraform. Deep understanding of CI/CD concepts and experience with CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI. Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK). Working experience of Grafana Observability Suite (Loki, Mimir, Tempo). Experience in implementing OpenTelemetry protocol in Microservice environment. Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems. Experience of Incident management and blameless postmortems that includes driving the incident response efforts during outages and other critical incidents, resolution, and communication in a cross-functional team setup. Good to have skills: Handson experience of working with large Kubernetes Cluster. Certification will be an added plus. Administration and/or development experience of standard monitoring and automation tools such as Splunk, Datadog, Pagerduty Rundeck. Familiarity with configuration management tools like Ansible, Puppet, or Chef. Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent.

Posted 2 weeks ago

Apply

8.0 - 10.0 years

12 - 17 Lacs

Bengaluru

Work from Office

Job Overview We are looking for a visionary Lead DevOps Engineer with a strong background in architecting scalable and secure cloud-native solutions on AWS. This leadership role will drive DevOps strategy, design cloud architectures, and mentor a team of engineers while ensuring operational excellence and reliability across infrastructure and deployments. The ideal candidate will: Architect and implement scalable, highly available, and secure infrastructure on AWS. Define and enforce DevOps best practices across CI/CD, IaC, observability, and container orchestration. Lead the adoption and optimization of Kubernetes for scalable microservices infrastructure. Develop standardized Infrastructure as Code (IaC) frameworks using Terraform or CloudFormation. Champion automation at every layer of infrastructure and application delivery pipelines. Collaborate with cross-functional teams (Engineering, SRE, Security) to drive cloud-native transformation. Provide technical mentorship to junior DevOps engineers, influencing design and implementation decisions. Primary Skills Bachelor's degree in Computer Science, Information Technology, or a related field. 7+ years of DevOps or Cloud Engineering experience with strong expertise in AWS. Proven experience designing and implementing production-grade cloud architectures. Hands-on experience with containerization and orchestration (Docker, Kubernetes). Proficient in building CI/CD workflows using Jenkins and/or GitHub Actions. Deep understanding of Infrastructure as Code using Terraform (preferred) or CloudFormation. Strong scripting/automation expertise in Python or Go. Familiarity with service mesh, secrets management, and policy as code (e.g., Istio, Vault, OPA). Strong problem-solving and architectural thinking skills. Excellent verbal and written communication skills with a track record of technical leadership. AWS Certified Solutions Architect (Professional/Associate), CKA/CKAD, or Terraform Associate is a plus. Good to Have Skills Exposure to AI & ML Exposure to cloud cost optimization and FinOps practices. Roles and Responsibilities Lead the architecture and implementation of scalable, secure, and cost-efficient infrastructure solutions on AWS. Define Kubernetes cluster architecture, implement GitOps/ArgoCD-based deployment models, and manage multi-tenant environments. Establish and maintain standardized CI/CD pipelines with embedded security and quality gates. Build and maintain reusable Terraform modules to enable infrastructure provisioning at scale across multiple teams. Drive observability strategy across all services, including metric collection, alerting, and logging with tools like Prometheus, Datadog, CloudWatch, and ELK. Automate complex operational workflows and disaster recovery processes using Python/Go scripts and native AWS services. Review and approve high-level design documents and support platform roadmap planning. Mentor junior team members and foster a culture of innovation, ownership, and continuous improvement. Stay abreast of emerging DevOps and AWS trends, and drive adoption of relevant tools and practices.

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

vadodara, gujarat

On-site

As a member of our team at Automation Anywhere, you will have the opportunity to work with cutting-edge AI-powered process automation solutions that drive productivity and innovation across organizations. Our Automation Success Platform, which incorporates specialized AI and generative AI, offers a wide range of services including process discovery, RPA, process orchestration, document processing, and analytics. We prioritize security and governance in all our offerings to ensure our clients have a robust and reliable solution. Your primary responsibilities will include designing, developing, and implementing hybrid cloud environments. You will be tasked with deploying and automating infrastructure and platform services in Public Clouds such as AWS, GCP, and Azure using tools like Terraform and Ansible. Additionally, you will design and manage continuous deployment using Kubernetes and Jenkins, and implement backup, recovery, and business continuity processes. Ensuring industry standard security processes and compliance is a crucial part of your role, utilizing services in Public Cloud like AWS GuardDuty, Web Application Firewall, and Cloudtrail. Monitoring environments for security vulnerabilities, application performance, and service incidents will be essential, and you will work on troubleshooting and root cause analysis using tools like Jira Service Desk, Datadog, Elastic Search, and Opsgenie. You will also set up intelligent application performance alerts, collaborate with software engineering teams, and keep abreast of new technologies to enhance our systems. Your skills should include a solid understanding of cloud-based web applications, experience with Public Cloud Deployments, Docker, Kubernetes, automation tools like Terraform and Ansible, networking, security technologies, continuous deployment tools, and logging and monitoring tools. Good communication skills, the ability to work independently, and a degree in Computer Science are essential for this role. If you are passionate about driving innovation and unleashing human potential through AI-powered automation, we encourage you to apply for this exciting opportunity at Automation Anywhere.,

Posted 3 weeks ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of cloud and big data platforms. Your role will involve representing the NADP SRE team, working in a dynamic environment, and providing technical leadership in defining and executing the team's technical roadmap. Collaborating with cross-functional teams, including software development, product management, customers, and security teams, is essential. Your contributions will directly impact the success of machine learning (ML) and AI initiatives by ensuring a robust and efficient platform infrastructure aligned with operational excellence. In this role, you will design, build, and optimize cloud and data infrastructure to ensure high availability, reliability, and scalability of big-data and ML/AI systems. Collaboration with cross-functional teams will be crucial in creating secure, scalable solutions that support ML/AI workloads and enhance operational efficiency through automation. Troubleshooting complex technical problems, conducting root cause analyses, and contributing to continuous improvement efforts are key responsibilities. You will lead the architectural vision, shape the team's technical strategy and roadmap, and act as a mentor and technical leader to foster a culture of engineering and operational excellence. Engaging with customers and stakeholders to understand use cases and feedback, translating them into actionable insights, and effectively influencing stakeholders at all levels are essential aspects of the role. Utilizing strong programming skills to integrate software and systems engineering, building core data platform capabilities and automation to meet enterprise customer needs, is a crucial requirement. Developing strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices is also part of the role. Qualifications for this position include 8-12 years of relevant experience and a bachelor's engineering degree in computer science or its equivalent. Candidates should have the ability to design and implement scalable solutions with a focus on streamlining operations. Strong hands-on experience in Cloud, preferably AWS, is required, along with Infrastructure as a Code skills, ideally with Terraform and EKS or Kubernetes. Proficiency in observability tools like Prometheus, Grafana, Thanos, CloudWatch, OpenTelemetry, and the ELK stack is necessary. Writing high-quality code in Python, Go, or equivalent programming languages is essential, as well as a good understanding of Unix/Linux systems, system libraries, file systems, and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure, architecting software and infrastructure at scale, and certifications in cloud and security domains are beneficial qualifications for this role. Cisco emphasizes diversity and encourages candidates to apply even if they do not meet every single qualification. Diverse perspectives and skills are valued, and Cisco believes that diverse teams are better equipped to solve problems, innovate, and create a positive impact.,

Posted 3 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

noida, uttar pradesh

On-site

Join our Team About This Opportunity As a Senior Technical Engineer specializing in Zabbix, you will play a crucial role in managing Zabbix Enterprise monitoring tools and Grafana dashboards, in addition to scripting tasks. Your responsibilities will include: - Developing and maintaining cloud-based infrastructure provisioning for both customer and internal use. - Analyzing requirements and devising Zabbix monitoring solutions for new architectures, enhancements to existing setups, and integration processes. - Implementing configuration management as per the product owner's specifications and feedback from customers. - Enhancing and maintaining continuous integration and deployment solutions. - Monitoring customer and internal environments while collaborating with the security team to ensure compliance with security standards. - Identifying and resolving infrastructure and application issues by implementing appropriate remediation measures. - Exploring optimization possibilities in terms of cost-efficiency and performance. - Creating, modifying, and managing custom scripts to enhance monitoring capabilities. - Upgrading Zabbix to the latest version to stay current with industry standards. - Conducting routine health checks and audits to maintain the accuracy and effectiveness of the monitoring environment. - Documenting the architecture and collaborating with various teams such as development, integration, security, and QA. Qualifications and Skills required: - Total IT experience of 7-8+ years, with at least 3-4 years of relevant experience in Zabbix/APM tools, Elk, Jenkins, Grafana, and various applications like Webservers, Mail Servers, and Database Servers. - Proficiency in server technologies including DNS Server, DB servers, and NFS servers with expertise in NGINX, APACHE, SMTP, MYSQL, MariaDB, Postgres SQL, bind, MSSQL, ORACLE, and scripting languages like Bash, Python, or Perl. - Hands-on experience with database engines such as MariaDB, MongoDB, MySQL, and/or PostgreSQL, along with knowledge of Rest API. - Zabbix certification (Zabbix Certified Professional) would be considered a strong advantage. Join us in this challenging role where you will have the opportunity to leverage your expertise in Zabbix and related technologies to drive efficient monitoring solutions and ensure the smooth operation of our infrastructure.,

Posted 3 weeks ago

Apply

3.0 - 12.0 years

0 Lacs

punjab

On-site

You will be responsible for creating and implementing new threat detection content, rules, and use cases to deploy in the SIEM platform with different data sets such as Proxy, VPN, Firewall, DLP, etc. In addition, you will assist with process development and process improvement for Security Operations by creating/modifying SOPs, Playbooks, and Work instructions. Your role will also involve developing custom content based on threat intelligence and threat hunting results, as well as identifying gaps in the existing security controls and proposing new security controls. Your expertise in SIEM Engineering and knowledge of integrating various log sources with any SIEM platform will be crucial. Furthermore, you will be expected to perform custom parsing of logs being ingested into the SIEM Platform. To succeed in this role, you should have at least 3 years of experience in Content development and experience in delivering and/or building content on any of the SIEM tools like Splunk, ArcSight, QRadar, Nitro ESM, etc. A deep understanding of the MITRE ATT&CK Framework is essential. Experience in SOC Incident analysis with exposure to information security technologies such as Firewall, VPN, Intrusion detection tools, Malware tools, Authentication tools, endpoint technologies, EDR, and cloud security tools is required. You should also have a good understanding of networking concepts and experience in interpreting, searching, and manipulating data within enterprise logging solutions. In this role, you will be expected to have an in-depth knowledge of security data logs and the ability to create new content on advanced security threats as per Threat Intelligence. You should be able to identify gaps in the existing security controls and have experience in writing queries/rules/use cases for security analytics on platforms like ELK, Splunk, or any other SIEM platform. Familiarity with EDR tools like Crowdstrike and understanding of TTPs like Process Injection are desirable. Excellent communication, listening, facilitation skills, investigative mindset, and problem-solving abilities are essential for this role. Preferred qualifications include understanding of the MITRE ATT&CK framework, demonstrable experience in Use case/rule creation on any SIEM Platform, and familiarity with Chronicle Backstory, YARA, or Crowdstrike rules.,

Posted 3 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

chennai, tamil nadu

On-site

As an ELK Engineer/Developer at our company in Chennai, you will be an integral part of the Cyber Defence Group. Your primary responsibility will involve comprehending customer requirements and then designing, implementing, and operating the ELK Platform. Your mandatory skill set should include a solid understanding of ELK with a focus on Cybersecurity. Your daily tasks and responsibilities will revolve around Design & Implementation as well as ELK Operations. In terms of Design & Implementation, you will need to grasp customer requirements, architect scalable ELK solutions, develop High-Level Design (HLD) and Low-Level Design (LLD) documentations, install ELK components, and configure them according to best practices. Under ELK Operations, you will lead Log onboarding activities, configure various components such as Logstash, FileBeats, MetricsBeats, elastic agent for efficient data collection and processing, optimize Elasticsearch components for data storage and availability, configure Kibana visualizations, handle user management activities, troubleshoot platform issues, collaborate with OEMs for problem resolution, document troubleshooting activities, and ensure health monitoring. Preferred qualifications for this role include at least 5 years of experience in deploying and managing large-scale ELK solutions for enterprise customers, prior experience in SOC analysis or Incident response teams, a strong grasp of cybersecurity technologies, protocols, and applications, ELK certifications, and knowledge of Python scripting, Dockers, Kubernetes, and Ansible for Runbook Automation. If you are looking to join a dynamic team and utilize your ELK expertise in a cybersecurity-focused environment, we encourage you to apply for this exciting opportunity.,

Posted 3 weeks ago

Apply

2.0 - 6.0 years

0 Lacs

pune, maharashtra

On-site

The ideal candidate should have 2-3 years of experience in DevOps along with a mandatory 1-year experience in DevSecOps. The role requires working onsite in Pune and following a second shift from 2 PM to 10 PM IST. Key skills for this position include proficiency in Cloud Technology (Azure), Automation Tools (Azure Kubernetes & Terraform), CI/CD Pipelines (Jenkins and Azure DevOps), Scripting Language (Python), Monitoring tools (Prometheus / Grafana / Splunk / ELK), and Security tools (Azure active directory). Additionally, experience in AI and GenAI would be considered a strong advantage. The selected candidate should be available to start immediately within 2 weeks at maximum.,

Posted 3 weeks ago

Apply

7.0 - 12.0 years

15 - 17 Lacs

Bengaluru

Work from Office

Infrastructure Management CI/CD Implementation Automation Monitoring & Logging Security Integration Collaboration Troubleshooting

Posted 3 weeks ago

Apply

1.0 - 6.0 years

7 - 17 Lacs

Noida

Work from Office

Job Summary Site Reliability Engineers (SRE's) cover the intersection of Software Engineer and Systems Administrator. In other words, they can both create code and manage the infrastructure on which the code runs. This is a very wide skillset, but the end goal of an SRE is always the same: to ensure that all SLAs are met, but not exceeded, so as to balance performance and reliability with operational costs. As a Site Reliability Engineer II, you will be learning our systems, improving your craft as an engineer, and taking on tasks that improve the overall reliability of the VP platform. Key Responsibilities: Design, implement, and maintain robust monitoring and alerting systems. Lead observability initiatives by improving metrics, logging, and tracing across services and infrastructure. Collaborate with development and infrastructure teams to instrument applications and ensure visibility into system health and performance. Write Python scripts and tools for automation, infrastructure management, and incident response. Participate in and improve the incident management and on-call process, driving down Mean Time to Resolution (MTTR). Conduct root cause analysis and postmortems following incidents and champion efforts to prevent recurrence. Optimize systems for scalability, performance, and cost-efficiency in cloud and containerized environments. Advocate and implement SRE best practices, including SLOs/SLIs, capacity planning, and reliability reviews. Required Skills & Qualifications: 1+ years of experience in a Site Reliability Engineer or similar role. Excellent communicaiton skills in English. Proficiency in Python for automation and tooling. Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, New Relic, Open Telemetry, etc. Experience with log aggregation and analysis tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd. Good understanding of cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes). Familiarity with infrastructure-as-code (Terraform, Ansible, or similar). Strong debugging and incident response skills. Knowledge of CI/CD pipelines and release engineering practices.

Posted 3 weeks ago

Apply

2.0 - 5.0 years

3 - 6 Lacs

Bengaluru

Work from Office

Position Purpose ODIN is a Data Warehouse application and it is changing its overall architecture and migrating to new efficient data model, Also few of the technical upgrade are due in this year. We are looking for the mid experience team member in the team, who can learn the BAU quickly and come up with fresh ideas in the team and contribute in designing and development of the statistical models. The team is in charge of the maintenance and the development of new features on 3 reporting applications as well as a Data Warehouse. These applications are designed to provide KPI and reports for back-office and valuation and risk control teams. Responsibilities Direct Responsibilities o Understand the need of Business/Operations teams and propose indicators to meet the expectations. o Work on DWH environment o Propose, when relevant, the architecture & flow of new applications that would help them integrate with existing ones and with the whole data architecture being developed o Develop, when relevant, different components of this architecture/flow of the applications. o Automation of recurring actions and implementation of tools to facilitate the support and evolution of applications o Contribution to the evolution of the application architecture to improve the quality of service provided to users (Operations, Business) Contributing Responsibilities o Support and maintenance in operational condition of various applications focused on the collection and analysis of technical and functional data o The technical environment consists of OS such as Windows, Linux Red Hat / Ubuntu, various technologies (Kafka, MQ, API, ELK) and a DevOps part using Ansible, Kubernetes, Git, Jenkins and all of the Atlassian tools. o Creation of processes to onboard new projects (Infrastructure and Business) o Study and advice on the technological choice for evolutions and new projects o Implementation and monitoring of action plans to avoid the recurrence of problems o Technical and functional data analysis (Data Intelligence, Machine Learning, ) o Technical & Behavioral Competencies Mandatory Technical expertise required: - Oracle Database SQL Development (+++) - Advanced PL/SQL which includes Partitions, performance tuning, local & global indexing, Dynamic SQL, Exception Handling and Bulk data handling, etc. - ETL concepts - Shell scripting Nice to have Technical skills: - Data Warehousing - Oracle APEX - Python - Statistical/Machine Learning Modelling. Behavioral Skills - Curiosity and analysis skills and willingness to learn. - Good Communication skills. - Quality-focused with a good eye for detail. - Capacity to work in a high-pressure environment. - Must be able to work closely with distributed team, users and business analysts. - Willing to share knowledge and skills with other developers within the team. - Whilst able to work independently, should be a true team player. - Creativity and Problem solving attitude.

Posted 3 weeks ago

Apply

7.0 - 9.0 years

20 - 22 Lacs

Chennai

Work from Office

Java ,Spring, ELK, Java Multithreading 7 to 10 years of experience in Java / Spring boot development. Solid understanding of Java Multithreading. Good exposure to ELK usage and ELK APIs. Exposure to CI/CD infra preferably Concourse. Ability to lead teams and lease with customer directly.

Posted 3 weeks ago

Apply

3.0 - 8.0 years

20 - 25 Lacs

Mumbai

Work from Office

ODIN is a Data Warehouse application and it is changing its overall architecture and migrating to new efficient data model, Also few of the technical upgrade are due in this year. We are looking for the mid experience team member in the team, who can learn the BAU quickly and come up with fresh ideas in the team and contribute in designing and development of the statistical models. The team is in charge of the maintenance and the development of new features on 3 reporting applications as well as a Data Warehouse. These applications are designed to provide KPI and reports for back-office and valuation and risk control teams. Responsibilities Direct Responsibilities o Understand the need of Business/Operations teams, and propose indicators to meet the expectations. o Work on Big data environment. o Propose, when relevant, the architecture & flow of new applications that would help them integrate with existing ones and with the whole data architecture being developed o Develop, when relevant, different components of this architecture/flow of the applications. o Collect data from multiple various sources in ELK stack to calculate the new indicators. The indicators could be results from time series prediction, clustering, or other supervised or unsupervised machine learning algorithms o Automation of recurring actions and implementation of tools to facilitate the support and evolution of applications o Contribution to the evolution of the application architecture to improve the quality of service provided to users (Operations, Business) Contributing Responsibilities o Support and maintenance in operational condition of various applications focused on the collection and analysis of technical and functional data The technical environment consists of OS such as Windows, Linux Red Hat / Ubuntu, various technologies (Kafka, PostgreSQL, Hadoop, ELK, Qlik, API) and a DevOps part using Docker, Ansible, Kubernetes, Git, Jenkins and all of the Atlassian tools. Creation of processes to onboard new projects (Infrastructure and Business) Study and advice on the technological choice for evolutions and new projects Implementation and monitoring of action plans to avoid the recurrence of problems Technical and functional data analysis (Data Intelligence, Machine Learning, ) Technical & Behavioral Competencies Mandatory Technical expertise required: Oracle (+++) / T-SQL & PLSQL (+++) (preferably) - including optimization of stored procedures, queries, partitioning, local and global indexes, etc. Nice to have Technical skills: DWH Python Reporting tools and ETL ELK Stack Kibana ML Libraries Git and all of the Atlassian tools. Statistical/Machine Learning Modelling. Behavioral Skills Curiosity and analysis skills and willingness to learn. Good Communication skills. Quality-focused with a good eye for detail. Capacity to work in a high-pressure environment. Must be able to work closely with distributed team, users and business analysts. Willing to share knowledge and skills with other developers within the team. Whilst able to work independently, should be a true team player. Creativity and Problem solving attitude. Specific Qualifications (if required) Skills Referential Behavioural Skills : (Please select up to 4 skills) Decision Making Organizational skills Critical thinking Communication skills - oral & written Transversal Skills: Ability to understand, explain and support change Ability to develop and adapt a process Ability to develop others & improve their skills Analytical Ability Ability to inspire others & generate people's commitment Education Level: Bachelor Degree or equivalent Experience Level At least 3 years

Posted 3 weeks ago

Apply

3.0 - 8.0 years

4 - 8 Lacs

Bengaluru

Work from Office

BNP Paribas is looking for dynamic and highly motivated individuals for the role of Application Production Support. The person will be responsible to provide level 1 and level 2 functional and technical support. The role will be challenging and will involve high level of commitment and pro-activeness to maintain the 24*7 availability of the applications while being leveraging on other sites in follow the sun model. Job profile at a glance, the role will mainly focus on the functional and technical support of in-scope applications including but not limited to monitoring, troubleshooting, resolving and communicating the production events. The event can be of various type and nature, ranging from simple application error or the infrastructure issue related to server crash. The main responsibility will be to resolve user queries and provide functional support while being leveraging on the systems technical architecture knowledge. Provide technical support by ensuring the smooth execution of batches, delivering the production change; manage Incidents and problem within the in-scope applications. The role also needs to ensure high availability of the application. Ensure that all relevant events and issues are recorded and resolved in timely manner. Responsibilities Direct Responsibilities Act as first line of contact for end users to register their queries and concerns and provide Level1 and Level2 functional and technical resolutions. Coordinate with Level3, Infra or ADM Teams as applicable for quicker resolution of issues/incidents whilst ensuring the timely communications to end users. Monitor batches with the help of provided tools and act proactively on failed batches while being ensuring that the preventive actions were implemented to avoid the future failures. Pro-actively monitor, manage and improve availability and performance of the production environments from presentation and application layers to middleware and databases with the help of provided tools. Create and maintain documentation about issue resolutions and process guidelines for easy resolution of future issues. Ensure logging all issues raised by user or intercepted by monitoring tool in the Banks ticket management system Service Now. Monitor recurrent incidents and perform problem management to ensure permanent fix with the help of IT partners. Coordinate with Infrastructure teams on events of patching & upgradation of servers to ensure the applications are stable & running after the infra work. Active participation in Disaster Recovery exercise to validate the resilience of production ecosystem for business. Explore opportunities of innovation and automation while in line with all policies Ensuring the timely delivery of change in staging and production ecosystem as per the agreed schedule. Handle access management for end users. Perform capacity management of in-scope applications by proactively monitoring the application behavior. Alerts Application Production and Development teams of any potential risks in the future Provide feedback and propose solutions to management on performance improvement and capacity. Customize production tools (monitoring, batch scheduling, backups, deployment tolls, automation) Adopt CIB Standard tools, industrialize monitoring, industrialize Release management whilst seeking reduction of dependency on manual interventions by support staff, e.g leveraging enterprise batch scheduling and enterprise monitoring tools Collaborate efficiently with Dev and release management team to automate release delivery respecting the DEVOPS best practices. Contributing Responsibilities Increasing productivity of team and company as a whole by striving for excellence. Motivating self and the team to take on new tasks. Challenge existing process for continuous improvements. Technical & Behavioral Competencies Technical Skills Must have Strong experience with Windows or Linux based infrastructure platforms Experience with Windows or Linux Server administration Ability to troubleshoot and solve main Linux issues, like identifying process blocking specific port, searching for a specific value in files, ... Strong experience with databases such as Oracle DB or Microsoft SQL Server or any other Database Ability to create detailed technical documents / videos for sharing knowledge Good to have (at least 3 from the below) Experience with IIS Administration / Weblogic /Webshpere administration Ability to troubleshoot IIS / .NET main issues Ability to create / maintain PowerShell scripts. Ability to create / maintain bash script. RHEL experience is an asset. Ability to troubleshoot performance issues Ability to monitor database performances, using monitoring tools or SQL queries Experience with Java application servers like IBM WebSphere or Apache Tomcat. Ability to configure a Java Application Server, notably:JVM settings,HTTP Connector configuration,Database Connection Pool,MQ Connection Pool Experience with Java/ .NET applications Ability to troubleshoot memory issues Perform Memory and Thread dump with Java monitoring tools (Dynatrace, Geneos) Ability to troubleshoot performance issues Investigate application slowness with proper monitoring tools (Dynatrace, Autosys, Geneos) Experience with a scripting languages such as Python, Groovy, Bash, PowerShell. Ability to write a small sample application to validate configuration / connectivity Ability to write / maintain deployments scripts written in Ansible (Python) and Jenkins Pipeline (Groovy) Ability to write / maintain scripts in Bash or PowerShell, to execute directly on Linux / Windows hosts Experience with ELK stack Ability to collect data and send them to Elastic search Ability to create reports and dashboards in Kibana Behavioral Skills Must have Ability to provide step-by-step functional and technical help, both written and verbal. Ability to diagnose and troubleshoot technical and functional issues. Strong skills on coordination, assignment, challenge the status quo and be proactive. Ability to Collaborate /Team Work in multi-location distributed environment. Experience in problem-solving and communication skills Ability to train / mentor juniors in the team. Transformational mindset with strong motivation for automations. Enthusiastic to work in challenging environment. Decision Making Client focused. Transversal Skills Ability to understand, explain and support change Analytical Ability Ability to develop others & improve their skills Ability to develop and leverage networks Specific Qualifications (if required) Skills Referential Behavioural Skills : (Please select up to 4 skills) Ability to collaborate / Teamwork Decision Making Ability to deliver / Results driven Ability to share / pass on knowledge Transversal Skills: Ability to understand, explain and support change Ability to develop and adapt a process Analytical Ability Ability to set up relevant performance indicators Ability to develop others & improve their skills Education Level: Bachelor Degree or equivalent Experience Level At least 3 years

Posted 3 weeks ago

Apply

15.0 - 20.0 years

9 - 14 Lacs

Mumbai

Work from Office

This position is for Site reliability Engineer within Client Engagement and Protection APS team. The primary purpose is to be accountable for all core engineering / transformation activities of ISPL Transversal CEP APS Responsibilities Direct Responsibilities Automate away toil using a combination of scripting, tooling, and process improvements Drive transformation strategies involving infrastructure hygiene / end of life Implementing new technologies or processes to improve efficiency and reduce costs eg:- CI/CD implementation Monitoring system performance and capacity levels to ensure high availability of applications with minimal downtime Investigating any service disruptions or other service issues to identify their causes Performing regular audits of computer systems to check for signs of degradation or malfunction Developing and implementing new methods of measuring service quality and customer satisfaction Conducting capacity planning to ensure that new technologies can be accommodated without impacting existing users Conducting post-mortem examinations of failed systems to identify and address root cause Drive various Automation, Monitoring & Tooling common purpose initiatives across CEP APS and other teams within CIB APS Accountable for generation, reporting and improvements of various Production KPIs, SLs and dashboards for APS teams Accountable for improvements in service and presentations for all governances and steering committees Accountable for maintenance and improvement of IT continuity plans (ICP) Contributing Responsibilities Technical & Behavioral Competencies Strong knowledge of DevOps methodology and toolsets Strong knowledge of Cloud based applications/services Strong knowledge of APM Tools i.e. Dynatrace / AppDynamics Strong Distributed Computing and Database technologies skillset Strong knowledge of Jenkin, Ansible, Python, Scripting etc. Good understanding of Log aggregators i.e. Splunk/ELK Good understanding of observability tools i.e. Grafana / Prometheus Ability to work with various APS, Development, Operations stakeholders, locally and globally Dynamic, proactive and teamwork oriented Independent, self-starter and fast learner Good communication and interpersonal skills Practical knowledge of change, incident & problem management tools Innovative and transformational mindset Flexible attitude Ability to perform under pressure Strong analytical skills Preferred to have ITIL Dockers/Kubernetes Prior knowledge on Site Reliability Engineering / Dev-Ops / Application Production Support / Development background Specific Qualifications (if required) Graduate in any discipline or Bachelor in Information Technology 15 of IT experience Skills Referential Behavioural Skills : Ability to collaborate / Teamwork Creativity & Innovation / Problem solving Ability to deliver / Results driven Communication skills - oral & written Transversal Skills: Ability to manage a project Ability to set up relevant performance indicators Ability to anticipate business / strategic evolution Ability to develop and adapt a process Analytical Ability Education Level: Bachelor Degree or equivalent Experience Level At least 15 years

Posted 3 weeks ago

Apply

15.0 - 20.0 years

7 - 11 Lacs

Bengaluru

Work from Office

Responsibilities Direct Responsibilities Essential: High skills on SAN Storage technologies (VMAX/PMAX, Pure Storage, Brocade). High skills on NAS Storage technologies (NetApp, VNX). High Skills on S3 (Cleversafe and ECS). Overall Storage Experience 15+ years Team and collaborative mind-set to develop and deliver appropriate solutions skills. Experience of delivering strategic priorities within strict timelines. Customer service-oriented skills. Capacity to learn new technologies in an autonomous way. Contributing Responsibilities DEVOPS oriented: High development skills Scripting on Python / Bash / PowerShell, code versioning tools and CD/CI Desirable: Skills on EMC Unity. Big data knowledge: ELK and Distributed workload. Stateful Container and Kubernetes knowledge. Technical & Behavioral Competencies 1. Personal Attributes Eager to learn. Analytical mind-set. Ability to work well under pressure, ability to work autonomous and in team environment. Good interpersonal and communication skills. Curious and open mind person. Specific Qualifications (if required) Bachelors in computer science preferable but not mandatory. ITIL Foundation certificate will be desirable. LANGUAGES High English spoken and written fluent (B2/C). Mandatory . Skills Referential Behavioural Skills : Choose an item. Choose an item. Choose an item. Choose an item. Transversal Skills: Choose an item. Choose an item. Choose an item. Choose an item. Choose an item. Education Level: Choose an item. Experience Level Choose an item.

Posted 3 weeks ago

Apply

8.0 - 13.0 years

11 - 15 Lacs

Mumbai

Work from Office

We are seeking an experienced Application Production Support having experience in platform support to join our team focused on production infrastructure management. The ideal candidate will have strong skills in automation, monitoring, troubleshooting, and incident management across a variety of tools and platforms. This position is ideal for an engineer who enjoys working in a dynamic, fast-paced environment and has a passion for production support, infrastructure optimization, automation and inclination towards problem solving Responsibilities Direct Responsibilities Automation & Scripting: Drive automation initiatives using Ansible/Shell scripting, and/or Python to optimize operational workflows, deployments, and system configuration. Middleware & Application Debugging: Monitor, debug, and maintain platform i.e. IIS / Apache to ensure stable application operations and uptime. Support production migrations to cloud/ virtual machines to enhance system performance and reliability. Production Support & Incident Management: Manage and resolve incidents on IIS / Apache, and database components, focusing on areas such as long-running queries and server performance. Perform root cause analysis (RCA) and implement corrective actions to prevent future occurrences. Handle MQ troubleshooting and performance issue/failures. Lead deployments and manage certificate renewals, ensuring seamless deployments and minimizing downtime. Monitoring & Alerting: Set up and manage Dynatrace alerts, analyze performance during incidents, create and maintain dashboards to provide real-time insights. Exposure building custom dashboards in Grafana for production system visibility. Capacity Planning: Analyze current and projected application capacity to ensure adequate resources are provisioned. Plan for capacity upgrades and scaling strategies to meet future demand. Log Management & Analysis: Exposure to ELK/OpenSearch for log analysis, dashboarding and troubleshooting, enabling faster root cause identification and resolution. File Transfer as a Service: Manage and support secure, reliable file transfer solutions across production systems. Documentation & Process Management: Skilled in documenting processes, incident reports, and application configurations for reference and compliance. Strong attention to detail to maintain accurate and up-to-date KeDB. Exposure to Cloud & Containerization Knowledge: Provide high-level support and guidance on cloud architecture, Kubernetes, and containerized environments, enhancing system scalability and modernization. Exposure to DevOps & CI/CD Knowledge: Familiarity with DevOps practices and tools (e.g., Jenkins) for automated deployments and configuration management. Understanding of CI/CD pipelines and version control to manage application releases and updates. Technical & Behavioral Competencies Required Skills and Qualifications: Proven experience in Application Production Support / Platform Management. Strong knowledge of monitoring / Log aggregation tools, including Dynatrace, Geneos, Grafana and the ELK stack. Hands-on experience with automation using Ansible / Shell scripting /Power Shell/and Python. Proficiency in managing incidents and performing root cause analysis on IIS / Apache, and database environments. Familiarity with Jenkins for continuous integration and deployment, as well as certificate management and renewal processes. Exposure to SQL skills for data extraction, debugging, and performance tuning. Exposure of cloud architectures, Kubernetes, and containerized infrastructure. Preferred to have ITIL Dockers/Kubernetes Prior knowledge on Application Production Support / Platform / Development background Skills Referential Behavioural Skills : Ability to deliver / Results driven Communication skills - oral & written Creativity & Innovation / Problem solving Personal Impact / Ability to influence Transversal Skills: Ability to develop and adapt a process Ability to anticipate business / strategic evolution Ability to manage / facilitate a meeting, seminar, committee, training Ability to understand, explain and support change Ability to develop others & improve their skills

Posted 3 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies