The Production Infrastructure Manager is responsible for overseeing and maintaining the infrastructure that powers our payment gateway systems in a high-availability production environment. This role requires deep technical expertise in cloud platforms, networking, and security, along with strong leadership capability to guide a team of infrastructure engineers. You will ensure the system’s reliability, performance, and compliance with regulatory standards while driving continuous improvement.
Key Responsibilities
Infrastructure Management
- Manage and optimize infrastructure for payment gateway systems to ensure high availability, reliability, and scalability.
- Oversee daily operations of production environments, including AWS cloud services, load balancers, databases, and monitoring systems.
- Implement and maintain infrastructure automation, provisioning, configuration management, and disaster recovery strategies.
- Develop and maintain capacity planning, monitoring, and backup mechanisms to support peak transaction periods.
- Oversee regular patching, updates, and version control to minimize vulnerabilities.
Team Leadership
- Lead and mentor a team of infrastructure engineers and administrators.
- Provide technical direction to ensure efficient and effective implementation of infrastructure solutions.
Cross-Functional Collaboration
- Work closely with development, security, and product teams to ensure infrastructure aligns with business needs and regulatory requirements (PCI-DSS, GDPR).
- Ensure infrastructure practices meet industry standards and security requirements (PCI-DSS, ISO 27001).
Monitoring & Incident Management
- Monitor infrastructure performance using tools like Prometheus, Grafana, Datadog, etc.
- Conduct incident response, root cause analysis, and post-mortems to prevent recurring issues.
- Manage and execute on-call duties, ensuring timely resolution of infrastructure-related issues.
Documentation
- Maintain comprehensive documentation, including architecture diagrams, processes, and disaster recovery plans.
Required
Skills and Qualifications
- Bachelor’s degree in Computer Science, IT, or equivalent experience.
- 8+ years of experience managing production infrastructure in high-availability, mission-critical environments (fintech or payment gateways preferred).
- Expertise in AWS cloud environments.
- Strong experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
- Deep understanding of:
- Networking (load balancers, firewalls, VPNs, distributed systems)
- Database systems (SQL/NoSQL), HA & DR strategies
- Automation tools (Ansible, Chef, Puppet) and containerization/orchestration (Docker, Kubernetes)
- Security best practices, encryption, vulnerability management, PCI-DSS compliance
- Experience with monitoring tools (Prometheus, Grafana, Datadog).
- Strong analytical and problem-solving skills.
- Excellent communication and leadership capabilities.
Preferred
- Experience in fintech/payment industry with regulatory exposure.
- Ability to operate effectively under pressure and ensure service continuity.
Skills:- Linux/Unix, SQL, Shell Scripting, Amazon Web Services (AWS), CI/CD, Jenkins and Customer Support