Job
Description
Role & responsibilities: Cloud Infrastructure Management - Design, implement, and maintain scalable and secure AWS cloud solutions - Provision and manage AWS resources including EC2 instances, S3 buckets, and VPCs - Implement auto-scaling solutions to ensure resource availability during peak demand Containerization and Orchestration - Design and implement containerization strategies using Docker - Develop and maintain Dockerfiles for application containerization - Set up and manage Kubernetes clusters on AWS (e.g., using Amazon EKS) - Implement Kubernetes deployments, services, and ingress configurations - Optimize container resource allocation and performance CI/CD Pipeline Development - Design, develop, and manage CI/CD pipelines using tools like AWS CodePipeline or Jenkins - Integrate Docker and Kubernetes into CI/CD workflows for containerized applications - Automate build, test, and deployment processes to accelerate release cycles - Monitor CI/CD processes for potential failures and bottlenecks Observability and Traceability Implementation - Design and implement comprehensive observability solutions across the entire application stack - Set up and maintain distributed tracing systems (e.g., Jaeger, Zipkin) for microservices architectures - Implement logging solutions for cloud resources and containerized applications - Develop and maintain metrics collection and visualization using tools like Prometheus and Grafana - Create and maintain dashboards for real-time system health monitoring - Implement alerting mechanisms to proactively identify and address issues Cost Optimization - Analyze AWS spending using tools like AWS Cost Explorer - Implement cost-saving measures such as utilizing Reserved Instances for long-running workloads - Optimize container resource usage to minimize cloud expenses - Continuously optimize cloud and container resource usage to minimize expenses while maintaining performance Security and Compliance - Automation and Scripting - Develop scripts and tools to automate repetitive tasks and streamline operations - Utilize Infrastructure as Code (IaC) tools like CloudFormation or Terraform - Create Kubernetes manifests and Helm charts for application deployment Monitoring and Troubleshooting - Set up and maintain monitoring systems using tools like AWS CloudWatch and Kubernetes- native solutions - Leverage observability data for efficient troubleshooting and performance optimization - Perform root cause analysis and resolve production issues promptly in both cloud and container environments Collaboration and Communication - Work closely with development and operations teams to integrate DevOps, containerization, and observability practices - Provide technical guidance and mentorship to team members on AWS, Docker, Kubernetes, and observability best practices. Skills & Qualifications: -Bachelor's degree in Computer Science or related field - Extensive experience with AWS services and cloud architecture - Strong knowledge of Linux-based infrastructure - Proficiency in scripting languages such as Python, Ruby, or Bash - Experience with CI/CD tools like Jenkins, GitLab CI, or AWS Code Pipeline - Expertise in Docker containerization and Kubernetes orchestration - Experience with container registries and image management - Familiarity with service mesh technologies (e.g., Istio) - Strong background in implementing and maintaining observability solutions - Experience with distributed tracing systems (e.g., Jaeger, Zipkin) - Proficiency in metrics collection and visualization tools (e.g., Prometheus, Grafana) - Knowledge of database management (MySQL, MongoDB) - Strong problem-solving and troubleshooting skills - Excellent communication and teamwork abilities - AWS certifications are a plus Additional Skills: - Experience with configuration management tools (Ansible, Puppet, Chef) - Knowledge of Agile methodologies - Familiarity with version control systems, preferably Git - Experience with Helm for Kubernetes package management - Knowledge of container security best practices - Familiarity with log aggregation and analysis tools (e.g., ELK stack).