Do you like collaborating across teams to solve complex problems?
Do you enjoy solving large scale distributed content delivery challenges?
Join our critical Platform and Reliability Engineering Team!
The Platform & Reliability Engineering team defines, measures, and optimizes key performance indicators for Akamai's global network. This role involves analyzing complex systems and identifying critical metrics for customer satisfaction. Collaboration with various teams addresses intricate problems requiring innovative solutions. Expertise in software engineering and systems administration supports building resilient and robust infrastructure.
Partner with the best
As a Senior Site Reliability Engineer, influence teams through expertise and support decision-making processes. Develop automation to streamline daily operations, escalations, and proactive monitoring efforts.As a Senior Site Reliability Engineer, you will be responsible for:
- Working on Internet technologies to improve the performance, availability, and scalability of large distributed content delivery systems
- Design,Implement and tune monitoring and observability systems to meet defined SLIs and SLOs.
- Acting as an escalation point for support, platform and product teams to ensure system issues are resolved
- Leading incident response, utilizing coding, data analysis, network diagnostics, and debugging tools for distributed systems.
- Collaborating with support, operations, and engineering teams, investigating issues, and implementing solutions to prevent recurrence.
- Collaborating with engineering and product teams to enhance reliability, scalability, performance, and usability of offerings.
- Identifying potential problems and creating scalable solutions to ensure continuous improvements to QOS
- Staying updated on advancements in cloud computing, DevOps, and SRE best practices.
Do What You Love
To be successful in this role you will:
- Require expertise in Computer Science, Engineering, or related field with 5+ experience in Site Reliability Engineering.
- Demonstrate expertise in coding and scripting languages such as C/C++, Python, Bash, JavaScript, etc.
- Demonstrate expert troubleshooting in UNIX/Linux environments, emphasizing scalability and reliability in distributed systems.
- Demonstrate expertise in internet protocols and networking: DNS, HTTP/HTTPS, UDP, TCP/IP, TLS/SSL.
- Gain expertise using observability tools like Prometheus, Grafana, ADBMS, Datadog for SLI/SLO management.
- Working knowledge in cloud platforms (Azure, Databricks).
Build your career at Akamai
Our ability to shape digital life today relies on developing exceptional people like you. The kind that can turn impossible into possible. We’re doing everything we can to make Akamai a great place to work. A place where you can learn, grow and have a meaningful impact.With our company moving so fast, it’s important that you’re able to build new skills, explore new roles, and try out different opportunities. There are so many different ways to build your career at Akamai, and we want to support you as much as possible. We have all kinds of development opportunities available, from programs such as GROW and Mentoring, to internal events like the APEX Expo and tools such as Linkedin Learning, all to help you expand your knowledge and experience here.
Learn more
Not sure if this job is the right match for you or want to learn more about the job before you apply? Schedule a 15-minute exploratory call with the Recruiter and they would be happy to share more details.