As a Principal Software Developer, you will design, build, and deliver scalable automation frameworks and advanced platforms leveraging AI/ML to drive operational excellence across OCI s global network. This includes building network event driven data (such as failures), hybrid classification, and both training and inference. You are passionate about developing software that solves real-world operational challenges, thrive in a fast-paced team, and are comfortable working with complex distributed systems. You value simplicity, scalability, and collaboration.
Responsibilities:
- Architect, build, and support distributed systems for process control and execution based on Product Requirement Documents (PRDs).
- Develop and sustain DevOps tooling, new product process integrations and automated testing.
- Develop ML in Python 3; build backend services in Go (Golang); create command-line interface (CLI) tools in Rust or Python 3; and integrate with other services as needed using Go, Python 3, or C.
- Build and maintain schemas/models to ensure every platform and service write is captured for monitoring, debugging and compliance
- Build and maintain dashboards that monitor the quality and effectiveness of service execution for "process as code" your team delivers.
- Build automated systems that route code failures to the appropriate oncall engineers and service owners.
- Ensure high availability, reliability, and performance of developed solutions in production environments.
- Support serverless workflow development for workflows which call and utlize the above mentioned services support our GNOC, GNRE, and onsite operations and hardware support teams.
- Participate in code reviews, mentor peers, and help build a culture of engineering excellence.
- Operate in an Extreme Programming (XP) asynchronous environment (chat/tasks) without daily standups, and keep work visible by continuously updating task and ticket states in Jira.
Required Qualifications:
- 8 - 10 years of experience in process as code, software engineering, automation development, or similar roles
- Bachelors in computer science and Engineering or related engineering fields
- Strong coding skills in Go and Python3
- Experience with distributed systems, micro-services, and cloud-native technologies
- Proficiency in Linux environments and scripting languages
- Proficiency with database creation, maintenance and code using SQL and Go or Py3 libraries
- Understanding of network operations or large-scale IT infrastructure
- Excellent problem-solving, organizational, and communication skills
- Experience using AI coding assistants or AI-powered tools to help accelerate software development, including code generation, code review, or debugging.
Preferred Qualifications:
- Process engineering experience (control systems, proportional integral derivatives (pid), statistical process control (SPC))
- Proficiency with data modeling, data analysis, and reporting frameworks (e.g., SQL, Spark, Prometheus, Grafana, etc.)
- Experience with C, Cpp, Java, or Rust
- Experience developing automation and tools for network or scale cloud operations
- Background in creating dashboards, alerts, and real-time reporting platforms
- Familiarity with workflow automation (e.g., Apache Airflow), CI/CD pipelines, or infrastructure as code
- Previous experience supporting or building tools for (any) hyperscale or scale could network, compute, or storage operations.
- Knowledge of REST APIs, remote procedure calls (RPCs), and service oriented architectures (SOA)
- Familiarity with eXtreme programming (xp), agile, and devops process
- Experience with ticketing and version control systems (e.g., Jira, Git)