Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
4.0 - 8.0 years
0 Lacs
pune, maharashtra
On-site
As a Senior Systems Engineer specializing in Data DevOps/MLOps, you will play a crucial role in our team by leveraging your expertise in data engineering, automation for data pipelines, and operationalizing machine learning models. This position requires a collaborative professional who can design, deploy, and manage CI/CD pipelines for data integration and machine learning model deployment. You will be responsible for building and maintaining infrastructure for data processing and model training using cloud-native tools and services. Your role will involve automating processes for data validation, transformation, and workflow orchestration, ensuring seamless integration of ML models into production. You will work closely with data scientists, software engineers, and product teams to optimize performance and reliability of model serving and monitoring solutions. Managing data versioning, lineage tracking, and reproducibility for ML experiments will be part of your responsibilities. You will also identify opportunities to enhance scalability, streamline deployment processes, and improve infrastructure resilience. Implementing security measures to safeguard data integrity and ensure regulatory compliance will be crucial, along with diagnosing and resolving issues throughout the data and ML pipeline lifecycle. To qualify for this role, you should hold a Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field, along with 4+ years of experience in Data DevOps, MLOps, or similar roles. Proficiency in cloud platforms like Azure, AWS, or GCP is required, as well as competency in using Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible. Expertise in containerization and orchestration technologies like Docker and Kubernetes is essential, along with a background in data processing frameworks such as Apache Spark or Databricks. Skills in Python programming, including proficiency in data manipulation and ML libraries like Pandas, TensorFlow, and PyTorch, are necessary. Familiarity with CI/CD tools such as Jenkins, GitLab CI/CD, or GitHub Actions, as well as understanding version control tools like Git and MLOps platforms such as MLflow or Kubeflow, will be valuable. Knowledge of monitoring, logging, and alerting systems (e.g., Prometheus, Grafana), strong problem-solving skills, and the ability to contribute independently and within a team are also required. Excellent communication skills and attention to documentation are essential for success in this role. Nice-to-have qualifications include knowledge of DataOps practices and tools like Airflow or dbt, an understanding of data governance concepts and platforms like Collibra, and a background in Big Data technologies like Hadoop or Hive. Qualifications in cloud platforms or data engineering would be an added advantage.,
Posted 4 days ago
5.0 - 9.0 years
0 Lacs
karnataka
On-site
As a Senior Site Reliability Engineer for the Operational Readiness team at HashiCorp, you play a crucial role in enhancing the scalability, performance, and reliability of our cloud products. With over 5 years of experience in site reliability engineering or a related field, you lead efforts to identify performance bottlenecks, address operational challenges proactively, and ensure our services meet the highest standards of operational excellence. Your expertise in load testing, performance analysis, and system hardening is instrumental in maintaining the operational resilience of our enterprise and cloud-based products. You focus on ensuring high availability and performance across all of HashiCorp's offerings, with a holistic view of enterprise and cloud systems. In this role, you define and execute test plans, develop system-wide strategies for product load and performance testing, and explore new avenues to meet essential operational readiness criteria. You utilize troubleshooting techniques like Chaos engineering to identify and provide novel solutions for complex system issues that may impact customers. Key Responsibilities: - Implement best practices for system reliability, including proactive identification of potential failure points and automated mitigations. - Design and execute comprehensive load testing strategies to identify performance bottlenecks and scalability limits. - Improve system resilience by implementing best practices and technologies for high availability and fault tolerance. - Collaborate with engineering and product teams to integrate operational readiness into the development lifecycle. - Build tools and frameworks for automated testing, environment simulation, and incident reproduction to increase test coverage. - Analyze testing results, document findings, and make actionable recommendations for system enhancements. - Drive systemic improvements through Chaos Testing and work closely with product development teams. - Share knowledge and expertise with team members, promoting a culture of learning and continuous improvement. - Develop and implement disaster recovery and backup strategies to ensure data integrity and system resilience. Ideal Candidate: - 5+ years of experience in SRE, systems engineering, or non-functional testing roles with a focus on operational readiness and performance testing. - Proficiency in high-level programming languages or scripting. - Track record of leading successful load testing and performance optimization initiatives in cloud and on-prem environments. - Experience in creating and managing test environments for automated testing. - Strong understanding of CI/CD processes and maintaining quality pipelines. - Familiarity with version control systems (e.g., Git) and agile project management methodologies. - Knowledge of monitoring and alerting systems, with the ability to develop metrics and alarms reflecting system health and operational risks. - Technical foundation in cloud technologies (AWS, Azure, or GCP) and container technologies like Nomad or Kubernetes. - Experience with performance testing tools like K6, Artillery, Vegeta, Locust, etc. - Effective communication and collaboration skills with cross-functional teams and diverse audiences. - Familiarity with HashiCorp products and tools is a plus. - Exposure to the disaster recovery domain is also a plus.,
Posted 4 days ago
5.0 - 9.0 years
0 Lacs
pune, maharashtra
On-site
We are looking for a skilled and experienced Senior Network Engineer to join our team. The ideal candidate should have 5+ years of hands-on experience in managing and supporting enterprise network infrastructure, with expertise in routing, switching, firewalls, VPNs, and network security protocols. Your role will involve designing, implementing, and maintaining complex networks to ensure high availability, performance, and security for business-critical systems. It is essential to have experience with protocols like BGP, OSPF, and technologies such as load balancing and network monitoring tools. Experience with Juniper Hardware is a must, and relevant certifications such as CCNP, NSE, PCNSE, or JNCIA are preferred. As a Senior Network Engineer, your responsibilities will include managing IP connectivity and latency for all data center networks, configuring BGP transit and private peering, collaborating with the Architect team on networking solutions, and overseeing WAN installations. You will also be tasked with improving network performance, monitoring and scaling network bandwidth, implementing future service data centers, planning complex network upgrades and migrations, managing network security, and maintaining network hardware and software. We would like you to have at least five years of experience with Juniper Enterprise Routers and Switches, routing protocols such as BGP, OSPF, VRRP, and traffic engineering, network-based ACLs, policy-based routing, firewall management, NAT, VLANs, and switching. Your ability to generate and maintain technical documentation and network diagrams, work with carrier circuits, troubleshoot complex network issues, and contribute to monitoring and alerting systems is crucial. Experience with network automation tools, DevOps environments, and security protocols is highly desirable, along with operational knowledge of flow-based technologies, IPv6, and UNIX/Linux OS Networking. For this role, a Bachelor's degree in engineering or an equivalent degree from a well-known institute/university is required. PubMatic operates on a hybrid work schedule, with employees working three days in the office and two days remotely to maximize collaboration and productivity. Our benefits package includes paternity/maternity leave, healthcare insurance, broadband reimbursement, catered lunches, and more. Join PubMatic, a leading digital advertising platform, and be part of a team dedicated to providing transparent advertising solutions to publishers, media buyers, commerce companies, and data owners. Founded in 2006, PubMatic enables content creators to run a profitable advertising business that supports multi-screen and multi-format content demanded by consumers.,
Posted 4 days ago
3.0 - 7.0 years
0 Lacs
surat, gujarat
On-site
As a Golang Developer, you will be responsible for designing, developing, and maintaining efficient, reusable, and reliable Go code. Your role will involve implementing and integrating with back-end services, databases, and APIs while ensuring clean, scalable, and testable code following best practices and design patterns. Collaborating with cross-functional teams to define, design, and ship new features will be a key part of your responsibilities. You will also play a crucial role in optimizing application performance for maximum speed and scalability, as well as identifying and addressing bottlenecks and bugs, devising effective solutions to these issues. Staying up-to-date with the latest industry trends, technologies, and best practices is essential for excelling in this role. To be successful in this position, you should have proven experience as a Golang Developer or in a similar software development role. Proficiency in the Go programming language, including paradigms, constructs, and idioms, is a must. Experience with server-side development, microservices architecture, and RESTful APIs is required, along with familiarity with common Go frameworks and tools such as Gin. Knowledge of implementing monitoring, logging, and alerting systems, as well as experience with SQL and NoSQL databases like PostgreSQL, MySQL, and MongoDB, is essential. Understanding of code versioning tools such as Git, a strong grasp of concurrency and parallelism in Go, excellent problem-solving skills, and attention to detail are also key qualifications for this role. The ability to work effectively both independently and as part of a team is crucial for success in this position. Additionally, experience with cloud platforms like AWS, GCP, or Azure, as well as familiarity with containerization technologies such as Docker and Kubernetes, would be considered a bonus. Join us in this exciting opportunity to contribute to cutting-edge technology solutions and enhance your skills as a Golang Developer.,
Posted 1 week ago
2.0 - 6.0 years
0 Lacs
surat, gujarat
On-site
As a Golang Developer, your primary responsibility will be to design, develop, and maintain efficient, reusable, and reliable Go code. You will be required to implement and integrate with back-end services, databases, and APIs while ensuring that the code is clean, scalable, and testable following best practices and design patterns. Collaboration with cross-functional teams to define, design, and ship new features is crucial, along with optimizing application performance for maximum speed and scalability. Identifying and addressing bottlenecks and bugs, and devising solutions to these problems will also be part of your role. It is essential to stay up-to-date with the latest industry trends, technologies, and best practices. To qualify for this role, you should have proven experience as a Golang Developer or in a similar role in software development. Proficiency in the Go programming language, including paradigms, constructs, and idioms, is necessary. You must have experience with server-side development, microservices architecture, and RESTful APIs, along with familiarity with common Go frameworks and tools such as Gin. Knowledge of implementing monitoring, logging, and alerting systems, as well as experience with SQL and NoSQL databases like PostgreSQL, MySQL, and MongoDB, is required. Understanding of code versioning tools like Git, strong grasp of concurrency and parallelism in Go, excellent problem-solving skills, and attention to detail are essential. The ability to work effectively both independently and as part of a team is also a key requirement. Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with containerization technologies like Docker and Kubernetes would be considered a bonus. If you are passionate about Golang development and possess the necessary skills and qualifications, we encourage you to apply for this exciting opportunity.,
Posted 2 weeks ago
1.0 - 5.0 years
0 Lacs
karnataka
On-site
About HashiCorp HashiCorp solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications. Our products enable companies large and small to mix and match AWS, Microsoft Azure, Google Cloud, and other clouds as well as on-premises environments, easing their ability to deliver new applications. We use the Tao of HashiCorp as our guiding principles for product development and operate according to a strong set of company principles for how we interact with each other. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users. Our Team The HashiCorp Incident Excellence team is responsible for improving HashiCorps incident response while maximizing learning from incidents. Our focus is on helping all engineers feel confident when they are on-call and improving communication to efficiently resolve incidents and build trust in our brand. We partner closely with teams to drive a holistic incident management strategy and share learnings to help our business continuously improve. About This Role This engineering role is on a nascent engineering team. The team is responsible for products that touch many areas of engineering organizations at HashiCorp, so applicants will need to excel at collaboration, have product-focused mindsets, and be comfortable iterating in an agile manner towards solutions. You will provide expert execution of the incident command process, including running and managing high-severity incident bridges and driving transparent communication that promotes maximum levels of internal and external customer satisfaction. Collaborate with an array of technical stakeholders and executives to drive resolution during incidents and improve overall response for future incidents and technical escalations. Utilize top-notch troubleshooting techniques to identify, organize, and advocate for novel solutions to remediate customer impact on complex interconnected systems. Participate in a closed-loop post-incident learning process driving insights and meaningful action Iterative improvements in response through consistent drills, tabletops, and game-day exercises Push the boundaries of innovation in incident management to deliver best-in-class incident response. In This Role, You Can Expect To Be responsible for and drive incident management capabilities and culture. Contribute to incident command on-call Build technical skills and relationships within a team of engineers and SREs. Lead and refine our incident response strategy, ensuring rapid and effective response to operational disruptions. Analyze incident trends and root causes to drive continuous improvements in system reliability and response processes. Develop and maintain tools for incident detection, analysis, and resolution, automating responses where possible to minimize human intervention. Create comprehensive incident response documentation and conduct training sessions to prepare all relevant teams for effective incident handling. Work closely with development, operations, and security teams to coordinate incident response efforts and post-incident analyses. You may be a good fit for our team if: 5+ years of experience in site reliability engineering, systems administration, or software engineering, with a significant focus on incident response and operational reliability. 1+ years managing, coordinating, and ensuring resolution of major incidents. Professional experience with incident management in cloud environments. Enjoy working on a variety of scopes spanning software engineering, cloud infrastructure, and SRE. Proven track record of managing and resolving incidents in cloud-based environments, with expertise in major public cloud platforms (AWS, GCP, Azure). Understanding of fundamental network technologies like DNS, Load Balancing, SSL, TCP/IP, HTTP Strong understanding of monitoring and alerting systems, with the ability to develop metrics and alarms that accurately reflect system health and operational risks. Experience with incident management tools and practices, including post-mortem analysis and root cause investigation. Passion for consistently responding to and leading complex incidents in a 24x7x365 environment utilizing a globalized follow-the-sun model. Customer-centric attitude with a focus on providing best-in-class incident response for customers and stakeholders Familiarity with HashiCorps product suite and infrastructure automation tools is a plus. Demonstrate strong leadership skills during periods of significant business impact, remaining calm and professional during high-pressure situations A strong desire to drive customer success with partner teams and management on high-profile issues critical to the long-term success of the business Outstanding verbal and written communication skills with the ability to convey information in a meaningful way to both engineers and executive-level management, during and outside of incidents Adaptable to a wide variety of technologies and capable of incident response and troubleshooting activities in complex interconnected environments,
Posted 3 weeks ago
8.0 - 12.0 years
0 Lacs
karnataka
On-site
Job Summary We are seeking a highly skilled Sr. R2 Engineer with 8 to 10 years of experience in SRE DevOps and SRE Concepts. The ideal candidate will be responsible for ensuring the reliability and performance of our systems. This role is hybrid with day shifts and no travel required. The candidate will play a crucial role in maintaining our infrastructure and improving our operational processes. Responsibilities Lead the implementation of SRE DevOps practices to enhance system reliability and performance. Oversee the development and maintenance of automation tools to streamline operational processes. Provide expertise in SRE Concepts to design and implement robust monitoring and alerting systems. Collaborate with cross-functional teams to identify and resolve system issues promptly. Ensure the scalability and reliability of our infrastructure through continuous improvement initiatives. Develop and maintain documentation for operational procedures and best practices. Monitor system performance and implement proactive measures to prevent downtime. Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence. Mentor and guide junior engineers in SRE best practices and methodologies. Participate in on-call rotations to provide 24/7 support for critical systems. Drive the adoption of new technologies and tools to improve operational efficiency. Work closely with development teams to ensure seamless integration of new features and services. Contribute to the overall improvement of our SRE work model by providing feedback and suggestions. Qualifications * Must have extensive experience in SRE DevOps and SRE Concepts. Should possess strong knowledge of automation tools and practices. Must have experience in designing and implementing monitoring and alerting systems. Should be proficient in conducting root cause analysis and implementing corrective actions. Must have excellent problem-solving skills and the ability to work under pressure. Should have strong communication and collaboration skills. Nice to have experience in mentoring and guiding junior engineers. Should be familiar with the latest trends and technologies in SRE and DevOps. Must be able to work effectively in a hybrid work model. Should have a proactive approach to identifying and resolving system issues. Must be committed to continuous improvement and operational excellence. Should have a strong understanding of infrastructure scalability and reliability. Nice to have experience in participating in on-call rotations.,
Posted 3 weeks ago
3.0 - 7.0 years
0 Lacs
karnataka
On-site
As an Engineer at Goldman Sachs, you will have the opportunity to make a significant impact by connecting people and capital with innovative ideas. You will be part of a dynamic environment that requires innovative strategic thinking and immediate, real solutions. Join our engineering teams to work on building massively scalable software and systems, designing low latency infrastructure solutions, proactively guarding against cyber threats, and leveraging machine learning alongside financial engineering to transform data into actionable insights. Be a part of creating new businesses, transforming finance, and exploring a world of opportunity at the speed of markets. Goldman Sachs Engineers are known for their innovation and problem-solving skills, as they build solutions in areas such as risk management, big data, mobile technology, and more. We are looking for creative collaborators who can evolve, adapt to change, and thrive in a fast-paced global environment. As a developer in the Site Reliability Engineering (SRE) team at Goldman Sachs, you will play a crucial role in improving the availability and reliability of the firm's critical platform services. You will collaborate with internal teams to build and operate sustainable production systems that can adapt to the fast-paced, global business environment. The SRE team develops and maintains platforms and tools that help other engineering teams at Goldman Sachs to build and operate reliable and resilient systems. Your responsibilities will include creating and supporting frontends for monitoring and alerting to enhance the reliability of the platforms and tools operated by the SRE team. You will ensure the technical feasibility of UI/UX designs and focus on building user-friendly interfaces. Additionally, you will adhere to and drive SRE disciplines and processes across the global team. To excel in this role, you should have a degree in computer science or engineering with a minimum of 3 years of industry experience. Proficiency in ReactJS, TypeScript, JavaScript, HTML, and CSS is essential. Strong programming skills, problem-solving abilities, and the capacity to work independently are key requirements. You should be comfortable with technical ownership, managing multiple stakeholders, and collaborating with a global team. Excellent communication skills and the ability to work collaboratively are also important for this role. Preferred experience includes expertise in testing UI applications with frameworks like Cypress, diagnosing and performance tuning UI applications with tools like Lighthouse, modern authentication patterns (OAuth 2.0 and OpenID Connect), REST API design, knowledge of cloud native solutions in AWS or GCP, and experience with monitoring and alerting systems. Goldman Sachs is committed to fostering diversity and inclusion in the workplace and providing opportunities for professional and personal growth. Our culture emphasizes the importance of individual growth through training and development opportunities, firmwide networks, benefits, wellness programs, and mindfulness initiatives. If you require reasonable accommodations during the recruiting process due to special needs or disabilities, please let us know. Learn more about our commitment to diversity and inclusion at GS.com/careers.,
Posted 3 weeks ago
7.0 - 11.0 years
0 Lacs
thiruvananthapuram, kerala
On-site
The company Armada is an edge computing startup that specializes in providing computing infrastructure to remote areas with limited connectivity and cloud infrastructure. They also focus on processing data locally for real-time analytics and AI at the edge. Armada is dedicated to bridging the digital divide by deploying advanced technology infrastructure rapidly. As they continue to grow, they are seeking talented individuals to join them in achieving their mission. As a DevOps Lead at Armada, you will play a crucial role in integrating AI-driven operations into the DevOps practices of the company. Your responsibilities will include leading a DevOps team, designing scalable systems, and implementing intelligent monitoring, alerting, and self-healing infrastructure. The role requires a strategic mindset and hands-on experience with a focus on Ops AI. This position is based at the Armada office in Trivandrum, Kerala. As the DevOps Lead, you will lead the DevOps strategy with a strong emphasis on AI-enabled operational efficiency. You will architect and implement CI/CD pipelines integrated with machine learning models and analytics. Additionally, you will develop and manage infrastructure as code using tools like Terraform, Ansible, or CloudFormation. Collaboration is key in this role, as you will work closely with data scientists, developers, and operations teams to deploy and manage AI-powered applications. You will also be responsible for enhancing system observability through intelligent dashboards and real-time metrics analysis. Furthermore, you will mentor DevOps engineers and promote best practices in automation, security, and performance. To be successful in this role, you should have a Bachelor's or Master's degree in Computer Science, Engineering, or a related field. You should also have at least 7 years of DevOps experience with a minimum of 2 years in a leadership role. Proficiency in cloud infrastructure management and automation is essential, along with experience in AIOps platforms and tools. Strong scripting abilities, familiarity with CI/CD tools, and expertise in containerization and orchestration are also required. Preferred qualifications include knowledge of MLOps, experience with serverless architectures, and certification in cloud platforms. Demonstrable experience in building and integrating software and hardware for autonomous or robotic systems is a plus. Strong analytical skills, time-management abilities, and effective communication are highly valued for this role. In return, Armada offers a competitive base salary along with equity options for India-based candidates. If you are a proactive individual with a growth mindset, strong problem-solving skills, and the ability to thrive in a fast-paced environment, you may be a great fit for this position at Armada. Join the team and contribute to the success and growth of the company while working collaboratively towards achieving common goals.,
Posted 3 weeks ago
5.0 - 10.0 years
20 - 32 Lacs
Hyderabad
Work from Office
We are looking for a Senior Production Engineer with strong expertise in performance optimization, system security, infrastructure hardening, and DevOps practices. Required Candidate profile The ideal candidate will be responsible for maintaining high availability, scalability, and reliability of production systems, while ensuring security.
Posted 1 month ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough