Jobs
Interviews

3 Victorops Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

7.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

About the Role We’re looking for an MLOps Engineer to build and operate reliable, secure, and scalable ML/LLM infrastructure—from data ingestion and training pipelines to model serving, monitoring, and continuous improvement. You’ll partner with Data Science, Platform, and Security teams to ship models to production with strong SLAs, observability, and cost control. Responsibilities Productionize models end-to-end: automate data ingestion, feature engineering, training, evaluation, packaging, and deployment (batch & real-time). Model serving & orchestration: design/operate low-latency model endpoints and batch jobs using Kubernetes, Docker, job schedulers, and serving frameworks. CI/CD for ML: implement reproducible pipelines (code, data, features, models) with unit/integration tests, approvals, and canary/blue-green rollouts. Monitoring & reliability: build drift, performance, and data-quality monitors; set alerts and on-call runbooks; drive incident response and postmortems. Observability: instrument tracing/logging/metrics (e.g., OpenTelemetry, Prometheus, Grafana) across data flows and model requests. Model registry & governance: manage lineage, versioning, approvals, and audit trails; enforce security (IAM, secrets management) and compliance controls. Cost & capacity management: optimize GPU/CPU usage, autoscaling, caching, batching, quantization, and instance right-sizing. LLM & RAG pipelines (nice if applicable): stand up vector databases, retrieval flows, prompt/version management, guardrails, and evaluations. Collaboration & enablement: create templates, docs, and self-service tooling for data scientists and app teams. Required Qualifications 3–7 years in MLOps/Platform/DevOps/SRE roles supporting ML in production. Strong with Python and one of Go/TypeScript/Bash ; proficiency in Docker and Kubernetes . Experience building ML pipelines with tools like Airflow/Prefect/Kedro/Flyte/Metaflow . CI/CD expertise (GitHub Actions/GitLab/Jenkins/Argo), including artifact/version management and automated testing. Data stack: object storage (S3/GCS/Azure Blob), data warehouses/lakes, message queues/streams (Kafka/PubSub), and caching layers. Monitoring/observability: Prometheus, Grafana, ELK/EFK, alerting (PagerDuty/VictorOps), tracing (OpenTelemetry/Jaeger). Security fundamentals: IAM, network policies, secrets (Vault/SSM), image signing, SBOMs. Solid understanding of ML lifecycle: data versioning, feature stores, experiment tracking, evaluation, and rollback.

Posted 1 day ago

Apply

7.0 years

0 Lacs

Mumbai, Maharashtra, India

Remote

JOB DESCRIPTION This position is accountable for the support of RTDS infrastructure and applications. The primary responsibilities include deploying and overseeing real-time software and infrastructure on the cloud/on-prem, ensuring system maintenance to optimize performance and uptime for end-users, and upholding a secure environment. The role also involves designing and implementing new technologies to meet client requirements. As an advanced technical position, it demands expert judgment and analysis in the design, development, and implementation of technical products and systems. The incumbent will be adept at resolving highly complex technical issues, conducting advanced research, and recommending process improvements to enhance product quality. Familiarity with various concepts, practices, and procedures in the field is essential. The successful candidate will leverage their extensive experience and sound judgment to effectively plan and achieve organizational goals. RESPONSIBILITIES SAFETY, SECURITY & COMPLIANCE Maintains the highest standards of corporate governance, ensuring that all activities are conducted ethically and in compliance with Company’s Security, Compliance & HSE policies, Management System, relevant laws, regulations, standards, and industry practices and complies with the Company’s Rules to Live By Places Quality, Health & Safety, Security, and protection of the Environment as core values while never intentionally placing employees, our processes, customers, or the communities in which we live and work at risk Seeks continual improvement in Health, Safety, Security & protection of the Environment, taking into account responsible care, process vulnerabilities, public, customer and employee inputs, knowledge and technology and best business practices to exceed customer expectations Supervisors & Managers should demonstrate effective safety leadership for the health and safety arrangements of all subordinates and for any persons visiting them while on the Company premises QUALITY Work to maintain ISO27001:2022 certification: Monthly data and reports for ISMS Committee Meetings Audit support Follows and enforces Processes and Procedures Responsible for maintaining the infrastructure including back-ups and Disaster Recovery Responsible for being familiar with the Company's Quality policies and takes an active role in the compliance and improvement of Weatherford’s Management System Maintains service quality as immediate priorities when working across all areas of the business and continually seeks areas for improvement. OPERATIONS Work on a support team on day-to-day basis with Proactive and results oriented approach, provide technical assistance. Assist deployment team members on designing the best possible real time solution for clients on rig, on client site or in data center and on Cloud. Ensure that the architecture and infrastructure on which the application will be deployed are robust and stable. Design and architect production systems. Re-engineering production systems when new technologies become available to increase performance and reliability. Follow deployment plan and schedules, ensuring alignment with project timelines and objective. Ensure the product/application has been correctly and completely integrated across the program. Validate that the product has been correctly packaged before deployment and ensure that all release controls have been satisfied. Follow deployment plans and schedules, ensuring alignment with project timelines and objectives. Coordinate deployment activities with stakeholders, including operations teams, system administrators, and third-party vendors. Conduct pre-deployment testing to identify and address potential issues, ensuring smooth integration with existing systems. Troubleshoot deployment issues and implement corrective actions in a timely manner to minimize downtime. Provide technical guidance and support, ensuring adherence to best practices and standards. Document deployment processes, configurations, and procedures for future reference and knowledge sharing. Continuously evaluate and improve deployment workflows to optimize efficiency, scalability, and reliability. Stay updated on industry trends and emerging technologies related to real-time systems deployment. Identify and address all security concerns/incidents. Participate in training on IT and software components of real time systems. Resolve all issues escalated by IT and Operations teams and escalate if needed. Provide detailed KPI reports as required for management. COMMUNICATION Know and understand Weatherford Quality Policy and comply with all requirements of the Quality Systems Manual, Operating and Technical Procedures and Workplace Instructions. Maintains effective communications with all key stakeholders both internal and where appropriate external. FINANCIAL All employees have an accountability to the organization to be financially responsible whether they oversee a function budget or simply their own expenses. Costs incurred should be within approved budget, processed within agreed time frames & following the relevant financial policy and procedure. QUALIFICATIONS Experience & Education Required Minimum 7-12+ years related experience Must have Engineering Degree in Computer OR Information Technology Certifications in relevant technologies (e.g., AWS Certified DevOps Engineer, Kubernetes Certified Administrator). Experience with continuous integration/continuous deployment (CI/CD) pipelines and automated testing frameworks. Knowledge of cybersecurity principles and best practices for securing real-time systems. Familiarity with monitoring and logging tools for performance optimization and troubleshooting. Preferred Knowledge of ISO27001:2022 requirements: Audit support Follows and enforces Processes and Procedures Responsible for maintaining the infrastructure including back-ups and Disaster Recovery Knowledge, Skills & Abilities. Required Knowledge, Skills & Abilities: Experience in IT Infrastructure services, worked with multiple technologies and involved in support and implementation for IT infrastructure related projects and given remote support to clients Identify and implement backup and disaster recovery solutions for mission critical data and applications Demonstrated high level of responsibility for researching, purchasing and configuring any equipment related to Information Technology Interface extensively with Top-tier management, staff, peers, users and other business partners Strong problem solving and analytical abilities and strong written/verbal communication skills Cloud Computing: Proficiency in cloud platforms like AWS, Azure, or Google Cloud for deploying and managing real-time applications. DevOps Tools: Experience with tools like Docker, Kubernetes, Ansible, or Terraform for containerization, orchestration, and automation of deployment processes. Continuous Integration/Continuous Deployment (CI/CD): Knowledge of CI/CD pipelines to automate testing, building, and deploying code changes rapidly and reliably. Networking: Understanding of networking concepts and protocols for configuring and optimizing real-time communication systems. Monitoring and Logging: Ability to set up monitoring tools like Prometheus, Grafana, or ELK stack for tracking system performance and troubleshooting issues in real-time. Security: Knowledge of security best practices for real-time systems, including encryption, authentication, and access control mechanisms. Scripting and Automation: Proficiency in scripting languages like Python, Bash, or PowerShell for automating deployment tasks and managing infrastructure as code. Database Management: Understanding of databases like Microsoft SQL, PostgresDB, MongoDB, Redis for storing and processing real-time data. Version Control: Experience with version control systems like Git for managing code changes and collaborating with team members effectively. Problem-Solving Skills: Ability to troubleshoot complex issues quickly and effectively in a real-time environment. Team Collaboration: Strong communication and collaboration skills to work effectively with cross-functional teams and stakeholders. Worked on ITSM/ITIL Practices/Processes (understand ITIL concepts – however, ITIL certification is not required) Familiarity with Load Balancing concepts, Server Clustering, hypervisor technology (Hyper-V, VMware) Good understanding on High Availability tools (SQL FCI, Cluster, Mirroring etc.) Strong Delegation, Time management, Conflict resolution skills and proven experience of leading a team of 6-10 personnel with diverse experiences. Windows Technologies – Windows OS, Active Directory, MSSQL, IIS, Clustering, Load Balancing, WSUS, MDT Cloud Infrastructure Management Monitoring/alerting Technologies – Victorops, Whatsup Gold, Sensu, Grafana, Statuscake Use Tools/apps like Git, bitbucket, confluence, Jira for source control, document and project sharing Networking Technologies – DNS and DHCP Servers, TCP/IP protocol suite Virtualization Technologies Ticketing Systems – Zendesk, Jira, DevOps Travel Requirement: This role may require domestic and potentially international travel of up to:

Posted 2 weeks ago

Apply

7.0 years

0 Lacs

Navi Mumbai, Maharashtra, India

Remote

JOB DESCRIPTION This position is accountable for the support of RTDS infrastructure and applications. The primary responsibilities include deploying and overseeing real-time software and infrastructure on the cloud/on-prem, ensuring system maintenance to optimize performance and uptime for end-users, and upholding a secure environment. The role also involves designing and implementing new technologies to meet client requirements. As an advanced technical position, it demands expert judgment and analysis in the design, development, and implementation of technical products and systems. The incumbent will be adept at resolving highly complex technical issues, conducting advanced research, and recommending process improvements to enhance product quality. Familiarity with various concepts, practices, and procedures in the field is essential. The successful candidate will leverage their extensive experience and sound judgment to effectively plan and achieve organizational goals . RESPONSIBILITIES Infrastructure & Deployment Design, implement, and support real-time infrastructure on cloud platforms (AWS, Azure) and on-premises environments. Maintain stable and high-performing systems architecture to support operational excellence. Build and manage CI/CD pipelines to automate deployment, testing, and updates. Conduct system integration, validation, and pre-deployment testing to ensure seamless delivery. System Operations & Support Monitor, troubleshoot, and resolve complex technical issues with minimal downtime. Implement disaster recovery, data backups, and high-availability solutions. Maintain and document technical configurations, deployment procedures, and incident reports. Collaborate with operations teams, system administrators, and vendors to ensure effective support and alignment. Security, Compliance & Quality Enforce cybersecurity best practices including encryption, access control, and vulnerability mitigation. Support ISO27001:2022 certification efforts through audits, reporting, and process adherence. Ensure compliance with company standards, HSE policies, and data protection regulations. Continuously seek opportunities to improve system reliability, performance, and quality. QUALIFICATIONS Experience & Education Required Minimum 7-12+ years related experience Must have Engineering Degree in Computer OR Information Technology Certifications in relevant technologies (e.g., AWS Certified DevOps Engineer, Kubernetes Certified Administrator). Experience with continuous integration/continuous deployment (CI/CD) pipelines and automated testing frameworks. Knowledge of cybersecurity principles and best practices for securing real-time systems. Familiarity with monitoring and logging tools for performance optimization and troubleshooting. Preferred Knowledge of ISO27001:2022 requirements: Audit support Follows and enforces Processes and Procedures Responsible for maintaining the infrastructure including back-ups and Disaster Recovery. Required Knowledge, Skills & Abilities: Experience in IT Infrastructure services, worked with multiple technologies and involved in support and implementation for IT infrastructure related projects and given remote support to clients Identify and implement backup and disaster recovery solutions for mission critical data and applications Demonstrated high level of responsibility for researching, purchasing and configuring any equipment related to Information Technology Interface extensively with Top-tier management, staff, peers, users and other business partners Strong problem solving and analytical abilities and strong written/verbal communication skills Cloud Computing: Proficiency in cloud platforms like AWS, Azure, or Google Cloud for deploying and managing real-time applications. DevOps Tools: Experience with tools like Docker, Kubernetes, Ansible, or Terraform for containerization, orchestration, and automation of deployment processes. Continuous Integration/Continuous Deployment (CI/CD): Knowledge of CI/CD pipelines to automate testing, building, and deploying code changes rapidly and reliably. Networking: Understanding of networking concepts and protocols for configuring and optimizing real-time communication systems. Monitoring and Logging: Ability to set up monitoring tools like Prometheus, Grafana, or ELK stack for tracking system performance and troubleshooting issues in real-time. Security: Knowledge of security best practices for real-time systems, including encryption, authentication, and access control mechanisms. Scripting and Automation: Proficiency in scripting languages like Python, Bash, or PowerShell for automating deployment tasks and managing infrastructure as code. Database Management: Understanding of databases like Microsoft SQL, PostgresDB, MongoDB, Redis for storing and processing real-time data. Version Control: Experience with version control systems like Git for managing code changes and collaborating with team members effectively. Problem-Solving Skills: Ability to troubleshoot complex issues quickly and effectively in a real-time environment. Team Collaboration: Strong communication and collaboration skills to work effectively with cross-functional teams and stakeholders. Worked on ITSM/ITIL Practices/Processes (understand ITIL concepts – however, ITIL certification is not required) Familiarity with Load Balancing concepts, Server Clustering, hypervisor technology (Hyper-V, VMware) Good understanding on High Availability tools (SQL FCI, Cluster, Mirroring etc.) Windows Technologies – Windows OS, Active Directory, MSSQL, IIS, Clustering, Load Balancing, WSUS, MDT Cloud Infrastructure Management Monitoring/alerting Technologies – Victorops, Whatsup Gold, Sensu, Grafana, Statuscake Use Tools/apps like Git, bitbucket, confluence, Jira for source control, document and project sharing Networking Technologies – DNS and DHCP Servers, TCP/IP protocol suite Virtualization Technologies Ticketing Systems – Zendesk, Jira, DevOps Travel Requirement: This role may require domestic and potentially international travel of up to: India- ABOUT US Weatherford is a leading global energy services company. Our world-class experts partner with customers to optimize their resources and realize the full potential of their assets. Across our operating locations, including manufacturing, research and development, service, and training facilities, operators choose us for strategic solutions that add efficiency, flexibility, and responsibility to any energy operation. When you join Weatherford, you instantly feel connected to something bigger – a community that is grounded by our core values and driven to create innovative solutions for our customers. We celebrate each other’s successes, grow together, and learn from each other constantly. Individually, we are impressive. Together, we are unstoppable. We are One Weatherford. Weatherford is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law.

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies