Opkey | Series B Funded | Noida, India (In-Office) | Full-Time
The Opportunity
Opkey, a Series B funded enterprise application lifecycle management platform, is lookingfor a Senior Data Engineer to join our team in Noida. We need someone who can build andscale the data infrastructure—pipelines, storage systems, and processing engines—thatpowers our platform.We're not pitching a vision—we're scaling a reality. Our systems already process hundredsof gigabytes of enterprise data. Now we need an engineer who can make that infrastructurehandle 10x more, 10x faster, with bulletproof reliability. This is your chance to be part ofbuilding something that will define a category.
About Us
Opkey is redefining how enterprises manage the lifecycle of their most critical applications.We've built the platform that takes organizations from Design to Configure to Test to Train,powered by agentic AI.Our customers already include Fortune 500 companies and top global systemintegrators. They trust us with hundreds of gigabytes of their most sensitive enterprisedata—payroll files, configuration exports, test results—because we've proven we can handleit.We're already doing what others are only talking about. Our pipelines already processmassive payroll files in real-time. Our systems already normalize chaotic enterprise dataformats into clean, queryable structures. Our infrastructure already powers AI and analyticsthat enterprises depend on.Now we're scaling. And we need exceptional people to help us go from category creator tocategory leader.This is founder mode, not corporate mode. We move fast, we solve hard problems, andwe ship things that matter.
Why This Role Matters
Data scientists can't build models on broken pipelines. Analysts can't find insights in dirtydata. The entire intelligence layer of our platform depends on rock-solid data infrastructure.You'll build the foundation everything else depends on.You'll design the pipelines that ingest data from dozens of enterprise formats. You'll build thesystems that diff millions of records in seconds. You'll create the infrastructure that lets ourdata scientists focus on algorithms instead of wrestling with data quality.When a Fortune 500 company validates their payroll migration, your infrastructure makesthat possible. When our ML models predict configuration failures, they're running onpipelines you built.This is already happening at Opkey. You'll help us scale it to the world.
What You'll Do
You'll join a team that's already built production data infrastructure handling enterprise-scaleworkloads. Your job is to make it faster, more reliable, and ready for the next order of
Magnitude
- Build & Optimize Data Pipelines: Design and implement ETL/ELT pipelines that
ingest data from diverse enterprise sources—Excel files, CSVs, API exports,
database extracts, proprietary formats—and transform it into clean, queryablestructures.
- Design High-Performance Comparison Engines: Build systems that diff massive
datasets—payroll files with millions of records, configuration exports with thousands
of parameters—and surface differences in real-time.
- Architect Scalable Data Storage: Design and manage data warehouses, data
lakes, and databases that handle terabytes of enterprise data. Make decisions about
partitioning, indexing, and storage formats.
- Ensure Data Quality & Reliability: Implement validation, monitoring, and alerting
systems that catch data issues before they affect downstream consumers. Build self
healing, observable pipelines.
- Enable Analytics & ML Teams: Partner with data scientists to build the
infrastructure they need—feature stores, training data pipelines, model serving
infrastructure.
- Scale for Growth: Design systems that can handle 10x the data without 10x the cost
or complexity. Think ahead about bottlenecks and architect around them.
Skills & Qualifications
Required Technical Skills
- Python for Data Engineering: 4+ years of production experience writing clean,
maintainable, performant Python code for data processing and pipeline development
- SQL Mastery: Expert-level SQL—complex queries, query optimization,
understanding execution plans. You can look at a slow query and know how to fix it.
- Data Pipeline Development: Hands-on experience building ETL/ELT pipelines that
run reliably in production. You've designed pipelines that process millions of records
without failing.
- Distributed Computing: Deep knowledge of frameworks like Apache Spark for
large-scale data processing. You understand partitioning strategies, shuffle
optimization, and memory management.
- Data Modeling & Warehousing: Strong foundation in data modeling—star
schemas, slowly changing dimensions, normalization vs. denormalization tradeoffs.
- Database Technologies: Experience with relational databases (PostgreSQL,
MySQL) and data warehouses (Redshift, Snowflake, BigQuery). You know when touse each.
Nice to Have
- Experience with streaming data systems (Kafka, Kinesis)
- Cloud platform expertise (AWS, GCP, Azure)
- Knowledge of orchestration tools (Airflow, Dagster, Prefect)
- Background in data comparison/diffing algorithms
- Experience with containerization (Docker, Kubernetes)
- Exposure to enterprise data formats and systems
Mindset & Approach
- Reliability-Obsessed: You've been paged at 2am, and you've built systems that
don't page you at 2am. You understand what it takes to run production infrastructure.
- Systems Thinker: You see how individual components fit into the larger
architecture. You make tradeoffs that optimize for the whole system.
- Ownership Mentality: You don't treat data quality as someone else's problem. You
own the pipeline end-to-end—from ingestion to the data scientist's query.
- Pragmatic Engineer: You know when to build for flexibility and when to optimize for
performance. You don't chase shiny tools when proven ones work better.
- Founder Mentality: You thrive in ambiguity, make architectural decisions with
incomplete information, and care about outcomes over perfect documentation.
What We're NOT Looking For
- Engineers who only want to work with cutting-edge tools regardless of fit
- People who treat data quality as someone else's problem
- Those who need a detailed roadmap handed to them
- Candidates who've never owned production systems end-to-end
What We Offer
- Competitive salary + meaningful equity in a company that's already winning
- The chance to architect data infrastructure that Fortune 500 companies depend on
- A team that values speed, ownership, and results over politics
- Direct impact—your pipelines will process enterprise data at a scale most engineers
never see
- The opportunity to be part of history—building the data foundation that powers
how enterprises manage their most critical applications
We've proven our infrastructure works. Now we need someone to scale it tothe world.Apply with your resume and a brief note about the most challenging data pipeline you'vebuilt.Opkey is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusiveenvironment for all employees.Skills: python,sql,etl,apache spark,postgresql