Role :- Data Engineer
Experience:
Role Overview
Data Engineer with deep AWS expertise
You will design, build, and maintain scalable data pipelines, implement secure cloud data architectures, optimize storage and compute performance, and ensure end-to-end operational reliability of our SaaS platform.
AI, vector search, or modern data processing techniques
Key Responsibilities:-
AWS-Centric Data Engineering
- Design and implement
ETL/ELT pipelines
using AWS Glue, EMR, Lambda, and Step Functions - Develop optimized
Athena queries, partitions, data catalog management
, and metadata workflows - Manage and optimize
Amazon S3 data lakes
(performance tuning, lifecycle policies, cost optimization) - Set up and maintain
PostgreSQL/RDS
for operational and analytical workloads - Implement
VPC, subnets, security groups, NAT gateways, and private networking
for secure data operations
Data Pipeline & Platform Operations
- Build highly scalable batch and real-time data workflows
- Develop connectors for diverse data sources (databases, cloud storage, SaaS APIs)
- Implement data quality validation, monitoring, and alerting across pipelines
- Ensure high availability, reliability, and performance of data systems for our SaaS platform
- Set up
CI/CD pipelines
, infrastructure automation using Terraform
, and containerized workflows using Docker/Kubernetes
AI & Advanced Processing (Good to Have)
- Basic knowledge of embeddings, tokenization, and vector DBs (OpenSearch, Pinecone, Weaviate, Chroma)
- Experience with document processing, OCR, semantic chunking, or RAG pipelines
- Integration exposure with LLM services (OpenAI, Bedrock, Databricks)
Required Technical Skills
AWS Services (Primary Focus):
EMR, Glue, Athena, S3, RDS/Postgres, Lambda, IAM, VPC, CloudWatchData Engineering:
Python (pandas, NumPy), SQL, ETL/ELT design, data warehousing, data modelingDistributed Processing:
Spark on EMR (mandatory), Ray (optional)Streaming Systems:
Kafka or Kinesis (good to have)DevOps & Infra:
Docker, Kubernetes, Terraform, GitHub Actions, CI/CD pipelinesAI/Data Tools (Advantage):
Vector databases, MLflow/DVC, embedding generation, RAG pipelines
Required Experience
- 3-5 years of experience as a Data Engineer
- Strong hands-on experience with
AWS analytics & data services
- Experience managing and optimizing
S3-based data lakes
- Strong experience with
PostgreSQL/RDS
- Prior work on secure cloud architectures (IAM, VPC, subnets)
- Experience with ETL/ELT pipeline design for high-volume workloads
- Exposure to SaaS platform operations or multi-tenant architectures is a plus
Key Skills for Success
- Strong analytical and problem-solving mindset
- Deep understanding of AWS data ecosystem
- Ability to work in a fast-paced startup environment
- Ownership mentality and willingness to build from the ground up
- Collaborative approach with cross-functional teams
What We Offer
- Work on cutting-edge Gen AI products solving real-world problems
- Collaborate with AWS on innovative AI initiatives
- Competitive salary + equity in growing AI company
- Professional growth in rapidly evolving AI landscape
- Flexible, innovative culture valuing creativity and ownership