Role Overview (Handson PlayerCoach) This is a handson leadership role. You will both lead and actively contribute code . Youll work closely with engineers—reviewing designs and PRs, pairing on tricky problems, and raising the bar on architecture, reliability, security, and performance. Key Responsibilities Own endtoend delivery of projects: requirements, scoping, design, implementation, testing, deployment, and operations. Architect scalable, faulttolerant systems for crawling, parsing, enrichment, and data processing. Write production code in Python; set standards via code reviews , design docs, and reference implementations. Mentor and grow engineers: coaching on design, coding best practices, observability, and operational excellence. Collaborate with stakeholders/clients: translate business needs into clear technical plans; manage risks and tradeoffs. Establish engineering best practices (branching strategy, CI/CD, testing strategy, security baselines, SLAs/SLOs, incident/RCA processes). Partner with QA/Ops to ensure quality gates, deployment hygiene, and oncall readiness. Drive exploration and adoption of GenAI/AIagent capabilities where they create clear value. Required Qualifications 7–11 years in software engineering, including 2–3+ years in a senior/lead capacity. Expert in Python ; strong command over data structures/algorithms , concurrency, and distributed systems concepts. Deep experience with SQL and NoSQL (plus schema design/modeling); familiarity with vector databases . Proven track record designing and shipping cloudnative systems on AWS (S3, Lambda, ECS/EKS, SQS/SNS, RDS/DynamoDB, CloudWatch, IAM). Significant experience building and operating crawlers/parsers and robust ETL/ELT pipelines. Strong proficiency with Git , testing strategies (unit/integration/e2e), observability (logging/metrics/tracing), and performance tuning. Excellent communication: produces highquality design docs and gives actionable, empathetic feedback. Preferred / Good to Have (Prioritized) GenAI & LLMs : LangChain, CrewAI, LlamaIndex , prompt design, RAG , evaluation; vector stores. (Strongly preferred and prioritized.) CI/CD & Containers : GitHub Actions/Jenkins, Docker , Kubernetes . Data Pipelines/Big Data : Airflow , Spark , Kafka , or equivalents. Infra as Code & Cloud Ops : Terraform/CloudFormation; security hardening, cost/performance optimization, capacity planning. Frontend/JS : not required ; basic JS or frontend exposure is nicetohave only. Exposure to GCP/Azure . Experience with interviewing, onboarding, and developing talent. What Success Looks Like Clear, incremental delivery with measurable reliability (SLOs) and strong documentation. Teams consistently ship highquality code under your guidance; juniors levelup through mentorship. Thoughtful tradeoffs that balance delivery speed, cost, security, and maintainability. WorkfromHome Requirements Highspeed internet for calls and collaboration. A capable, reliable computer (modern CPU, 8GB+ RAM). Headphones with clear audio quality. Stable power and backup arrangements. ForageAI is an equalopportunity employer. We value curiosity, craftsmanship, and collaboration.
We are seeking a Web Crawling Engineer who will be responsible for building and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping mechanisms. About Forage AI : Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured intelligence. Our platform combines web crawling, NLP, LLMs, and agentic AI to deliver highly accurate firmographic and enterprise insights across numerous domains. Trusted by global clients in finance, real estate, and healthcare, Forage AI enables businesses to automate workflows, reduce manual rework, and access high-quality data at scale. Key Responsibilities: Maintain and enhance existing web scraping and data crawling projects. Develop and refine crawlers using Python-based tools and frameworks. Utilize browser automation tools (e.g., Playwright, Selenium) to handle dynamic content. Clean, validate, and integrate extracted data into downstream storage systems. Implement and manage solutions for anti-bot measures (CAPTCHAs, IP rotation, etc.). Optimize crawling efficiency and ensure compliance with web crawling best practices. Collaborate with cross-functional teams to improve data acquisition strategies. Required Skills & Qualifications : Proficiency in Python and 2 years of work experience of web scraping frameworks (especially Scrapy). Strong knowledge of browser automation tools such as Playwright or Selenium. Solid understanding of HTML, CSS, and selector languages (XPath/CSS). Experience in handling anti-scraping challenges and ensuring robust data extraction. Familiarity with distributed scraping techniques and data pipelines. Ability to troubleshoot and optimize web crawlers for performance and reliability. Strong analytical and problem-solving skills with attention to detail. Excellent communication and inter-personal skills. Other Infrastructure Requirements Since this is a completely work-from-home position, you will also require the following - High-speed internet connectivity for video calls and efficient work. Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM, and no other obstacles to interrupted, efficient work). Headphones with clear audio quality. Stable power connection and backups in case of internet/power failure.