Jobs
Interviews

numberz.ai

5 Job openings at numberz.ai
QA Engineer, GenAI india 0 years None Not disclosed On-site Full Time

Make our Should-Costing & Negotiation Copilot answers correct, safe, grounded —every time. You’ll define and own the quality bar for GenAI at Numberz.ai. Outcomes ( 12 Months) Evaluation harness: Ship rubric + contract + grounding suites; gate releases in CI. Baseline rubric pass rate ≥ 85% on priority slices by day 60. Safety : Zero critical safety escapes; refusal/ jailbreak pass rate ≥ 99%. Grounding : ≥ 95% grounded-claim precision; citation coverage ≥ 98% on fact-bearing responses. Slice coverage: Expand suite to cover ≥ 90% of priority commodity × archetype × region cases. Drift & staleness: Canary tests in place; MTTD < 24h for model/data drift; materials/FX freshness within SLA. Observability : Traces (prompts/tools/citations) wired to dashboards; MTTA < 2h for red builds; triage playbook documented. Cost/latency: 95p task latency ≤ 2.5s (no-tool tasks) / ≤ 6s (tooling); eval $/run within budget. Skills: Must-Haves Python + Pytest; JSON Schema/strict parsers; CI (GitHub Actions). LLM/RAG Testing : Rubric design (must/should/ forbid), semantic similarity, pairwise preferences, pass@K. Grounding checks for RAG; retrieval coverage & freshness controls. Safety Testing : Jailbreaks, refusal quality, PII detection. Data Rraftsmanship: Curate goldens, create slices, measure IAA; basic SQL; Docker. Competencies: How You Work Pragmatism over dogma : Small, high‐signal tests; avoid vanity coverage. Systems thinking : Connect prompts, tools, retrieval, data freshness. Ownership & clarity: crisp writing, reproducible experiments, actionable dashboards. Collaboration: Partner with Product/Eng to encode tone, banned phrases, and safety guardrails. Security & ethics mindset : Handle data and evals responsibly.

AI Security and Reliability Engineer india 0 years None Not disclosed On-site Full Time

And you are... Are you the kind of person who wonders what the best deal is with these “transformers”? You skimmed “Attention Is All You Need,” but your attention… wandered ;) You’ve watched Swordfish , Hackers , Sneakers , WarGames , maybe The Matrix , and thought: “Wait—was that command actually netstat -a on the screen?” You thought it was lame... And you know the truth: most “hacks” are doors someone left open. Security isn’t perfect—your job is to make it harder to break, faster to notice, and easier to fix . If you’re practical about basics and curious about GenAI’s new tricks, you’ll fit right in. Mission Keep Numberz.ai’s GenAI and app stack safe, sane, and grounded —and keep the team current on what’s changing out there. No-nonsense About Numberz.ai We build a should-costing + negotiation copilot for procurement. It ingests Engineering diagrams, RFQs, quotes, and PO history; normalizes parts/BOMs; computes a should-cost and a cross-bid composite ; explains the variance ; and drafts counter-offers and talk tracks people actually use. Less noise, fair pricing, better terms. Outcomes (first 12 months) Real tests, not theater: Red-team checks for prompt/indirect injection, retrieval poisoning, tool-abuse, data leaks, and jailbreaks. Builds fail when protections slip. OWASP-savvy: Map our controls to OWASP Top 10 (web) + OWASP Top 10 for LLM Apps . No critical gaps; a simple quarterly scorecard. Fast intel → fast action: New attack write-ups/CVEs triaged in 48h ; tests or mitigations land within 7 days . Protect people & data: Keep PII/secrets from leaking; factual answers cite sources; retrieval stays fresh. See it, fix it: Dashboards show attacks tried, blocked, or missed; when we miss, we learn and close the loop. Monthly Security Brief: Ship a short GenAI + AppSec brief by the 5th ; run a 20-min teach-in ; track follow-ups to done. What you’ll actually do Turn new attack ideas into small, automated tests (add to CI). Pair with engineers to bake security into prompts, tools, and retrieval. Keep a living Latest Threats list (indirect injections in PDFs, tool-call escalation, vector-DB poisoning, provider/model changes, classic web vulns). Write short, human-readable advisories and update runbooks. Skills (the helpful kind) Comfortable in Python and CI ; you write tests and wire them up. Solid AppSec basics (OWASP Top 10, auth, secrets handling). Working knowledge of GenAI/RAG risks (prompt/indirect injection, retrieval poisoning, data exfiltration, jailbreaks). Clear writing and calm debugging—you explain, you don’t mystify. Bonus: JSON Schema, basic SQL, Docker; threat modeling (STRIDE/LINDDUN for GenAI). Competencies (how you work) Always learning: You skim the noise and turn it into 1–2 concrete actions. Practical: You ship protections that actually reduce risk. Teacher energy: Monthly briefs people read without groaning. Partner mindset: Security as a habit, not a roadblock.

Founding Principle Engineer india 10 years None Not disclosed On-site Full Time

Who are you? You like software that feels calm. You’ve shipped the release nobody noticed—because everything that needed to break, already broke in dev and UAT (on purpose). You also know some truths only show up under real traffic . So you ship behind flags, If something bends, you roll forward or back fast, write it down, and make the next release calmer. You never say “it works on my laptop,” because your laptop mimics prod : containers, seeded fixtures, contract tests, and mocks for the sharp edges. That’s the craft. That’s exactly the energy we want. We’re building the Numberz.ai WealthTech assisted trading with an RM (financial advisors) in the loop, a clean 360° portfolio view, and ideas that are actually acted on. It has to be fast, stable, and simple—because real clients (High Net-worth Individuals) and relationship managers will rely on it. Mission: Ship simple, dependable features end‑to‑end—types first, tests included, infra ready. What you’ll do (one line!) Ship simple, dependable features end‑to‑end—types first, tests included, infra ready. What you’ll do Build product surfaces in React/Next.js + TypeScript and services in Node.js/Nest/Express . Keep contracts honest with TypeScript + Zod/JSON Schema and write unit/integration tests that run in CI. Design and document APIs (REST/GraphQL), add tracing/logging , and watch p95s. Run it in prod: Docker + Kubernetes , Postgres/Mongo, Redis; basic IaC to make it repeatable. Review PRs with care (old‑school eyes welcome), mentor, and keep PRs small. Use Cursor/Copilot to move faster—verify outputs, don’t outsource thinking. Work async‑first : plan your time, communicate clearly in Slack/Docs, and ship. How we build (principles) Keep it small. Small PRs, small services (when needed), small blast radius. Types earn trust. Model the domain; make invalid states unrepresentable. Tests tell stories. Happy path + edge cases; guardrails in CI. Prod reveals truth—safely. Feature flags, canaries, synthetic load; rollbacks are a skill, not a surprise. Parity over excuses. “Works on my laptop” is banned—use containers, seeded data, contract tests, mocks to mirror prod locally. UAT isn’t ceremony. Masked prod‑like data, idempotent migrations, backfills, and chaos toggles run there first. GenAI is a power tool, not a crutch. We review, we measure. Infra is part of the job. Observability over optimism; boring deploys beat heroic fixes. Jargon‑light, clarity‑heavy. Plain words beat buzzwords. What you bring 7–10 years building user‑facing products (preferably in startups). Strong TypeScript/MERN ; confident with Kubernetes in production. Taste for simple designs, steady delivery, and excellent code reviews. Solid engineering hygiene: RFC → small PR → measure → iterate. Ownership, time management, and crisp async communication. Nice to have Fintech/wealth/trading exposure; event‑driven patterns; auth (OIDC), caching; observability (OpenTelemetry/Prom/Grafana). Extra credit — our tiny book club You’ve read (and used ideas from) Analysis Patterns (Martin Fowler). You’ve read Refactoring (Martin Fowler) and can explain one refactor you applied. You bring a pragmatic mindset —tradeoffs, not dogma. (If you also like The Pragmatic Programmer , tell us a tip you actually use.) How to apply “Founding Principal Engineer — Story Edition” and 3 tiny stories : a time you deleted more than you wrote, a test that caught a bug early, a moment you kept a system simple when complexity tempted. Include a link to a PR you’re proud of.

React Developer (jr) india 6 years None Not disclosed Remote Full Time

You like the front end and the seam where it meets the backend. You’ve walked a shop floor, priced a part, or at least wondered why two quotes for the same bolt don’t match. You turn messy BOMs, RFQs, quotes, and POs into clear screens. You move fast with Cursor and Tailwind to get a real UI in users’ hands—then you take a second pass to fix semantics, edge cases, and tighten the contract with backend / Node.js so it keeps working. We’re building the Numberz.ai Should‑Costing + Negotiation Copilot —normalize parts, compute should‑costs, explain variance, and draft counter‑offers. You’ll take feature idea → UI → API contract → production → iteration . You like Object Oriented thinking / business first mentality. So you do object modeling and interactions - understand your nouns and verbs. Mission: Build fast, accessible React UIs for should‑costing workflows and keep frontend ↔ backend contracts tight. What you’ll do Implement product surfaces in React/Next.js + TypeScript ; contribute to the design system (Tailwind/shadcn, Storybook). Own UI ↔ API contracts: TypeScript types + Zod/JSON Schema , OpenAPI specs, and consumer‑driven contract tests (Pact/MSW). Integrate with Node.js services (REST/GraphQL); plan errors, retries, optimistic updates, and offline‑tolerant states. Ship core should‑costing flows: BOM Explorer , Should‑Cost Review (materials/labor/overhead/FX), Quote Workspace (cross‑bid composite), Counter‑Offer Builder , Supplier View . Instrument usage/events; watch LCP/INP/CLS and domain metrics (time‑to‑compare quotes, variance explained, acceptance rate); release behind feature flags . Write tests that matter ( RTL/Jest/Vitest/Playwright ); keep CI green and previews useful. After release, read the data, fix papercuts, and simplify flows—the closed loop . How you’ll use GenAI (Cursor & friends) Scaffold components with Tailwind and generate Storybook stories ; create MSW mocks from schemas. Draft Zod/TypeScript types from OpenAPI ; spot gaps; propose refactors. Summarize PR diffs, write changelog notes, and synthesize test data—then verify before committing. Move fast for first drafts— then fix as we go : semantics, a11y, contracts, and docs. How we build (principles) Prototype to learn; productionize to last. Words first. Clear UX copy and obvious next steps. Accessibility is default. Keyboard paths, labels, contrast—every time. Performance has a budget. Measure and hold the line. Parity over excuses. Containers, seeded fixtures, contract tests; no “works on my laptop.” Close the loop. Ship small, observe real use, iterate. Horses for courses. Pick the simplest tool that works. What you bring 3–6 years building React/TypeScript apps; comfortable with Next.js and Tailwind/shadcn . Experience integrating with Node.js backends; confident with types, schemas, and API contracts . Testing discipline (RTL/Jest/Vitest/Playwright) and CI hygiene. Curious about manufacturing: you’ve handled data like BOMs/RFQs/POs or want to learn fast. Async communication and time management in a remote setup.  Nice to have Exposure to Should‑Costing/Manufacturing workflows, supplier comparisons, or cost models. Data‑viz (Recharts/Visx/D3), analytics/experimentation basics, and performance tuning. Familiar with Cursor/Copilot and code review best practices. PS: We aim to move fast - hope to close this in a week.

Business Analyst (jr) india 2 years None Not disclosed Remote Full Time

Team: Numberz.ai — Should‑Costing & Wealth Apps · Location: Remote (ET hours) You like turning messy into clear. You ask good questions, sketch quick flows, and write down what “done” means. You pick tools for the job— horses for courses : a one‑pager to align, a Miro map to spot gaps, a Figma wire to test an idea, a spreadsheet to check the math. You notice how people actually work—on a shop floor pricing parts or in a portfolio review—and you enjoy helping them get from A to B with fewer steps. Bonus points if you love machines (BOMs, RFQs, POs make you curious) or you’ve placed a trade and felt that beat between click and confirm. We’re building the Numberz.ai Wealth SuperApp and the Should‑Costing + Negotiation Copilot . You’ll help take features from discovery → spec → handoff → UAT → post‑launch iteration . Mission: Turn ambiguity into crisp requirements, clean data, and simple flows—and close the loop after launch. What you’ll do Shadow users (buyers, suppliers, advisors); run short interviews; map current → target workflows. Write one‑pagers and user stories with clear acceptance criteria (Given/When/Then); track decisions. Draft wireframes in Figma and process maps in Miro to align on UX and edge cases. Reconcile data across sources (BOM/RFQ/PO or holdings/IPS); define field mappings; prepare sample datasets . Run UAT : create scripts, seed masked data, log issues, retest; help keep release notes tidy. Build lightweight metrics (Sheets/Looker Studio/Retool) to measure if a change worked. After release, review analytics/feedback, summarize insights, and propose the next iteration. How you’ll use GenAI (Cursor/ChatGPT & friends) Turn interview notes into draft user stories; generate edge‑case lists and test data. Convert OpenAPI/CSV fields into draft mappings and basic SQL—then verify with the team. Summarize PR diffs and write draft changelog notes; never ship blindly. How we work (principles) Prototype to learn; words first. Clear language before fancy UI. Small batches. Align with a one‑pager, then iterate. Source of truth. Decisions live where the team can find them. Mirror prod safely. Masked data in UAT; contracts checked; no “works on my laptop.” Close the loop. Ship, observe (metrics + user quotes), and adjust. Jargon‑light, clarity‑heavy. Use names people can guess; design for recovery. What you bring 0–2 years in product/ops/consulting/BA (internships welcome) and strong writing. Comfortable with spreadsheets , tidy docs, and basics of Figma/Miro ; SQL curiosity is a plus. Organized, proactive, great at async communication and time management. Domain curiosity in manufacturing/should‑costing or wealth ; you like learning how work really happens. Nice to have Hands‑on with BOMs/RFQs/POs or portfolios/IPS; Excel power moves (Pivot/XLOOKUP). Basics of analytics/experiments; Tableau/Looker Studio; Python/pandas curiosity. You’ve toured a plant or traded yourself. We move fast and promise to close the position in less than a week!