Job Summary
Caterpillar is seeking a Vector DB Engineer Data Scientist join Applications Development & Intelligence Automation -CAT IT Division.
The incumbent would be responsible for designing, implementing, and optimizing vector databases that enable high-performance, large-scale data processing and retrieval. You will work closely with our data science, machine learning, and software engineering teams to build robust solutions that support our clients data-intensive applications.
The preference for this role is to be based out of Bangalore Caterpillar PSN
What you will do
- Design, implement, and manage vector databases to support large-scale data storage and retrieval, ensuring low latency and high availability.
- Develop efficient data models that facilitate fast vector operations such as similarity search, nearest neighbor search, and other vector-based queries.
- Optimize database performance through indexing, partitioning, sharding, and other techniques to handle large-scale datasets.
- Integrate vector databases with existing systems and applications, ensuring seamless data flow and accessibility.
- Design and implement solutions that scale with growing data volumes, ensuring the database infrastructure can handle increased load and complexity.
- Implement security best practices to protect data at rest and in transit, including encryption, access controls, and audit logging.
- Monitor database performance and troubleshoot issues as they arise, ensuring system reliability and availability.
- Work closely with data scientists, machine learning engineers, and software developers to understand their needs and provide database solutions that meet their requirements.
- Maintain comprehensive documentation for database schemas, configurations, and procedures to support operational excellence and knowledge sharing.
What you will have
Must Have Skills:
- Deep understanding and hands-on experience with vector databases, including their architecture, query languages, and optimization techniques.
- Strong programming skills in languages such as Python, C++, or Java, with experience in developing and optimizing database operations.
- Solid understanding of data structures, algorithms, and computational geometry, particularly related to vector search and similarity measures
- Experience with cloud platforms (e.g., AWS, GCP, Azure) and managed database services.
- Understanding of machine learning concepts, particularly those related to embedding vectors and similarity searches.
- Strong problem-solving skills with a focus on performance optimization and scalability.
- Excellent communication skills, with the ability to articulate complex technical concepts to non-technical stakeholders.
- This position requires candidate to work a 5-day -a -week schedule in the office
- Shift Timing :01:00PM -10:00PM IST
Skills desired:
- Business Statistics: Knowledge of the statistical tools, processes, and practices to describe business results in measurable scales; ability to use statistical tools and processes to assist in making business decisions
- Level Working Knowledge:Explains the basic decision process associated with specific statistics
- Works with basic statistical functions on a spreadsheet or a calculator
- Explains reasons for common statistical errors, misinterpretations, and misrepresentations
- Describes characteristics of sample size, normal distributions, and standard deviation
- Generates and interprets basic statistical data
- Accuracy and Attention to Detail: Understanding the necessity and value of accuracy; ability to complete tasks with high levels of precision
- Level Extensive Experience:Evaluates and makes contributions to best practices
- Processes large quantities of detailed information with high levels of accuracy
- Productively balances speed and accuracy
- Employs techniques for motivating personnel to meet or exceed accuracy goals
- Implements a variety of cross-checking approaches and mechanisms
- Demonstrates expertise in quality assurance tools, techniques, and standards
- Analytical Thinking: Knowledge of techniques and tools that promote effective analysis; ability to determine the root cause of organizational problems and create alternative solutions that resolve these problems
- Level Working Knowledge:Approaches a situation or problem by defining the problem or issue and determining its significance
- Makes a systematic comparison of two or more alternative solutions
- Uses flow charts, Pareto charts, fish diagrams, etc
- to disclose meaningful data patterns
- Identifies the major forces, events and people impacting and impacted by the situation at hand
- Uses logic and intuition to make inferences about the meaning of the data and arrive at conclusions
- Machine Learning: Knowledge of principles, technologies and algorithms of machine learning; ability to develop, implement and deliver related systems, products and services
- Level Working Knowledge:Completes specific tasks and initiatives utilizing machine learning technologies, such as search engine optimization
- Utilizes specific tools and techniques to process descriptive and inferential statistics
- Applies specific computing languages and tools in machine learning, such as R and Python
- Explores to use machine learning in one own areas to make business improvements
- Conducts data mining and cleaning initiatives
- Programming Languages: Knowledge of basic concepts and capabilities of programming; ability to use tools, techniques and platforms in order to write and modify programming languages
- Level Working Knowledge:Participates in the implementation and support of specialized programming languages
- Conducts basic reviews on writing a specific programming language within a specific platform
- Assists with the design and development of specialized programming languages
- Follows an organization's standards, policies and guidelines for structured programming specifications
- Diagnoses and reports minor or routine programming language problems
- Query and Database Access Tools: Knowledge of data management systems; ability to use, support and access facilities for searching, extracting and formatting data for further use
- Level Working Knowledge:Defines, creates and tests simple queries by using associated command language in a specific environment
- Applies appropriate query tools used to connect to the data warehouse
- Obtains and analyzes query access path information and query results
- Employs tested query statements to retrieve, insert, update and delete information
- Works with advanced features and functions including sorting, filtering and making simple calculations
- Requirements Analysis: Knowledge of tools, methods, and techniques of requirement analysis; ability to elicit, analyze and record required business functionality and non-functionality requirements to ensure the success of a system or software development project
- Level Working Knowledge:Follows policies, practices and standards for determining functional and informational requirements
- Confirms deliverables associated with requirements analysis
- Communicates with customers and users to elicit and gather client requirements
- Participates in the preparation of detailed documentation and requirements
- Utilizes specific organizational methods, tools and techniques for requirements analysis