Question 1: Multimodal RAG Retrieval
Question
In a Retrieval-Augmented Generation (RAG) system, how would you efficiently handle and retrieve information from multiple types of datasets such as text, images, audio, video, and tables?
For example:
Text documents may contain paragraphs, structured tables, and columns
Images may contain charts or diagrams
Audio and video may contain spoken content
How would you design the data processing and retrieval pipeline so that the system can retrieve the most relevant information efficiently when a user asks a question?
Specifically explain:
How each modality (text, tables, images, audio, video) would be processed
How the data would be converted into embeddings
How it would be stored in a vector database
How the system would perform efficient retrieval across different data types
Question 2: Resume Information Extraction
Question
You are given multiple resume templates, and each template contains different formats of date representations.
Examples of date formats include:
Jan 2022 – Mar 2023
2021 - Present
03/2020 – 07/2022
March 2019 to June 2021
Your task is to build a system that automatically extracts structured information from resumes, specifically:
Project Name
Project Duration
Total Years of Experience
Challenges:
Resumes follow different templates and layouts
Date formats are not consistent
Information may appear in different sections