Round 1:
How to change Soft delete as hard delete - CDC change logic
How do you handle sudden surge in data
Difference between materialized view, view and temp view
Have you been in a stakeholder meeting?
Dependency management in your ETL tool
How will you get notified when there is an error in pipeline
Difference between data lake, data warehouse and delta lake
How do you handle with duplication of data while ingesting
Round 2:
SQL query optimization methods
What is spark
Which AWS services have you used?
Self Rating in Python and SQL
Other than DE where have you used python?
Round 3:
Primary work, tools used
How do you ensure data quality in your pipelines?
Where do you deploy your code?
Why need of separate dbs, OLTP and OLAP?
Most recent project, use case and tools used
What is Autoloader in Databricks?
How do you manage schema evolution?
Why there was a need of real time streaming in your architecture?
When to use nested queries and CTE?