data engineer interview questions shared by candidates
Please use the awswrangler module to create a list of the files in the input folder that are not in the output folder. There is an AWS S3 bucket with two folders. Here is the initial code: import awswrangler as wr input_folder = 's3://mf-pythontest/in' output_folder = 's3://mf-pythontest/out' Using the AWS wrangler module, please create a list of the files in the input folder that are not in the output folders. The Required output is: ['doc_003.parquet']You must use the awswrangler package: https://github.com/awslabs/aws-data-wranglerYou will need to have some AWS credentials to access this public bucket. ***TIP*** The solution should have no more than three lines of code
import awswrangler as wr input_folder = 's3://mf-pythontest/in' output_folder = 's3://mf-pythontest/out from os import path get_filenames = lambda folder_path: [path.basename(file_path) for file_path in wr.s3.list_objects(folder_path)] [filename for filename in get_filenames(input_folder) if file not in get_filenames(output_folder)]
There were no questions of the traditional format that would be of interest here. Simply things about past technologies used and the challenges described above.