Question

    You are combining sales data from three different

    sources, each with slightly different column names for the same information (e.g., "Product_ID," "ProdID," and "PID"). What is the best way to handle this discrepancy?
    A Retain all variations of column names for completeness. Correct Answer Incorrect Answer
    B Standardize column names during data wrangling Correct Answer Incorrect Answer
    C Drop the columns to avoid confusion Correct Answer Incorrect Answer
    D Create a mapping table to relate different column names Correct Answer Incorrect Answer
    E Analyze each dataset separately before combining. Correct Answer Incorrect Answer

    Solution

    Standardizing column names ensures consistency, making it easier to merge and analyze datasets. By mapping all variations to a uniform name (e.g., "Product_ID"), you can avoid confusion and ensure that subsequent operations (e.g., joins or aggregations) are error-free. Option A : Retaining all variations increases complexity and redundancy in the dataset. Option C : Dropping the columns results in data loss, reducing analysis quality. Option D : A mapping table might help in understanding variations but doesn’t standardize the data for use. Option E : Analyzing separately prevents gaining a comprehensive view of the data.

    Practice Next