Question
Which data cleaning technique is most appropriate for
handling missing data when missing values are randomly distributed across a dataset?Solution
When missing data points are randomly distributed, imputing values using the mean (for continuous data) or median (for skewed distributions) can be an effective technique. This approach maintains the dataset’s overall structure and helps reduce potential bias introduced by missing values. By substituting missing values with central tendencies, analysts can preserve statistical relationships without significantly distorting the data, ensuring a more accurate analysis. Option A is incorrect as removing rows may lead to a significant data loss, especially if many rows contain missing values. Option C is incorrect because dropping columns with missing values reduces feature dimensions, potentially discarding useful information. Option D is incorrect as placeholder values can introduce bias or mislead analysis, especially if the placeholder value skews the distribution. Option E is incorrect because ignoring missing values leaves gaps, making it difficult to perform accurate analysis.
‘World Chess Day’ is observed annually on which day of July month?
The Malimath Committee Report deals with?
Which of the following district of Uttar Pradesh is the smallest district in term of area?
Between which two rivers was the ancient Takshashila University located?
What is the new name of Karimganj district in Assam?
Which of the following is mined in the Badampahar mines of Odisha?
Which of the following has the highest density?
Which gas is commonly known as laughing gas?
Who has been appointed as the Prime Minister of the country Somalia?
According to the "Women and Men in India 2023" report, what is the Maternal Mortality Ratio (MMR) that India achieved during 2018-20, and how does it al...