Question

    Which Python library is primarily used for data manipulation and cleaning?

    A NumPy Correct Answer Incorrect Answer
    B pandas Correct Answer Incorrect Answer
    C Matplotlib Correct Answer Incorrect Answer
    D Seaborn Correct Answer Incorrect Answer
    E Scikit-learn Correct Answer Incorrect Answer

    Solution

    The pandas library is the go-to tool for data manipulation and cleaning in Python. It provides powerful, flexible data structures like DataFrames and Series that make handling structured data (such as tables and spreadsheets) very efficient. Pandas allows you to easily manipulate, clean, and preprocess data by offering features like handling missing values, merging datasets, filtering data, and performing group-by operations. It's a vital tool in data analysis and is widely used alongside other libraries like NumPy for numerical computations and Matplotlib/Seaborn for visualization. The other options are incorrect because: • Option 1 (NumPy) is primarily used for numerical operations and is excellent for working with arrays, but it does not offer data manipulation functions specific to tables or datasets. • Option 3 (Matplotlib) is used for data visualization, not data manipulation. • Option 4 (Seaborn) is a statistical visualization library built on top of Matplotlib, useful for creating beautiful plots, but not focused on data cleaning. • Option 5 (Scikit-learn) is a machine learning library that focuses on model building and not data manipulation.

    Practice Next