Reduce memory usage in Pandas DataFrame using astype method
Pandas: Performance Optimization Exercise-4 with Solution
Write a Pandas program that uses the "astype" method to convert the data types of a DataFrame and measures the reduction in memory usage.
Sample Solution :
Python Code :
import pandas as pd # Import the Pandas library
import numpy as np # Import the NumPy library
# Create a sample DataFrame with mixed data types
np.random.seed(0) # Set seed for reproducibility
data = {
'int_col': np.random.randint(0, 100, size=100000),
'float_col': np.random.random(size=100000) * 100,
'category_col': np.random.choice(['A', 'B', 'C'], size=100000),
'object_col': np.random.choice(['foo', 'bar', 'baz'], size=100000)
}
df = pd.DataFrame(data)
# Print memory usage before optimization
print("Memory usage before optimization:")
print(df.info(memory_usage='deep'))
# Convert data types using astype method
df['int_col'] = df['int_col'].astype('int16')
df['float_col'] = df['float_col'].astype('float32')
df['category_col'] = df['category_col'].astype('category')
df['object_col'] = df['object_col'].astype('category')
# Print memory usage after optimization
print("\nMemory usage after optimization:")
print(df.info(memory_usage='deep'))
Output:
Memory usage before optimization: <class 'pandas.core.frame.DataFrame'> RangeIndex: 100000 entries, 0 to 99999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 100000 non-null int32 1 float_col 100000 non-null float64 2 category_col 100000 non-null object 3 object_col 100000 non-null object dtypes: float64(1), int32(1), object(2) memory usage: 12.4 MB None Memory usage after optimization: <class 'pandas.core.frame.DataFrame'> RangeIndex: 100000 entries, 0 to 99999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 100000 non-null int16 1 float_col 100000 non-null float32 2 category_col 100000 non-null category 3 object_col 100000 non-null category dtypes: category(2), float32(1), int16(1) memory usage: 781.9 KB None
Explanation:
- Import Libraries:
- Import the Pandas library for data manipulation.
- Import the NumPy library for generating random data.
- Create a sample DataFrame:
- Set a seed for reproducibility using np.random.seed(0).
- Create a dictionary data with columns of mixed data types: integers, floats, categories, and objects.
- Generate a DataFrame df using the dictionary.
- Print memory usage before optimization:
- Use df.info(memory_usage='deep') to display the memory usage of the DataFrame before optimization.
- Convert data types using astype method:
- Convert the 'int_col' to 'int16'.
- Convert the 'float_col' to 'float32'.
- Convert the 'category_col' and 'object_col' to 'category'.
- Print Memory usage after optimization:
- Use df.info(memory_usage='deep') to display the memory usage of the DataFrame after optimization.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Optimize Memory usage when loading large CSV into Pandas DataFrame.
Next: Compare DataFrame row filtering using for loop vs. Boolean indexing.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://www.w3resource.com/python-exercises/pandas/reduce-memory-usage-in-pandas-dataframe-using-astype-method.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics