Versioned Datasets Management System with Python
Write a Python program that creates a system for managing versioned datasets with Git-like semantics.
The task involves developing a system to manage versioned datasets with functionalities similar to "Git". This system should allow users to create commits, each capturing a snapshot of the dataset along with a commit message and timestamp. Users should be able to list all commits, view details of each commit, and roll back the dataset to any previous version. This version control mechanism enhances dataset management by enabling easy tracking of changes and restoring previous states when needed.
Sample Solution:
Python Code :
# Import necessary modules
import os
import shutil
import datetime
# Define the DatasetManager class
class DatasetManager:
# Initialize the DatasetManager instance with the given dataset path
def __init__(self, dataset_path):
# Set the dataset path
self.dataset_path = dataset_path
# Set the metadata path inside the dataset
self.dataset_metadata_path = os.path.join(dataset_path, ".metadata")
# Initialize the current version to 0
self.current_version = 0
# Initialize the dataset
self.initialize_dataset()
# Method to initialize the dataset
def initialize_dataset(self):
# Check if the dataset path does not exist
if not os.path.exists(self.dataset_path):
# Create the dataset directory
os.makedirs(self.dataset_path)
# Create the metadata directory
os.makedirs(self.dataset_metadata_path)
# Create the initial commit
self.create_commit("Initial commit")
# Method to create a new commit with a message
def create_commit(self, message):
# Increment the current version
self.current_version += 1
# Create a directory for the new commit
commit_dir = os.path.join(self.dataset_metadata_path, str(self.current_version))
# Make the commit directory
os.makedirs(commit_dir)
# Write the commit message to a file
with open(os.path.join(commit_dir, "message.txt"), "w") as f:
f.write(message)
# Get the current timestamp
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# Write the timestamp to a file
with open(os.path.join(commit_dir, "timestamp.txt"), "w") as f:
f.write(timestamp)
# Take a snapshot of the dataset
self.snapshot_dataset(commit_dir)
# Method to snapshot the dataset
def snapshot_dataset(self, commit_dir):
# Define the snapshot directory path
snapshot_dir = os.path.join(self.dataset_path, str(self.current_version))
# Copy the dataset to the snapshot directory
shutil.copytree(self.dataset_path, snapshot_dir)
# Method to get the current version of the dataset
def get_current_version(self):
# Return the current version
return self.current_version
# Method to rollback to a specific version
def rollback(self, version):
# Check if the version number is valid
if version <= 0 or version > self.current_version:
# Print an error message if the version number is invalid
print("Invalid version number")
return
# Define the path to the commit to rollback to
commit_path = os.path.join(self.dataset_metadata_path, str(version))
# Check if the commit path does not exist
if not os.path.exists(commit_path):
# Print an error message if the version does not exist
print("Version {} does not exist".format(version))
return
# Remove the current dataset directory
shutil.rmtree(self.dataset_path)
# Copy the commit directory to the dataset path
shutil.copytree(commit_path, self.dataset_path)
# Set the current version to the rollback version
self.current_version = version
# Method to list all commits
def list_commits(self):
# Initialize an empty list to store commits
commits = []
# Iterate over the entries in the metadata directory
for entry in os.listdir(self.dataset_metadata_path):
# Define the path to the commit
commit_path = os.path.join(self.dataset_metadata_path, entry)
# Read the commit message from the file
with open(os.path.join(commit_path, "message.txt"), "r") as f:
message = f.read().strip()
# Read the timestamp from the file
with open(os.path.join(commit_path, "timestamp.txt"), "r") as f:
timestamp = f.read().strip()
# Append the commit details to the list
commits.append((entry, message, timestamp))
# Return the list of commits
return commits
# Example usage
if __name__ == "__main__":
# Create an instance of DatasetManager with the dataset path "dataset1"
dataset_manager = DatasetManager("dataset1")
# Create a new commit with the message "Add initial data"
dataset_manager.create_commit("Add initial data")
# Create another commit with the message "Update data"
dataset_manager.create_commit("Update data")
# Print the current version of the dataset
print("Current version:", dataset_manager.get_current_version())
# Print the list of commits
print("Listing commits:")
for commit in dataset_manager.list_commits():
print(commit)
# Rollback to version 1
dataset_manager.rollback(1)
# Print the current version after rollback
print("After rollback, current version:", dataset_manager.get_current_version())
Output:
Current version: 3 Listing commits: ('1', 'Initial commit', '2024-05-21 11:30:05') ('2', 'Add initial data', '2024-05-21 11:30:05') ('3', 'Update data', '2024-05-21 11:30:05') After rollback, current version: 3
Explanation:
- Import Modules: Necessary modules (os, shutil, datetime) are imported.
- Define DatasetManager Class: A class for managing versioned datasets.
- Initialize Class (__init__ Method): Sets up dataset paths and initializes dataset.
- Initialize Dataset Method: Creates dataset and metadata directories if they don't exist, and makes an initial commit.
- Create Commit Method: Increments version, creates a commit directory, writes a message and timestamp, and snapshots the dataset.
- Snapshot Dataset Method: Copy the current dataset to a snapshot directory.
- Get Current Version Method: Returns the current version number.
- Rollback Method: Reverts the dataset to a specified version, with checks for valid version numbers.
- List Commits Method: Lists all commits by reading messages and timestamps from the metadata directory.
- Example Usage: Demonstrates creating a 'DatasetManager' instance, making commits, printing the current version, listing commits, and rolling back to a previous version.
Python Code Editor :
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Building a Rule-Based Chatbot with Python and Regular Expressions.
Next: Synthetic Data Generation Tool in Python.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://www.w3resource.com/python-exercises/advanced/python-versioned-datasets-management-system.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics