w3resource

C++ File handling: Split a large text file

C++ File handling: Exercise-10 with Solution

Write a C++ program to split a large text file into smaller files of equal size.

Sample Solution:

C Code:

#include <iostream>   // Including the input/output stream library
#include <fstream>    // Including the file stream library
#include <string>     // Including the string handling library
#include <vector>     // Including the vector container

// Function to split a file into smaller chunks
void splitFile(const std::string & inputFile, const std::string & outputPrefix, int chunkSize) {
  // Open the input file in binary mode
  std::ifstream input(inputFile, std::ios::binary); // Open the input file in binary mode

  if (input.is_open()) { // Check if the input file was successfully opened
    // Get the file size
    input.seekg(0, std::ios::end); // Move the file pointer to the end of the file
    std::streampos fileSize = input.tellg(); // Get the current position of the file pointer, indicating the file size
    input.seekg(0, std::ios::beg); // Move the file pointer back to the beginning of the file

    // Calculate the number of chunks
    int numChunks = (fileSize + chunkSize - 1) / chunkSize; // Calculate the number of chunks based on file size and chunk size

    // Read and write each chunk
    for (int i = 0; i < numChunks; ++i) { // Iterate through each chunk
      // Create or overwrite the output file with an incremental suffix
      std::ofstream output(outputPrefix + std::to_string(i + 1) + ".txt", std::ios::binary); // Create or overwrite the output file

      if (output.is_open()) { // Check if the output file was successfully opened
        std::vector<char> buffer(chunkSize); // Create a buffer to hold the chunk data

        // Read a chunk of data from the input file
        input.read(buffer.data(), chunkSize); // Read chunkSize number of bytes into the buffer

        // Write the chunk to the output file
        output.write(buffer.data(), input.gcount()); // Write the read data from the buffer to the output file

        output.close(); // Close the output file
      } else {
        std::cout << "Failed to open output file: " << outputPrefix + std::to_string(i + 1) + ".txt" << std::endl; // Display an error message if output file opening failed
      }
    }

    input.close(); // Close the input file

    std::cout << "File split successfully." << std::endl; // Display a success message after splitting
  } else {
    std::cout << "Failed to open the input file." << std::endl; // Display an error message if input file opening failed
  }
}

int main() {
  std::string inputFile = "merged_test_file.txt"; // Input file
  std::string outputPrefix = "part_"; // Prefix for output files
  int chunkSize = 400; // Chunk size in bytes
  splitFile(inputFile, outputPrefix, chunkSize); // Call the function to split the file
  return 0; // Return 0 to indicate successful execution
}

Sample Output:

File split successfully

Explanation:

In the above exercise,

  • The function splitFile() takes three parameters: inputFile (the name of the input file to be split), outputPrefix (the prefix for the output files), and chunkSize (the size of each chunk in bytes).
  • The program opens the input file using std::ifstream in binary mode. It then determines the size of the input file using the seekg() and tellg() functions.
  • Next, it calculates the number of chunks required to split the file based on the specified chunk size.
  • The program iterates over each chunk, creates or overwrites the corresponding output file using std::ofstream, and reads a chunk of data from the input file using a std::vector<char> buffer.
  • Each chunk is then written to the output file using the write function.
  • After all the chunks have been written, the input and output files are closed, and a success message is displayed.

Note:

Content of "merged_test_file.txt"

Many vendors provide C++ compilers, including the Free Software Foundation, LLVM, Microsoft, Intel, Embarcadero, Oracle, and IBM.

C++ is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup.

It is almost always implemented in a compiled language.

Modern C++ currently has object-oriented, generic, and functional features, in addition to facilities for low-level memory manipulation.

First released in 1985 as an extension of the C programming language, it has since expanded significantly over time.

Content of the split files

part_1.txt

Many vendors provide C++ compilers, including the Free Software Foundation, LLVM, Microsoft, Intel, Embarcadero, Oracle, and IBM.

C++ is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup.

It is almost always implemented in a compiled language.

Modern C++ currently has object-oriented, generic, and functional features, in addition to facil.

part_2.txt

ities for low-level memory manipulation.

First released in 1985 as an extension of the C programming language, it has since expanded significantly over time.

Flowchart:

Flowchart: Split large text file.

CPP Code Editor:

Contribute your code and comments through Disqus.

Previous C++ Exercise: Merge multiple text files.
Next C++ Exercise: Search for string with line number in text file.

What is the difficulty level of this exercise?



Become a Patron!

Follow us on Facebook and Twitter for latest update.

It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.

https://www.w3resource.com/cpp-exercises/file-handling/cpp-file-handling-exercise-10.php