It takes time to write a large amount of data to an xlsx file with Pythond, so I tried using a C ++ library. As a usage, when exporting a large amount of csv or tsv data to xlsx without using Excel. I forgot to compare the speeds, but due to the difference in cloudiness, C ++ was able to export at high speed. I will compare them soon.
In the case of xlsx file, it supports up to 1,048,576 rows and 16384 columns.
libxlsxwriter
A simple xlsx export library for C. It is published in the Github repository.
A wealth of examples are included.
It can be used without installation. In that case, it is necessary to specify the path.
$ git clone https://github.com/jmcnamara/libxlsxwriter.git
$ cd libxlsxwriter
$ make
$ sudo make install
Tab-delimited standard input is output in xlsx format to the file specified by the argument, with tabs as cell column delimiters.
tsv2xlsx.cpp
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>
#include "xlsxwriter.h"
int main(int argc, char **argv) {
//Argument check
if (argc < 2){
std::cerr << "Usage: " << argv[0] << " output_filename.xlsx" << std::endl;
}
//Create Excel Workbooks and Worksheets
lxw_workbook *workbook = workbook_new(argv[argc-1]); // argv[argc-1]Is the file name
lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
std::vector<std::string> v_string;
int row = 0;
char input_text[51200]; //Standard input buffer(It may be better to receive standard input with string type)
while (std::cin.getline(input_text, sizeof(input_text))){
row++;
//Do not exceed the maximum number of lines.
if(row == 1000000){
break;
}
//If the input is blank, do not write.
if(!input_text[0]){
continue;
}
//Separate the inputs with tabs.
boost::split(v_string, input_text, boost::is_any_of("\t"));
//1 line output
for (int col = 0; col < v_string.size(); col++){
worksheet_write_string(worksheet, row, col, v_string[col].c_str(), NULL);
}
}
workbook_close(workbook);
return 0;
}
% g++ tsv2xlsx.cpp -o tsv2xlsx -lxlsxwriter
% cat test.tsv | ./tsv2xlsx test.xlsx
Recommended Posts