admin管理员组

文章数量:1026381

I am developing a server script on Node.js/Express.js that receives uploaded .tar.gz archives with multiple files. The script has to untar and ungzip CSV files in archives, parse them and store some in database. There is no need to store files on the server, just process them. To upload files I am using Multer without specifying where to store files, so file uploads are only available in req.files as Buffer.

My question is, how is it possible to untar and ungzip Buffer to get the contents of the files? If I do something like:

const { unzipSync } = require('zlib');

const zipped = req.files[0];
const result = await unzipSync(zipped.buffer);
const str = result.toString('utf-8');

I get not the content of the file, but all information including file name, some metadata etc as string, which is tricky to parse. Is there a better way?

I am developing a server script on Node.js/Express.js that receives uploaded .tar.gz archives with multiple files. The script has to untar and ungzip CSV files in archives, parse them and store some in database. There is no need to store files on the server, just process them. To upload files I am using Multer without specifying where to store files, so file uploads are only available in req.files as Buffer.

My question is, how is it possible to untar and ungzip Buffer to get the contents of the files? If I do something like:

const { unzipSync } = require('zlib');

const zipped = req.files[0];
const result = await unzipSync(zipped.buffer);
const str = result.toString('utf-8');

I get not the content of the file, but all information including file name, some metadata etc as string, which is tricky to parse. Is there a better way?

Share Improve this question asked Dec 4, 2019 at 5:33 Alex TokAlex Tok 1011 silver badge5 bronze badges 5
  • Why not use actual tar and then load in the resulting data from disk? (using exec or spawn) – Mike 'Pomax' Kamermans Commented Dec 4, 2019 at 5:40
  • Yeah, or even easier to use tar module for Node, such as npmjs./package/tar. I was just thinking if I can avoid saving upload to the disk and untar from Buffer itself. – Alex Tok Commented Dec 4, 2019 at 6:12
  • If you want to unpack a tgz, you need to both unzip and untar. Right now you're only unzipping. – Mike 'Pomax' Kamermans Commented Dec 4, 2019 at 15:32
  • Yes. But how to untar Buffer in JavaScript? I've found many modules but without such functionality. They mostly work with files in file system or reading streams. – Alex Tok Commented Dec 5, 2019 at 1:11
  • you literally link to a library that does what you need, but you can't find a specific detail, so: you probably want to ask for them to document how to do that on their issue tracker. That way, everyone in the open source munity benefits. – Mike 'Pomax' Kamermans Commented Dec 5, 2019 at 16:01
Add a ment  | 

1 Answer 1

Reset to default 6

I managed to untar and unzip Buffer using tar-stream and streamifier libraries.

const tar = require('tar-stream');
const streamifier = require('streamifier');
const { unzipSync } = require('zlib');

const untar = ({ buffer }) => new Promise((resolve, reject) => {
  // Buffer is representation of .tar.gz file uploaded to Express.js server
  // using Multer middleware with MemoryStorage
  const textData = [];
  const extract = tar.extract();
  // Extract method accepts each tarred file as entry, separating header and stream of contents:
  extract.on('entry', (header, stream, next) => {
    const chunks = [];
    stream.on('data', (chunk) => {
      chunks.push(chunk);
    });
    stream.on('error', (err) => {
      reject(err);
    });
    stream.on('end', () => {
      // We concatenate chunks of the stream into string and push it to array, which holds contents of each file in .tar.gz:
      const text = Buffer.concat(chunks).toString('utf8');
      textData.push(text);
      next();
    });
    stream.resume();
  });
  extract.on('finish', () => {
    // We return array of tarred files's contents:
    resolve(textData);
  });
  // We unzip buffer and convert it to Readable Stream and then pass to tar-stream's extract method:
  streamifier.createReadStream(unzipSync(buffer)).pipe(extract);
});

Using this approach I managed to avoid storing any temporary files on filesystem and process all files' contents in memory exclusively.

I am developing a server script on Node.js/Express.js that receives uploaded .tar.gz archives with multiple files. The script has to untar and ungzip CSV files in archives, parse them and store some in database. There is no need to store files on the server, just process them. To upload files I am using Multer without specifying where to store files, so file uploads are only available in req.files as Buffer.

My question is, how is it possible to untar and ungzip Buffer to get the contents of the files? If I do something like:

const { unzipSync } = require('zlib');

const zipped = req.files[0];
const result = await unzipSync(zipped.buffer);
const str = result.toString('utf-8');

I get not the content of the file, but all information including file name, some metadata etc as string, which is tricky to parse. Is there a better way?

I am developing a server script on Node.js/Express.js that receives uploaded .tar.gz archives with multiple files. The script has to untar and ungzip CSV files in archives, parse them and store some in database. There is no need to store files on the server, just process them. To upload files I am using Multer without specifying where to store files, so file uploads are only available in req.files as Buffer.

My question is, how is it possible to untar and ungzip Buffer to get the contents of the files? If I do something like:

const { unzipSync } = require('zlib');

const zipped = req.files[0];
const result = await unzipSync(zipped.buffer);
const str = result.toString('utf-8');

I get not the content of the file, but all information including file name, some metadata etc as string, which is tricky to parse. Is there a better way?

Share Improve this question asked Dec 4, 2019 at 5:33 Alex TokAlex Tok 1011 silver badge5 bronze badges 5
  • Why not use actual tar and then load in the resulting data from disk? (using exec or spawn) – Mike 'Pomax' Kamermans Commented Dec 4, 2019 at 5:40
  • Yeah, or even easier to use tar module for Node, such as npmjs./package/tar. I was just thinking if I can avoid saving upload to the disk and untar from Buffer itself. – Alex Tok Commented Dec 4, 2019 at 6:12
  • If you want to unpack a tgz, you need to both unzip and untar. Right now you're only unzipping. – Mike 'Pomax' Kamermans Commented Dec 4, 2019 at 15:32
  • Yes. But how to untar Buffer in JavaScript? I've found many modules but without such functionality. They mostly work with files in file system or reading streams. – Alex Tok Commented Dec 5, 2019 at 1:11
  • you literally link to a library that does what you need, but you can't find a specific detail, so: you probably want to ask for them to document how to do that on their issue tracker. That way, everyone in the open source munity benefits. – Mike 'Pomax' Kamermans Commented Dec 5, 2019 at 16:01
Add a ment  | 

1 Answer 1

Reset to default 6

I managed to untar and unzip Buffer using tar-stream and streamifier libraries.

const tar = require('tar-stream');
const streamifier = require('streamifier');
const { unzipSync } = require('zlib');

const untar = ({ buffer }) => new Promise((resolve, reject) => {
  // Buffer is representation of .tar.gz file uploaded to Express.js server
  // using Multer middleware with MemoryStorage
  const textData = [];
  const extract = tar.extract();
  // Extract method accepts each tarred file as entry, separating header and stream of contents:
  extract.on('entry', (header, stream, next) => {
    const chunks = [];
    stream.on('data', (chunk) => {
      chunks.push(chunk);
    });
    stream.on('error', (err) => {
      reject(err);
    });
    stream.on('end', () => {
      // We concatenate chunks of the stream into string and push it to array, which holds contents of each file in .tar.gz:
      const text = Buffer.concat(chunks).toString('utf8');
      textData.push(text);
      next();
    });
    stream.resume();
  });
  extract.on('finish', () => {
    // We return array of tarred files's contents:
    resolve(textData);
  });
  // We unzip buffer and convert it to Readable Stream and then pass to tar-stream's extract method:
  streamifier.createReadStream(unzipSync(buffer)).pipe(extract);
});

Using this approach I managed to avoid storing any temporary files on filesystem and process all files' contents in memory exclusively.

本文标签: nodejsUnTAR and unGZip file stored as JavaScript BufferStack Overflow