Efficiently Streaming and Zipping Multiple Files from S3 using fs2
Posted on September 8, 2024 • 3 min read • 620 words
By Irodotos Apostolou (Software Engineer)
When working with large-scale report generation, where clients need to download hundreds of reports at once, performance and memory usage become critical factors. In one of our recent projects, we faced the challenge of allowing clients to download up to 500 reports at once. Instead of generating and serving these files directly, which could cause high memory consumption, we implemented a streaming solution in Scala using fs2.
This post will highlight how we leveraged fs2 streams to zip files from AWS S3, keeping memory usage minimal throughout the process.
Since the number of files and their individual size varies, we need to safeguard our downloading process from excessive memory usage

In order to optimise our zip construction, we segregated the original content from the optimised file format we use in the downloading process.
This also gives us the flexibility to evolve the two processes independently (reading and writing) for later changes in requirements or optimisations, since the generated format is loosely coupled with the downloadable format.
Using fs2, we were able to stream each file from s3, add it to the zip being downloaded by the client with a small memory footprint
def transfer(from: NonEmptyString, to: NonEmptyString, fileIdentifier: NonEmptyString): fs2.Stream[IO, ETag] =
withS3(s3AsyncClient).flatMap { s3 =>
val body = s3.readFile(BucketName(from), FileKey(fileIdentifier))
body.through(deflateStream)
.through(s3.uploadFile(BucketName(to), FileKey(fileIdentifier)))
}def retrieveMultiple(compressedBucket: NonEmptyString, fileKeys: List[NonEmptyString], streamSize: Int): fs2.Stream[IO, Byte] =
withS3(s3AsyncClient).flatMap { s3 =>
fs2.Stream.emits(fileKeys).map { fileKey =>
val body = s3.readFile(BucketName(compressedBucket), FileKey(fileKey))
FileArchive(fileKey.value, body)
}.through(zipPipe(streamSize))
}fs2.Stream[IO, Byte] which wraps a ZipOutputStreamdef zipPipe(chunkSize: Int): Pipe[IO, FileArchive, Byte] = { fileArchives: fs2.Stream[IO, FileArchive] =>
fs2.io.readOutputStream[IO](chunkSize) {
outputStream =>
Resource.fromAutoCloseable(IO.delay(new ZipOutputStream(outputStream))).use { zipOut =>
val writeOutput = fs2.io.writeOutputStream[IO](IO(zipOut), closeAfterUse = false)
fileArchives.evalMap { archive: FileArchive =>
IO.delay(zipOut.putNextEntry(archive.asZipEntry)) >>
archive.inflatedData.through(writeOutput).compile.drain >>
IO.delay(zipOut.closeEntry())
}.compile.drain
}
}
}We checked memory usage using VisualVM , with different test scenarios and configurations



By leveraging fs2 for streaming and zipping files from S3, we created a solution that’s not only efficient but scalable. We were able to meet the challenge of providing clients with hundreds of reports in a single download without overwhelming our memory resources. Streaming files directly from S3, zipping them on the fly, and serving the final zip package ensures that our system can handle high volumes of data with ease.
If you’re dealing with similar challenges around large file downloads or need to optimize how you handle large datasets, consider giving fs2 a try! With its powerful streaming capabilities, you can keep things efficient and scalable, no matter the size of the workload.
Feel free to check out our implementation on GitHub, and let us know what you think!
GitHub