Cloud

Write an awesome doc for cloud and how to deploy our apps using technologies like AWS, GitHub actions, CI/CD and Docker. Also a very comprehensive documentation around AWS.

View on GitHub

Multipart File Upload in AWS S3

[!CAUTION]

You might get busted by a surprise bill if you miss this.

When You Should Use Multipart?

When your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.

Checksums

[!TIP]

You wanna upload something:

  1. Use a precalculated checksum.
  2. Use the full object checksum type.

Checksum Types

Checksum algorithm Full object Composite
CRC64NVME Yes No
CRC32 Yes Yes
CRC32C Yes Yes
SHA1 No Yes
SHA256 No Yes

Full object checksums in multipart uploads are only available for CRC-based checksums because they can linearize into a full object checksum.

[!TIP]

To generate those checksums on your Linux you can simply execute one of these:

shasum -a 256 filename.ext | cut -f1 -d\ | xxd -r -p | base64

I still dunno how to generate a CRC32 checksum hash in terminal. So feel free to file an issue and lemme know how I can do it.

How To Use The FULL_OBJECT Checksum Type

  1. You choose a checksum algorithm when you wanna start uploading your data.
  2. Calculate the checksum of your file:
    • Read the file chunk by chunk and try to upload them, but at the same time you will calculate the checksum. So no need to read the entire file.
    • We’re not gonna use the checksum itself, but rather a hash of it as I’ve already explained it here. Feel free to upvote my answer there :).
  3. Amazon S3 uses that algorithm to compute a checksum on its own servers and validates it with the provided value by you.
    • Amazon S3 also stores the checksum as part of the object metadata.

Which Checksum Algorithm Should You Pick?

Checksum tradeoffs

To answer those questions you should know that:

This picture I believe is a nice way to visualize it:

Checksum generation at source

[!CAUTION]

There are cases where your client have to have a multipart upload and the file might as well be bigger than 5MB. So you can do it with multipart upload, but now you need to have a single precalculated checksum for the whole file. In this case it is a good idea to have a FULL_OBJECT checksum type. For this you only can use CRC-based algorithms.

Process

[!TIP]

See how you can do it in NestJS here. Do not forget to gimme a star :).

  1. A CreateMultipartUpload call to start the process.

  2. As many individual UploadPart calls as needed.- General doc: UploadPart.

  1. A CompleteMultipartUpload call to complete the process.

CreateMultipartUploadCommand

Call UploadPartCommand n Times

CompleteMultipartUploadCommand or AbortMultipartUploadCommand

Refs