index

Ajax Fileupload to Google Cloud Platform with a Ruby Backend 2015-01-16

During my winter break from University I spent most of my free time working on features for a rails project of mine. One of these features involved allowing users of the web app to upload files, directly to google's cloud storage service using Ajax. Doing it this way would prevent the upload process from occupying my rails instance. Amazon offers a similar service called S3 and has existed for quite some time. As a result there are a lot of guides online for getting this file upload technique to work with S3 but, accomplishing this with the google's cloud service was quite finicky and the lack of guides and examples, especially for ruby, did not help. This is what ended up working for me.

On the client side we will need to fetch a signed url from our backend server. And our backend code will have to generate the url and a signature that authenticates the request. The signed url is just a url that points to

https://storage.googleapis.com/{bucket_name}/{filename}?Expires={timestamp}&Signature={signature}&GoogleAccessId={access_email}

where bucket_name is the name of a bucket we have created using the google dashboard and filename is the name of the file we wish to either create or read, depending on which HTTP verb we use. bucket_name + filename is what google calls a resource. Attached to this url are also parameters, timestamp which is a unix-timestamp of when this signed url no longer is valid and a signature which is a signed hash of the filename, the expiration time and the content-type. This hash needs to be signed using the private key of the service account for your Project on google's dashboard. And finally, access_id is the email address (not id) of the same service account.

Client-side

On the client side we only need a little bit of html for the fileupload form and then javascript to get the signed url from our server, which we'll use to send the file via ajax to google.

This is what the html looks like:

<form action="#" id="fileupload" enctype="multipart/form-data" method="post">
    <input id="file-select" multiple="multiple" type="file"/>
</form>

And this is the javascript:

var form = document.getElementById('new_attachment');
var fileSelect = document.getElementById('file-select');

fileSelect.onchange = function(event) {
    event.preventDefault();
    var files = fileSelect.files;
    // Loop through each of the selected files.
    for (var i = 0; i < files.length; i++) {
        var file = files[i];

        //Get a signed url for the file
        var gcloudUrl;
        {
            var params = 
            "consult_id=" + encodeURIComponent(consultId) +
            "&filename=" + encodeURIComponent(file.name) + 
            "&filetype=" + encodeURIComponent(file.type);

            var xmlhttp = new XMLHttpRequest();
            // `false` in open makes the request synchronous
            xmlhttp.open("GET", "/attachments/sign_upload?"+params, false); 
            xmlhttp.setRequestHeader("Content-Type", 
            "application/json;charset=UTF-8");
            xmlhttp.send(null);
            if (xmlhttp.status === 200) {
                gcloudUrl = data.url;
                console.log("PUSH URL IN:"+gcloudUrl);
            } else {
                alert('An error occurred during upload!');
            }
        }

        // Set up the request to google.
        var xhr = new XMLHttpRequest();
        xhr.open('PUT', gcloudUrl, true);
        xhr.onload = function () {
            if (xhr.status == 200) {
                // File(s) uploaded.
                uploadButton.innerHTML = 'Upload';
            } else {
                alert('An error occurred during upload!');
            }
        };
        xhr.setRequestHeader('Content-Type', file.type);
        xhr.send(file.slice());
    }
}

The javascript simply attaches an event handler to our input which will be called whenever the file input field changes, such as when we have selected a file. For each file that has been selected we get a signed url from our back-end code and use that URL to send a PUT request to google, using our retrieved url.

At first I tried this using jQuery, since the project uses this elsewhere. But it seems that the jQuery methods add extra headers to the request, I wanted the request to follow the Google's specifications as closely as possible and not include anything superfluous which could make the signature we generate on the server-side invalid. I am therefore using plain old XMLHttpRequest().

Now, sending PUT requests via ajax to a url that is not the current domain is not allowed and the browser will automatically block it, unless the receiving server tells us otherwise using Access-Controll headers (CORS).

Allowing Cross-site requests

Google allows you to set Access-Controll headers for each bucket using their command line tool. They explain the process very well here.

First we create a cors.json file:

[
    {
      "origin": ["ourdomain.com"],
      "responseHeader": ["Content-Type"],
      "method": ["GET", "DELETE", "PUT"],
      "maxAgeSeconds": 3600
    }
]

In the cors file we first specify which domains/origins we wish to allow to make cross-site requests to our bucket and then after that which headers are allowed and what methods. For development, setting the origin to * might be a good idea.

To apply the config to our bucket we use the gsutil tool:

> gsutil cors set cors.json gs://bucket-name

When the client-side javascript tries to make a request to our bucket on Google's domain the browser will automatically send a separate OPTIONS request to the same url to verify that the action is allowed. The browser will check to see if the received Access-control-allow-origin header (which is now set according to origin in our json file) matches the domain we are on. If the origins is set correctly our original request is sent.

Server-side Ruby

To generate a signed url we'll need the .p12 file for the Service Account and information about the file to upload which we'll parse from the request parameters.

Having to keep a .p12 file in my project is not ideal and requires a lot of extra infrastructure to get the file distributed quickly to a server, and storing it in version control is not good. My project is set up to retrieve sensitive data from environment variables, much like heroku does. The .p12 file content is encoded in such a way that it cannot be stored in a string. To overcome this I used ruby and it's crypto library to load the .p12 file and then print out the RSA string which I can distribute much easier:

#get RSA string of .p12 file
require "openssl"
file = File.read("bucket.p12")
p12 = OpenSSL::PKCS12.new(file,"notasecret")

#the key is loaded, now print the RSA string
put p12.key.to_s

According to the gcloud documentation the signature for our signed url is based on a string containing information about the request we want to allow in a certain order. This string will then need to be hashed using SHA256 and after that signed using the key of our Service account (we can use the RSA string that we got above instead) and then finally returned as a Base64 encoded string. Luckily ruby has a great crypto library that makes this easy.

The string to sign has the following components Verb, ExtraHeaders, ContentTypeHeader, Expiration, PathToFileInBucket all of which should be separated by a line break, even if it is empty!

def gcloud_signature(verb, expiration, filetype, resource)
    string_to_sign = verb + "\n" +
        "\n" +
        filetype.to_s + "\n" + 
        expiration + "\n" +
        resource
    digest = OpenSSL::Digest::SHA256.new(string_to_sign)
    key = OpenSSL::PKey::RSA.new(Rails.application.secrets.google_cloud_storage_key)
    signature = key.sign(digest,string_to_sign)
    return Base64.encode64(signature).chomp.gsub(/\n/, '').gsub(/\//,'%2F').gsub(/\+/,'%2B').gsub(/\=/,'%3D')
end

The string to sign first gets hashed using SHA256, signed using our key and then returned in Base64 encoding. This string will also need to be encoded so that it we can pass it as a url parameter.

At first I used ruby's URI.encode(string) method to do the URL encoding, which did not catch all of the characters. Instead I manually replace the illegal characters with their URI equivalent codes. I also remove any line breaks.

Lastly, we need to respond to our ajax request and return a url. This code is in a rails specific context.

def sign_upload
    filename = params["filename"]
    filetype = params["filetype"]
    resource = URI.encode("/bucketname/#{filename}")
    expiration = (Time.now + 10.minutes).to_i.to_s
    signature = gcloud_signature("PUT",expiration,filetype,resource)
    signed_url = ("https://storage.googleapis.com" + resource +
        "?GoogleAccessId=" + Rails.application.secrets.google_cloud_storage_access_id +
        "&Expires=" + expiration + 
        "&Signature=" + signature
    )

    render :json => {
        url: signed_url
    }
end

The GoogleAccessId is the email-address of the Service Account.


Thats it! The web form should now be able to upload to google directly using the signed url. The form will still probably need a bit more work to let the user know what is going on, such as adding a progress bar.

Here is a video of the finished form with extra UI work in my project, uploading files via ajax directly to Google cloud storage using signed urls:

Resources: