1.8 Using Cloud Storage - grzzboot/pingpong-service GitHub Wiki
Cloud Storage, or just Storage, is GCP's Bucket solution. Buckets รคr places where you can stora large data BLOBs like files. If you have the need to distribute documents, public or private, to end users then using Cloud Storage for storing and accessing is much better than streaming the files through your own API.
Again we are going to abandon the previous version of pingpong-service and move on to yet another one. Navigate into the pingpong-service-cloud-storage folder of the project.
In this example we're going to upload files, public and private, and then produce links to these and indeed very that they are public and private respectively. We'll also take the opportunity to learn about a very crucial mean-of-service-access method called Service Accounts.
It is possible to mix public and private documents in the same Bucket but in this example we'll use two different Buckets. This will also allow us to learn how to configure a completely public and private Bucket in GCP.
First, let's create the public Bucket. To work with Buckets you need the gsutil
component installed in your SDK (unless you want to do this from the Cloud console of course). You can check if you have the gsutil
using the following command:
gcloud components list
Then, in the output, there will be an entry for gsutil
which is like:
Installed โ Cloud Storage Command Line Tool โ gsutil โ 3.5 MiB
If it doesn't say Installed
you don't have it. No worries, install it by using the following command:
gcloud components install gsutil
Once we have it we can create a Bucket using the following command:
gsutil mb -b on -l europe-west3 gs://pingpong-public-docs/
This should be quite swift and you can confirm that the Bucket exists by browsing the Cloud console, choosing Storage and then Browser. You should see something similar to this:
We said that this should be a completely public Bucket but it's not (yet). Let's make it so. You can make a Bucket completely public by adding allUsers as objectViewer. You can read more about Buckets and IAM, making data public and Cloud storage roles in the GCP documentation. Run the following command:
gsutil iam ch allUsers:objectViewer gs://pingpong-public-docs/
WARNING: This is something you DON'T EVER want to do on a real Bucket that contains files that are not 100% public by nature!!!
A good way to verify that the Bucket is working and that the contents are public is to add a document into it and then simply fetch the same document through a browser using its public URL. There is a small sample PDF-document, 'ping.pdf', in the k8s folder of this project to help you with that.
Run the following command, from the k8s-folder, to upload the document into the Bucket:
gsutil cp ping.pdf gs://pingpong-public-docs
You should see something similar to this being the console output:
Copying file://ping.pdf [Content-Type=application/pdf]...
/ [1 files][ 8.5 KiB/ 8.5 KiB]
Operation completed over 1 objects/8.5 KiB.
Now we should be able to access/download the PDF through a completely public URL. The URL will be https://storage.googleapis.com/pingpong-public-docs/ping.pdf
, that's just the way it is.
All Buckets are always available through
storage.googleapis.com
, given that you have the proper authority to access them of course, but you can also create and associate your own domain to a given Bucket. So we could register a domain for our Bucket, likestorage.pingpong.com
, and then tie it to our Bucket name/instance. Then the URL would be simplified to justhttps://storage.pingpong.com/ping.pdf
. Here the Bucket name is implicit because of the domain <-> Bucket association. It would however still be possible to access the document using the originalhttps://storage.googleapis.com/pingpong-public-docs/ping.pdf
.
Open up an Incognito window, or use another browser where you are not authenticated to GCP, and go to the URL https://storage.googleapis.com/pingpong-public-docs/ping.pdf
just to prove the point that it really is public.
You can now delete the document, we'll soon be uploading it through a REST-service instead.
gsutil rm gs://pingpong-public-docs/ping.pdf
By default the browser will cache the document for a little while so it may appear like it's still there for a few seconds/minutes.
Now let's create the private Bucket. In its simplest way this is done by just adding the Bucket as we did first with the public one and then simply NOT making it public.
gsutil mb -b on -l europe-west3 gs://pingpong-private-docs/
That should be all. Let's upload the same document into that one as well and make sure we can't access that document publicly.
gsutil cp ping.pdf gs://pingpong-private-docs
Open up an Incognito window, or use another browser where you are not authenticated to GCP, and go to the URL https://storage.googleapis.com/pingpong-private-docs/ping.pdf
just to prove the point that it really is private. You should see something like this:
Let's clean up and leave two empty Buckets before we start with the fun stuff.
gsutil rm gs://pingpong-private-docs/ping.pdf
Now we're gonna learn how to manipulate and access contents inside the Bucket(s) from a REST-service deployed in GKE. As with essentially all GCP SaaS services you can access them from any location really, be it inside or outside your GCP project. In the case of the PostgreSQL database and the Memorystore cache we accessed them using an IP through a network route internally in the Cloud. This was convenient and fast, but for Cloud Storage there is no internal IP address that we can use. In this case we need to use the Service Account approach, and in fact you can for any service really, even databases and caches.
Service Accounts (SAs) are accounts intended for services/software/machines that wants to access your project. This is in contrast to IAM accounts which represent people that have access to your project. But, identically to IAM accounts, SAs can be equipped with roles and some resources can be mapped more granular with a given SA and permissions associated to it. That's the case with Buckets.
So, since we're gonna be using a REST-service, pingpong-service-cloud-storage, we'll create a service account for it. An SA can be created using the following command:
gcloud iam service-accounts create pingpong-cloud-storage-sa \
--description="A service account for the Pingpong service using Cloud Storage" \
--display-name="pingpong-cloud-storage-sa"
You can double check your SA in the Cloud console, it should look similar to this:
Now, for something external, like our REST-service, to assume the identity of this SA we need to create a key that it can use for authentication.
A key can be created using the following command:
gcloud iam service-accounts keys create ./pingpong-cloud-storage-sa-key.json \
--iam-account pingpong-cloud-storage-sa@pingpong-site1-gcp-demo.iam.gserviceaccount.com
This command will also download the key and place it inside the folder where the prepared kustomize scripts expects it, that's what the ./pingpong-cloud-storage-sa-key.json
is about. So be sure that you are in the correct folder, <git-root>/pingpong-service/pingpong-service-cloud-storage/src/k8s
, when running the command.
Using this key it will now be possible for our service to authenticate, however this SA still has no privileges inside the Buckets, so let's add some for viewing and creating documents.
gsutil iam ch serviceAccount:pingpong-cloud-storage-sa@pingpong-site1-gcp-demo.iam.gserviceaccount.com:legacyBucketWriter,legacyObjectReader gs://pingpong-public-docs/
gsutil iam ch serviceAccount:pingpong-cloud-storage-sa@pingpong-site1-gcp-demo.iam.gserviceaccount.com:legacyBucketWriter,legacyObjectReader gs://pingpong-private-docs/
After this you can inspect you Bucket(s) and see that indeed there are permissions granted for both. It should look similar to this (only displaying the private Bucket here):
Now it's time to deploy our REST-service that will make use of the service account. If you've deleted the cluster used previously refer to the GKE section to set it up again. You need only a single node cluster for this.
From the k8s-folder of the pingpong-service-cloud-storage perform a deploy as per usual and when the service is up and running with an allocated LoadBalancer IP you can navigate to the following:
http://<YOUR_IP>:8080/swagger-ui.html
It's a bit different from the previous URL:s and that's because we're using a Swagger UI for this service. It makes it a lot easier to POST a file to the service, it's mainly that actually! When you get to the page you're gonna see a rather colorful OpenAPI definition of this service. It offers 4 endpoints, two for reading public/private (GETs) and two for uploading public/private (POSTs). If you click on one of the POSTs and then the Try it out
-button you'll see the help. Swagger is creating a file upload HTML component for us. We don't need to create binary data and POST that using a cURL command. Thank you Swagger!
So why not try the endpoint for uploading a public document first? Choose/click on the definition of the POST /ping/documents/public
if you haven't already and then select the ping.pdf document that is placed under the k8s-folder. Then just click the BIG blue Execute
button.
Now if you browse the objects of thepublic Bucket, either by using the web based Cloud console or by using gsutil you should see that the document is indeed inside the Bucket. The services has uploaded it using the authority of the service account. If you want to you can remove the roles for the service account from the Bucket and try an Execute
again. Then it will fail.
For the POST operation it's enough to remove the legacyBucketWriter role to make it fail. The service has poor error handling and will output something quite revealing like so:
{
"timestamp": "2020-09-15T14:58:26.538+0000",
"status": 500,
"error": "Internal Server Error",
"message": "pingpong-cloud-storage-sa@pingpong-site1-gcp-demo.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object.",
"path": "/ping/documents/public"
}
Now that we have a document in the Bucket we should try to get a link to it as well. Use the GET /ping/documents/public/{name}
-endpoint to do so. It takes a name
as parameter and you need to type the entire name, including pdf, so; - ping.pdf
. The service isn't very user friendly...
When you've executed that request you should get the following response body:
{
"url": "https://storage.googleapis.com/pingpong-public-docs/ping.pdf"
}
Now go ahead and upload the same document using the POST /ping/documents/private
-endpoint and then request a URL to that one by invoking the GET /ping/documents/private/{name}
-endpoint.
Now you'll get a different URL, a much longer one.
{
"url": "https://storage.googleapis.com/pingpong-private-docs/ping.pdf?GoogleAccessId=pingpong-cloud-storage-sa@pingpong-site1-gcp-demo.iam.gserviceaccount.com&Expires=1600182292&Signature=YHvTVg1FCqXbj1DCcEQbAAZUrh%2FQ3eog3pDwTY1Pts5HwVg1d%2B8Mvfea23YC6WDlBZEUfj1nmxAKbNuKm%2BM94kXb1zKfp7Dx1ujIDDIErA7dti8ZrFy6QgExJY64bKakhAr9u%2BEBrHvnsVO%2B5sVrm4XXHGnZ2KdCCp3FHa6MN7%2F08Ul5HawYvpemv3x4SAYvkVIdnX6AO%2B8VYBiX%2FMAjNaRu1OmVN81S2sQBzPp7UD56%2BPcLim05OG9IUG4tq89Ibnutyl3Pc7KRB5FkRxXLs2KKTKkP9bhRxoVPdp%2B4Ue4eBzXFTVhhVtyVJZrw1u2CI4CKywUdUc6v%2BJejx4GHnw%3D%3D"
}
This is a signed URL and it will only be valid for 60 seconds because that's how our pingpong-service is configured. It has the following entry in its application.properties
-file:
buckets.private-storage.default-duration=60
So this is a way to keep documents protected but still issue, for a given time, a valid link to an end user. You don't need to enforce the given time yourself, the Cloud Storage platform will do that for you.
That's pretty much it for this part. We've learned about
- Public and private Buckets
- Service accounts
- Buckets IAM
- Signed URL:s for non-public objects