Setup of the MASLO store - academic-colab/maslo-server GitHub Wiki

Setup of the MASLO store

Introduction

In order to be able to publish and retrieve MASLO content packs, setup of a centralized store is necessary. While many Cloud-based solutions could be leveraged to do this, the MASLO project recommends and herein documents two methods to implement the MASLO store front: Amazon’s Amazon Web Services (AWS) or a simple web server.

Amazon AWS

Amazon’s AWS is a reliable and cost-effective way to host MASLO content. No hardware or software needs to be purchased. Therefore this is the recommended method for institutions or individuals who do not already have a web service infrastructure in place.

Setup - Account creation:

To get started with AWS, you need to create an AWS account on http://aws.amazon.com/. Once you’ve entered and submitted your data, your account should immediately be available. If your are a new user, you may be eligible for Amazon’s free tier, which gives access to basic AWS services for free for one year (http://aws.amazon.com/free/). Depending on your needs, this may be enough to cover your first year of MASLO use.

Creating your EC2 and S3 instances

Your MASLO store relies on two AWS technologies: EC2 to run the basic web server software on and S3 to persistently store your content packs. Once you are logged in with your newly created account, you will be able to enter the AWS management console and create new EC2 instances and S3 buckets.

To create a new S3 bucket, go to the S3 tab in the AWS management console. Pick a name for your bucket, but bear in mind that the bucket name needs to be globally unique. Later on, your new bucket will be available via http://bucketName.s3.amazonaws.com/, where “bucketName” is the name of your bucket. This bucket name will be used later in the software configuration step.

To create a new EC2 instance, open the EC2 tab of the AWS management console, and click “Launch Instance”. Select the “Classic Wizard”. We recommend selecting “Ubuntu Server 12.04 LTS 64 bit” as your instance. In case you choose a different operating system, parts of this document may not apply. Follow the rest of the setup guide. When it comes to firewall settings, you should open ports 80 and 443 (please refer to AWS documentation on security group rules in case more detailed instructions are needed). Those ports are needed to be able to talk to the EC2 instance via HTTP and HTTPS protocols. Additionally you need to make sure port 22 is open as this port is needed to use SSH.

Once you’ve created and started your instance, you will be able to log in and setup the software side of things. Follow Amazon’s documentation on how to SSH into your newly created instance – typically you will need to know the IP of the new instance and the secret key you have created during instance setup. Bear in mind that root login is disabled for the instances, therefore you will have to SSH as user “ubuntu”. A typical AWS SSH command will look like this

ssh –i /path/to/aws/key [email protected]

where “my-instance-at.amazonaws.com” is your instance’s address. Once logged in, you will be able to run “sudo –s” to gain root access. You will need to install a couple of packages. In order to do so run (as root):

apt-get install apache2 libapache2-mod-php5 
apt-get install sqlite3 libsqlite3-dev zip php5-sqlite 

This will automatically start your Apache2 web server. You will want to make sure that it works, and also that your security setup, during which you should have opened ports 80 and 443, worked correctly. Open http://my-instance-at.ec2.amazonaws.com in a web browser. As before the host name is only a placeholder for your actual instance’s address. You should see the Apache test page if everything worked correctly.

You may also want to test that PHP works correctly. In order to do so, you should place a file named “test.php” in /var/www/. Put the following content in it:

<?php
	echo “PHP test”;
?>

In a web browser open http://my-instance-at.ec2.amazonaws.com/test.php. If all you see is “PHP test”, everything works correctly and you are good to go.

It is recommended to also configure HTTPS. Please refer to general literature on how to perform HTTPS setup with Apache. You should test your configuration by opening https://my-instance-at.ec2.amazonaws.com/test.php in a browser.

Software

Now you are ready to use your newly configured web server. From here on, the procedure mostly corresponds to the second method, the web server setup. You need a way to move the MASLO store software to your Amazon EC2 instance. Common ways to do this are to either install Git on the EC2 node, and retrieve a version that way, or to download it to your personal machine and SCP it to the EC2 instance. Assume you have retrieved the store software and created a zip archive named “maslo-server.zip”. Your upload command would look similar to

scp –i /path/to/aws/key maslo-server.zip [email protected]:.

This command would store your archive in the “ubuntu” user’s home directory, i.e. /home/ubuntu.

Using a generic web server as MASLO store

If you already have web server infrastructure up and running, the easiest way to set up a MASLO store is to grab the store software and to install it there. Requirements for this to work are:

  • Linux or Mac OS X server
  • Apache2 web server with PHP5 support
  • PHP5 with SQLite3 support
  • Python2 at version 2.5 or newer with SQLite3 support
  • Zip/UnZip

Retrieve the server software from GitHub (https://github.com/academic-colab/maslo-server). Create a directory in your web documents directory. The default for this on Linux is /var/www. Consult with your provider if this is different in your case. Some website hosting servers may retrieve content from a “public_html” directory in your home folder. In other words, create a directory named “maslo” and put all the server scripts retrieved from GitHub into that folder. This directory has to have read-write-execute permissions for the web server user (often “nobody” or “www-data”). It’s preferable if the MASLO script directory is owned by that particular web server user (in case of Ubuntu, that would be “www-data”). To do this, run the following command as user “root”:

chown –R www-data maslo 

You may need to investigate which user is correct for your system if you are using a different distribution than Ubuntu Linux.

After copying all the scripts and making sure permissions are correct, just open the path to the admin index.php in your browser, i.e. https://your.domain/maslo/admin/index.php (or HTTP, if at this point you haven’t created your HTTPS certificates yet, which we highly recommend). This will create the rest of the infrastructure for you.

In case this doesn’t work right away (i.e. you see an empty screen instead of a log in prompt), try (as root) un- and reinstalling PHP5-SQLite:

apt-get remove php5-sqlite
apt-get install php5-sqlite

And try accessing the admin index page again.

You should be able to log in with the default admin user/password: “masloAdmin” and “initMASLO”. It is highly recommended that you change the default password by editing the “masloAdmin” user in the user management console. You can create new users in the management console and can view/delete uploaded content packs. In order for your users to be able to push to the new MASLO store, you need to provide them with a log in and the server configuration https://your.domain/maslo/, which they have to enter in the MASLO authoring tool settings. At this point your MASLO server is ready to use.

Software configuration

The software is kept relatively simple. The only configuration knobs to consider are whether or not S3 storage should be used. More details about this can be found in the software configuration files section in the technical details appendix.

Appendix: Technical details

This section contains detailed information about the server software (i.e. a more detailed list of files and their responsibilities and database schemas). Software content

Top level directory:

FTS.py
admin/
config.json
doZip.sh
index.html
index.php
login.php
s3sdk/
search.php
stopwords.txt
traverse.php
upload.php

Admin Panel content:

./admin:
admin.db
css/
icons/
images/
index.php
js/
modify.php
overview.php
util.php

CSS contents for admin panel

./admin/css

Icons for admin panel

./admin/icons

Images for admin panel look-and-feel

./admin/images

Admin panel JavaScript helpers: look-and-feel, encryption, and user authentication

./admin/js:

help.js
jquery-1.6.1.min.js
jquery-ui-1.8.14.custom.min.js
jquery.cuteTime.min.js
jquery.watermark.min.js
md5.js
sha256.js
user.js
util.js

Amazon S3 SDK:

./s3sdk:

README.md
authentication/
config.sample.inc.php
extensions/
lib/
sdk.class.php
services/
utilities/

Software configuration files

All needed configuration is related to S3 access. By default the configuration files have S3 access disabled, which means that store content is stored on the server machine the store software is running on. If you choose to enable S3 access, you have to modify the following files:

Copy

./s3sdk/config.sample.inc.php

to

./s3sdk/config.inc.php

and enter your AWS credentials as described within the file. The only necessary values to be changed are “key” and “secret”. “key” is the name of your credential key, and “secret” is its value.

Additionally you need to change the values in

config.json

set “wantS3” to “true”, and change bucket and base directory names to match your desired configuration. “bucket” needs to be set to a bucket value that already exists in your S3 instance.

What does the store software do?

When you access the admin panel for the first time, the store software will create a couple of folders and databases:

Folders:

uploads/
uploads/tmp/

The “uploads” folder is used to process and store uploaded content packs. In case S3 storage is used, it is only used to process uploads until they have been moved over to S3.

For each individual upload a temporary folder is created in “uploads/tmp” to allow for parallel uploads from different users.

Databases:

users.db
admin/admin.db
uploads/search.db

The store software creates and uses three different databases. The “users.db” database contains all users with upload privileges. The “admin.db” database, which shares the same schema with “users.db”, contains all users with administrative privileges, i.e. users who are allowed to log into the admin management console, to create new users, and to delete users and content packs.

The role schema is kept very simple: There are only three user roles:

  • Guests, who are allowed to see the overview of existing content packs,
  • Authors, who are additionally allowed to upload content packs, and
  • Administrators, who are additionally allowed to add/delete users and to delete content packs in the admin panel.

The store software supports several tasks, namely, user authentication, content pack upload handling, serving content pack requests, global search, and store and user administration.

User authentication

The first step to uploading a content pack is authenticating. The store server has to make sure that the user attempting to upload a content pack has the permissions to do so. Therefore it checks the uploading user’s credentials against its user database and denies upload access if the user can’t authenticate successfully.

Involved scripts: upload.php, login.php

Content Pack Upload Handling

Content pack upload requires the store software to receive uploaded files, index textual content, and create and maintain full-text-search databases. Each content pack upload consists of a number of 1MB-sized chunks. Upon upload completion, the software combines those chunks to one zip file, which then is unzipped and processed. For each uploaded content pack, a separate database is created with just the individual pack’s indexed content. The same data is also added to the global content pack database.

Involved scripts: upload.php, FTS.py, doZip.sh

Serving Content Pack Requests

Mobile clients accessing the “store” frontend in their UI request a content pack summary from the store server. The server has to be ready to provide this summary, as well as download links for each individual content packs.

Involved scripts: index.php, traverse.php

Global Search

MASLO search relies on FTS3 support of SQLite3, and so does global search. Therefore global search issues the search query to the SQLite database containing information about all existing content packs for a given instance.

Involved scripts: search.php

Store and User Administration

The MASLO store software comes with a web admin management console. This allows for convenient user management and offers a way to delete undesired content packs.

Involved scripts: admin/*

Database schemas

The database schemas are basic and simple. User/Admin databases store information about users. This task could be accomplished by having just one database with a boolean field for “isAdmin”. However, the intention is to keep administration separated from content provision. The entire system could persist without even having the admin console in place, hence the redundancy in user data.

The search tables make use of SQLite’s FTS functionality. Besides the virtual table “content_search”, basic information about a content pack is stored in content: pack name, path to the pack download file, pack version, and author of the pack.

Appendix: Command Line Use References

How to use the Ubuntu command line: https://help.ubuntu.com/community/UsingTheTerminal/

How to use SSH: https://help.ubuntu.com/community/SSH/

How to use SCP: https://help.ubuntu.com/community/SSH/TransferFiles

What is HTTPS: http://en.wikipedia.org/wiki/HTTP_Secure