Librivox Audiobook player for Alexa - ranjitiyer/audiobooks GitHub Wiki

Background and Objective

Librivox is a group of worldwide volunteers who read and record public domain texts creating free public domain audiobooks for download from their website and other digital library hosting sites on the internet. The objective of this project is to make Librivox books available through an Amazon Alexa skill. Alexa is a voice assistant device from Amazon that can answer natural language questions, play music and run third-party developer authored voice-driven applications, called Skills. This project aims to develop an Alexa skill that to allow searching for and playing Librivox audio books on the Alexa device. Information about the Librivox project and Alexa developer skills can be here and here.

Features

This skill, let's call it 'Free Books', should at a minimum strive to provide features that users have come to expect from an audio book player. The paradigm difference here being that Free Books is a voice driven application as opposed to a conventional user interface driven one. A voice driven audio book player must allow users to search for books by criteria (author, title, genre, etc), paging through results and finally playing, pausing and resuming books.

Design Considerations

Some of the core functions of this skill would be to:

  • Search for a book
    • By name
    • By author
    • By genre
    • By recency
  • Page through search results
  • Play a book
    • Play
    • Pause
    • Resume
    • Stop
  • Manage user state
    • Persist information like book, chapter, current offset inside a chapter to support pause and resume.

Books Search

The search features of the skill can either be implemented with a combination of Librivox/Archive public APIs or by building a-priori, a private books database local to the skill (with a mechanism to keep it current) to support advanced searches and low latency responses.

Review of public APIs:

In order to make a good design choice, it is important to discuss examples of how users might search for books on Free Books. (The examples assume that the skill has already been launched via the voice command 'Alexa, launch Free Books'

  • Find a book by author <first/last name>
  • Find a book by genre
  • Find a book by title <full title|key words>
  • Play a book (a random book)

Most exploratory search queries will return a multi-record result set that the user must be allowed to page through and select the book they want to play. I'm proposing that creating a private copy of the entire Librivox books database tuned to support fast and rich querying capabilities is the optimal choice. I first start by stating some reasons as to why the existing Librivox and Archive public APIs for book searches is not a good choice here. Let's work through examples and understand the challenges with the public APIs.

  • Find a book by author <first/last name>

Although this search can be achieved in librivox, the API seems limited to searching by an author's last name only. Also in a quick test with a valid last name, I got 0 responses back. It's unclear to me if the API is functionally incomplete or insufficiently tested.

https://librivox.org/api/feed/authors?last_name=Mayde&format=json
{"authors":[]}

To prove this author exists, I previously ran a query to get back all the authors in Librivox

https://librivox.org/api/feed/authors?format=json
{"authors":[{"id":"13487","first_name":"Richard","last_name":" Mayde","dob":"0","dod":"0"},
  • Find a book by genre

I did not find a public API to search books by genre

  • Find a book by title

Internet archive supports title search (it internally uses Lucene) but the results include non-librivox items as well, so additional querying is required for narrow the results to the books we care. For example, searching for the book 'Art of War' returned at least 4 books with the librivox ids.

https://archive.org/advancedsearch.php?q=title%3A%22art+of+war%22&fl%5B%5D=identifier&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=50&page=1&output=json
{"responseHeader":{"status":0,"QTime":90,"params":{"query":"title:\"art of war\"","qin":"title:\"art of war\"","fields":"identifier","wt":"json","rows":"50","start":0}},"response":{"numFound":280,"start":0,"docs":[{"identifier":"art_of_war_librivox"},{"identifier":"cd_the-art-Havoc_legionnaire"},{"identifier":"winampskin_art_of_war_by_h2o"},{"identifier":"iuma-artofwar_1402_librivox"}]}}

artofwar_1402_librivox
art_of_war_librivox
artofwar3_pc_librivox
art_war_ps_librivox

In addition to these I've experimented with a few other APIs that aren't complete in themselves. For example, section mp3 links from Librivox are broken or missing that information from metadata for certain books.

  • Export all books
https://librivox.org/api/feed/audiobooks
  • Get book metadata from Librivox
https://librivox.org/api/feed/audiobooks?id=47
  • Get book metadata from Archive
http://arhive.org/metadata/artofwar_1402_librivox

Overall, I felt that a lack of adequate documentation and a published latency SLA, and at times what seems like functional incompleteness, makes it hard to rely on the public APIs to power searches (which I'd argue is very crucial to the overall user experience) in production Skill. [I'm happy to be proven wrong here]. Due to these reasons, I decided to run a batch job to collect metadata about every book in Librivox using a combination of APIs from Librivox and Archive and persist them in AWS S3. I then plan to use S3 Select (SQL on S3) to query book metadata. This gives us the flexibility to modify the schema at will to support optimal and fast querying. There is also no longer a need to make multiple public API calls to collate information about a single book, making for a responsive search experience in the Skill.

Implementation

The script is implemented in Python. It starts by collecting a batch of books from Librivox, and make IA API calls to get section mp3 information. It then normalizes some deeply nested structures to make them easier to query with SQL over JSON. I've provided snippets for some of the main steps in the script's flow.

Get book meta-data from Librivox

def get_books(limit, offset):
	url="https://librivox.org/api/feed/audiobooks?limit={}&offset={}&extended=true" \
		.format(limit, offset)
	response=requests.get(url)

Get an internet archive id for a book

def find_ia_id_from_title(title):
	_params={'q' : title,'fl[]': 'identifier','page': 1,'output': 'json'}
	url='https://archive.org/advancedsearch.php'
	response=requests.get(url, params=_params)
	_json = json.loads(response.content)
	docs=_json['response']['docs']
	lib_docs=list(filter(is_librivox, docs))
	if len(lib_docs) > 0:
		# Testing all potential librivox ids until a valid one is found 
		for doc in lib_docs:
			if is_id_valid(doc['identifier']):
				return doc['identifier']
	return None

Get Section Mp3 metadata

def get_sections(ia_id):
	# get file server base path
	path = get_ia_file_server_base_path(ia_id)

	# find section mp3 names
	sections={}
	url="http://archive.org/metadata/{}/files".format(ia_id)
	response=requests.get(url)

After normalizing authors and genres, we persist books in S3

BUCKET='librivox-audiobooks-db'
KEY='books.json'

s3=boto3.resource('s3',aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
	aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'))
response=s3.Bucket(self.BUCKET).upload_file('books.json', self.KEY)

An example book in S3 looks like this

We finally persist all books in S3 where a book JSON looks like this https://gist.github.com/ranjitiyer/c1054835adf15cf5e05ae3d20c02c70b

{
  "id": 56,
  "title": "Secret Garden",
  "description": "Mary Lennox is a spoiled, middle-class, self-centred child who has been recently orphaned. She is accepted into the quiet and remote country house of an uncle, who has almost completely withdrawn into himself after the death of his wife. Mary gradually becomes drawn into the hidden side of the house: why does she hear the crying of a unseen child? Why is there an overgrown, walled garden, its door long locked? (Summary by Peter)",
  "num_sections": 27,
  "sections": {
    "1": {
      "number": 1,
      "name": "01 - There Is No One Left",
      "url": "https://ia600204.us.archive.org/15/items/secret_garden_version2_librivox/secretgarden_01_burnett_64kb.mp3"
    },
     . . 
    "28": {
      "number": 28,
      "name": "28 - In The Garden, Part 2",
      "url": "https://ia600204.us.archive.org/15/items/secret_garden_version2_librivox/secretgarden_27b_burnett_64kb.mp3"
    }
  },
  "authors": [
    "Frances Hodgson Burnett"
  ],
  "genres": [
    "Children's Fiction"
  ],
  "totaltime": "9:08:25"
}

All books (which I'm estimating are in 10s of thousands) will reside in a single .gz file and can be queried over using SQL alone. In the example below, we obtain a random book to play (to possibly support 'Play random book' feature)

r = s3.select_object_content(Bucket=BUCKET,
            Key=KEY,
            ExpressionType='SQL',
            Expression="select * from s3object limit 1")

We can now easily search by authors (first and last name), genres, description, etc.

select title from books where title like %keyword%
select title from books where description like %keyword%

Voice interaction

Here are some proposed user voice interactions with the Free Book Alexa skill

Session

U: Alexa, ask Free Books to read a book A: Playing '<Title>'

The skill plays a random book from the book database

[TODO: More interactions need to be spec'd out]

Resources

https://librivox.org/api/info https://archive.org/help/json.php https://blog.archive.org/developers/

⚠️ **GitHub.com Fallback** ⚠️