Write your first connector - Phyks/konnectors GitHub Wiki

Introduction

First, let's explain what a Konnector is. It's a piece of software to fetch some data from a vendor website or API. It's useful when you want to retrieve information like your sleep activity, your phone bills or your last tweets.

Fetched data are stored in your Cozy with the aim to be reused by other applications. Data are stored as records in the database. Data provided as PDF files (like bills) can be downloaded too. They will appear in your Files application like any file.

Adding a connector doesn't require to understand the full Cozy app architecture. You only need to be familiar with the asynchronous programming of Node.js.

All the logic will be contained in one file. To add your connector, you have to add this file to the Konnectors folder.

In this guide, we will walk you through the steps to get your first connector to fetch Bills from a fictitious website "MyAwesomeWebsite".

First, you will need to add a myawesomewebsite.js file to the Konnectors folder. Your file should follow a specific structure to work properly. A typical connector file is divided in 3 parts:

  1. dependency declaration
  2. connector description
  3. fetching logic.

Let us detail more in depth what should be put in each part.

Dependencies

Like in any Node.js module you have to declare your external dependencies at the beginning. For that, simply use the require function.

Typically, for our example connector, we will use:

const baseKonnector = require('../lib/base_konnector');
const async = require('async');
const moment = require('moment');
const request = require('request');
const cheerio = require('cheerio');

baseKonnector is the base declaration of a konnector, which you should extend. It is mandatory. The other dependencies listed above are the most commonly used. You sould crosscheck with your needs and import only libraries you need. Here is why these libs might be interesting for your connector:

  • async to run some operations asynchronously.
  • moment will ease date management a lot.
  • request is great to fetch webpages you will need to scrap.
  • cheerio will allow you to select elements from a downloaded html page (scrap the webpage) like if you were using jQuery.

Note: There is also a request-json library available which can be used as a drop-in replacement for request to ease a lot the communication with a JSON API.

Konnector definition

⚠️ This section is still being written

The application needs a definition for your Konnector, composed of:

name
the displayed name of the Konnector
<dt>vendorLink</dt>
<dd>the website of the vendor related to the konnector. A link to the website is displayed to the user so that he can check he is providing the good credentials. This link is just displayed for info and not used at all for any fetching operation</dd>

<dt>models</dt>
<dd>list of models used by the konnector (we just link previously declared models to the konnector). For more information about models, head over to <a href="https://dev.cozy.io/#main-document-types">the Cozy developper doc</a>. A list of available models in Konnectors is available <a href="https://github.com/cozy-labs/konnectors/tree/master/server/models">here</a>.</dd>

<dt>fields</dt>
<dd>map of fields offered to the user for configuration of this connector (see below for a detailed list of available fields types)</dd>

<dt>fetchOperations</dt>
<dd>list of operations to perform in order to fetch data on the target website</dd>

In our example connector, the base description would be:

module.exports = baseKonnector.createNew({
  // Basic information on the connector
  name: 'My Awesome Website',
  vendorLink: 'https://www.example.com/',

  models: [
    Bill  // Our connector will only fetch Bills
  ],

  // Required fields for our connector
  fields: {
    login: 'text',
    password: 'password',
    folderPath: 'folder'
  } 

  fetchOperations: [
    // We will complete this part at the end of the next section
  ]
});

List of available field types

Here is a list of all the available fields type for your connector:

text
simple plain text field
<dt>password</dt>
<dd>hidden text field</dd>

<dt>folder</dt>
<dd>selection box to chose a folder from the file application (required to save downloaded files)</dd>

Fetching logic and layers

Konnectors uses an architecture based on layers to perform the fetching operations. Layers work the same way as Express middlewares and they will perform all the fetching, parsing and storing operations.

The parameters and storing data structures are passed from one layer to the next one. Each layer will push new data in these data structures, so the execution order matters. Each layer is a function with a special signature:

const myLayer = function (requiredFields, billInfos, data, next) {
  // We will discuss the arguments and what to put here in a few paragraphs
}

Layers are useful as they can be shared between connectors and avoid a lot of code duplication (it may be unclear at the moment, but will most likely make sense after you read this section). Typically, in this simple example

  fetchOperations: [
    logIn,
    parsePage,
    saveData
  ]

our connector will create a fetcher object that will run the three operation layers (logIn, parsePage and saveData) in this given order. They are the only thing specific to your connector that you should write and everything else is shared with other connectors. These three functions will perform the necessary steps to retrieve data and save them into the Cozy:

  • logIn will log us on the website the connector is using.
  • parsePage will scrap the page, to extract bills.
  • saveData will save the found bills into our Cozy.

Each operational layer is a function that should have a predefined signature, with the following arguments:

requiredFields
Values given by the user for each field listed in the description
entries
Object to store information related to entries to store into the cozy. This object is passed through all operation layers.
data
Object to store raw information retrieved from the target website. This object is passed through all operation layers.
next
The function to call when it's time to call the next layer. The operation ends when this function is called.

Here is a typical implementation of the operational layers for our example connector:

/**
  * Login layer
  * 
  * This layer will perform the log in and save the session cookie in the request library, to stay logged in in further requests.
  */
function logIn (requiredFields, entries, data, next) {
  // Define request call options
  const options = {
    method: 'POST',
    jar: true,  // Jar is true to store the session cookies in request library
    url: 'https://example.com/login/',  // The login page
    form: {  // The POST data to send to perform login
      login: requiredFields.login,
      password: requiredFields.password,
    }
  };
  
  // Run the query
  request(options, function (err, res, body) {
    // Request stores session cookie in RAM. So now, we can request 
    // the website like we are logged in.
    // Typically, we should check that login was successful here,
    // but for this simple example, we will skip it and assume
    // everything went well
    next();  // Call next layer
  });
}

/**
  * Page parsing layer
  * 
  * This layer will parse the bill page and store the result in the entries field.
  */
function parsePage (requiredFields, entries, data, next) {
  const url = 'https://example.com/bills/';  // The page which lists the bills
  
  entries.fetched = [];  // This will store the fetched bills
  
  // Do the query
  request.get(url, function (err, res, body) {
    // Now, let us parse the results
    // For more information on this, see the next page of this guide
    let $ = cheerio.load(body);
    $('.pane li').each(function () {
      let amount = parseFloat($(this).find('.amount'));
      var date = moment($(this).find('.date'), 'YYYYMM');

      // Push fetched entry
      entries.fetched.push({
        date: date,
        amount: amount,
        vendor: 'MyAwesomeWebsite',
      });
      
      next();  // Call next layer
    });
  });
}

/**
  * Save data to the Cozy layer
  */
function saveData (requiredFields, entries, data, next) {
  // Handle all the fetched entries asynchronously
  async.eachSeries(entries.fetched, function (bill, callback) {
     // Create a bill for every fetched entry
     Bill.create(bill, callback);
  }, next);
}

For more information on the scraping of a webpage, how to query a JSON API, etc, please have a look at the Tips for writing your own connector wiki page.

Common layers

Many operations are common to most of the connectors. That's why we created common layers that can be reused in several connectors.

Common layers can be required from the lib directory. They act a little bit differently than other layers. When you require them you import a function that will generate the layer that will be used.

const filterExisting = require('../lib/filter_existing');
const saveDataAndFile = require('../lib/save_data_and_file');
const linkBankOperation = require('../lib/link_bank_operation');

// ...

const myKonnector = baseKonnector.createNew({
  // ...
  fetchOperation: [
    logIn,
    parsePage,
    // Read ahead for a description of these common layers
    // TODO filterExisting(log, PhoneBill),
    // TODO saveDataAndFile(log, PhoneBill, 'bouygues', ['facture']),
    // TODO linkBankOperation({
      log: log,
      model: Bill,
      identifier: 'bouyg',
      minDateDelta: 4,
      maxDateDelta: 20,
      amountDelta: 0.1
    })
  ]
});

module.exports = myKonnector;

Here is a complete list of available common layers.

filterExisting
Action

It requires that the entries object has a fetched attribute which is an array of fetched data to save. It will build a filtered field which will be an array of data that will contain only data that are not currently stored in the database. It checks for data existence through the data attribute.

Parameters

Example

filterExisting(log, Bill)  // Filter entries to keep only Bill objects
saveDataAndFile
Action

This layer will persist data as a Cozy document for each entries listed in the filtered field of the entries object. If a pdfurl field is set on the entry object. The file is downloaded and stored in the File application.

Note: pdfUrl can point to any file, not necessarily a PDF file. The name comes from legacy code and has not been updated.

Parameters
  • log: a printit logger. See the printit documentation for details
  • model: a model object used to save data.
  • suffix: added to filename. This allows you to omit the "vendor" property on your bill object.
  • tags: to apply to created files

####### Example

// Save Github bills with a github suffix and a bill tag
saveDataAndFile(log, Bill, 'github', ['bill']))
linkBankOperation
Action

⚠️ This section is still being written

This layer is specific to bill data. It takes entries stored in the fetched field of the entries object. Then for each entry it will look for an operation that could match this entry. Once found, it attaches a binary to the bank operation. It's the same binary that is attached to the corresponding file object.

The criterias to find a matching operation are:

  • Operation label should contain one of the identifiers given in parameter.
  • The date should be between (bill date - dateDelta) and (bill date + dateDelta). Where dateDelta is given as a parameter and is in days.
  • The amount should be between (bill amount - amountDelta) and (bill amount + amountDelta). Where amountDelta is given as a parameter.
Parameters
  • log: a printit logger. See the printit documentation for details
  • model: a model object to check for.
  • identifier: a string or an array of strings to look for in the operation label (case insensitive: the layer will automatically set it to lowercase).
  • dateDelta: the number of days allowed between the bank operation date and the bill date (15 by default).
  • amountDelta: the difference between the bank operation amount and the bill amount (useful when the currency is not the same) (0 by default).
  • isRefund: boolean telling if the operation is a refund or not. By default, it is false. Allows to match only operations with positive amount if set to true.
Usage
linkBankOperation({
    log: log,
    model: Bill,
    identifier: 'github',
    dateDelta: 4,
    amountDelta: 5
})

Please go on with the advanced [tips for writing your own connector](Tips for writing your own connector).

⚠️ **GitHub.com Fallback** ⚠️