Database Summary - SEL-Columbia/shared_solar_data_warehouse GitHub Wiki

This Repo maintains data from sharedsolar installations in Uganda and Mali.

List of tables:

sharedsolar@xxx-xxx-xxx-xxx:~$ psql -d sharedsolar -c "\dt;"
                    List of relations
 Schema |           Name           | Type  |    Owner    
--------+--------------------------+-------+-------------
 public | calc_tmp                 | table | sharedsolar
 public | circuit                  | table | sharedsolar
 public | circuit_reading          | table | sharedsolar
 public | circuit_reading_daily    | table | sharedsolar
 public | circuit_reading_hourly   | table | sharedsolar
 public | circuit_reading_minutely | table | sharedsolar
 public | power_reading            | table | sharedsolar
 public | raw_circuit_reading      | table | sharedsolar
(8 rows)

Where raw_circuit_reading is the raw data taken from the meters. There were lots of issues with this data:

  • Duplicates
  • Missing data
  • Clock skew introduced wrong timestamps

Though, I believe there were a few "good" sites/circuits that were used for analysis.

The circuit_reading table is the "cleaned" version of the raw where we tried to get rid of the most egregious issues (deduplicated, outlier removal):

$ psql -d sharedsolar -c "\d circuit_reading;"
               Table "public.circuit_reading"
      Column      |            Type             | Modifiers 
------------------+-----------------------------+-----------
 site_id          | character varying(8)        | not null
 ip_addr          | character varying(16)       | not null
 time_stamp       | timestamp without time zone | not null
 watts            | real                        | 
 watt_hours_sc20  | double precision            | 
 credit           | real                        | 
 watt_hours_delta | real                        | 
Indexes:
    "circuit_reading_pkey" PRIMARY KEY, btree (site_id, ip_addr, time_stamp) CLUSTER

And circuit_reading_{minutely,hourly,daily} are aggregates of the values in circuit_reading see aggregate.sql for how each field was aggregated.

Field notes: site_id: identifier for the mini-grid this circuit belonged to ip_addr: identifier for circuit (I believe these may have been swapped in some cases) time_stamp: timestamp of recording (skew in some cases) watts: peak power since last timestamp (?) watt_hours_delta: The energy used since the last timestamp credit: Running balance of credit