Crowdsourcing Map based data - datameet-pune/datameet-pune.github.io GitHub Wiki

Project: Website for Crowdsourcing map based data

Aim

To have a tool/platform using which we can collect (and share as we're collecting) map-based data for any topic.

Front end

Front end : viewing / extracting data

Check out https://overpass-turbo.eu/.

overpass-purbo-screenshot

Same functionality as that of Overpass Turbo tool, ie..

  • User runs a query using a tag or key=value combination or other such queries (see examples here)
  • Is able to see result and extract the data in multiple forms just as Overpass Turbo does.

Difference from OSM : This website will not generate tilesets / slippy maps. It will simply make all the collected data available for bulk or custom extraction as Overpass Turbo does.

Front end: contributing data

Here's where we differentiate from OSM. The website has different instances or topics created by its users, listed in the way open data portals are (example).

  1. User goes into a topic, for example "Government schools in Pune".
  2. There a map shows the data collected so far. And there's statistics displayed. And further down the page, detailed blog-like listing of data points gathered.
  3. User can place / drag a pin on the map to enter a new data point, enter relevant information and submit.
  4. "Relevant information" : this is where the need for this platform emerges.
  5. Some tags (key=value pairs) will be preset (or hard-coded) by the topic initiator. Example: amenity=school, operator=Pune Municipal Corporation. The contributor cannot edit these.
  6. Then there are more tags like name (ie, key is name and value we fill) that are mandatory to fill.
  7. And then there are optional tags that the topic initiator has offered, for example number_of_classrooms and the user may fill this or leave it empty.
  8. And THEN there are user-entered key-value pairs. User can enter more data, can add as many key-value pairs as needed in addition to the ones already decided. (see screenshot below)screenshot-tags in OSM
  9. There will be search-as-you-type auto-suggest for the new keys so that keys entered by other users for this topic are suggested first, followed by keys entered in the whole site.
  10. As the data collects, the topic initiator will get to see the different tags entered or skipped by contributors, and could update which tags are preset, mandatory, optional and whether new tags should be allowed or not.
  11. Reference: see OSM wiki entry for school for suggested tags that can go with amenity=school. These pages could be loaded when the topic initiator is creating or editing the topic, to suggest associated tags for including.
  12. Obviously the topic page also has an export link / section for exporting all the data gathered under this topic so far.
  13. Image uploading also to be there. Explore possibility to hosting the uploaded images on imgur the way github or stackexchange does when we upload images when filing issues. See here for clues. Alternately, take image URL and give instructions to users to upload on imgur or elsewhere and share link.
  14. Import : the user can bulk-import existing data from OSM for that topic. The OSM id and versioning will have to be preserved to ensure re-integration into OSM in future (if we push the collected data to OSM, these points should not duplicate). Also, compatibility with the preset and mandatory key-value pairs will have to be addressed.
  15. Bulk import : Enable submitting bulk data in CSV or other formats. This enables people that have already done some work on the topic to pool in their work.
  16. A comments / discussion space is there for discussions to happen. specific data-points can be referred using shortcodes or so.

Front end : Improving data

  1. This could start as a derivative of the data collecting instance discussed in the previous section.
  2. It could also be initiated from an import from OSM.
  3. Over here, instead of contributing new places, existing data is shown and users can click to improve data points by filling in additional key-value pairs or editing existing key-value pairs.
  4. Changes are tracked and can be reverted by topic owners and moderators: either for the whole topic or for a specific data point.
  5. Discussion space would be present here as well.
  6. The page links back to the data collector page in case the user decides some data needs to be added. Similarly, over there users will be given links to come here if they want to edit something and if this editing / improving instance exists for that topic.

Backend

Backend option 1 : Use Openstreetmap to collect this data.

Pros

  • Existing database backend, so no need to develop our own.
  • Could be done with little to no costs for server etc.

Cons

  • the data we want to collect may be map-based, but not cartographic or appropriate for putting on a general-purpose map. Example: Rent prices, popular wada-pav stalls
  • Since the data we'd want to collect can be raw, not thoroughly structured / referenced in nature, we wouldn't want to use a db where there is global moderator monitoring and things can get flagged and deleted beyond our control. We may want to push finalized data to OSM, but at collection stage it may be better to have a "safe space" so to say.

Backend option 2 : Use Mapbox API to collect this data

Pros: Same pros as OSM option, and..

  • It's our database and we don't have to fear sudden deletions by moderators from the other side of the planet.

Cons:

  • Free account has limitations. It might be good for the pilot phase but if we go live then it gets into paid territory.

Grace:

  • Mapbox has pay-as-you-go mode, which could be relied upon in initial days when we don't know how things will pan out.

Backend option 3 : Make own database

  • DB structure has to be compatible with the extensible key-value pairs structure. Regular MySQL tables won't do.
  • Need to figure out options for this. Have to find something that works on a server/config that doesn't cost the moon.

Links:

Resources for custom web form submitting to google former

Brainstorming: What kind of data might we gather

  • groundwater data
  • urban farms
  • ranwater harvesting
  • solar water heater usage
  • composting
  • waste segregation
  • industries / factories
  • govt offices
  • rent, prices of housing and shops
  • create a base layer of societies and residential buildings in Pune, onto which other data like waste segregation status, RWH etc can be put
  • citizen reporting sightings of animals, birds, plants, mushrooms etc
  • prices of different items : food, FMCGs, hardware, services etc : tracking it over time