2017-09-26 Discussion with Jasper Heeffer (Gapminder / Open Numbers)
Gapminder 7 people team, including 2 developers (some things outsourced).
The old Gapminder World graph used Flash and data on Google Spreadsheet. Open Numbers started about 1,5 years ago for the new version of Gapminder Tools.
Data is stored in CSV because it’s easy to work with, that’s what researchers want, to dig in, create their own dataset, etc.
However CSV is not the best to fetch things from, it’s not optimised for queries.
Python script => transformed to DDF => data stored in CSV
Semantic harmonization: for example country names/IDs are changed in order to be harmonized across all stored datasets, using tables to define and match countries and territories frequently used across many other datasets (see all alternative names/IDs for countries).
Data is stored harmonized but an unchanged copy is also kept.
“DDF Chef” : a python script (what we would call a fetcher).
“DDF Recipe” : how to mix and harmonize (“cook”) multiple datasets.
- Automatise more
- Fetch more sources
- Crowdsource the harmonization
- Get an overview of fetchers (which will be updated?, when? etc.)
Their strength is in data visualisation
Vizabi : Powerful visualisation tools developed in-house (GitHub)
Example of dataviz project they’ve done: https://open-numbers.github.io/sodertornsmodellen/
Potential of collaboration
DB.nomics is kind of creating the “data architecture”, with a platform on which you could build other projects on the user-side: vizualisation, reuse, mixing, harmonization, etc.
Users can be individuals but also other websites/platforms
Gapminder could be one of these users, focused on the dataviz, since the architecture (fetching, agregating, etc.) is not their strength.
Data.world : similar idea
See also Quandl