A few details on the US EIA (and FERC) Electricity data


#1

I notice on the DBnomics site you’re pulling in US EIA data. That’s a lot of the data which Catalyst is integrating as well. It looks like you’re pulling directly from the API that they publish and getting the multitude of individual time series. Have you looked at pulling their bulk download data as well? Or providing a similar all-in-one data product? Is that within the scope of your project?

Also, you might be interested to know that there’s a bunch of valuable US EIA data which is not contained within the API. The most fine grained electricity production and fuel consumption data – at the individual generator and boiler level – are not published in a cleanly structured machine readable form. E.g. for the Comanche coal fired power plant in Colorado you have plant level fuel consumption and net generation at monthly resolution from the EIA’s API, but they also have that data split out individually for Comanche units 1, 2, and 3 – but only via spreadsheets. Similarly the most detailed data about oil and gas production is excluded from the API, but available via spreadsheets (which are, sadly, apparently the original, authoritative source of data). Catalyst is integrating that more detailed data and other energy related data (like the FERC Form 1, available from FERC only as undocumented binary database files) in the US for open publication, and linking the datasets from different agencies together to make them more useful.

Is that a kind of data that you would want to re-publish? It might be more useful but would definitely be less official than what’s coming directly from the reporting agencies. If it were something you’d be interested in aggregating and re-publishing, what would be the easiest way to make that happen? If we were to publish it directly as data packages via a platform like https://datahub.io is that something you might index? You seem focused on time series, but are you also publishing data related to the entities referenced in the time series (e.g. power plants, generating units, utilities that own them, etc.)? That data is also often updated on an annual or quarterly basis – would that qualify as a time series for your purposes?