Air Supply

You need data; Air Supply will get it to you.

Air Supply is a versatile library to handle getting data from multiple sources in a sane way to use in your application build, data analysis, or whatever else requires fetching some data.

Why

Air Supply aims to address the need of having to bring in various data sources when making small or mid-size, self-contained projects. Air Supply was originally conceived while working at the Star Tribune where we often create small projects and different, non-dynamic data sources are needed for most of them. Air Supply is essentailly putting together and making a consistent interface for lots of one-off code around getting and parsing data that was written and used for many years.

Pros

Can handle many sources of data, such as local files and directories, HTTP(S) sources, Google Docs and Sheets, many SQL sources, AirTable, and more. See Packages.
Can easily parse and transform data such as CSV-ish, MS Excel, YAML, Shapefiles, ArchieML, zip files, and more. See parsers
Caches by default.
Aimed at simple uses by just writing a JSON config, as well as more advanced transformations.
Loads dependency modules as needed and allows for overriding.

Cons

Not focused on performance (yet). The caching mitigates a lot of issues here, but the goal would be to use streams for everything where possible.
Not meant for very complex data pipelines. For instance, if you have to scrape a thousand pages, Air Supply doesn't currently fit well, but could still be used to pull the processed data into your application.

Similar projects

These projects do roughly similar things, but not to the same degree:

Installation

npm install air-supply --save

By default Air Supply only installs the most common dependencies for its packages and parsers. This means, if you need specific more parsers and packages, you will need to install them as well. For instance:

npm install googleapis archieml

Command line use (global)

If you just want to use the command-line tool, install globally like:

npm install -g air-supply

If you plan to use a number of the packages and parsers, it could be easier (though uses more disk-space), to install all the "dev dependencies" which includes all the packages and parser dependences:

NODE_ENV=dev npm install -g air-supply

Usage

Air Supply can be used as a regular Node library, or it can utilize config files that can be run via a command-line tool or as well as through Node.

Basics

Basic usage in Node by defining the Packages when using Air Supply.

const { AirSupply } = require('air-supply');
 
// Create new AirSupply object and tell it about
// the packages it needs
let air = new AirSupply({
  packages: {
    remoteJSONData: 'http://example.com/data.json',
    // To use Google Sheet package, make sure to install
    // the googleapis module:
    // npm install googleapis
    googleSheetData: {
      source: 'XXXXXXX',
      type: 'google-sheet'
    }
  }
});
 
// Get the data, caching will happen by default
let data = await air.supply();
 
// Data will look something like this
{
  remoteJSONData: { ... },
  googleSheetData: [
    { column1: 'abc', column2: 234 },
    ...
  ]
}

Command line

The command line tool will look for configuration in multiple places. See Configuration files below. You can simply call it with:

air-supply

A configuration file, such as a .air-supply.json, will be loaded and run through Air Supply, outputting the fetched and transformed data to the command line (stdout).:

{
  "packages": {
    "cliData": "some-file.yml"
  }
}

You can also point the comand-line tool to a specific file if you want:

air-supply -c air-supply.rc > data.json

Examples

Any AirSupply options are passed down to each Package, so we can define a custom ttl (cache time) to AirSupply and then override for each package.

const { AirSupply } = require("air-supply");
// Since we are using the YAML parser, make sure module is installed
// npm install js-yaml
 
let air = new AirSupply({
  ttl: 1000 * 60 * 60,
  packages: {
    // This data will probably not change during our project
    unchanging: {
      ttl: 1000 * 60 * 60 * 24 * 30,
      source: "http://example.com/data.json"
    },
    defaultChanging: "https://example/data.yml"
  }
});
await air.supply();

Each Package can be given a transform function to transform data. We can also alter when the caching happens. this can be helpful in this instance so that we don't do an expensive task like parsing HTML.

// Cheerio: https://cheerio.js.org/
const cheerio = require("cheerio");
const { AirSupply } = require("air-supply");
 
let air = new AirSupply({
  packages: {
    htmlData: {
      // Turn off any parsing, since we will be using cheerio
      parser: false,
      source: "http://example.com/html-table.html",
      // Transform function
      transform(htmlData) {
        $ = cheerio.load(htmlData);
        let data = [];
        $("table.example tbody tr").each(function(i, $tr) {
          data.push({
            column1: $tr.find("td.col1").text(),
            columnNumber: parseInteger($tr.find("td.col2").text(), 10)
          });
        });
 
        return data;
      },
      // Alter the cachePoint so that AirSupply will cache this
      // after the transform
      cachePoint: "transform"
    }
  }
});
await air.supply();

You can easily read a directory of files. If you just give it a path to a directory, it will assume you mean a glob of **/* in that directory.

const { AirSupply } = require("air-supply");
 
let air = new AirSupply({
  packages: {
    directoryData: "path/to/directory/"
  }
});
await air.supply();

This might cause problems or otherwise be an issue as it will read every file recursively in that directory. So, it may be helpful to be more specific and define a glob to use. This requires being explicit about the type of Package. We can also use specific parserOptions to define how to parse files.

// In this example we are using the csv and yaml parsers, so make sure to:
// npm install js-yaml csv-parse
const { AirSupply } = require("air-supply");
 
let air = new AirSupply({
  packages: {
    directoryData: {
      source: "path/to/directory/**/*.{json|yml|csv|custom-ext}",
      type: "directory"
      // The Directory Package type will define the `parser` option as
      // { multiSource: true } which will tell the parser to treat it
      // as an object where each key is a source.  This means, we can
      // define specific options for specific files.
      parserOptions: {
        "file.custom-ext": {
          parser: "yaml"
        }
      }
    }
  }
});
await air.supply();

You can also achieve something similar by just overriding the parser configuration to handle other extensions. Here we will update the YAML matching for another extension.

// In this example we are using the csv and yaml parsers, so make sure to:
// npm install js-yaml csv-parse
const { AirSupply } = require("air-supply");
 
let air = new AirSupply({
  parserMethods: {
    yaml: {
      match: /(yaml|yml|custom-ext)$/i
    }
  },
  packages: {
    directoryData: {
      source: "path/to/directory/**/*.{json|yml|csv|custom-ext}",
      type: "directory"
    }
  }
});
await air.supply();

Here is an example that gets a shapefile from an FTP source, reprojects and turns it to Topojson:

const { AirSupply } = require("air-supply");
 
let air = new AirSupply({
  cachePath: defaultCachePath,
  packages: {
    mnCounties: {
      // The FTP Package require the ftp module
      // npm install ftp
      source:
        "ftp://ftp.commissions.leg.state.mn.us/pub/gis/shape/county2010.zip",
      // We need to ensure that the Package will pass the data as a buffer
      fetchOptions: {
        type: "buffer"
      },
      parsers: [
        // The shapefile parser require specific modules
        // npm install shapefile adm-zip
        "shapefile",
        // We then reproject the geo data and need some more modules
        // npm install reproject epsg
        {
          parser: "reproject",
          parserOptions: {
            sourceCrs: "EPSG:26915",
            targetCrs: "EPSG:4326"
          }
        },
        // Finally, we make the data more compact with topojson
        // npm install topojson
        {
          parser: "topojson",
          name: "mnCounties"
        }
      ]
    }
  }
});
await air.supply();

Configuration files

Air Supply will look for a config files based on cosmiconfig rules with a little customization. So, it will read the first of any of these files as it goes up the directory tree:

package.json # An 'air-supply' property 
.air-supply
.air-supply.json
.air-supply.json5
.air-supply.yaml
.air-supply.yml
.air-supply.js
air-supply.config.js

Note that any JSON will be read by the json5 module.

Packages

Packages are the methods that define how to get raw data from sources. The following are the available packages; see the full API documentation for all the specific options available.

Packages will get passed any options from the AirSupply object that is using it, as well has some common options and usage.

Note that many packages require specific modules to be installed separately.

AirSupply({
  ttl: 1000 * 60 * 10,
  packages: {
    things: {
      // Type is the kebab case of the package class name, i.e.
      // the package class name here would be PackageName.
      //
      // AirSupply will try to guess this given a source
      type: "package-name",
      // Almost all pcakages use the source option as it's
      // main option to get data
      source: "the main source option for this package",
      // Depending on the package, any options for the
      // fetching of data is ususally managed in fetchOptions
      fetchOptions: {
        fetchEverything: true
      },
      // Can override any defaults from the AirSupply object
      ttl: 1000 * 60 * 60,
      // Parsers are simple functions to transform the raw data.
      // This can be a string definign which parser to use,
      // an object of configuration, or an array of either if
      // you want to do multiple parsers.  The package
      // will guess what kind of parser is needed based on the source.
      parsers: ["zip", { multiSource: true }],
      // Custom transform function that will happen after parsing.
      transform(data) {
        return expensiveAlterFunction(data);
      },
      // Custom transform function that will happen after getting
      // all packages.
      bundle(allPackages) {
        return alterPackages(data);
      },
      // By default, caching will happen after fetching the raw data and
      // any of the built-in parsing.  But, you can cache after the 'transform'
      // or after the 'bundle'.
      //
      // Overall, this is only needed if you have expensive transformations
      cachePoint: "transform",
      // Use the output option to save the fully loaded data
      // to the filesystem.  This is useful if you need to save files
      // that will get loaded into the client (asynchronously).
      output: "things.json"
    }
  }
});

Package	Description	Docs	Dependencies
AirTable	Get data from an AirTable table.	API	`npm install airtable`
Data	Just pass JS data through.	API
Directory	Read files from a directory and parse each one if can.	API
File	Read a file from the filesystem.	API
Ftp	Get a file from an FTP source.	API	`npm install ftp`
GoogleDoc	Get plain text version of a Google Doc and by default parse with ArchieML. Can be a "Published to the web" URL, or if given an ID will use Google's authentication.	API	`npm install googleapis`
GoogleSheet	Get tabular data from a Google Sheet and assumes first row is headers by default. Uses Google's authentication; if you want to use a public "Published to the web" CSV, just use the Http package with a CSV parser.	API	`npm install googleapis`
Http	Get data from an HTTP source.	API
Sql	Get data from SQL sources as that are supported by sequelize.	API	`npm install sequelize`

Parsers

Parsers are simple functions to transform common data; mostly these are used to transform the raw data to more meaningful JSON data.

Note that most parsers require specific modules to be installed separately.

The parsers options can be defined a few different ways:

If it is undefined, the package will try to determine which parser to use by looking at the source.
If it is false, then no parsing will happen.
If it is a string, such as 'csv', then it will use that parser with any default options.
If it is a function, then it will simply run the data through that function.
If it is an object, it should have a parser key which is the is one of the above options, and optionally a parserOptions that will get passed the parser function. Or it can just be { multiSource: true } which will assume the data coming in is an object where each key is a source.
If it is an array, it is assume to be multiple parsers with the above options.

The following parsers are available by default.

Parser	Description	Source match	Docs	Dependencies
archieml	Uses archieml.	`/aml$/i`	API	`npm install archieml`
csv	Uses csv-parse. Can be used for any delimited data.	`/csv$/i`	API	`npm install csv-parse`
gpx	Uses togeojson.	`/gpx$/i`	API	`npm install @mapbox/togeojson`
json	Uses json5.	`/json5?$/i`	API
kml	Uses togeojson.	`/kml$/i`	API	`npm install @mapbox/togeojson`
reproject	Reprojects GeoJSON using reproject.	NA	API	`npm install reproject epsg`
shapefile	Parsers a Shapefile as a .zip or .shp file into GeoJSON using shapefile.	`/(shp.*zip\|shp)$/i`	API	`npm install shapefile adm-zip`
topojson	Transforms GeoJSON to TopoJSON using topojson.	`/geo.?json$/i`	API	`npm install topojson`
xlsx	Parsers MS Excel and others (.xlsx, .xls, .dbf, .ods) using xlsx.	`/(xlsx\|xls\|dbf\|ods)$/i`	API	`npm install xlsx`
xml	Parsers XML using xml2js.	`/xml$/i`	API	`npm install xml2js`
yaml	Uses js-yaml.	`/(yml\|yaml)$/i`	API	`npm install js-yaml`
zip	Turns a zip file into an object where each key is the file name and the value is the text contents of that file using adm-zip.	`/zip$/i`	API	`npm install adm-zip`

API

Full API documentation can be found at zzolo.org/air-supply.

Develop

Documentation

Use npm run docs:preview and open localhost:4001 in a browser.

Test

Run tests with: npm run test

Publish

NPM

Bump version in package.json and run npm install.
Commit.
Tag: git tag X.X.X
Push up: git push origin master --tags
Run npm publish

Docs

Build and publish to Github Pages (after NPM publish): npm run docs:publish

air-supply

Air Supply

Why

Pros

Cons

Similar projects

Installation

Command line use (global)

Usage

Basics

Command line

Examples

Configuration files

Packages

Parsers

API

Develop

Documentation

Test

Publish

NPM

Docs

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

air-supply

Air Supply

Why

Pros

Cons

Similar projects

Installation

Command line use (global)

Usage

Basics

Command line

Examples

Configuration files

Packages

Parsers

API

Develop

Documentation

Test

Publish

NPM

Docs

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads