valuepack-mine-github

Mines github for user and repository data used by valuepack.

npm install valuepack-mine-github
20 downloads in the last month

valuepack-mine-github build status

Mines github for user and repository data.

Supports valuepack, the community driven rating system for nodejs modules on npm in order to help in selecting the right one.

Read more about its goals.

functions

The entire public API is exposed via the index file:

exports.fetchGithubRepos    =  require('./lib/fetch-github-repos');
exports.storeGithubRepos    =  require('./lib/store-github-repos');
exports.updateGithubRepos   =  require('./lib/update-github-repos');
exports.clientIdSecretQuery =  require('./lib/client-id-secret-query');
exports.updateMultipleUsers =  require('./lib/update-multiple-github-users');

example

var update = require('valuepack-mine-github');
update(usernames, function () {
  console.error('Successfully updated ', usernames);
});

scripts

You can play with scripts inside ./scripts.

environment variables

The following environment variables are considered by the scripts:

github app tokens

used to increase github rate limit to 5K

  • VALUEPACK_GITHUB_CLIENT_ID
  • VALUEPACK_GITHUB_CLIENT_SECRET
export VALUEPACK_GITHUB_CLIENT_ID=0i0d0
export VALUEPACK_GITHUB_CLIENT_SECRET=0s0e0c0r0e0t

You can run the below scripts without these, but will hit the unauthorized rate limit (60/hr) fairly quick.

database path

  • VALUEPACK_MINE_DB the path at which the leveldb data is stored (defaults to valuepack-mine-npm/store/valuepack-mine.db)

Make sure to include it every time you execute a script or add the following to your .bashrc (example):

initializing the data store

An init script is provided that creates a leveldb database at ~/.valuepack.

Do one of the following:

npm run init or cd scripts && ./init.sh

fetch and store user data

Run the update multiple users script:

./update-multiple-github-users.js substack isaacs visionmedia

Note: Only JavaScript or CoffeeScript (I'm being nice ;) ) repos are stored.

Note: Forks are currently ignored as well, but that may changej

query the data

Make sure to pass the --read flag in order to run queries, if not passed it is assumed that you pass the path to a json file which contains user data you want to store.

# how many of substack modules are prefixed with node-
./store-github-repos.js --read --keys | grep substack/node- | wc -l

# how many of TJ's modules are express related
./store-github-repos.js --read --keys | grep visionmedia/express | wc -l

# show all of isaacs modules related to npm
./store-github-repos.js --read --keys | grep isaacs/npm

# show all users stored in your database
./store-github-repos.js --read --keys --users

# show all their metadata (i.e. when the user data was last modified)
./store-github-repos.js --read --values --meta

TODO

Possibly it is better to use a search query since the indicator Not-Modified fails more often than not.

still unstable, but may work

The below would limit the amount of requests to get the latest repos, however we would not see updates to properties like stars.

Also the larger amount of requests is wasted to get all the followers and starred repos. For that there seems to be no way to limit with search.

curl -H 'Accept: application/vnd.github.preview' \
'https://api.github.com/search/repositories?q=@thlorenz+language:javascript+language:coffeescript+pushed:>=2013-08-08'

The below would gain a small gain, but at least make sure we get all necessary updates:

# not limiting by date
curl -H 'Accept: application/vnd.github.preview' \
'https://api.github.com/search/repositories?q=@thlorenz+language:javascript+language:coffeescript'
npm loves you