PhET Simulations scraper
This scraper creates offline versions in ZIM format of PhET science simulations for Science and Math.
Requirements
It requires Node.js version 16 or higher.
Quick Start
npm i && npm start
The above will eventually output a ZIM file to dist/
Command line arguments
--withoutLanguageVariants
uses to exclude languages with Country variant. For example en_CA
will not be present in zim with this argument.
Available only on GET step:
--withoutLanguageVariants ...
Available on GET and EXPORT steps only:
--includeLanguages lang_1 [lang_2] [lang_3] ...
--excludeLanguages lang_1 [lang_2] [lang_3] ...
Available on EXPORT step only:
# Skip ZIM files for individual languages
--mulOnly
Example:
npm run get -- --includeLanguages en ru fr
Config
Another way to configure behaviour is through environment variables. Sample .env
file (with default values):
# request per second, affects GET step only
PHET_RPS=8
# async workers on TRANSFORM step (keep it equal to number of CPU cores)
PHET_WORKERS=10
# number of retries on GET step (delay grow with exponential backoff)
PHET_RETRIES=5
# display verbose errors
PHET_VERBOSE_ERRORS=false
About
This project achieves multiple things:
- Download PhET content
- Generate an Index for said content
- Generate ZIM file(s) containing content and index
Things this project does not yet do, but should:
- Generate Android APK
Usage
The functionality is split into 5 npm scripts
:
-
npm run setup
- deletes state from previous runs -
npm run get
- downloads PhET simulations in specified languages -
npm run transform
- prepare the content and media files -
npm run export
- generates ZIM file(s) -
npm start
- runs all of the above in sequence
The steps get, transform and export have their own output directories:
-
get
outputs HTML and PNG files tostate/get
-
transform
outputs intermediate files tostate/transform
-
export
outputs HTML and PNG files tostate/export
AND a ZIM file(s) todist/