mw-ocg-texter

Converts mediawiki collection bundles (as generated by mw-ocg-bundler) to plaintext

npm install mw-ocg-texter
4 downloads in the last week
4 downloads in the last month

mw-ocg-texter

NPM

Build Status dependency status dev dependency status

Converts mediawiki collection bundles (as generated by mw-ocg-bundler) to stripped plaintext.

This is a proof-of-concept, but it could be used to archive or embed the textual content of wikipedia in a minimal amount of space.

Installation

Node version 0.8 and 0.10 are tested to work.

Install the node package dependencies.

npm install

Install other system dependencies.

apt-get install unzip

Generating bundles

You may wish to install the mw-ocg-bundler npm package to create bundles from wikipedia articles. The below text assumes that you have done so; ignore the mw-ocg-bundler references if you have bundles from some other source.

Running

To generate a plaintext file named out.txt from the en wikipedia article "United States":

mw-ocg-bundler -o us.zip --prefix en "United States"
bin/mw-ocg-texter -o out.txt us.zip

The default format does 80-column word wrap. If you would like to use "semantic" new lines (that is, newlines end paragraphs and there are no newlines within paragraphs) use the --no-wrap option:

bin/mw-ocg-texter --no-wrap -o out.txt us.zip

For other options, see:

bin/mw-ocg-texter --help

License

GPLv2

(c) 2013-2014 by C. Scott Ananian

npm loves you