JSON Cache system

An NDCODE project.

Overview

The json_cache_rw package exports a single constructor JSONCacheRW(diag) which must be called with the new operator. The resulting cache object stores arbitrary node.js JSON objects, which are read from disk files and modified (repeatedly) during the execution of your program. The cache tracks the on-disk pathname of the object, and writes it back to there after a delay time. A simple locking algorithm is implemented to support atomic modifications.

Calling API

Suppose one has a JSONCacheRW instance named jcrw. It behaves somewhat like an ES6 Map instance that maps pathname strings to JSON objects, except that it has jcrw.read(), jcrw.write(), and jcrw.modify() functions instead of get and set, and new objects are added to the cache by attempting to read them.

The interfaces for the JSONCacheRW-provided instance functions are:

await jcrw.read(key, default_value) — retrieves the object stored under key, which must be the on-disk path to the *.json or similarly-named file that will eventually store the JSON object. If the default_value is provided and the on-disk file does not exist, then the default_value is added to the cache and then returned directly. Otherwise, the on-disk file is read with utf-8 encoding, parsed with JSON.parse(), and then cached and returned. Disk file reading or JSON parsing errors result in exceptions being thrown.

await jcrw.write(key, value, timeout) — caches the given value under the given key, and dirties it so that it will be written after timeout ms has elapsed. If the key already exists in the cache and is dirty, the new value will be written after the original timeout elapses, and the timeout specified here ignored. This ensures that the on-disk contents cannot be too old, even for frequently-modified files. If timeout is omitted or undefined it defaults to 5000 ms. The file is written to the pathname corresponding to the key, which must be a string and usually refers to *.json or similar, with utf-8 encoding and JSON.stringify() plus a newline. The function returns immediately (before the write is attempted), and any later disk file writing error is logged to the console. Despite this, the interface to the function is specified as async because concurrent jcrw.get() or jcrw.modify() operations on the same key must be awaited before updating the cache.

await jcrw.modify(key, default_value, modify_func, timeout) first does a jcrw.read() call with the given key and default_value, then passes the result of this to the user-specified modify_func callback, and then does a jcrw.write() call with the given key, the modify_func result, and the given timeout. In the meantime, the given cache entry is locked to prevent any other accesses, thus allowing atomic modification of a given cache entry (or equivalently, a given JSON file). The modify_func is specified as async, so it can perform activities such as disk I/O, but this should not be lengthy, since other cache accesses to the same key will block during the modify_func.

The interface for the user-provided callback function modify_func() is:

await modify_func(result) — user must either modify the JSON object in result.value, or else set result.value to a different JSON object to be written and stored in the cache. The first way is normally applicable when the JSON object is an array or dictionary type, which can be modified in-place. The second way is normally applicable when the JSON object is a literal type, which is immutable and thus must be replaced in order to modify it. (Doing it the second way allows to store a single literal value, such a string, a number, or a flag, per disk file, which may be inefficient, but may also be convenient).

Example

Consider a simple analytics application for web pages. Each time a page is served, we will call the function hit(slug) with slug set to a value that is unique to a page. We'll have an on-disk file hit_count.json which maps the slug value to a counter. The counter for a page will increments each time the code executes. The code creates a new file and/or a new counter as required.

let JSONCacheRW = require('@ndcode/json_cache_rw')

let json_cache_rw = new JSONCacheRW()
let hit = slug => {
  let hit_count = json_cache_rw.read('hit_count.json', {})
  if (
    !Object.prototype.hasOwnProperty.call(result.value, slug)
  )
    hit_count[slug] = 0
  ++hit_count[slug]
  json_cache_rw.write('hit_count.json', hit_count)
}

In the above example, it has not been done atomically, since it does not matter in which order hits are recorded for a page. It could be done atomically like:

let JSONCacheRW = require('@ndcode/json_cache_rw')

let json_cache_rw = new JSONCacheRW()
let hit = slug => {
  json_cache_rw.modify(
    'hit_count.json',
    {},
    async result => {
      if (
        !Object.prototype.hasOwnProperty.call(
          result.value,
          slug
        )
      )
        result.value[slug] = 0
      ++result.value[slug]
    }
  )
}

Note that we used Object.prototype.hasOwnProperty.call() to guard against the possibility that the JSON object contains unusual key names, such as the key 'hasOwnProperty' itself. This is annoying but essential JavaScript practice.

About lock order

The atomic modification facility refers to a particular key (equivalently, a particular file or JSON object), so if an atomic modification must be carried out that involves several different JSON files, special precautions need to be taken. We will use an example of a money-transfer application with two files, transactions.json containing a log of transactions (an array that) and balances.json with account balances (a dictionary indexed by account number).

To modify the transaction log consistently with the account balances in atomic fashion, both files should be locked by nesting the modifications. A consistent order of lock acquisition should be chosen to avoid deadlock. In this example we will acquire transactions.json and then balances.json:

let JSONCacheRW = require('@ndcode/json_cache_rw')

let json_cache_rw = new JSONCacheRW()
let deposit = (account, amount) => {
  json_cache_rw.modify(
    'transactions.json',
    [],
    async transactions => {
      json_cache_rw.modify(
        'balances.json',
        {},
        async balances => {
          transactions.value.push(
            {
              'type': 'deposit',
              'account': account,
              'amount': amount
            }
          )
          if (
            !Object.prototype.hasOwnProperty.call(
              balances.value,
              account
            )
          )
            balances.value[account] = 0
          balances.value[account] += amount
        }
      )
    }
  )
}

About system crashes

If the system crashes while writing the JSON file, a partially written file will unavoidably be left on the disk after the system reboots. To be robust against this situation, we write the modified JSON out to a temporary file first (whose pathname is the key value plus '.temp'), and then rename it into place. The only problem that can then happen is if the crash occurs after deleting the original but before renaming the temporary in its place. To guard against this, when opening the file we check for the requested file and then if that does not exist, we attempt to rename a temporary file in and then re-try.

We do not guarantee that atomic modifications spanning several files will be atomic across a system crash. The renaming system is only intended to guard against data loss. If desynchronization is an issue, then all files concerned should be scanned on system startup, and synchronization fixed up as necessary.

About asynchronicity

JSON files are read and written with fs.readFile() and fs.writeFile(), this jcrw.read() is fundamentally an asynchronous operation and therefore returns a Promise, which we showed as await jcrw.read() above. Other functions are also asynchronous as they may have to wait for a concurrent jcrw.read() to complete.

Also, the atomic modification may be asynchronous, and so modify_func() is also expected to return a Promise. Obviously, jcrw.modify() must wait for the modify_func() promise to resolve, indicating that the new object is safely stored in the cache, so that it can resolve the jcrw.modify() promise in turn.

About exceptions

Exceptions during atomic modification are handled by reflecting them through both Promises. The user should ensure that the result.value is not modified in this case — exceptions should be caught and any result.value changes undone before the exception is rethrown from build_func to jcrw.modify().

Note that if several callers are requesting the same key simultaneously and an exception occurs during reading or parsing the JSON, each caller receives a reference to same shared exception object, thus when the jcrw.read() Promise rejects, the rejection value (exception object) should be treated as read-only.

About deletions

There is no way to remove a JSON object from the cache at the moment. This will be addressed in a future version of the API, which may provide a function like fs.unlink() to both remove the on-disk file and uncache it simultaneously. If it is only wanted to delete the in-memory version and not the on-disk version, then this should be left to a timeout routine to be added in future, see below.

About on-disk modification

Do not modify the on-disk version of the file while the server is running and the json_cache_rw may be active for a file. It will not be detected, and cannot be handled in a consistent way. If read-only access to JSON files is required, please use our json_cache module instead json_cache_rw. Then, on-disk changes to the file will be detected and visible to the application.

Also, do not run multiple node.js instances, or multiple JSONCacheRW instances in the same node.js instance, which can refer to the same file. Modifying the file in such circumstance counts as an on-disk modification, which is not allowed.

About diagnostics

The diag argument to the constructor is a bool, which if true causes messages to be printed via console.log() for all activities except for the common case of retrieval when the object is already in cache. A diag value of undefined is treated as false, thus it can be omitted in the usual case.

The diag output is handy for development, and can also be handy in production, e.g. our production server is started by systemd which automatically routes stdout output to the system log, and the cache access diagnostic acts somewhat like an HTTP server's access.log, albeit cache hits are not logged. It is particularly handy that write failures, such as disk-full errors, are logged.

We have not attempted to provide comprehensive logging facilities or log-routing, because the simple expedient is to turn off the built-in diagnostics in complex cases and just do your own. In our server we use a single JSONCacheRW instance for all *.json files with diag set to true.

To be implemented

It is intended that we will shortly add a timer function (or possibly just a function that the user should call periodically) to flush objects from the cache after a stale time, on the assumption that the object might not be accessible or wanted anymore. This will be able to occur between a jcrw.read() and a corresponding jcrw.write() call, hence the API for jcrw.write() specifies that the value is mandatory, even if the cached object was modified in-place.

GIT repository

The development version can be cloned, downloaded, or browsed with gitweb at: https://git.ndcode.org/public/json_cache_rw.git

License

All of our NPM packages are MIT licensed, please see LICENSE in the repository.

Contributions

The caching system is under active development (and is part of a larger project that is also under development) and thus the API is tentative. Please go ahead and incorporate the system into your project, or try out our example webserver built on the system, subject to the caution that the API could change. Please send us your experience and feedback, and let us know of improvements you make.

Contact: Nick Downing nick@ndcode.org

@ndcode/json_cache_rw

JSON Cache system

Overview

Calling API

Example

About lock order

About system crashes

About asynchronicity

About exceptions

About deletions

About on-disk modification

About diagnostics

To be implemented

GIT repository

License

Contributions

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

@ndcode/json_cache_rw

JSON Cache system

Overview

Calling API

Example

About lock order

About system crashes

About asynchronicity

About exceptions

About deletions

About on-disk modification

About diagnostics

To be implemented

GIT repository

License

Contributions

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads