JSON Cache system
An NDCODE project.
Overview
The json_cache_rw
package exports a single constructor
JSONCacheRW(diag)
which must be called with the new
operator. The resulting cache object stores
arbitrary node.js JSON objects, which are read from disk files and modified
(repeatedly) during the execution of your program. The cache tracks the on-disk
pathname of the object, and writes it back to there after a delay time. A
simple locking algorithm is implemented to support atomic modifications.
Calling API
Suppose one has a JSONCacheRW
instance named jcrw
. It behaves somewhat like
an ES6 Map
instance that maps pathname strings to JSON objects, except that
it has jcrw.read()
, jcrw.write()
, and jcrw.modify()
functions instead of
get
and set
, and new objects are added to the cache by attempting to read
them.
The interfaces for the JSONCacheRW
-provided instance functions are:
await jcrw.read(key, default_value)
— retrieves the object stored under
key
, which must be the on-disk path to the *.json
or similarly-named file
that will eventually store the JSON object. If the default_value
is provided
and the on-disk file does not exist, then the default_value
is added to the
cache and then returned directly. Otherwise, the on-disk file is read with
utf-8
encoding, parsed with JSON.parse()
, and then cached and returned.
Disk file reading or JSON parsing errors result in exceptions being thrown.
await jcrw.write(key, value, timeout)
— caches the given value
under
the given key
, and dirties it so that it will be written after timeout
ms
has elapsed. If the key
already exists in the cache and is dirty, the new
value
will be written after the original timeout elapses, and the timeout
specified here ignored. This ensures that the on-disk contents cannot be too
old, even for frequently-modified files. If timeout
is omitted or undefined
it defaults to 5000 ms. The file is written to the pathname corresponding to
the key
, which must be a string and usually refers to *.json
or similar,
with utf-8
encoding and JSON.stringify()
plus a newline. The function
returns immediately (before the write is attempted), and any later disk file
writing error is logged to the console. Despite this, the interface to the
function is specified as async
because concurrent jcrw.get()
or jcrw.modify()
operations on the same key
must be await
ed before updating the cache.
await jcrw.modify(key, default_value, modify_func, timeout)
first does a
jcrw.read()
call with the given key
and default_value
, then passes the
result of this to the user-specified modify_func
callback, and then does a
jcrw.write()
call with the given key
, the modify_func
result, and the
given timeout
. In the meantime, the given cache entry is locked to prevent
any other accesses, thus allowing atomic modification of a given cache entry
(or equivalently, a given JSON file). The modify_func
is specified as
async
, so it can perform activities such as disk I/O, but this should not be
lengthy, since other cache accesses to the same key will block during the
modify_func
.
The interface for the user-provided callback function modify_func()
is:
await modify_func(result)
— user must either modify the JSON object in
result.value
, or else set result.value
to a different JSON object to be
written and stored in the cache. The first way is normally applicable when the
JSON object is an array or dictionary type, which can be modified in-place. The
second way is normally applicable when the JSON object is a literal type, which
is immutable and thus must be replaced in order to modify it. (Doing it the
second way allows to store a single literal value, such a string, a number, or
a flag, per disk file, which may be inefficient, but may also be convenient).
Example
Consider a simple analytics application for web pages. Each time a page is
served, we will call the function hit(slug)
with slug
set to a value that
is unique to a page. We'll have an on-disk file hit_count.json
which maps the
slug
value to a counter. The counter for a page will increments each time the
code executes. The code creates a new file and/or a new counter as required.
let JSONCacheRW = require('@ndcode/json_cache_rw')
let json_cache_rw = new JSONCacheRW()
let hit = slug => {
let hit_count = json_cache_rw.read('hit_count.json', {})
if (
!Object.prototype.hasOwnProperty.call(result.value, slug)
)
hit_count[slug] = 0
++hit_count[slug]
json_cache_rw.write('hit_count.json', hit_count)
}
In the above example, it has not been done atomically, since it does not matter in which order hits are recorded for a page. It could be done atomically like:
let JSONCacheRW = require('@ndcode/json_cache_rw')
let json_cache_rw = new JSONCacheRW()
let hit = slug => {
json_cache_rw.modify(
'hit_count.json',
{},
async result => {
if (
!Object.prototype.hasOwnProperty.call(
result.value,
slug
)
)
result.value[slug] = 0
++result.value[slug]
}
)
}
Note that we used Object.prototype.hasOwnProperty.call()
to guard against the
possibility that the JSON object contains unusual key names, such as the key
'hasOwnProperty'
itself. This is annoying but essential JavaScript practice.
About lock order
The atomic modification facility refers to a particular key (equivalently, a
particular file or JSON object), so if an atomic modification must be carried
out that involves several different JSON files, special precautions need to be
taken. We will use an example of a money-transfer application with two files,
transactions.json
containing a log of transactions (an array that) and
balances.json
with account balances (a dictionary indexed by account number).
To modify the transaction log consistently with the account balances in atomic
fashion, both files should be locked by nesting the modifications. A consistent
order of lock acquisition should be chosen to avoid deadlock. In this example
we will acquire transactions.json
and then balances.json
:
let JSONCacheRW = require('@ndcode/json_cache_rw')
let json_cache_rw = new JSONCacheRW()
let deposit = (account, amount) => {
json_cache_rw.modify(
'transactions.json',
[],
async transactions => {
json_cache_rw.modify(
'balances.json',
{},
async balances => {
transactions.value.push(
{
'type': 'deposit',
'account': account,
'amount': amount
}
)
if (
!Object.prototype.hasOwnProperty.call(
balances.value,
account
)
)
balances.value[account] = 0
balances.value[account] += amount
}
)
}
)
}
About system crashes
If the system crashes while writing the JSON file, a partially written file
will unavoidably be left on the disk after the system reboots. To be robust
against this situation, we write the modified JSON out to a temporary file
first (whose pathname is the key
value plus '.temp'
), and then rename it
into place. The only problem that can then happen is if the crash occurs after
deleting the original but before renaming the temporary in its place. To guard
against this, when opening the file we check for the requested file and then if
that does not exist, we attempt to rename a temporary file in and then re-try.
We do not guarantee that atomic modifications spanning several files will be atomic across a system crash. The renaming system is only intended to guard against data loss. If desynchronization is an issue, then all files concerned should be scanned on system startup, and synchronization fixed up as necessary.
About asynchronicity
JSON files are read and written with fs.readFile()
and fs.writeFile()
, this
jcrw.read()
is fundamentally an asynchronous operation and therefore returns
a Promise
, which we showed as await jcrw.read()
above. Other functions are
also asynchronous as they may have to wait for a concurrent jcrw.read()
to
complete.
Also, the atomic modification may be asynchronous, and so modify_func()
is
also expected to return a Promise
. Obviously, jcrw.modify()
must wait for
the modify_func()
promise to resolve, indicating that the new object is
safely stored in the cache, so that it can resolve the jcrw.modify()
promise
in turn.
About exceptions
Exceptions during atomic modification are handled by reflecting them through
both Promise
s. The user should ensure that the result.value
is not modified
in this case — exceptions should be caught and any result.value
changes
undone before the exception is rethrown from build_func
to jcrw.modify()
.
Note that if several callers are requesting the same key simultaneously and an
exception occurs during reading or parsing the JSON, each caller receives a
reference to same shared exception object, thus when the jcrw.read()
Promise
rejects, the rejection value (exception object) should be treated as
read-only.
About deletions
There is no way to remove a JSON object from the cache at the moment. This will
be addressed in a future version of the API, which may provide a function like
fs.unlink()
to both remove the on-disk file and uncache it simultaneously. If
it is only wanted to delete the in-memory version and not the on-disk version,
then this should be left to a timeout routine to be added in future, see below.
About on-disk modification
Do not modify the on-disk version of the file while the server is running and
the json_cache_rw
may be active for a file. It will not be detected, and
cannot be handled in a consistent way. If read-only access to JSON files is
required, please use our json_cache
module instead json_cache_rw
. Then,
on-disk changes to the file will be detected and visible to the application.
Also, do not run multiple node.js instances, or multiple JSONCacheRW instances in the same node.js instance, which can refer to the same file. Modifying the file in such circumstance counts as an on-disk modification, which is not allowed.
About diagnostics
The diag
argument to the constructor is a bool
, which if true
causes
messages to be printed via console.log()
for all activities except for the
common case of retrieval when the object is already in cache. A diag
value
of undefined
is treated as false
, thus it can be omitted in the usual case.
The diag
output is handy for development, and can also be handy in
production, e.g. our production server is started by systemd
which
automatically routes stdout
output to the system log, and the cache access
diagnostic acts somewhat like an HTTP server's access.log
, albeit cache hits
are not logged. It is particularly handy that write failures, such as disk-full
errors, are logged.
We have not attempted to provide comprehensive logging facilities or
log-routing, because the simple expedient is to turn off the built-in
diagnostics in complex cases and just do your own. In our server we use a
single JSONCacheRW
instance for all *.json
files with diag
set to true
.
To be implemented
It is intended that we will shortly add a timer function (or possibly just a
function that the user should call periodically) to flush objects from the
cache after a stale time, on the assumption that the object might not be
accessible or wanted anymore. This will be able to occur between a
jcrw.read()
and a corresponding jcrw.write()
call, hence the API for
jcrw.write()
specifies that the value
is mandatory, even if the cached
object was modified in-place.
GIT repository
The development version can be cloned, downloaded, or browsed with gitweb
at:
https://git.ndcode.org/public/json_cache_rw.git
License
All of our NPM packages are MIT licensed, please see LICENSE in the repository.
Contributions
The caching system is under active development (and is part of a larger project that is also under development) and thus the API is tentative. Please go ahead and incorporate the system into your project, or try out our example webserver built on the system, subject to the caution that the API could change. Please send us your experience and feedback, and let us know of improvements you make.
Contact: Nick Downing nick@ndcode.org