Configuration Reference

About Configuration Files

Cosnim’s configuration files control every facet of the software’s operations.

All essential settings for Cosnim software are contained within a single yaml or JSON file. This compact configuration is intentional; it is streamlined for simplicity, efficiency, and resilience, allowing for rapid deployment, easy distribution, and effective disaster recovery across environments.

Location of configuration files

Cosnim uses the first of the following locations to read its configuration:

  1. The file specified in the --config option of the cosnim commands.

  2. The file specified in the COSNIM_CONFIG environment variable.

  3. The file ‘config.yml’ in the current directory.

  4. The file ‘config/config.yml’ under the current directory.

Format of configuration

The standard and recommended format for Cosnim configurations is yaml, but Cosnim will accept configurations in JSON format. Configurations can be re-exported in another format using the cosnim show config command.

Splitting up configuration files with !include

It is not necessary to put the entire configuration in a single file. To construct a configuration from multiple files, use the yaml ‘!include’ directive to embed configurations from other files.

Example

continuum:
   ...CONTINUUM_OPTIONS...

gateways:
   !include config/gateways.yml

Standard Value Formats

Cosnim supports special value formats beyond yaml/json to facilitate configuration:

DURATION

Synopsis

NUMERIC_VALUE TIME_UNIT

Description

Specifies a time duration in a human-friendly format, such as ‘30.5 minutes’. The following units are supported. All units are internally converted to seconds. The default is ‘ seconds ‘ if no unit is specified in a DURATION parameter.

TIME_UNIT

Converted to (seconds)

ms, milliseconds

0.000001

us, microseconds

0.001

s, sec, secs

1

m, min, mins

60

h, hr, hrs, hour, hours

3,600

d, day, days

86,400

w, wks, week, weeks

592,200

Examples

Configuration Parameter

Result in seconds

delay: 30

30

delay: 30 s

30

delay: .5 minutes

30

delay: 50 ms

0.050

delay: 1.25 hours

4500

SIZE

Synopsis

NUMERIC_VALUE SIZE_UNITS

Description

Specifies a size, such as storage space or memory, in a human-friendly format, such as 10 MB. The following units are supported; the case of the unit is not important ‘20 kB’ and ‘20 KB’ are equivalent. All units are internally converted to bytes. The default is ‘ bytes ‘ if no unit is specified in a SIZE parameter.

SIZE_UNIT

Converted to (bytes)

B

1

bytes

1

KiB

1,024

Mib

1,048,576

Gib

1,073,741,824

Tib

1,099,511,627,776

Pib

1,125,899,906,842,624

kB

1,000

KB

1,000

MB

1,000,000

GB

1,000,000,000

TB

1,000,000,000,000

PB

1,000,000,000,000,000

Examples

Configuration Parameter

Result in bytes

size: 256 MB

256,000,000

size: 256 MiB

268,435,456

size: 1.5 GB

1,500,000,000

BOOLEAN

Synopsis

[ yes | no | on | off | true | false ]

Description

Boolean values can be specified in yaml files using labels such as ‘yes’, ‘no’, ‘on’, ‘off’, ‘true’, and ‘false.’ All boolean variants produce the same results; choosing one type of label over the other is a question of personal choice.

Examples

YAML Configuration Parameter

Result

active: on

True

active: yes

True

active: true

True

active: off

False

active: no

False

active: false

False

Cache Management

caches

This section defines the global RAM caches. Global caches are used primarily by storage hubs and reduce the frequency at which data is pulled from disk and cloud storage, which increases performance and efficiency.

Caches can be shared between all hubs or assigned to individual groups of hubs for more refined tuning.

Synopsis

caches:
  CACHE_NAME:
    type: [ unified ]
    size: SIZE
    reuse_ratio: NUMBER(0-1)
  ⋮

Options

CACHE_NAME

Name of the cache as referred to by the cache: parameter of the storage hubs.

type: [ unified ]

Type of cache to use. The default, and now the only option, is unified, which uses Cosnim’s unified cache manager.

size: SIZE

Defines the maximum amount of memory that can be used for caching. The more memory is used for caching, the higher the performance will be, provided the system has sufficient RAM.

You should try to allocate at least 1 GB for optimal results if the system resources permit. Otherwise, a cache size below 256 MB is not suggested without prior testing. Depending on how the continuum is used, a very low cache size could have detrimental effects on performance.

reuse_ratio: NUMBER

Determines the amount of memory cache that should be prioritized for reused items as a ratio of the total cache size in the form of a numerical value between 0 and 1.

For example, if the cache size is 1GB and the reuse_ratio is .75, 750 MB of cache will be prioritized for previously reused items.

The reuse ratio is a fine-tuning parameter of Cosnim’s unified cache algorithm. A higher ratio increases the probability that older historical data remains longer in the cache, improving random access patterns’ performance. In contrast, a lower ratio increases the likelihood that recently read data will be reused from the cache. The recommended reuse ratio is .75, but you can freely experiment with other values.

Examples

caches:
   primary:
      type: unified
      size: 768 MB
      reuse_ratio: .75
   data:
      type: unified
      size: 128 MB
      reuse_ratio: .50

This defines two caches: ‘primary’, a self-tuning large cache geared for general use, and ‘data’, a smaller separate cache for some data where performance is not a concern. Those caches are referred to in the hub with the option cache:. prevent

Continuum Configuration

continuum

This section defines the global operation of the continuum.

Synopsis

continuum:
  primary_hub: HUB_NAME
  keychain: FILE
  capsule_version: [ 3 ]
  autocreate: [ BOOLEAN ]
  autoexpand: [ BOOLEAN ]
  relay: RELAY_NAME
  use_locks: [ BOOLEAN ]
  auto_retry: DURATION
  signature:
    private_key: KEY_NAME
    hash_algo: HASH_ALGORITHM
    sign:
      - files

Options

primary_hub: HUB_NAME

Name of the primary storage hub to use. It’s recommended to call this hub ‘primary’ to avoid confusions with other hubs.

keychain: FILE

File name and path of the keychain file to be used by this continuum for encryption and signatures. The keychain must contain at least an encryption key block named ‘default’.

capsule_version: [ 3 ]

Capsule version to use in this continuum. For now, only version 3 may be used.

autocreate: [ BOOLEAN ]

When this option is true, Cosnim will automatically create the continuum if any command is run and all storage hubs are uninitialized. If false, a cosnim create continuum command must be run explicitly to create the continuum.

autoexpand: [ BOOLEAN ]

When this option is true, Cosnim will automatically expand the continuum if some new storage hubs are uninitialized. This condition may occur if new storage hubs have been added to an existing continuum. When false, a cosnim expand continuum command must be run explicitly to expand the continuum to use new, previously uninitialized storage hubs.

relay: RELAY_NAME

Name of the primary relay to use to accelerate certain continuum operations. Optional.

use_locks: BOOLEAN

Use relay locks to add an extra layer of security when sharing continuums. THIS OPTION IS DEPRECATED. Use the share_level option in filesystems instead.

auto_retry: DURATION

Specifies the maximum time Cosnim will wait to access capsules that are temporarily unavailable. This setting is handy when:

  • Connecting to typically unshared continuums (e.g., backup destinations)

  • Working with storage hubs that upload control information or metadata to cloud/shared storage before completing full data transfer

The operation fails if Cosnim cannot access the capsule within the specified duration.

Default: 0 seconds (no waiting) Recommended: 30 seconds for shared continuums

signature: ...

Option group related to digital signatures. See below.

private_key: KEY_NAME

Name of the private key in the keychain used to sign data modifications made by the user. The key name and public key associated with the private key are automatically recorded in the continuum when first used in a continuum.

hash_algo: ALGO

The hash algorithm to use for data signatures digests. The default and recommended algorithm is ‘sha256’. Use the command cosnim test hashes to see other available hash algorithms. Ensure you use only secure hash algorithms to preserve the signatures’ integrity.

sign: [ files ]

Indicates what type of data should be signed automatically with digital signature cascades for the current user. At the moment, there is only one option:

files

Automatically sign files that are created or updated by the user. This indirectly spawns a cascade of signatures and countersignatures based on other file updates made by this or other users.

Activating or deactivating signatures for a user does not affect the previous signatures or the cascades produced by other users.

Examples

continuum:
  primary_hub: primary
  keychain: ./credentials/keychain
  autocreate: false
  autoexpand: false
  relay: myrelay_1
  auto_retry: 30 s
  signature:
    private_key: john.doe@cosnim.com
    sign:
      - files

Filesystem Configuration

This section defines the filesystems that are hosted on a continuum.

Note

Currently, operating only one active filesystem per continuum is recommended. If you need multiple filesystems, the preferred approach is to use different continuums. See the prefix and namespace options in storage hubs for easy ways of hosting multiple continuums on the same physical storage.

filesystems

Synopsis

FILESYSTEM_NAME:
   version: [ 1 | 2 ]
   share_level: [ 0 - 3 ]
   merge_policy: [ drop | replace | rename_old | rename_new ]
   pacing: DURATION
   iosize: SIZE
   stripe_size: SIZE
   inline_size: SIZE
   dir_cache_size: COUNT
   file_cache_size: COUNT
   priority_data:
     - offset: 0
       size: NUMBER
   maxqueue: NUMBER
   maxfiles: NUMBER
   timetravel_marker: [ ~~~ | STRING ]
   trace_api: BOOLEAN
   dedup:
     algo: HASH_ALGO
     minsize: NUMBER
     update: BOOLEAN
     verify: BOOLEAN
     verify_file: BOOLEAN
   security:
     idmap:
       global:
         users:
           USER_NAME: UID
           ⋮
         groups:
           GROUP_NAME: GID
           ⋮
   rindex: BOOLEAN
   metadata:
       ds_store: BOOLEAN
       inode: BOOLEAN
FILESYSTEM_NAME

Name of the filesystem. The primary (or only) filesystem should be named ‘root’.

version: [ 1 | 2 ]

Filesystem version. Version 2 is recommended; version 1 exists only for legacy support with older versions of Cosnim.

This parameter only affects how new data is written. The filesystem always operates in hybrid mode, supporting multiple versions simultaneously. All existing data remains in the original format it was written until updated or deleted.

share_level: [ 0 - 3 ]

Determines how the filesystem will be shared in updates with other users. Please use the appropriate level according to your actual needs. Higher sharing levels may be more efficient at sharing with other users but also add processing and storage overhead. You can freely intermix different share levels (except level 0) between users. You should, therefore, activate higher levels only when needed. Share levels apply only to updates, can be changed at any time, and never prevent simultaneous users from reading the filesystem or continuum.

Sharing Levels

0 - Unprotected

This share level offers no sharing protection and is optimized for the highest performance and storage efficiency. It is ideal for isolated single-use continuums where simultaneous updates by multiple users or Cosnim instances are impossible, either due to tight control over storage hubs, because it is an ad hoc setup, or due to operational certainty. Multiple users can access a filesystem and continuum for reading without constraints.

Warning

Using this share level in environments where multiple users or Cosnim instances could potentially attempt to update the continuum can lead to severe corruption. If uncertain, use share_level 1 instead.

1 - Single

Designed for filesystems where only one user is expected to update the continuum at the same time. Contrarily to share level 0, the continuum and filesystem are fully protected against accidental sharing at the cost of some slight processing and storage overhead. Should multiple users or instances running at this share level attempt to update the continuum simultaneously, one of the users/instances attempting updates will fail. This does not affect the other users running at share level 2 or higher, nor does it prevent other users from simultaneously reading the continuum or filesystem.

2 - Multiple

Use this level for filesystems where multiple users may update the continuum simultaneously while looking for higher performance. Locks are implemented only for local users, and update conflicts between remote users are managed at the continuum level according to the merge_policy. A relay is not required for share level 0 operations, but using one will increase the speed at which users will be aware of other’s updates.

3 - Global

NOTE: This share_level is temporarily disabled.

Use this level for globally shared filesystems with global lock propagation. This makes the filesystem operate as a local filesystem and with fully distributed locks. Relays are mandatory. Network latency or outages will affect the filesystem’s performance and availability and may cause applications to hang or fail; this sharing level should be used only in environments with strong network stability.

merge_policy: [ drop | replace | rename_old | rename_new ]

Determines how update conflicts are managed and merged when two users simultaneously update a given file or shared information such as directory metadata. Conflicts may occur more frequently when applications make changes without holding locks or using a share level of 2 or lower.

Conflicts are detected and managed by the “losing” user, that is, the user whose changes are determined to have collided with another user’s updates that have already been published in the continuum. The merge_policy parameter determines how the conflicting update is managed and merged into the continuum:

drop

The new conflicting updates are dropped and discarded. The original file or element is left intact.

replace

The new update is saved and replaces the other user’s changes. The original file or data element remains available in Time-Travel.

rename_old*

The original file or data element is renamed by appending the current timestamp to its name and the new update is recorded.

rename_new

The new update is renamed by appending the current timestamp to its name. The original file or data element is left unchanged.

pacing: DURATION

Determines how fast filesystem updates should be committed to physical storage.

When the pacing value is 0 (no pacing), all updates are instantly pushed to physical storage as soon as an application or OS requests it, for example, with a fsync() or storage sync() request. Depending on the applications and the OS, this can result in a large amount of very small updates and tiny capsules, which can be inefficient for storage.

When the pacing value is greater than 0, updates are provisionally accumulated in memory for the specified duration and then committed to physical storage. This can significantly optimize storage use.

Pacing affects a system’s Recovery Point Objective (RPO), that is, how much data could be lost in an outage or disaster. The effective RPO of a filesystem is equal to the pacing value plus the time it takes for the capsules to be transmitted to storage, which is influenced by how the storage hubs are configured.

The recommended pacing values are:

  • Live storage: 500 ms

  • vaulting: 1 second

  • backups: 5 to 30 seconds

iosize: SIZE

The official size of filesystem I/O’s as communicated to applications and the OS. This is equivalent to the traditional “blocksize”, but without impacting the physical storage organization.

stripe_size: SIZE

The size of the data stripes that are written to physical storage. When large files or data elements are created or updated, Cosnim automatically splits them up in smaller stripes to optimize physical I/O and deduplication efficiency. Stripes are bundled with metadata and other users’ or applications’ data inside capsules. Larger stripe sizes reduce the number of split capsules and data elements, improving I/O efficiency but consuming more storage during updates. Inversely, smaller stripe sizes optimize storage use, particularly when there are a lot of small or random updates, but increase the processing overhead and the size of the deduplication database.

The following stripe sizes are recommended:

  • Live storage: 256 KiB. This generally gives a good balance between performance and storage efficiency for relatively random data and regular updates. If there are a lot of very small updates, you can decrease the stripe size further, but you should also consider deactivating deduplication or increasing the deduplication ‘minsize’ to account for the fact that small, frequent updates rarely benefit from deduplication.

  • Vaulting & backups: 1 MiB to 2 MiB. This increases performance and optimizes deduplication by managing data in larger chunks. It is also a good choice when hosting large files such as video, audio and images.å

inline_size: SIZE

The threshold at which file data is stored inline with the metadata instead of a separate element or capsule. This helps optimize performance and storage efficiency for very small files.

The recommended inline size is 64 bytes. You could decrease this value down to 16 bytes if you wish to keep metadata capsules the smallest possible, and you could increase this value up to 1 KiB to increase metadata bundling opportunities without too much impact. Going beyond these bounds will probably not result in additional benefits.

dir_cache_size: COUNT

The number of directories that should be kept in the internal filesystem cache. Keeping directories in cache can increase performance significantly, especially when searching through files and directories during Time-Travel explorations and full backup operations.

The recommended directory cache size is between 1,000 and 20,000, depending on the number of files and directories in the continuum.

file_cache_size: COUNT

The number of files that should remain in the cache after being closed. Both data and metadata are kept in memory. This can significantly improve performance when files are repeatedly closed and re-opened, which happens frequently with applications and system tools.

The recommended file cache size is 100. Beware that increasing the file cache can increase RAM usage considerably.

priority_data:

This is a performance option for specialized applications. It embeds part of the file’s data with the metadata to increase read performance when reading this particular data. It can help optimize the performance of special applications and scanning tools that look, for example, at magic numbers or special headers in a large number of files.

At the moment, only one priority data segment can be defined. This will change in the future.

offset: 0

The offset of the priority data. For now, only zero is accepted.

size: NUMBER

The size of the priority data. This depends on the applications and tools. Choose a size that matches how much data the applications or tools will typically attempt to read from the files. Tracing the applications can help determine the proper size.

maxqueue: NUMBER

The maximum number of updated files and directories that can be pending while pacing is active. If the number of files and directories waiting to be flushed exceeds this number, they are flushed to storage immediately, even if the pacing interval has not yet expired. This helps to push changes to storage faster and more efficiently when there’s a lot of activity.

The recommended maxqueue is 1,000, with a minimum of 100 and a maximum of 10,000.

maxfiles: NUMBER

The maximum number of files that can be opened at the same time by the current Cosnim instance. This excludes other Cosnim instances or users accessing the filesystem simultaneously. When the number of open files exceeds this number, the open requests are rejected. This value should match or exceed your operating system’s limits.

The recommended maxfiles value is 1024, which is usually sufficient.

timetravel_marker: [ ~~~ | STRING ]

The file and directory name marker that is used to trigger Time-Travel within the filesystem. When this string is present in a name, and no existing file or directory matches that name, this opens up a Time-Travel portal into the data’s history.

The default and recommended value is “~~~”, the Cosnim standard.

trace_api: BOOLEAN

When this option is true, it turns on the internal Cosnim filesystem API traces to display all the internal requests made to the filesystem, either by direct applications or interfaces such as FUSE. Beware, this option can produce a lot of output.

dedup:

Controls the deduplication engine, which reduces storage consumption by re-using existing data stripes when applications and users produce duplicate data.

It works by calculating the hash value of data stripes as the data is written and updated by applications, and then searching an internal database for a duplicate data stripe with the same hash value. If duplicate data is detected, the new file reuses the same data stripe(s), thereby reducing storage consumption.

To enable deduplication, configure this section. The deduplication database is saved and maintained directly in the continuum. To temporarily suspend deduplication use the ‘update’ parameter. To disable deduplication completely, omit this section; this stops deduplication but does not destroy the deduplication database. Deduplication can be re-enabled and disabled at any time, and users don’t need to enable deduplication at the same time.

The dedup options follow:

algo: HASH_ALGO

Identifies the hashing algorithm used to detect and catalog deduplicated data. The best is to choose a fast algorithm with excellent performance and low collision risks. You can view all of Cosnim’s supported hashing algorithms and their performance with the cosnim test hashes command.

The collision rate of a hash algorithm is important. If two different stripes of data produce the same hash and the ‘verify’ option is false, the deduplication engine may incorrectly consider them equal, leading to data corruption. To reduce the risk of collisions, use algorithms that produce large hashes. To eliminate the risks of collisions, set the ‘verify’ option to true.

Suggested algorithms:

  • algo=xxh3-128, verify=no: The provides an efficient compromise, with excellent performance and a reasonably low risk of collision of 2.7 x \(10^{-9}\) with 1 billion deduplicated data stripes.

  • algo=sha256, verify=no: SHA256 provides practical collision-proof protection with a low risk of collision of 4.3 x \(10^{-60}\) with 1 billion deduplicated data stripes. However, it consumes considerably more resources than xxh3-128 and produces a significantly larger deduplication database.

  • algo=xxh3-64, verify=yes: Produces the smallest deduplication database and is guaranteed never to collide. Inversely, it induces substantial overhead as it needs to re-read each deduplicated data stripe to compare it with the current strip before confirming deduplication. Hence, it is not ideal in an environment where the data needs to be re-read from cloud or remote storage.

minsize: NUMBER

The minimum stripe size to deduplicate. Stripes smaller than this are not deduplicated. This helps to optimize the deduplication process without compromising too much on the amount of data deduplicated.

The recommended deduplication minsize is 4 KiB, but you can go as far down as 64 bytes for small private continuums, and at or near the stripe size if you want to deduplicate only very large files.

update: BOOLEAN

Indicates if the deduplication database should be updated to include newer data produced (true) or if the deduplication database should be used in read-only mode solely to detect if the data matches previously tracked data (false).

verify: BOOLEAN

Indicates if new data identified as potential duplicate should be compared and verified with actual data in storage to ensure it is perfectly identical (true) or if the hash values are sufficient to determine if the data is duplicated (false).

Verifying data before deduplication (true) can be costly in terms of time and resources. It should be activated only if the original data is readily accessible, for example, in local storage.

The recommended value is false combined with a hashing algorithm with a very low collision rate, such as xxh3-128 or higher.

verify_file: BOOLEAN

Indicates if the deduplicated data should be verified if the updated data’s deduplication hash matches the current file’s contents in the same file position. When such a duplicate is found in the same file at the same position, there is a high probability that the application or tool is simply re-writing the same data to the same file. Specifying false for this option and true for the ‘verify’ option reduces the overhead for same file deduplication while ensuring global data integrity.

security:

Manually controls the security profile of the filesystem. This section is currently used only for the global idmap.

The idmap translates foreign UIDs and GIDs that may be found in the shared filesystem back to the local filesystem. This may be necessary if systems sharing the continuum don’t have the same security definitions. It is defined as follows:

::
security:
idmap:
global:
users:

USER_NAME: UID ⋮

groups:

GROUP_NAME: GID ⋮

Where ‘USER_NAME: UID’ maps a global ‘UID’ found in the continuum to the local user ‘USER_NAME’, and ‘GROUP_NAME: GID’ maps a ‘GID’ found in the continuum to the local group ‘GROUP_NAME’.

For example, if a user ‘johndoe’ has a local UID of 1001, but its data should be stored under UID 201001 in the continuum, the following mapping will force all items created by this user to use a UID of 201001 while still reflecting this as UID 1001 on the local system.

::
security:
idmap:
global:

johndoe: 201001

rindex: BOOLEAN

Indicates if the filesystem should automatically generate rindex values for new data. The rindex (randomness or ransomware index) is a number between 1 and 100 that indicates the randomness of data. It is recorded directly in the immutable metadata. The rindex can be queried during audits, and can help detect abnormal changes to data that may be typical of a ransomware encryption attacks on the source data.

metadata:

Indicates which metadata should be maintained for files and directories, as follows:

ds_store: BOOLEAN

Preserves (true) or ignores (false) ‘.DS_Store’ files on MacOS. These files contain user preference Finder information, such as folder view options. These files can consume storage and interfere with other users’ use of the shared storage. When this option is false, this reduces the load on the filesystem and the number of trivial Time-Travel points.

inode: BOOLEAN

When true, this freezes file inode numbers in metadata. When false, inode numbers are generated dynamically on a need-to basis. This is for special applications only.

Recommended value: false

Global Defaults

defaults

This section defines configuration-wide default parameters for other sections, such as storage hubs. It helps reduce configuration file clutter and normalize parameters.

Synopsis

defaults:
  hubs:
    cache: ...
    encryption: ...
    group_size: ...
    key: ...
    compression: ...
    compression_level: ...
    max_capsule_size: ...
    max_elements: ...

Options

hubs: [...]

Sets the default parameters that apply to all storage hubs in the current configuration. This allows for better standardization. Please see the hubs primary section for the definition of these parameters.

Gateways Configuration

gateways

This section defines storage gateways. Gateways are small servers that provide private cloud capsule storage services to continuums by making local or enterprise storage available to many users instead of relying on cloud services. Gateways may also be used to centralize or firewall access to cloud services to protect storage infrastructure further.

Gateways internally leverage their own relays to manage network communications and, therefore, share many of the same parameters. Refer to the ‘relays’ section for details on those parameters.

Synopsis

gateways:
  GATEWAY_NAME:
    client:
      url: URL
      security:
        private_key: KEY_NAME
        ca_cert_file: FILE
    server:
      type: capsule
      url: URL
      security:
        private_key_file: FILE
        ssl_cert_file: FILE
        auth_keychain: [ BOOLEAN | FILE ]
      ...HUBOPTIONS...
      immutable: BOOLEAN

Options

GATEWAY_NAME:

Name of the gateway. Refer to this name in storage hubs that use the gateway as a client and in the cosnim start gateway commands used to start the gateway servers.

A gateway definition can have a ‘client’ section, a ‘server’ section, or both. When sharing gateways with multiple users, consider storing the gateway configuration in a separate file and which can then be ‘!include’ in the client’s configurations.

client:

Begin the definition of a gateway’s client parameters. The following options are identical to those of relays. See the ‘relays’ section for further details:

client:
  url: URL
  security:
    private_key: KEY_NAME
    ca_cert_file: FILE
server:

Begin the definition of a gateway’s server parameters. The following options are identical to those of relays. See the ‘relays’ section for further details on these:

server:
  url: URL
  security:
    private_key_file: FILE
    ssl_cert_file: FILE
    auth_keychain: [ BOOLEAN | FILE ]

The following options are exclusive to gateways and are described below:

server:
  type: capsule
 immutable: BOOLEAN
 ...HUBOPTIONS...
type: capsules

Identifies the type of storage hub that this gateway will be leveraging. At the moment, only Capsule Hubs are allowed.

...HUBOPTIONS...

Include in this section all the storage hub parameters that define the storage this gateway will serve. The parameters are identical to those of a Capsule Hub.

immutable: BOOLEAN

When true, the gateway operates purely as an immutable WORM-like (Write-Once-Read-Many) storage system, strictly prohibiting all capsule updates and deletions. This protects the physical storage against encryption, destruction and corruption attacks while providing all services to users. It is recommended to set this option to true.

This option may also protect other types of hubs, such as cloud storage, to create a security firewall between clients and cloud storage providers, further shielding that physical storage against attacks.

Storage Hubs Configuration

hubs

This section defines storage hubs, which dictate how and where capsules are stored in the continuum. There are multiple types of hubs, each with a specific purpose:

Capsule Hubs

This foundation hub is responsible for assembling, compressing, encrypting, and storing capsules in a specific location, such as local storage, a gateway or a cloud storage service. All other hubs ultimately send their data to capsule hubs for actual storage.

Split Hubs

Redistribute elements to different hubs according to the element types, such as control data, metadata and actual user data. This effectively splits capsules by function. They help to optimize data storage usage and general performance.

Staging Hubs

Upload, download, and cache capsules in stages. Stages are useful to optimize I/O performance, manage local capsule storage caching, and reduce network congestion.

Scatter Hubs

Scatter, replicate and distribute capsules to multiple locations to provide high resiliency to outages and disasters. This is also known as asymmetric replication.

See also the Providers subsection for the definition of providers, which are the last-mile interface to actual storage.

Synopsis

hubs:
  HUB_NAME:
    type: [ capsule, split, staging, scatter ]
    ...HUB_CONFIGURATION...
HUB_NAME:

Name to give to the hub. The name must be unique among all hubs and providers. One hub has to be designated as the ‘primary_hub’ in the continuum configuration. Ideally, the hubs configuration section should start with this primary hub, often named ‘primary’, followed by the subordinate hubs.

type: [ capsule, split, staging, scatter ]

Type of hub being defined. The default is ‘capsule’ for Capsule Hubs.

...HUB_CONFIGURATION...

All other options depend on the hub’s type, as described below.

Capsule Hubs

These hubs assemble, compress, encrypt, and store data in capsules and reverse the process when retrieving data. Capsule hubs use providers to access physical storage, such as local, network, enterprise or cloud storage.

Synopsis

hubs::
  HUB_NAME:
    **type: capsule**
    key: KEY_NAME
    group_size: NUMBER
    max_capsule_size: SIZE
    max_elements: NUMBER
    compression: COMPRESSION_ALGO
    compression_level: NUMBER
    encryption: ENCRYPTION_ALGO
    cache: CACHE_NAME
    read_prio: PRIORITY
    write_prio: PRIORITY
    provider:
       ...PROVIDER_CONFIGURATION...

Options

type: capsule

Identifies this hub as a Capsule Hub. This is the default.

key: KEY_NAME

Name of the encryption key block in the keychain to encrypt the capsules’ contents.

The default key name is ‘default’.

group_size: NUMBER

The size of capsule groups. Most providers use a form of name hierarchy similar to directories to organize and retrieve capsules efficiently. The group size determines the maximum number of capsules that can be put in a given group/directory. This, in turn, controls the directory structure and hierarchy. The group size must be a power of 2 and is a permanent parameter – do not change it once capsules have started to be written to storage.

Recommended group_size: 4096

max_capsule_size: SIZE

Sets the ideal maximum size of a capsule in storage. Beware that capsules may be larger than this size due to mandatory payload.

This is an important tuning parameter that can significantly impact performance and storage efficiency. The recommendation is to first split capsules according to group type (see Split Hubs) and then use smaller sizes for control and metadata capsules and a large size for data capsules.

Split Hub objtype

Suggested max_capsule_size

control

16 KB

metadata

265 KB

data

10 x the filesystem’s stripe_size or iosize

max_elements: NUMBER

Sets the maximum number of data elements (aka “objects”) that can be stored in a given capsule. This tuning parameter limits the size and density of capsules and helps avoid having too many small data elements in a given capsule. In most cases, the recommended values below will provide optimal results.

Recommended max_elements: 1000

compression: COMPRESSION_ALGO

Activates capsule data compression. The only compression algorithm currently available is ‘zlib’ (other algorithms will be added in the future). To disable compression, omit or set this option to ‘null’.

compression_level: NUMBER

Set the compression level to NUMBER. For zlib, the default and recommended compression level is 6, but you can use any compression level between 1 and 9 according to your particular needs. Do not use compression level 0; this will needlessly increase processing time; instead, remove or nullify the compression algorithm altogether, giving you better performance.

encryption: ENCRYPTION_ALGO

Activates the encryption of capsules using the given algorithm. Use the command cosnim test encryption to see all encryption algorithms available and their performance on a particular platform.

Recommended value: aes-256-cbc

It is strongly recommended to encrypt capsules with AES-256 (CBC), as it is highly secure, fully vetted, quantum-safe, and gives good performance on all platforms.

cache: CACHE_NAME

Name of the cache in which capsules can be kept in RAM to improve performance. See the ‘caches’ section for additional information.

read_prio: PRIORITY

Sets the read priority of this hub relative to other hubs within a scatter group. A read_prio of 1 is the highest priority. See Scatter Hubs for details.

write_prio: PRIORITY

Sets the write priority of this hub relative to other hubs within a scatter group. A write_prio of 1 is the highest priority. See Scatter Hubs for details.

provider:

Defines the storage provider that connects the hub to the actual physical storage or cloud service. See the ‘providers’ section for the complete list of providers and their parameters.

Split Hubs

Split hubs redistribute data elements (aka objects) to different capsules and hubs according to the type of those elements. This increases performance and reduces overhead for some data types, such as metadata, by regrouping them together in smaller, more efficient capsules. Using a split hub as the continuum’s primary hub is highly recommended.

Synopsis

hubs:
  HUB_NAME:
    **type: split**
    interleave: BOOLEAN
    hub_types:
      HUB_NAME:
        objtypes: OBJTYPES
      ⋮

Options

type: split

Required to identify a hub as a split hub.

interleave: BOOLEAN

Indicates if elements can interleave between capsules (true) or should be in isolated capsules (false). This option is no longer recommended and should always be omitted or set to false.

hub_types:

Defines how object types are split and which hubs the elements are redirected to. For each subordinate hub:

HUB_NAME

Name of the subordinate hub. Only the name of the hub is given here. The hub itself must be defined separately in the ‘hubs:’ section.

objtypes: OBJTYPES

Defines the types and groups of elements (objects) that should be assigned to this subordinate hub. You should use object type groups whenever possible:

objtypes group

Description

data

User data (file data, streams, …)

matadata

User metadata (file info, directories, embedded data, …)

dedup

Deduplication information

security

Signature cascades, certificates & security info

control

Internal continuum controls

system

General system management (Time-Travel, Reclaimer, Acceleration, …)

It’s strongly recommended to use split hubs and redirect at least data elements to a different hub and capsules – this increases efficiency and performance significantly. Large and shared continuums should also consider splitting control data capsules, which have very different storage & retrieval profiles than other capsules.

The recommended split hubs layouts are:

Shared continuums

For continuums that are shared with other users (filesys.share_level >= 3), a split hub to redistribute data to three hubs is recommended (one hub for data, one for control, and a default hub for everything else):

hubs:
  primary:
    type: split
    hub_types:
      control:
        objtypes: control
      system:
        objtypes: default
      data:
        objtypes: data

Private continuums

A two-tier setup is more efficient for small and private continuums that are not actively shared, such as backups and single-user mode (filesys.share_level = 2).

hubs:
  primary:
    type: split
    hub_types:
      system:
        objtypes: default
      data:
        objtypes: data

Note

Please see the Capsule Hubs definition for recommended capsule sizes for each hub type.

Staging Hubs

Staging hubs are special Capsule Hubs that transport capsules from one provider to another in stages. They help to manage local caches for performance. They also help to smooth out temporary network congestions by first writing capsules to local storage and uploading them asynchronously to remote or cloud.

Synopsis

hubs::
  HUB_NAME:
    **type: staging**
    ..CAPSULE_HUB_CONFIGURATION..
    keep_cached: BOOLEAN
    safe_to_purge: FILE
    max_memory_size: SIZE
    stages:
     - provider:
       ...PROVIDER_CONFIGURATION...
     ⋮

Options

type: staging

Identifies this hub as a Staging Hub.

keep_cached: BOOLEAN

Specifies if the first stage hub is used for caching (true) or transit (false). When true, the first stage hub will always keep a copy of the capsules it handled, even if they are now in the second stage. When false, capsules are deleted from the first stage hub as soon as they are successfully written to the second stage.

safe_to_purge: FILE

Name and path of the safe-to-purge file that contains the list of all the capsules stored in the first stage that are safe to purge. This list can be used to safely delete capsules from local storage to free up space.

max_memory_size: SIZE

The maximum amount of RAM the staging hub can use to keep a copy of capsules waiting to be transmitted to the second stage. When this size is exceeded, the staging hub will purge the pending capsules from RAM and will instead re-read the capsule from the first stage / local storage when the second stage is ready to accept capsules.

stages: [...]

Defines the capsule storage stages. Each stage must have a provider configuration that defines where the capsules will be physically stored. Currently, only two stages can be defined in a staging hub. The first stage should be faster storage than the second stage, for example, local storage. The second stage should be the ultimate destination of the capsules, for example, cloud or enterprise storage.

The order in which capsules are written to the stages is determined by the capsules’ content, purpose and priority. Most capsules are stored first in the first stage and then uploaded to the second stage as network conditions permit. Some capsules, such as those handling continuum controls, will instead be written to the second stage first to formally broadcast their contents and then written to the first stage for caching purposes.

Scatter Hubs

Scatter hubs are used to scatter, distribute and replicate capsules asymmetrically to multiple locations to provide high resiliency to outages and disasters.

Synopsis

hubs::
  HUB_NAME:
    **type: scatter**
    hubs: [ HUB_NAME, ... ]
    read_policy: priority
    write_policy: [ roundrobin | priority | random ]
    read_prio: NUMBER
    write_prio: NUMBER
    min_copies: NUMBER
    min_hubs: NUMBER
    fast_resume: BOOLEAN
    distribute: BOOLEAN

Options

type: scatter

Identifies this hub as a Scatter Hub.

hubs: [ HUB_NAME, ... ]

Name of the subordinate hubs to which capsules will be distributed. The order is unimportant. The subordinate hub’s ‘read_prio’ and ‘write_prio’ may impact capsule distribution depending on the scatter hub’s ‘read_policy’ and ‘write_policy’.

read_policy: priority

Determines how hubs are to be selected when reading capsules. Currently, the only option available is ‘priority’, meaning hubs will be selected in order of priority. Also see ‘read_prio’ in Capsule Hubs.

write_policy: [ roundrobin | priority | random ]

Determines how hubs are selected when writing capsules. The available options are:

priority

Hubs are selected in order of priority. See ‘write_prio’ in Capsule Hubs. You’ll probably also want to set distribute=false with this policy if you wish all capsules to be written primarily to the higher-priority hubs.

roundrobin

Hubs are selected in a round-robin fashion, one after the other, skipping over hubs that are not available.

random

Hubs are selected randomly among the currently available hubs.

min_copies: NUMBER

The minimum number of copies to write to the subordinate hubs. This establishes the outage tolerance of the scatter hub. For example, if min_copies=2, at least two copies of capsules will be written to different hubs, meaning that any one hub may go down without affecting data availability.

min_hubs: NUMBER

The minimum number of hubs that must be online for the scatter hub to be considered online. The default and minimum is the total number of hubs minus the ‘min_copies’, plus 1. For example, a scatter hub with three hubs and a ‘min_copies’ of 2 would need a minimum (min_hubs) of 2 online hubs. You may set a min_hubs value superior to the minimum. If the number of available hubs goes below this threshold, the scatter hub’s status is changed to ‘fractured’. Depending on the other hubs’ configuration, this may escalate the fractured status up to the continuum, disabling its operation.

distribute: BOOLEAN

When true, the scatter hub attempts to distribute the capsules more or less evenly across all available hubs. When distribute is true, the write_policy only affects the order in which hubs are selected for writing; all hubs will ultimately receive capsules or more or less the same quantity, provided they are online.

When false, the scatter hub does not attempt to distribute capsules evenly and obeys the write_policy strictly. For example, if write_policy=priority, only the highest priority hubs will usually receive capsules. All other hubs will be ignored except when one or more higher-priority hubs are temporarily unavailable.

Providers

Providers are the bridge and last-mile interface between the storage hubs, which assemble and manage capsules, and the actual physical or cloud storage where capsules are stored. They leverage internal adapters that understand how a particular storage system or service should be accessed. Providers are integral to the storage hubs’ configuration and should be included anywhere there is a reference to PROVIDER_CONFIGURATION in the hub’s configuration.

Synopsis

There are multiple classes and types of providers, each with its own set of options:

Local/Enterprise Storage

provider:
  name: PROVIDER_NAME
  type: file
  path: PATH
  prefix: PREFIX
  namespace: NAMESPACE
  autocreate: BOOLEAN
  relay: RELAY_NAME
  remote: BOOLEAN
  maxqueue: NUMBER

Gateways

provider:
  name: PROVIDER_NAME
  type: gateway
  gateway: GATEWAY
  namespace: NAMESPACE

Cloud Object Storage

provider:
  name: PROVIDER_NAME
  type: amazon_s3 | azure_blob | tencent_cos | digitalocean_spaces | google_gcs | backblaze_s3 | ovh_s3 | wasabi
  region: REGION
  bucket_name: BUCKET_NAME
  prefix: PREFIX
  namespace: NAMESPACE
  autocreate: BOOLEAN
  host: HOSTNAME
  credentials: [ key: KEYNAME | file: FILE ]

DynamoDB

provider:
  name: PROVIDER_NAME
  type: amazon_dynamodb
  region: REGION
  bucket_name: BUCKET_NAME
  prefix: PREFIX
  namespace: NAMESPACE
  autocreate: BOOLEAN
  region: REGION
  host: HOSTNAME
  credentials: [ key: KEYNAME | file: FILE ]
  read_capacity_units: NUMBER
  write_capacity_units: NUMBER
  exceed_capacity_backoff: NUMBER

Dropbox

provider:
  name: PROVIDER_NAME
  type: dropbox
  prefix: PREFIX
  namespace: NAMESPACE
  autocreate: BOOLEAN
  credentials: [ key: KEYNAME | file: FILE ]
  rate_limit_backoff: NUMBER

Options

name: PROVIDER_NAME

Assigns a specific name to the provider. If omitted, Cosnim generates a name from the hub’s configuration.

type: TYPE

Defines the type of provider:

file

Uses local native filesystems to store and retrieve capsules. Capsules are organized in directories to to lighten the load on local filesystems.

gateway

Uses a Cosnim gateway to access capsule storage. The gateway acts as a bridge between a Cosnim instance and the actual storage provider. Gateways are often used to build a private cloud storage environment shared among many users as an alternative to cloud storage services.

amazon_s3 | azure_blob | tencent_cos | digitalocean_spaces | google_gcs | backblaze_s3 | ovh_s3 | wasabi

Uses public cloud object storage services to store and retrieve capsules.

amazon_dynamodb

Uses Amazon DynamoDB to store and retrieve capsules. This provider should be used only for the small ‘control’ and ‘system’ capsules that benefit highly from this type of storage. Larger data and metadata capsules should be written to other types of providers.

dropbox

Leverages Dropbox for capsule storage. Although perfectly functional, this service has performance limitations due to the Dropbox services limitations; it should be used mostly for low-priority & low-cost storage.

autocreate: BOOLEAN

When true, the provider and underlying adapters can automatically create the underlying storage structures, such as the directories, buckets and paths required to store capsules. When false, providers and adapters may do this only during a cosnim create continuum or cosnim expand continuum command.

Applicable to all provider types except gateways.

path: PATH

Path to the local or enterprise storage filesystem where capsules will be stored.

Applicable to provider types: file

region: REGION

Identifies the cloud storage provider’s region where the bucket is located.

Applicable to the following providers:

  • amazon_dynamodb

  • amazon_s3

  • backblaze_s3

  • cloudflare_r2

  • ovh_s3

  • tencent_cos

  • wasabi

bucket_name: BUCKET_NAME

Name of the cloud object storage bucket in which capsules will be stored.

Applicable to all cloud storage providers.

prefix: PREFIX

This is the fixed prefix to prepend to all capsule names and paths prior to storing and retrieving them. It permanently subdivides the primary provider’s storage, for example, a bucket, for specific uses. The provider never attempts to access data, for example, objects, files or database records outside of this prefix. See ‘namespace’ below for further description of hub storage subdivision and capsule naming conventions and how they relate to prefixes.

Applicable to all providers.

namespace: NAMESPACE

Subdivides a provider’s storage for a specific continuum. The namespace is appended to the provider’s bucket name/filesystem path and the prefix to create a complete path to this continuum’s capsules. The full path of capsules then becomes:

For cloud storage: BUCKET_NAME / PREFIX / NAMESPACE / CAPSULE_PATH For local storage: PATH / PREFIX / NAMESPACE / CAPSULE_PATH

Gateways use namespaces primarily to share a physical storage space with multiple users and continuums. When gateways are not used, a prefix is usually sufficient to subdivide storage.

The recommendation is to:

  • Use ‘bucket_name’ and ‘path’ parameters to formally delimit the physical boundaries of the provider’s storage.

  • Use ‘prefix’ to direct the provider to use only a subset of that storage. The provider will not attempt to access storage outside of this boundary. The prefix can be part of the provider’s security profile in the physical storage.

  • Use ‘namespace’ to identify a particular continuum in the provider’s storage. A provider may access multiple namespaces.

Applicable to all providers.

credentials: [ key: KEYNAME | file: FILE ]

Supplies the access credentials the provider and its adapter need to connect to the cloud storage service. The credentials can be read from the keychain or a separate file.

The precise format and contents of the credentials depend on the cloud storage provider. Where possible, Cosnim adapters use the json format as follows:

Provider type

Credentials format

amazon_dynamodb

{“aws_access_key_id”: “…”, “aws_secret_access_key”: “…”}

amazon_s3

{“aws_access_key_id”: “…”, “aws_secret_access_key”: “…”}

azure_blob

DefaultEndpointsProtocol=https;AccountName=…;AccountKey=…;…

backblaze_s3

{“access_key_id”: “…”, “secret_access_key”: “…”}

cloudflare_r2

{“access_key_id”: “…”, “secret_access_key”: “…”}

dropbox

[…]

google_gcs

{“type”: “service_account”, “project_id”: “…”, “private_key”: “…”, …}

ovh_s3

{“access_key_id”: “…”, “secret_access_key”: “…”}

tencent_cos

{“secret_id”: “…”, “secret_key”: “…”}

wasabi

{“access_key_id”: “…”, “secret_access_key”: “…”}

Applicable to all providers.

gateway: GATEWAY

Name of the gateway to use. The remainder of the configuration is in the ‘gateways:’ section under ‘client:’.

Applicable to gateway profilers.

host: HOSTNAME

Override the hostname of the cloud storage provider. This parameter is set by default by Cosnim adapters.

Applicable to cloud providers.

relay: RELAY_NAME

Name of the relay to use to accelerate access to this provider. Storage hubs use relays to cache and exchange capsule rosters (inventories) with other storage hubs and users and between executions; this reduces the number of queries a given provider will need to make to the cloud storage service, increasing performance and reducing cloud access costs. Relays can be used whether or not continuums are shared with other users.

Applicable to all providers.

remote: BOOLEAN

When true, the provider’s adapter is run in a separate remote process to provide better isolation. This may also help some adapter’s performance.

Recommended value: true for ‘azure_blob’, ‘dropbox’ and ‘google_gcs’, false for all others.

maxqueue: NUMBER

Establishes how many concurrent requests may be queued to a remote adapter. When this number is exceeded, the storage hub will accumulate the additional requests until the adapter catches up.

The recommended value is 10.

Applicable only to providers using ‘remote=true’.

rate_limit_backoff: NUMBER

Set the starting backoff delay in seconds when an adapter exceeds the rate limit of cloud access requests. The adapter will slow down the rate of requests starting with this value and keep doubling it until the cloud service stops returning errors.

The default value is .05

Applicable to Dropbox only.

read_capacity_units: NUMBER

Sets the rate (per second) at which read requests are provisioned to DynamoDB. The read_capacity_units are set by the adapter before accessing the table. This affects performance and billing. See AWS documentation for details.

The default is 25.

Applicable to Amazon DynamoDB only.

write_capacity_units: NUMBER

Sets the rate (per second) at which write requests are provisioned to DynamoDB. The write_capacity_units are set by the adapter before accessing the table. This affects performance and billing. See AWS documentation for details.

The default is 25.

Applicable to Amazon DynamoDB only.

exceed_capacity_backoff: NUMBER

Set the starting backoff delay in seconds when the adapter is starting to exceed the read or write capacity units. The adapter will slow down the rate of requests starting with this value and keep doubling it until DynamoDB stops returning errors.

The default value is .5

Applicable to Amazon DynamoDB only.

Licensing Control

license

Identifies the software license to use for this instance.

Synopsis

license: FILE
license: FILE

Name and path of the license file to use.

Note

The license file may contain multiple licenses and be shared between users and servers. Cosnim will automatically pick the first license that matches the current host and environment.

Logging Control

logging

Defines how logging is to be handled.

Synopsis

logging:
  - type: [ stdout | stderr | file ]
    path: FILE
    verbosity: NUMBER
    events:
      - EVENT_NAME
logging: [...]

Defines the logging facilities to use. There can be more than one logging facility, each with its particular options.

type: [ stdout | stderr | file ]

Type and destination of this logging facility. Output can be directed to standard output, standard error or a particular file.

The default is stderr.

path: FILE

Name and path of the file to receive logging. Use when type=file.

verbosity: NUMBER

Verbosity of the messages. Use positive numbers to increase the amount and details of messages, and negative numbers to reduce logging:

Verbosity level

Effect

0

Normal logging. Includes informational messages (default).

-1

Reduced logging. Includes notices, warnings, responses and errors.

-2

Reduced logging. Includes warnings, responses and errors.

-3

Minimal logging. Includes responses and errors only.

-4

Suppressed logging. Includes errors only.

-5

Suppressed logging. Includes fatal/disastrous errors only.

-6

Disabled logging.

1

Increased logging. Includes additional informational messages.

2

Increased logging. Includes all informational messages.

3

Increased logging. Includes informational messages and some traces.

4 and +

Debugging logging.

events: [...]

List of internal diagnostic events to include in the logging. Use as directed by Cosnim support.

Mounts

mount

Defines how the continuum filesystem should be mounted on the local machine to access it as a regular filesystem. This is used by the cosnim mount command.

Synopsis

mount:
  mountpoint: PATH
  volume: VOLUME_NAME
  fsname: FILESYS_NAME
  threads: BOOLEAN
  readonly: BOOLEAN
  default_permissions: BOOLEAN
  iosize: SIZE
  direct_io: BOOLEAN
  allow_root: BOOLEAN
  allow_other: BOOLEAN
  auto_cache: BOOLEAN
  auto_xattr: BOOLEAN
  fuse_debug: BOOLEAN
  driver:
    type: cosnim
    filesystem: root
mount:

Begins a mount section. At the moment, only one filesystem may be mounted. To mount multiple filesystems, run a cosnim mount command separately for each mountpoint.

mountpoint: PATH

The path where the filesystem will be mounted and where users & applications may access the continuum’s filesystem. The mountpoint directory is automatically created and deleted if the --mkdir option of the cosnim mount command is set or allowed to default.

volume: VOLUME_NAME

The name of the volume as seen by the host operating system. This is currently used only under MacOS.

fsname: FILESYS_NAME

The name of the filesystem type that is presented to the OS. This helps to differentiate a Cosnim filesystem from others hosted on the same OS.

The default is ‘cosnim-fs’.

threads: BOOLEAN

Enables or disables the use of threads when running the mounted filesystem. Threads increase performance and responsiveness, but may make debugging easier.

The recommended and default value is true — Disable threads only when requested by Cosnim support.

readonly: BOOLEAN

Set to true if the filesystem should be mounted read-only and false if read & write access is to be permitted. The --ro and --rw options of the cosnim mount command may be used to override this configuration parameter.

iosize: SIZE

Sets the size of I/O operations of the filesystem. This is currently used only under MacOS. It should usually match the iosize of the continuum filesystem this mountpoint is using.

The default is 131072 (128KiB), which provides a good balance of performance vs buffering.

direct_io: BOOLEAN

When true, this activates direct I/O with the operating system. Direct I/O bypasses OS buffering, which can increase performance, but it can also decrease performance if applications are making small I/Os.

The recommended value is false unless directed by Cosnim support.

allow_root: BOOLEAN

Allows the root user to access the mounted filesystem.

By default, only the actual user that runs the cosnim mount command can access the filesystem. All other users are locked out for security reasons. This option allows the root user to access the mounted filesystem with full authority.

allow_other: BOOLEAN

Allow other users to access the mounted filesystem.

By default, only the actual user that runs the cosnim mount command can access the filesystem. This option allows other non-root users to access the mounted filesystem.

Note

You may mount a filesystem using a regular user or root, independently of how the filesystem is shared with other users.

auto_cache: BOOLEAN

Turns on FUSE automatic file caching. This should always be turned on (default) as it improves performance significantly. You may disable auto_cache if you suspect automatic caching interferes with file contents.

The default and recommended value is true.

fuse_debug: BOOLEAN

When true, turn on FUSE debugging. Beware, this produces a large amount of output and should be used only for debugging.

driver:

Defines the backend driver and filesystem to serve on this mountpoint.

The default is to use the Cosnim filesystem named ‘root’. You should change this option only if you named your filesystem in the ‘filesystems:’ section, something other than ‘root’.

::
driver:

type: cosnim filesystem: root

Relays Configuration

relays

Relays are very small servers (servicelets) that cache and exchange information between running Cosnim instances and between Cosnim executions to improve performance and reduce costs. Relays host the following services:

Rosters

Relay rosters collect, cache and share minimal capsule information between running Cosnim instances. This reduces the number of requests storage hubs need to make to the storage adapters, increasing performance and reducing costs, especially in the cloud. Rosters do not store or share any user data, metadata or other data that could leak information about the continuum’s content. Relays operating rosters can be shut down and restarted with no impact.

Distributed Locks

Relays, when present, will sometimes be used to operate and share internal locks with other instances to optimize sharing activities. This reduces the instances’ overhead and increases the general performance of the continuum in a shared environment.

Gateways

Gateways also use internal relays to provide connectivity between clients and gateway servers. These relays are independent of the relays defined here.

Synopsis

relays:
  RELAY_NAME:
    client:
      url: URL
      security:
        private_key: KEY_NAME
        ca_cert_file: FILE
    server:
      url: URL
      security:
        private_key_file: FILE
        ssl_cert_file: FILE
        auth_keychain: [ BOOLEAN | FILE ]

Options

RELAY_NAME:

Name of the relay. Refer to this name in storage hubs that use the relay as a client and in the cosnim start relay commands used to start the relay servers.

A relay definition can have a ‘client’ section, a ‘server’ section, or both. When sharing relays with multiple users, consider storing the relay configurations in a separate file and which can then be ‘!include’d in the client’s configurations.

client:

Begin the definition of a relay’s client parameters.

url: URL

URL of the relay, in the form tps[s]://HOST:PORT, that clients should use to connect to the relay server.

security:

Section to define the security identifiers and certificates to use to connect to the relay server.

private_key: KEY_NAME

Name of the private key to use to authenticate with the server. The key is read from the continuum’s keychain. It is required if the relay server is configured to authenticate client connections (‘auth_keychain’). You can create a key using the cosnim generate key --private .. command.

ca_cert_file: FILE

Name and path of the CA certificate file that authenticates the relay server. This parameter is required when connecting with SSL/TLS (protocol ‘tcps’). See the Installation and Configuration Guide for instructions on how to generate a relay CA key and certificate.

server:

Begin the definition of a relay’s server parameters.

url: URL

URL that the relay server will bind to and listen for client connections. Specify in the form tcp[s]://HOST:PORT. Use a HOST of ‘0.0.0.0’ to accept connections from any interface. There are two protocols supported:

tcp

Serves client requests on unencrypted TCP communication channels. This protocol does not disclose any sensitive or valuable information in clear text. As long as storage hubs are configured with encryption, all user data, metadata, and control information are already fully encrypted and quantum-safe in capsules prior to transmission. No other sensitive information is transmitted in clear text, and all client authentication is performed using an SSL-like challenge protocol, fully preserving confidentiality over unencrypted channels.

tcps

Services client requests over encrypted SSL/TLS communication channels. This is in addition to the quantum-safe encryption of capsules. An SSL/TLS channel can be used to increase the protection against man-in-the-middle attacks and make communication even more opaque. When ‘tcps’ is used, a ‘private_key_file’ and ‘ssl_cert_file’ must be provided to the server, and a ‘ca_certfile’ must be provided to clients. See the Installation and Configuration Guide for instructions on generating these certificates.

security:

Defines the security options of the relay server (see below).

private_key_file: FILE

Name and path of the SSL/TLS private key. Keep this key in a secure location. See the Installation and Configuration Guide for instructions on how to generate this key.

ssl_cert_file: FILE

Name and path of the SSL/TLS certificate that will be presented to clients that connect to the server. See the Installation and Configuration Guide for instructions on how to generate this key.

auth_keychain: [ BOOLEAN | FILE ]

Determines how clients connecting to the relay are authenticated:

FILE

Name of a keychain that contains the public keys of clients authorized to connect to this server. The relay uses a challenge protocol and public-key authentication to confirm the identity of the clients connecting.

Only users defined in this keychain can connect to the server. See the Installation and Configuration Guide for instructions on how to manage relay and relay security.

true

When set to true, the continuum’s keychain is used to authenticate client connections.

false

When set to false, no connection security protocol is implemented; any client can connect to the relay server. However, this does not create a significant security risk as clients still need to provide additional security information, such as internal UUIDs and storage hub namerkeys before they are allowed to participate in a relay. Without this critical information, which can only be obtained from authorized Cosnim instances, unauthenticated users cannot effectively use relays. Nevertheless, authentication security should be implemented when relays are accessed through the Internet or untrusted networks.

Sync Profiles

sync_profiles

Defines the profiles used by the cosnim sync|backup|restore commands, which in turn defines many of the commands’ parameters, avoiding the need to supply them to the commands.

There are three default profiles: ‘sync’, ‘backup’ and ‘restore’, which automatically match the cosnim command used. Other profiles may be created and selected with the ‘--profile’ option of the cosnim command.

Synopsis

sync_profiles:
  PROFILE_NAME:
    override:
      ...CONFIGURATION_OPTIONS...
      ⋮
    drift_ns: NANOSECONDS
    delete_pacing: DURATION
    timetravel_tag: STRING
    filters:
      - include: PATTERN
        exclude: PATTERN
        ignore: PATTERN
      ⋮
    paths:
      - source: SYNC_PATH
        destination: SYNC_PATH
      ⋮

Options

PROFILE_NAME

Name of the sync profile. It can be ‘sync’, ‘backup’ or ‘restore’, or any other name of your choosing as selected by the --profile option of the cosnim command.

override:

Overrides other configuration options when this profile is selected. This can be used to customize the operation of the continuum while sharing a standard configuration with other uses, such as mounting live filesystems. The most frequent option overridden during syncs is the filesystems pacing value. For example:

::
sync_profiles:
backup:
overrides:
filesystems:
root:

pacing: 15 s

The above reconfigures the filesystem to pace commits every 15 seconds when running a cosnim backup command. Increasing the pacing value during backups is highly recommended as backup operations don’t need continuous Time-Travel - the objective of Time-Travel points during backups is to capture the state of the continuum at the end of the backup, not intermediary points. A modest pacing value, for example, at every 15 to 60 seconds, instead of a very high pacing value is still helpful as this accelerates resumed backups if they are ever interrupted.

drift_ns: NANOSECONDS

Determines how much file modifications are allowed to drift before being considered unequal. This helps to compensate for different filesystem time resolutions. See the --drift-ns option of the cosnim sync|backup|restore command for more information about the effects of this parameter.

If this option is specified both in the profile configuration and on the cosnim command options, the latter has precedence.

delete_pacing: DURATION

Delays file and directory deletions in the destination for an amount of time. This helps to smooth out events where applications continuously delete and recreate the same files and directories as a way of updating them atomically. Delete pacing forces Cosnim to wait a little bit before considering the file or directory as effectively deleted. This helps Time-Travel to reflect the actual net event (update) instead of reporting repeating deletions & re-creations.

Delete pacing may impact the RPO during small and frequent backups as it may force Cosnim to wait a little longer when deleting files.

The default and recommended delete_pacing value is 500 ms.

timetravel_tag: STRING

Customizes the Time-Travel tag that is created when a sync or equivalent operation is completed. Time-Travel tags are sometimes referred to as “snapshots” as they identify by name a particular Time-Travel point. However, whether or not Time-Travel points are tags does not affect how Time-Travel functions.

The Time-Travel tag parameter is suffixed with the current time and the unique Time-Travel identifier to produce a full tag name. For example, if timetravel_tag=’Sync’, the following Time-Travel tag name would be created after a cosnim sync command is completed:

By default, the timetravel_tag is named after the cosnim command run. It can also be overridden with the --timetravel-tag option of the cosnim sync|backup|restore command, which takes precedence over the profile’s value.

filters: [ include | exclude | ignore : PATTERN, ... ]

Enumerates the list of filters that determine which files and directories are synchronized, backed up or restored. The filters are tested one by one in the order they appear in this section. The first pattern that matches the current file/directory is used. There are three types of filters:

include

Identifies files and directories that should be synchronized.

exclude

Identifies files and directories that should not be synchronized. If such files or directories appear in the destination, they are removed to keep the destination in sync with the source.

ignore

Identifies files and directories that should be ignored completely, both on the source and destination. Those files or directories are never synchronized nor removed from the destination if they don’t exist on the source.

Patterns are used to determine if a given file or directory matches. The following rules apply:

  • A pattern with no path delimiter (/) applies to any directory or subdirectory. For example, the pattern ‘tempfile’ would match any file named ‘tempfile’.

  • A pattern that contains a path delimiter is relative to the source. For example, if a backup is run on the source filesystem directory ‘/data/mydata/’ and the filter pattern is ‘mydir/myfile’, this would match the file ‘/data/mydata/mydir/myfile’ on the local filesystem.

  • A ‘?’ matches any single character. For example, the pattern ‘mydir/m?file’ would match ‘mydir/myfile’ and ‘mydir/mofile’, but not ‘mydir/mfile’.

  • A ‘*’ matches any number of characters, including no characters, within a given directory level. For example, the pattern ‘*/myfile’ would match ‘mydocs/myfile’ and ‘mydir/myfile’, but not mydir/mydocs/myfile.

  • A ‘**’ matches any number of characters, including no characters, in any number of directory levels. The pattern can be used at a path’s beginning, middle and/or end. For example, the pattern ‘**/tempdir/**’ would match any directory named ‘tempdir’ at any level, including all subdirectories.

Example:

sync_profiles:
  backup:
    filters:
      - exclude '*.swp'             # Excludes all files that end with .swp
      - exclude '.TemporaryItems'   # Excludes all files named '.TemporaryItems'
      - exclude '**/cache/*.tmp'    # Excludes all files that end with .tmp that are under a directory named 'cache'.
      - ignore '.DS_Store'          # Ignore MacOS finder settings files
paths: [ source: SOURCE_PATH, destination: DEST_PATH, ... ]

Specifies one or more pairs of source and destination paths that the sync, backup, or restore command will synchronize. For each pair, one path must be on a local filesystem and the other must be in the continuum. Local paths must start with a ‘./’ (current directory) or ‘/’ (absolute path). Paths on the continuum must begin with ‘cosnim:/’.

When running the cosnim backup command, all source paths must be on a local filesystem, and destination paths must be on a continuum. When running the cosnim restore command, all paths must be in the opposite direction. There is no restriction as to the direction when running the cosnim sync command.

Examples
sync_profiles:
  backup:
    paths:
      - source: /home/johndoe
        destination: cosnim:/Backups/user/johndoe
      - source: /opt/mysoft
        destination: cosnim:/Backups/mysoft