Storage Hubs

Summary

Storage hubs determine how and where internal data elements and capsules are distributed and stored.

There are four types of hubs:

Capsule hubs, which store actual capsules in physical/cloud storage.
Split hubs, which splits capsules based on internal content and redistributes them to other hubs.
Staging hubs, which move/copy capsules in stages to provide, for example, local caching and network acceleration.
Scatter hubs, which scatter and redistribute capsules to various locations for availability and resiliency. This is where asymmetric replication is also performed.

Hub Hierarchy

Hubs are arranged in a hierarchy, with the ‘primary’ hub at the top and capsule hubs at the bottom.

The ‘primary’ hub is the one that is identified in the ‘continuum’ section of the configuration. All storage requests go first through the primary hub and down to subordinate hubs.

A typical multi-cloud environment could be configured this way. You are free to design your storage hubs’ hierarchy any way you please:

The above represents a continuum and hub configuration where:

Capsules are first split on the metadata/data boundary. This helps to keep metadata and system information in smaller capsules, which is more efficient for storage, and isolate data in separate, larger capsules so they don’t interfere when accessing metadata and system information.
Capsules then go through staging hubs, where the first stage stores capsules on local disks. Depending on the hub’s configuration, the first stage may be used only for transit, or it may keep a copy of capsules on local storage for caching.
The second stage stores and scatters capsules to cloud storage, which is the capsules’ final destination. In this configuration, capsules are scattered among three different cloud providers. The scatter hub is usually also configured for asymmetric replication. In this example, a typical configuration would be to ask the scatter hub to ensure at least two copies of capsules are stored anywhere in any of the three providers, allowing the continuum to continue functioning even if one cloud provider goes offline.

Note

Cosnim hubs all operate internally within a given Cosnim instance. These are not external services running somewhere else; they run directly within each Cosnim instance that is accessing a continuum on behalf of a given user, server, container or cloud instance.

Primary Hub

The primary storage hub is identified in the continuum’s configuration. It is usually named ‘primary’ and sits at the top of the storage hubs hierarchy. Depending on its configuration, the primary hub will either store capsules directly (very rarely) or distribute data and capsules to other hubs to provide for performance, caching, and multi-site / multi-cloud resiliency.

Recommended Configuration

The primary hub should usually be a ‘split’ storage hub, as described below. This provides optimal performance. The other hub types should be implemented at lower levels.

Split Hubs

Split hubs split and redistribute data elements to different capsules and storage hubs based on the data type. This can significantly improve performance and reduce overhead and cloud storage access costs, especially when small system and metadata information is maintained in small capsules, separate from the large data fragments which tend to produce large capsules.

Primary Hub Split Configuration

The primary hub should ideally split capsules and hubs in either a two-tier or three-tier structure as follows:

Shared continuums

For continuums that are shared with other users, the primary hub should redistribute data to three hubs: one for data, one for control, and a default hub for everything else.

Below is a sample configuration which redistributes data to three subordinate hubs named ‘control’, ‘system’ and ‘data’. These subordinate hubs can then store capsules in a given storage system, or they may be structural hubs to further scatter, replicate or cache capsules in other locations.

Refer to the configuration reference guide for a complete description of these parameters, the data types that can be split and recommendations on how to configure subordinate capsule hubs in these types of configurations:
hubs:
  primary:
    type: split
    hub_types:
      control:
        objtypes: control
      system:
        objtypes: default
      data:
        objtypes: data

Private continuums

For simpler continuums which are not intended to be shared frequently with other users, for example, when used for backups or vaulting, a two-tier setup is sufficient:
hubs:
  primary:
    type: split
    hub_types:
      system:
        objtypes: default
      data:
        objtypes: data

Staging Hubs

These hubs are used to write capsules in “stages”, first in one storage location, then another. This is done mainly for caching and, to a lesser extent, to accelerate some processes when uploading data to the cloud over slower connections.

At the moment, only two stages are supported. This is typically configured as follows:

First Stage

The first stage typically directs capsules to local storage. In most (but not all) circumstances, Cosnim will write a capsule in the first stage before moving on to the second stage. This helps Cosnim processing to proceed faster.

The first stage can be just a transit, or it can also be used as a local cache. This is controlled by the ‘keep_cached’ configuration option (see the Configuration Reference guide). When used as transit, a capsule written to the first stage is deleted as soon it is successfully stored in the second stage. When used also as caching, Cosnim keeps a copy of capsules that have gone through the first stage to improve subsequent requests.

You should always enable the cache in control and system hubs. They occupy very little space and improve performance dramatically. You are free to turn caching on or off at any time.

Second Stage

The second stage is the ultimate destination of capsules. Depending on internal processing, Cosnim may write capsules first in the first stage, then the second stage (most frequent), or it may write capsules directly to the second stage (less frequent), for example, when managing key sharing information that is time-sensitive to other users. In the latter case, if the first stage is configured for caching, Cosnim will cache the capsule back to the first stage after the second stage is complete. This special processing of the first and second stages is managed entirely by Cosnim and is transparent to the user.

Scatter Hubs

These hubs scatter, distribute and replicate capsules and their contents to multiple storage locations. This is where multi-cloud data distribution, asymmetric replication and dynamic disaster recovery operate.

Basic Concepts

The concept behind scatter hubs is simple. They receive data elements and capsules from a higher hub, for example, a staging or spit hub, and redistribute that to two or more subordinate hubs.

Contrarily to traditional technologies, there is no actual mirroring, tracking or fixed distribution model to follow. You can configure scatter hubs any way you want, using any type and mixture of subordinate hubs and storage types; for example, you can intermix cloud object storage with shared and local file storage within the same scatter hub without restriction.

Scattering Policies

There are three primary controls below that determine how capsules are scattered. Refer to the Configuration Reference for instructions on how to code these parameters:

Write Policy

The write policy determines how the scatter hub will store and distribute capsules. It can be in a round-robin fashion, at random, or according to hub priorities.

You’ll typically choose a random or round-robing policy when you don’t really care where the data gets stored. For each capsule written, Cosnim will simply pick one of the online hubs to receive new capsules. This selection process is repeated for each capsule and readjusts based on hub availability.

You’ll choose a priority policy when you want precise control over which hubs will receive capsules. You assign each underlying hub a priority, and Cosnim will prefer the online hubs with the highest priority first. This selection process is repeated for each capsule written, so if an underlying hub temporarily goes down, capsules are automatically sent to other lower-priority hubs during the outage. When configuring hubs in priority, you’ll probably want to set the ‘distribute’ configuration option below to False.

You can change the scatter policies any time you want. Different users and instances may have their own write policies without impacting other users sharing the same continuum. The only thing that needs to be consistent across users is the set of underlying hubs that are configured; all users must have access to at least a majority of the hubs so they can have minimal access to capsules.

Read Policy

The read policy dictates where Cosnim will first try to read capsules from. The current policy is ‘priority’. You then give each underlying hub a read priority, and Cosnim will prefer hubs with higher priority first over lower priority hubs.

Note that Cosnim does not blindly “try” to read a capsule from the highest priority hub first. Internal algorithms determine in advance where a given capsule is likely to be; if they determine that the capsule is in a lower-priority hub, Cosnim will read the capsule from that hub. The selection algorithm typically has a 100% success ratio in predicting which capsules and hubs can satisfy a request, so hubs with the highest priority rarely receive requests for capsules they don’t have.

Distribute

The ‘distribute’ configuration option in scatter hubs impacts how capsules are effectively distributed. When this option is ‘True’, Cosnim will attempt to redistribute capsules evenly across hubs; the write policy simply dictates from which hub it should first attempt a distribution cycle.

If you are using a priority write policy to control precisely where capsules will be stored, make sure to set this option to False.

Asymmetric Replication

Cosnim implements a unique asymmetric replication model within scatter hubs to ensure resiliency against outages.

Contrarily to traditional technologies such as mirroring, site-to-site replication, RAID and erasure coding, Cosnim’s asymmetric replication is not constrained to any particular set of hubs, storage locations or distribution strategy. Capsules are free to be scattered and replicated in any way that the current storage hubs’ availability permits, dynamically, without readjustments. You still can configure Cosnim to replicate capsules in a site-to-site model, similar to current technologies, by using priority write policies and disabling distribution, but you are not obligated to. Moreover, contrary to current technologies, even if you configure a scatter hub to function more like a traditional site-to-site replication system, in the event of an outage, asymmetric replication will still dynamically redirect replication to another available hub without recovery.

Asymmetric replication also helps when reading data. Contrary to traditional technologies, Cosnim has no fixed location to read capsules from. Any hub that has the capsule is able to serve requests, independently on when and how the capsules were originally stored or replicated. This allows continuous, optimal access to storage without the traditional limitations of site-to-site replication and resynchronizations after outages.

Replication is controlled by two key parameters in a scatter hub’s configuration (see the Configuration Reference for additional details):

min_copies

This parameter activates asymmetric replication. It instructs the hub to make sure that at least ‘min_copies’ of capsules are written somewhere, anywhere in the underlying underlying hubs.

This value essentially dictates how many hubs can go offline without affecting the availability of the data. For example, if min_copies is 2 in a 3-hub scatter configuration, this means that any one of the 3 hubs may go down without impacts.

When a scatter hub is configured for asymmetric replication, the underlying hubs selected for replication are controlled by the write policy as previously described. For example, if you use a priority write policy, capsules are always replicated in the highest priority hubs first up to ‘min_copies’; lower priority hubs are only used when there’s an outage.

min_hubs

A scatter hub’s ‘min_hubs’ parameter determines how many hubs must be available and online for the hub to be considered operational. The default and minimum value is calculated automatically by Cosnim during activation based on the current min_copies value and the number of configured hubs. For example, in a four-hub configuration with ‘min_copies’ set to 2, there needs to have at least three hubs online to be sure that any given capsule can be read. You can increase ‘min_hubs’ above this threshold if you want to ensure that a minimum set of hubs are always available.

When the number of available hubs goes below this threshold, the scatter hub enters what’s called a “fractured” state. Fractured hubs are considered unstable. Depending on the upper hubs’ configuration and dependency they have on a fractured scatter hub, the fractured state may escalate through hubs in the hierarchy, up to the continuum itself, in which case Cosnim will stop operating until the outage is resolved.

Planning for Disaster Recovery

There are many ways to plan for disaster recovery. Scatter hubs are by far the most popular mechanism by which you can quickly recover from a disaster, which is why disaster recovery planning is described here, but you should also consider local storage and staging hubs as contributors.

How Cosnim ensures resiliency

In Cosnim, there is no actual “disaster recovery”, or even “recovery” per se. The resiliency of a continuum relies solely on the network of storage hubs and the ubiquitous mobility of capsules. This is because your data and Cosnim’s entire continuum control information are held entirely in capsules, stored directly in storage hubs. As long as capsules are available somewhere, anywhere, the continuum is fully operational. The type of storage used by hubs is irrelevant, and because capsules have the same internal format wherever they are stored, capsules can be freely moved and replicated between hubs at any time.

When Cosnim needs to read data, it pulls the information it needs for one or more capsules. Internal algorithms and mesh information within other capsules determine the precise capsule(s) that can serve the request. Another algorithm then dynamically determines the most probable location of those capsules. As long as storage was not altered manually, the success ratio of these algorithms is 100%, even during outages. This mechanism is insensitive to the capsules’ physical location or their age; this is how disaster recovery is essentially avoided in Cosnim, even during outages.

Hence, in a situation that would cause other storage systems to fail and enter into disaster recovery mode or require resynchronization, in Cosnim, availability is continuously assured by this automatic internal mechanism. As long as there is a capsule, anywhere, including in local storage, that can satisfy the request, services continue uninterrupted.

Configuring Hubs for Availability

To ensure availability, all you have to do is configure hubs in such a way to ensure that the capsules will be available somewhere when they’re needed.

Scatter hubs are obviously ideal as they already replicate capsules to more than one location, specifically to ensure availability.

Staging hubs may also ensure availability when configured with caching. In fact, when staging hubs are used, Cosnim will always prioritize the first stage’s capsules before going to the second stage. Even if the second stage is down, as long as the first stage (or another hub) has the capsule, there is no outage.

Capsule Archival for Availability

Another way to ensure availability is to copy & archive the capsules that are found in storage hubs. You can do this by copying or replicating capsules you find in storage directly without even telling Cosnim. In the case of a major disaster, you can either add a new storage hub pointing to that location or simply copy capsules back to the original location. You do not need to notify Cosnim of this data movement. Its internal mechanisms dynamically detect manual changes and automatically adjust to moved, copied or deleted capsules.

Capsule Hubs & Providers

These hubs are at the bottom of a hub hierarchy. They are the ones that store and manage capsules in physical storage.

Configuration

You should examine the Configuration Reference to familiarize yourself with the different parameters. They control how capsules are assembled, compressed, encrypted and moved to physical storage. Some parameters, such as ‘max_capsule_size’, significantly impact the capsules’ and continuum efficiency. The Configuration Reference advises and recommends the best way to configure capsule hubs.

Capsule hubs leverage providers to store and retrieve capsules physically. Providers are the only component of hubs that understand physical storage and are treated in the next section.

Creation and Initialization

Capsule hubs need a minimal “initialization” before they can be used. This involves the creation of a special encrypted capsule in the provider’s storage to allow Cosnim to confirm that the storage belongs to a given continuum. This is done by one of two cosnim commands:

cosnim create continuum
cosnim expand continuum

The capsule hubs will query these capsules and use them to dynamically construct the continuum model in which your data is stored and retrieved.

Gateways

A gateway is a special type of capsule hub that leverages internal Cosnim relays to exploit remote storage without going through cloud storage providers or traditional sharing services. They are described in the subsequent sections. Gateways operate identically to capsule hubs, except that the access to storage goes through the gateway instead of a local provider.