Configure HybridCache
Enabling hybrid cacheβ
You can configure the Navy engine (flash cache engine when running in the HybridCache mode) through the NvmCache::Config::navyConfig
in the Cache config by using APIs provided in navy::NavyConfig
, for example:
Simple hybrid cache setup
#include "cachelib/allocator/CacheAllocator.h"
LruAllocator::Config lruConfig;
LruAllocator::NvmCacheConfig nvmConfig;
nvmConfig.navyConfig.setBlockSize(4096);
nvmConfig.navyConfig.setSimpleFile(FLAGS_navy_file_name,
FLAGS_dipper_device_size_mb * 1024 *1024 /*fileSize*/,
false /*truncateFile*/);
nvmConfig.navyConfig.blockCache().setRegionSize(16 * 1024 * 1024);
lruConfig.enableNvmCache(nvmConfig);
All settings are optional, unless marked as "Required".
How to set Navy settingsβ
1. Common Settings - Deviceβ
navyConfig.setSimpleFile(fileName, fileSize, truncateFile /*optional*/);
navyConfig.setBlockSize(blockSize);
navyConfig.setDeviceMetadataSize(deviceMetadataSize);
navyConfig.setDeviceMaxWriteSize(deviceMaxWriteSize);
OR
navyConfig.setRaidFiles(raidPaths, fileSize, truncateFile /*optional*/);
navyConfig.setBlockSize(blockSize);
navyConfig.setDeviceMetadataSize(deviceMetadataSize);
navyConfig.setDeviceMaxWriteSize(deviceMaxWriteSize);
(Required)
file name
/RAID paths
file name
(for simple file): File/device path with cache.RAID paths
(for RAID files): Multiple files/devices to be used as a single cache. Note they must be identical in size.
(Required)
file size
Size (in bytes) of a single file/device with cache.
truncate file
=false
(default)Default is
false
. If it istrue
, doftruncate
on the file to the requested size.block size
=4096
(default)Device block size in bytes (minimum IO granularity).
device metadata size
=0
(default)The size of the metadata partition on the Navy device.
device max write size
=1024 * 1024
(default)This controls whatβs the biggest IO we can write to a device. After it is configured, any IO size above it will be split and issued sequentially.
Optionally, when NavyRequestScheduler
is used, the queue depth and IO engine of Device
layer can be adjusted.
navyConfig.enableAsyncIo(qdepth, enableIoUring);
(Required)
qdepth
=0
(default)Determines the qdepth of async IO queue used by each Navy thread. By default, this is set automatically as
<max num reads> / <reader threads>
and<max num writes> / <writer threads>
for reader and writer threads, respectively.(Required)
enableIoUring
=true
(default)Select Io engine between io_uring and libaio. See Architecture Guide - Device for more details.
Optionally, to enable Flexible Data Placement (FDP) support in Device
layer of Navy.
navyConfig.setEnableFDP(enableFDP);
enableFDP
=true
When set to
true
, FDP is enabled and the BigHash and BlockCache device writes get segregated within the SSD. For more details, refer FDP enabled Cache.
2. Common Settings - Job Schedulerβ
Two types of Job scheduler are supported (see Architecture Guide - Navy overview). Common settings are as follows.
navyConfig.setNavyReqOrderingShards(navyReqOrderingShards);
request ordering shards
=20
(default)If it is non-zero, we will enable request ordering where we put requests into 2N shards and ensure each shard executes requests in order.
Parameters specific to each job scheduler are as follows.
OrderedThreadPoolSchedulerβ
navyConfig.setReaderAndWriterThreads(readerThreads, writerThreads);
(Required)
reader threads
=32
(default)Number of threads available for processing read (lookup) requests.
(Required)
writer threads
=32
(default)Number of threads available for processing write (insert and delete) requests.
NavyRequestSchedulerβ
navyConfig.setReaderAndWriterThreads(readerThreads, writerThreads, maxNumReads, maxNumWrites, stackSizeKB);
(Required)
reader threads
Number of threads available for processing read requests. The recommended value is 4.
(Required)
writer threads
Number of threads available for processing write (insert and delete) requests. The recommended value is 4.
(Required)
max num reads
The maximum number of read requests that can be concurrently executed. The suggested value is 64 or 128 depending on application. If the value is larger, throughput can be increased. However, latency could be also increased. This should be integer multiple of the number of reader threads.
(Required)
max num writes
The maximum number of write requests that can be concurrently executed. The suggested value is 32 or 64 depending on application. If the value is larger, throughput can be increased. However, latency could be also increased. This should be integer multiple of the number of writer threads.
(Optional)
stack size in KB
=64
(default)This determines the size of the fiber stack. Navy allocates the stack of total size
(<readerThreads> + <writerThreads> + <reclaimThreads>) x 100 x <stack size>
at minimum. The default should work for most of the cases, but can be decreased (e.g., 16KB) if the memory footprint matters. Be aware that this should be enough to run all callbacks (e.g., destructor callback) that can be executed by Navy.
3. Common Settings - Otherβ
navyConfig.setMaxConcurrentInserts(maxConcurrentInserts);
navyConfig.setMaxParcelMemoryMB(maxParcelMemoryMB);
max concurrent inserts
=1'000'000
(default)This controls how many insertions can happen in parallel. This is an effective way to avoid too many insertions backing up that drives up the write latency (it can happen if the use case is too heavy on writes).
max parcel memory
=256(MB)
(default)Total memory limit for in-flight parcels. Once this is reached, requests will be rejected until the parcel memory usage gets under the limit.
4. Admission Policy Settingsβ
There are 2 types of admission policy: "random" and "dynamic_random". Users can choose one of them to enable.
"random" policy
navyConfig.enableRandomAdmPolicy()
.setAdmProbability(admissionProbability);(Required)
admission probability
Acceptance probability. The value has to be in the range of [0, 1].
"dynamic_random" policy
navyConfig.enableDynamicRandomAdmPolicy()
.setAdmWriteRate(admissionWriteRate)
.setMaxWriteRate(maxWriteRate)
.setAdmSuffixLength(admissionSuffixLen)
.setAdmProbBaseSize(admissionProbBaseSize);(Required)
admission write rate
(bytes/s)Average per day write rate to target. Default to be 0 if not being explicitly set, meaning no rate limiting.
max write rate
=0 (bytes/s)
(default)The max write rate to device in bytes/s to stay within the device limit of saturation to avoid latency increase. This ensures writing at any given second doesn't exceed this limit despite a possibility of writing more to stay within the target rate above.
admission suffix length
=0
(default)Length of suffix in key to be ignored when hashing for probability.
admission base size
=0
(default)Navy item base size of baseProbability calculation. Set this closer to the mean size of objects. The probability is scaled for other sizes by using this size as the pivot.
5. Engine Settings - Block Cacheβ
navyConfig.blockCache()
.enableSegmentedFifo(sFifoSegmentRatio)
.enableHitsBasedReinsertion(hitsThreshold)
.setCleanRegions(cleanRegions, cleanRegionThreads)
.setRegionSize(regionSize)
.setDataChecksum(false);
eviction policy (choose one of the followings):
LRU: default policy
FIFO: once enabled, LRU will be disabled.
navyConfig.blockCache().enableFifo();
segmented FIFO: once enabled, LRU and FIFO will be disabled.
// sFifoSegmentRatio maps to segments in the order from least-important to most-important.
// e.g. {1, 1, 1} gives equal share in each of the 3 segments;
// {1, 2, 3} gives the 1/6th of the items in the first segment (P0 least important),
// 2/6th of the items in the second segment(P1),
// and finally 3/6th of the items in the third segment (P2).
navyConfig.blockCache().enableSegmentedFifo(sFifoSegmentRatio);
reinsertion policy (choose one of the followings but not both):
hits based
If this is enabled, we will reinsert item that had been accessed more than the threshold since the last time it was written into block cache. This can better approximate a LRU than the region-based LRU. Typically users configure this with a region-granularity FIFO policy, or SFIFO policy. It cannot be enabled when percentage based reinsertion policy has been enabled.
navyConfig.blockCache().enableHitsBasedReinsertion(hitsThreshold);
percentage based
This is used for testing where a certain fraction of evicted items(governed by the percentage) are always reinserted.The percentage value is between 0 and 100 for reinsertion. It cannot be enabled when hits based reinsertion policy has been enabled.
navyConfig.blockCache().enablePctBasedReinsertion(pctThreshold);
clean regions
andin-memory buffer
clean regions
=1
(default)How many regions do we reserve for future writes. Set this to be equivalent to your per-second write rate. It should ensure your writes will not have to retry to wait for a region reclamation to finish.
in-memory buffer
=2 * clean regions
(default)All writes will first go into a region-sized buffer. Once the buffer is full, we will flush the region to the device. This allows BlockCache to internally pack items closer to each other (saves space) and also improves device read latency (regular sized write IOs means better read performance).
clean region threads
=1
(default)How many threads to use for the reclaims and flush operations.
1
is enough for most of use cases.
navyConfig.blockCache().setCleanRegions(cleanRegions, cleanRegionThreads);
region size
=16777216 (16 Mb)
(default)Region size in bytes.
data checksum
=true
(default)This controls whether or not BlockCache will verify the itemβs value is correct (equivalent to its checksum). This should always be enabled, unless youβre doing your own checksum logic at a higher layer.
6. Engine Settings - BigHashβ
navyConfig.bigHash()
.setSizePctAndMaxItemSize(bigHashSizePct, bigHashSmallItemMaxSize)
.setBucketSize(bigHashBucketSize)
.setBucketBfSize(bigHashBucketBfSize);
(Required)
size percentage
Percentage of space to reserve for BigHash. Set the percentage > 0 to enable BigHash. The remaining part is for BlockCache. The value has to be in the range of [0, 100]. Default value is 0.
(Required)
small item max size
(bytes)Maximum size of a small item to be stored in BigHash. Must be less than the bucket size.
bucket size
=4096 (bytes)
(default)Bucket size in bytes.
bucket bloom filter size
=8
(default)Bloom filter, bytes per bucket. Must be power of two. 0 means bloom filter will not be applied
NavyConfig Data Outputβ
NavyConfig
provides a public function serialize()
so that users can call to print out the configured Navy settings, e.g.
XLOG(INFO) << "Using the following navy config"
<< folly::toPrettyJson(
folly::toDynamic(navyConfig.serialize()));
Admissionβ
HybridCache can leverage an admission policy to control burn rate of the underlying nvm devices (e.g. SSD drives). Using a suitable admission policy for your workloads can often not only improve device longevitiy but also improve the cache hit rate. You can configure an admission policy like the following example:
auto policy = std::make_shared<cachelib::RejectFirstAP<LruAllocator>>(/* ... */);
cacheConfig.setNvmCacheAdmissionPolicy(std::move(policy));
Random rejectβ
This policy just rejects P%
of inserts, picking victims randomly. It lets user to reduce IOPS (and so increase flash life time).
Reject firstβ
This policy helps if flash cache contains lots of inserted and never accessed items. It maintains a running window (sketch) of keys that were accessed. If a key is inserted for the first time, the policy rejects it. Second inserts get into the cache. A sketch consists of several splits. As times goes, old splits are discarded. With larger split, rejection gets more accurate (less false accepts).
Dynamic reject (or rate throttle)β
This is a smart random reject policy. Users specify the maximum size of data that can be written to the device per day. Policy monitors write traffic and as it grows beyond the target (how much can be written up to this time of the day) it starts randomly reject inserts. It prefers to reject larger items to make hit ratio better. This behavior is tunable to allow users to control flash's wearing out.
ML-based admission policyβ
CacheLib also supports using ML based admission policy to make intelligent decision on what to admit into nvm devices. However, the use of ML policy requires careful analysis of cache workloads, and set up a training pipeline to train the model on a conintuous basis. Please try out the other admission policies first, and if you're not satisfied with them, then reach out directly to the CacheLib team to discuss using a ML-based policy.