External Disks for Storing Data
Data, processed in ClickHouse, is usually stored in the local file system — on the same machine with the ClickHouse server. That requires large-capacity disks, which can be expensive enough. To avoid that you can store the data remotely. Various storages are supported:
- Amazon S3 object storage.
- Azure Blob Storage.
- Unsupported: The Hadoop Distributed File System (HDFS)
MergeTree family or Log family tables.- to work with data stored on Amazon S3disks, use S3 table engine.
- to work with data stored in Azure Blob Storage use AzureBlobStorage table engine.
- Unsupported: to work with data in the Hadoop Distributed File System — HDFS table engine.
Configuring external storage
MergeTree and Log family table engines can store data to S3, AzureBlobStorage, HDFS (unsupported) using a disk with types s3, azure_blob_storage, hdfs (unsupported) accordingly.
Disk configuration requires:
- typesection, equal to one of- s3,- azure_blob_storage,- hdfs(unsupported),- local_blob_storage,- web.
- Configuration of a specific external storage type.
Starting from 24.1 clickhouse version, it is possible to use a new configuration option. It requires to specify:
- typeequal to- object_storage
- object_storage_type, equal to one of- s3,- azure_blob_storage(or just- azurefrom- 24.3),- hdfs(unsupported),- local_blob_storage(or just- localfrom- 24.3),- web. Optionally,- metadata_typecan be specified (it is equal to- localby default), but it can also be set to- plain,- weband, starting from- 24.4,- plain_rewritable. Usage of- plainmetadata type is described in plain storage section,- webmetadata type can be used only with- webobject storage type,- localmetadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).
E.g. configuration option
is equal to configuration (from 24.1):
Configuration
is equal to
Example of full storage configuration will look like:
Starting with 24.1 clickhouse version, it can also look like:
In order to make a specific kind of storage a default option for all MergeTree tables add the following section to configuration file:
If you want to configure a specific storage policy only to specific table, you can define it in settings while creating the table:
You can also use disk instead of storage_policy. In this case it is not requires to have storage_policy section in configuration file, only disk section would be enough.
Dynamic Configuration
There is also a possibility to specify storage configuration without a predefined disk in configuration in a configuration file, but can be configured in the CREATE/ATTACH query settings.
The following example query builds on the above dynamic disk configuration and shows how to use a local disk to cache data from a table stored at a URL.
The example below adds cache to external storage.
In the settings highlighted below notice that the disk of type=web is nested within
the disk of type=cache.
The example uses type=web, but any disk type can be configured as dynamic, even Local disk. Local disks require a path argument to be inside the server config parameter custom_local_disks_base_directory, which has no default, so set that also when using local disk.
A combination of config-based configuration and sql-defined configuration is also possible:
where web is a from a server configuration file:
Using S3 Storage
Required parameters:
- endpoint— S3 endpoint URL in- pathor- virtual hostedstyles. Endpoint URL should contain a bucket and root path to store data.
- access_key_id— S3 access key id.
- secret_access_key— S3 secret access key.
Optional parameters:
- region— S3 region name.
- support_batch_delete— This controls the check to see if batch deletes are supported. Set this to- falsewhen using Google Cloud Storage (GCS) as GCS does not support batch deletes and preventing the checks will prevent error messages in the logs.
- use_environment_credentials— Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is- false.
- use_insecure_imds_request— If set to- true, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is- false.
- expiration_window_seconds— Grace period for checking if expiration-based credentials have expired. Optional, default value is- 120.
- proxy— Proxy configuration for S3 endpoint. Each- urielement inside- proxyblock should contain a proxy URL.
- connect_timeout_ms— Socket connect timeout in milliseconds. Default value is- 10 seconds.
- request_timeout_ms— Request timeout in milliseconds. Default value is- 5 seconds.
- retry_attempts— Number of retry attempts in case of failed request. Default value is- 10.
- single_read_retries— Number of retry attempts in case of connection drop during read. Default value is- 4.
- min_bytes_for_seek— Minimal number of bytes to use seek operation instead of sequential read. Default value is- 1 Mb.
- metadata_path— Path on local FS to store metadata files for S3. Default value is- /var/lib/clickhouse/disks/<disk_name>/.
- skip_access_check— If true, disk access checks will not be performed on disk start-up. Default value is- false.
- header— Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
- server_side_encryption_customer_key_base64— If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
- server_side_encryption_kms_key_id- If specified, required headers for accessing S3 objects with SSE-KMS encryption will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
- server_side_encryption_kms_encryption_context- If specified alongside- server_side_encryption_kms_key_id, the given encryption context header for SSE-KMS will be set. Optional.
- server_side_encryption_kms_bucket_key_enabled- If specified alongside- server_side_encryption_kms_key_id, the header to enable S3 bucket keys for SSE-KMS will be set. Optional, can be- trueor- false, defaults to nothing (matches the bucket-level setting).
- s3_max_put_rps— Maximum PUT requests per second rate before throttling. Default value is- 0(unlimited).
- s3_max_put_burst— Max number of requests that can be issued simultaneously before hitting request per second limit. By default (- 0value) equals to- s3_max_put_rps.
- s3_max_get_rps— Maximum GET requests per second rate before throttling. Default value is- 0(unlimited).
- s3_max_get_burst— Max number of requests that can be issued simultaneously before hitting request per second limit. By default (- 0value) equals to- s3_max_get_rps.
- read_resource— Resource name to be used for scheduling of read requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- write_resource— Resource name to be used for scheduling of write requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- key_template— Define the format with which the object keys are generated. By default, Clickhouse takes- root pathfrom- endpointoption and adds random generated suffix. That suffix is a dir with 3 random symbols and a file name with 29 random symbols. With that option you have a full control how to the object keys are generated. Some usage scenarios require having random symbols in the prefix or in the middle of object key. For example:- [a-z]{3}-prefix-random/constant-part/random-middle-[a-z]{3}/random-suffix-[a-z]{29}. The value is parsed with- re2. Only some subset of the syntax is supported. Check if your preferred format is supported before using that option. Disk isn't initialized if clickhouse is unable to generate a key by the value of- key_template. It requires enabled feature flag storage_metadata_write_full_object_key. It forbids declaring the- root pathin- endpointoption. It requires definition of the option- key_compatibility_prefix.
- key_compatibility_prefix— That option is required when option- key_templateis in use. In order to be able to read the objects keys which were stored in the metadata files with the metadata version lower that- VERSION_FULL_OBJECT_KEY, the previous- root pathfrom the- endpointoption should be set here.
Google Cloud Storage (GCS) is also supported using the type s3. See GCS backed MergeTree.
Using Plain Storage
In 22.10 a new disk type s3_plain was introduced, which provides a write-once storage. Configuration parameters are the same as for s3 disk type.
Unlike s3 disk type, it stores data as is, e.g. instead of randomly-generated blob names, it uses normal file names (the same way as clickhouse stores files on local disk) and does not store any metadata locally, e.g. it is derived from data on s3.
This disk type allows to keep a static version of the table, as it does not allow executing merges on the existing data and does not allow inserting of new data.
A use case for this disk type is to create backups on it, which can be done via BACKUP TABLE data TO Disk('plain_disk_name', 'backup_name'). Afterwards you can do RESTORE TABLE data AS data_restored FROM Disk('plain_disk_name', 'backup_name') or using ATTACH TABLE data (...) ENGINE = MergeTree() SETTINGS disk = 'plain_disk_name'.
Configuration:
Starting from 24.1 it is possible configure any object storage disk (s3, azure, hdfs (unsupported), local) using plain metadata type.
Configuration:
Using S3 Plain Rewritable Storage
A new disk type s3_plain_rewritable was introduced in 24.4.
Similar to the s3_plain disk type, it does not require additional storage for metadata files; instead, metadata is stored in S3.
Unlike s3_plain disk type, s3_plain_rewritable allows executing merges and supports INSERT operations.
Mutations and replication of tables are not supported.
A use case for this disk type are non-replicated MergeTree tables. Although the s3 disk type is suitable for non-replicated
MergeTree tables, you may opt for the s3_plain_rewritable disk type if you do not require local metadata for the table and are
willing to accept a limited set of operations. This could be useful, for example, for system tables.
Configuration:
is equal to
Starting from 24.5 it is possible configure any object storage disk (s3, azure, local) using plain_rewritable metadata type.
Using Azure Blob Storage
MergeTree family table engines can store data to Azure Blob Storage using a disk with type azure_blob_storage.
Configuration markup:
Connection parameters:
- storage_account_url- Required, Azure Blob Storage account URL, like- http://account.blob.core.windows.netor- http://azurite1:10000/devstoreaccount1.
- container_name- Target container name, defaults to- default-container.
- container_already_exists- If set to- false, a new container- container_nameis created in the storage account, if set to- true, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container- container_nameexists, and creates it if it doesn't exist yet.
Authentication parameters (the disk will try all available methods and Managed Identity Credential):
- connection_string- For authentication using a connection string.
- account_nameand- account_key- For authentication using Shared Key.
Limit parameters (mainly for internal usage):
- s3_max_single_part_upload_size- Limits the size of a single block upload to Blob Storage.
- min_bytes_for_seek- Limits the size of a seekable region.
- max_single_read_retries- Limits the number of attempts to read a chunk of data from Blob Storage.
- max_single_download_retries- Limits the number of attempts to download a readable buffer from Blob Storage.
- thread_pool_size- Limits the number of threads with which- IDiskRemoteis instantiated.
- s3_max_inflight_parts_for_one_file- Limits the number of put requests that can be run concurrently for one object.
Other parameters:
- metadata_path- Path on local FS to store metadata files for Blob Storage. Default value is- /var/lib/clickhouse/disks/<disk_name>/.
- skip_access_check- If true, disk access checks will not be performed on disk start-up. Default value is- false.
- read_resource— Resource name to be used for scheduling of read requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- write_resource— Resource name to be used for scheduling of write requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- metadata_keep_free_space_bytes- the amount of free metadata disk space to be reserved.
Examples of working configurations can be found in integration tests directory (see e.g. test_merge_tree_azure_blob_storage or test_azure_blob_storage_zero_copy_replication).
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
Using HDFS storage (Unsupported)
In this sample configuration:
- the disk is of type hdfs(unsupported)
- the data is hosted at hdfs://hdfs1:9000/clickhouse/
By the way, HDFS is unsupported and therefore there might be issues when using it. Feel free to make a pull request with the fix if any issue arises.
Keep in mind that HDFS may not work in corner cases.
Using Data Encryption
You can encrypt the data stored on S3, or HDFS (unsupported) external disks, or on a local disk. To turn on the encryption mode, in the configuration file you must define a disk with the type encrypted and choose a disk on which the data will be saved. An encrypted disk ciphers all written files on the fly, and when you read files from an encrypted disk it deciphers them automatically. So you can work with an encrypted disk like with a normal one.
Example of disk configuration:
For example, when ClickHouse writes data from some table to a file store/all_1_1_0/data.bin to disk1, then in fact this file will be written to the physical disk along the path /path1/store/all_1_1_0/data.bin.
When writing the same file to disk2, it will actually be written to the physical disk at the path /path1/path2/store/all_1_1_0/data.bin in encrypted mode.
Required parameters:
- type—- encrypted. Otherwise the encrypted disk is not created.
- disk— Type of disk for data storage.
- key— The key for encryption and decryption. Type: Uint64. You can use- key_hexparameter to encode the key in hexadecimal form. You can specify multiple keys using the- idattribute (see example below).
Optional parameters:
- path— Path to the location on the disk where the data will be saved. If not specified, the data will be saved in the root directory.
- current_key_id— The key used for encryption. All the specified keys can be used for decryption, and you can always switch to another key while maintaining access to previously encrypted data.
- algorithm— Algorithm for encryption. Possible values:- AES_128_CTR,- AES_192_CTRor- AES_256_CTR. Default value:- AES_128_CTR. The key length depends on the algorithm:- AES_128_CTR— 16 bytes,- AES_192_CTR— 24 bytes,- AES_256_CTR— 32 bytes.
Example of disk configuration:
Using local cache
It is possible to configure local cache over disks in storage configuration starting from version 22.3.
For versions 22.3 - 22.7 cache is supported only for s3 disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc.
For versions >= 23.5 cache is supported only for remote disk types: S3, Azure, HDFS (unsupported).
Cache uses LRU cache policy.
Example of configuration for versions later or equal to 22.8:
Example of configuration for versions earlier than 22.8:
File Cache disk configuration settings:
These settings should be defined in the disk configuration section.
- 
path- path to the directory with cache. Default: None, this setting is obligatory.
- 
max_size- maximum size of the cache in bytes or in readable format, e.g.ki, Mi, Gi, etc, example10Gi(such format works starting from22.10version). When the limit is reached, cache files are evicted according to the cache eviction policy. Default: None, this setting is obligatory.
- 
cache_on_write_operations- allow to turn onwrite-throughcache (caching data on any write operations:INSERTqueries, background merges). Default:false. Thewrite-throughcache can be disabled per query using settingenable_filesystem_cache_on_write_operations(data is cached only if both cache config settings and corresponding query setting are enabled).
- 
enable_filesystem_query_cache_limit- allow to limit the size of cache which is downloaded within each query (depends on user settingmax_query_cache_size). Default:false.
- 
enable_cache_hits_threshold- number which defines how many times some data needs to be read before it will be cached. Default:false. This threshold can be defined bycache_hits_threshold. Default:0, e.g. the data is cached at the first attempt to read it.
- 
enable_bypass_cache_with_threshold- allows to skip cache completely in case the requested read range exceeds the threshold. Default:false. This threshold can be defined bybypass_cache_threashold. Default:268435456(256Mi).
- 
max_file_segment_size- a maximum size of a single cache file in bytes or in readable format (ki, Mi, Gi, etc, example10Gi). Default:8388608(8Mi).
- 
max_elements- a limit for a number of cache files. Default:10000000.
- 
load_metadata_threads- number of threads being used to load cache metadata on starting time. Default:16.
File Cache query/profile settings:
Some of these settings will disable cache features per query/profile that are enabled by default or in disk configuration settings. For example, you can enable cache in disk configuration and disable it per query/profile setting enable_filesystem_cache to false. Also setting cache_on_write_operations to true in disk configuration means that "write-though" cache is enabled. But if you need to disable this general setting per specific queries then setting enable_filesystem_cache_on_write_operations to false means that write operations cache will be disabled for a specific query/profile.
- 
enable_filesystem_cache- allows to disable cache per query even if storage policy was configured withcachedisk type. Default:true.
- 
read_from_filesystem_cache_if_exists_otherwise_bypass_cache- allows to use cache in query only if it already exists, otherwise query data will not be written to local cache storage. Default:false.
- 
enable_filesystem_cache_on_write_operations- turn onwrite-throughcache. This setting works only if settingcache_on_write_operationsin cache configuration is turned on. Default:false. Cloud default value:true.
- 
enable_filesystem_cache_log- turn on logging tosystem.filesystem_cache_logtable. Gives a detailed view of cache usage per query. It can be turn on for specific queries or enabled in a profile. Default:false.
- 
max_query_cache_size- a limit for the cache size, which can be written to local cache storage. Requires enabledenable_filesystem_query_cache_limitin cache configuration. Default:false.
- 
skip_download_if_exceeds_query_cache- allows to change the behaviour of settingmax_query_cache_size. Default:true. If this setting is turned on and cache download limit during query was reached, no more cache will be downloaded to cache storage. If this setting is turned off and cache download limit during query was reached, cache will still be written by cost of evicting previously downloaded (within current query) data, e.g. second behaviour allows to preservelast recently usedbehaviour while keeping query cache limit.
Warning Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported.
Cache system tables:
- 
system.filesystem_cache- system tables which shows current state of cache.
- 
system.filesystem_cache_log- system table which shows detailed cache usage per query. Requiresenable_filesystem_cache_logsetting to betrue.
Cache commands:
- 
SYSTEM DROP FILESYSTEM CACHE (<cache_name>) (ON CLUSTER)--ON CLUSTERis only supported when no<cache_name>is provided
- 
SHOW FILESYSTEM CACHES-- show list of filesystem caches which were configured on the server. (For versions less than or equal to22.8the command is namedSHOW CACHES)
Result:
- DESCRIBE FILESYSTEM CACHE '<cache_name>'- show cache configuration and some general statistics for a specific cache. Cache name can be taken from- SHOW FILESYSTEM CACHEScommand. (For versions less than or equal to- 22.8the command is named- DESCRIBE CACHE)
Cache current metrics:
- 
FilesystemCacheSize
- 
FilesystemCacheElements
Cache asynchronous metrics:
- 
FilesystemCacheBytes
- 
FilesystemCacheFiles
Cache profile events:
- 
CachedReadBufferReadFromSourceBytes,CachedReadBufferReadFromCacheBytes,
- 
CachedReadBufferReadFromSourceMicroseconds,CachedReadBufferReadFromCacheMicroseconds
- 
CachedReadBufferCacheWriteBytes,CachedReadBufferCacheWriteMicroseconds
- 
CachedWriteBufferCacheWriteBytes,CachedWriteBufferCacheWriteMicroseconds
Using static Web storage (read-only)
This is a read-only disk. Its data is only read and never modified. A new table is loaded to this disk via ATTACH TABLE query (see example below). Local disk is not actually used, each SELECT query will result in a http request to fetch required data. All modification of the table data will result in an exception, i.e. the following types of queries are not allowed: CREATE TABLE, ALTER TABLE, RENAME TABLE, DETACH TABLE and TRUNCATE TABLE.
Web storage can be used for read-only purposes. An example use is for hosting sample data, or for migrating data.
There is a tool clickhouse-static-files-uploader, which prepares a data directory for a given table (SELECT data_paths FROM system.tables WHERE name = 'table_name'). For each table you need, you get a directory of files. These files can be uploaded to, for example, a web server with static files. After this preparation, you can load this table into any ClickHouse server via DiskWeb.
In this sample configuration:
- the disk is of type web
- the data is hosted at http://nginx:80/test1/
- a cache on local storage is used
Storage can also be configured temporarily within a query, if a web dataset is not expected to be used routinely, see dynamic configuration and skip editing the configuration file.
A demo dataset is hosted in GitHub. To prepare your own tables for web storage see the tool clickhouse-static-files-uploader
In this ATTACH TABLE query the UUID provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.
A ready test case. You need to add this configuration to config:
And then execute this query:
Required parameters:
- type—- web. Otherwise the disk is not created.
- endpoint— The endpoint URL in- pathformat. Endpoint URL must contain a root path to store data, where they were uploaded.
Optional parameters:
- min_bytes_for_seek— The minimal number of bytes to use seek operation instead of sequential read. Default value:- 1Mb.
- remote_fs_read_backoff_threashold— The maximum wait time when trying to read data for remote disk. Default value:- 10000seconds.
- remote_fs_read_backoff_max_tries— The maximum number of attempts to read with backoff. Default value:- 5.
If a query fails with an exception DB:Exception Unreachable URL, then you can try to adjust the settings: http_connection_timeout, http_receive_timeout, keep_alive_timeout.
To get files for upload run:
clickhouse static-files-disk-uploader --metadata-path <path> --output-dir <dir> (--metadata-path can be found in query SELECT data_paths FROM system.tables WHERE name = 'table_name').
When loading files by endpoint, they must be loaded into <endpoint>/store/ path, but config must contain only endpoint.
If URL is not reachable on disk load when the server is starting up tables, then all errors are caught. If in this case there were errors, tables can be reloaded (become visible) via DETACH TABLE table_name -> ATTACH TABLE table_name. If metadata was successfully loaded at server startup, then tables are available straight away.
Use http_max_single_read_retries setting to limit the maximum number of retries during a single HTTP read.
Zero-copy Replication (not ready for production)
Zero-copy replication is possible, but not recommended, with  S3 and HDFS (unsupported) disks. Zero-copy replication means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
