Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1651,8 +1651,8 @@ public class ConfigOptions {
.enumType(DataLakeFormat.class)
.noDefaultValue()
.withDescription(
"The data lake format of the table specifies the tiered Lakehouse storage format. Currently, supported formats are `paimon`, `iceberg`, and `lance`. "
+ "In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. "
"The data lake format of the table specifies the tiered Lakehouse storage format. Currently, supported formats are `paimon`, `iceberg`, `hudi`, and `lance`. "
+ "In the future, more kinds of data lake format will be supported, such as DeltaLake. "
+ "Once the `table.datalake.format` property is configured, Fluss adopts the key encoding and bucketing strategy used by the corresponding data lake format. "
+ "This ensures consistency in key encoding and bucketing, enabling seamless **Union Read** functionality across Fluss and Lakehouse. "
+ "The `table.datalake.format` can be pre-defined before enabling `table.datalake.enabled`. This allows the data lake feature to be dynamically enabled on the table without requiring table recreation. "
Expand Down Expand Up @@ -2250,8 +2250,8 @@ public class ConfigOptions {
.enumType(DataLakeFormat.class)
.noDefaultValue()
.withDescription(
"The datalake format used by Fluss as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. "
+ "In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi.");
"The datalake format used by Fluss as lakehouse storage. Currently, supported formats are Paimon, Iceberg, Hudi, and Lance. "
+ "In the future, more kinds of data lake format will be supported, such as DeltaLake.");

// ------------------------------------------------------------------------
// ConfigOptions for tiering service
Expand Down
2 changes: 1 addition & 1 deletion website/docs/engine-flink/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ See more details about [ALTER TABLE ... SET](engine-flink/ddl.md#set-properties)
| table.kv.standby-replica.enabled | Boolean | (None) | Whether to enable standby replicas for primary key tables. Standby replicas maintain recent KV snapshots for fast leader promotion. Automatically set to `true` by the coordinator during table creation for new PK tables. Tables created before this option was introduced are treated as disabled. Can be dynamically enabled via `ALTER TABLE SET ('table.kv.standby-replica.enabled' = 'true')`. |
| table.log.tiered.local-segments | Integer | 2 | The number of log segments to retain in local for each table when log tiered storage is enabled. It must be greater that 0. The default is 2. |
| table.datalake.enabled | Boolean | false | Whether enable lakehouse storage for the table. Disabled by default. When this option is set to ture and the datalake tiering service is up, the table will be tiered and compacted into datalake format stored on lakehouse storage. |
| table.datalake.format | Enum | (None) | The data lake format of the table specifies the tiered Lakehouse storage format. Currently, supported formats are `paimon`, `iceberg`, and `lance`. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. Once the `table.datalake.format` property is configured, Fluss adopts the key encoding and bucketing strategy used by the corresponding data lake format. This ensures consistency in key encoding and bucketing, enabling seamless **Union Read** functionality across Fluss and Lakehouse. The `table.datalake.format` can be pre-defined before enabling `table.datalake.enabled`. This allows the data lake feature to be dynamically enabled on the table without requiring table recreation. If `table.datalake.format` is not explicitly set during table creation, the table will default to the format specified by the `datalake.format` configuration in the Fluss cluster. |
| table.datalake.format | Enum | (None) | The data lake format of the table specifies the tiered Lakehouse storage format. Currently, supported formats are `paimon`, `iceberg`, `hudi`, and `lance`. In the future, more kinds of data lake format will be supported, such as DeltaLake. Once the `table.datalake.format` property is configured, Fluss adopts the key encoding and bucketing strategy used by the corresponding data lake format. This ensures consistency in key encoding and bucketing, enabling seamless **Union Read** functionality across Fluss and Lakehouse. The `table.datalake.format` can be pre-defined before enabling `table.datalake.enabled`. This allows the data lake feature to be dynamically enabled on the table without requiring table recreation. If `table.datalake.format` is not explicitly set during table creation, the table will default to the format specified by the `datalake.format` configuration in the Fluss cluster. |
| table.datalake.freshness | Duration | 3min | It defines the maximum amount of time that the datalake table's content should lag behind updates to the Fluss table. Based on this target freshness, the Fluss service automatically moves data from the Fluss table and updates to the datalake table, so that the data in the datalake table is kept up to date within this target. If the data does not need to be as fresh, you can specify a longer target freshness time to reduce costs. |
| table.datalake.auto-compaction | Boolean | false | If true, compaction will be triggered automatically when tiering service writes to the datalake. It is disabled by default. |
| table.datalake.auto-expire-snapshot | Boolean | false | If true, snapshot expiration will be triggered automatically when tiering service commits to the datalake. It is disabled by default. |
Expand Down
2 changes: 1 addition & 1 deletion website/docs/maintenance/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ More metrics example could be found in [Observability - Metric Reporters](observ
| Option | Type | Default | Description |
|------------------|---------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| datalake.enabled | Boolean | (None) | Whether the Fluss cluster is ready to create and manage lakehouse tables. If unset, Fluss keeps the legacy behavior where configuring `datalake.format` also enables lakehouse tables. If set to `false`, Fluss pre-binds the lake format for newly created tables but does not allow lakehouse tables yet. If set to `true`, Fluss fully enables lakehouse tables. When this option is explicitly configured to true, `datalake.format` must also be configured. |
| datalake.format | Enum | (None) | The datalake format used by Fluss as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. |
| datalake.format | Enum | (None) | The datalake format used by Fluss as lakehouse storage. Currently, supported formats are Paimon, Iceberg, Hudi, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake. |

## Kafka

Expand Down
6 changes: 3 additions & 3 deletions website/docs/maintenance/tiered-storage/lakehouse-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ sidebar_position: 3
Lakehouse represents a new, open architecture that combines the best elements of data lakes and data warehouses.
Lakehouse combines data lake scalability and cost-effectiveness with data warehouse reliability and performance.

Fluss leverages the well-known Lakehouse storage solutions like Apache Paimon, Apache Iceberg, Apache Hudi, Delta Lake as
the tiered storage layer. Currently, only Apache Paimon, Apache Iceberg, Lance are supported, with more kinds of Lakehouse storage support are on the way.
Fluss leverages the well-known Lakehouse storage solutions like Apache Paimon, Apache Iceberg, Apache Hudi, and Delta Lake as
the tiered storage layer. Currently, Apache Paimon, Apache Iceberg, Apache Hudi, and Lance are supported, with more kinds of Lakehouse storage support on the way.

Fluss's datalake tiering service will tier Fluss's data to the Lakehouse storage continuously. The data in Lakehouse storage can be read both by Fluss's client in a streaming manner and accessed directly
by external systems such as Flink, Spark, StarRocks and others. With data tiered in Lakehouse storage, Fluss
Expand Down Expand Up @@ -134,4 +134,4 @@ The following table lists the options that can be used to configure the datalake

| Option | Type | Default | Description |
|-----------------------------------------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| lake.tiering.auto-expire-snapshot | Boolean | false | If true, snapshot expiration will be triggered automatically when tiering service commits to the datalake, even if `table.datalake.auto-expire-snapshot` is false. |
| lake.tiering.auto-expire-snapshot | Boolean | false | If true, snapshot expiration will be triggered automatically when tiering service commits to the datalake, even if `table.datalake.auto-expire-snapshot` is false. |
2 changes: 1 addition & 1 deletion website/docs/maintenance/tiered-storage/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Fluss organizes data into different storage layers based on its access patterns,
Fluss ensures the recent data is stored in local for higher write/read performance and the historical data is stored in [remote storage](remote-storage.md) for lower cost.

What's more, since the native format of Fluss's data is optimized for real-time write/read which is inevitable unfriendly to batch analytics, Fluss also introduces a [lakehouse storage](lakehouse-storage.md) which stores the data
in the well-known open data lake format for better analytics performance. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake support are on the way. Keep eyes on us!
in the well-known open data lake format for better analytics performance. Currently, supported formats are Paimon, Iceberg, Hudi, and Lance. In the future, more kinds of data lake support are on the way. Keep eyes on us!

The overall tiered storage architecture is shown in the following diagram:

Expand Down
Loading
Loading