Our data platform is set up as a hybrid. We primarily have our data storage and processing on-premise while we do secondary/additional work on the cloud. Naturally, this requires lots of data movement across environments.
While Delta Lake features (structured streaming source/sink) has made this data flow much easier and reliable, there are still certain use cases where we require hdfs distcp-like functionality (usually when needing duplicate copies on-prem and cloud). Distcp that's specific for Delta datasets would be nice here.