John Liu Blog

Every drop counts

SQL Read Parquet File

There are several ways to read Parquet file within SQL Server. 1. Using OPENROWSET (SQL2022+) Starting with SQL Server 2022, we can use OPENROWSET to query Parquet files directly from Azure Blob Storage, ADLS Gen2, or S3-compatible storage without creating a permanent table first. -- 1. Create credential CREATE DATABASE SCOPED CREDENTIAL [MyAzureCredential] WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = 'sv=2022-11-02&ss=b&srt=sco&sp=rwdl&se=2025-12-31...'; -- 2. Create the Data Source pointing to your container CREATE EXTERNAL DATA SOURCE MyCloudLogs WITH ( LOCATION = 'abs://yourcontainer@yourstorageaccount.

SQL export to Parquet file

In SQL 2022+, we can export SQL query results directly to a Parquet file using the CETAS (Create External Table As Select) statement using PolyBase. PolyBase exists in SQL Server 2016+ and Azure SQL Database, but CETAS is only supported in SQL Server 2022+ and Azure SQL Database. Here is an example of how to export SQL query results to a Parquet file stored in an S3 bucket or Azure storage:

SQL Server Change Data Capture (CDC)

CDC (Change Data Capture) was introduced in SQL2008 Enterprise only, only available to Standard edtions since SQL2016 SP1. It’s not available in Express edtion as Express edtion doesn’t have SQL Agent. CDC involves three-tier configuration: Server-level (agent job), Database-level and Table-level. It works in all recovery modes. When using CDC, we need to minotr log file growth and adjust retention period, especially if not using FULL recovery mode. CDC is like a video recording of every single change, where as Change Tracking (CT) is more like a “motion sensor”.

SQL Server Change Tracking (CT)

Change Tracking (CT) introduced in SQL2008, all edtitions, includes Express. CT is more like a “motion sensor”. CT tells you which rows changed and what the latest values are, but id doesn’t keep a history of every intermediate step. CDC is like a video recording of every single change. CT is much more lightweight than CDC. CT can only be enabled on table with PK. It’s a sync operation (overhead on every transaction).

SQL Server MCP

References: SQL AI Samples (Microsoft Azure Samples) MSSQL MCP (Aaronontheweb)