There are several ways to read Parquet file within SQL Server.
1. Using OPENROWSET (SQL2022+)
Starting with SQL Server 2022, we can use OPENROWSET to query Parquet files directly from Azure Blob Storage, ADLS Gen2, or S3-compatible storage without creating a permanent table first.
-- 1. Create credential
CREATE DATABASE SCOPED CREDENTIAL [MyAzureCredential]
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2022-11-02&ss=b&srt=sco&sp=rwdl&se=2025-12-31...';
-- 2. Create the Data Source pointing to your container
CREATE EXTERNAL DATA SOURCE MyCloudLogs
WITH (
LOCATION = 'abs://yourcontainer@yourstorageaccount.blob.core.windows.net',
CREDENTIAL = [MyAzureCredential] -- Links to the secret created above
);
-- 3. Query using the Data Source
SELECT * FROM OPENROWSET(
BULK '2024/sales_report.parquet',
DATA_SOURCE = 'MyCloudLogs',
FORMAT = 'PARQUET'
) AS [result];
If we didn’t create extenal data source as above, but using the the full storage URL as following, SQL Server looks for a Server-Level Credential where the name of the credential matches the URL of the storage.
SELECT TOP 10 *
FROM OPENROWSET(
BULK 'abs://yourcontainer@yourstorageaccount.blob.core.windows.net/2024/sales_report.parquet',
FORMAT = 'PARQUET'
) AS [data];
OPENROWSET with FORMAT = ‘PARQUET’ is offically supported for Azure/S3 cloud paths. For Parquet files on local disk, we typically use PolyBase or Python.
2. Using PolyBase
If we need to query the Parquet file frequently as if it were a regular table, we can use PolyBase to create an External Table. This works for both cloud storage and local Hadoop-style sources.
-- Create the External Table
CREATE EXTERNAL TABLE [dbo].[ReadParquetData] (
[SalesID] INT,
[Amount] DECIMAL(18,2),
[OrderDate] DATE
)
WITH (
LOCATION = '/data/sales.parquet',
DATA_SOURCE = MyAzureStorage,
FILE_FORMAT = ParquetFormat
);
-- Now query it like a normal table
SELECT * FROM [dbo].[ReadParquetData] WHERE Amount > 1000;
3. Using Python
EXEC sp_execute_external_script
@language = N'Python',
@script = N'
import pandas as pd
# Read from local disk
df = pd.read_parquet("C:/Exports/YourFile.parquet")
OutputDataSet = df
'
WITH RESULT SETS ((SalesID INT, Amount DECIMAL(18,2), OrderDate DATE));