SQL Read Parquet File-John Liu Blog

There are several ways to read Parquet file within SQL Server.

1. Using OPENROWSET (SQL2022+)

Starting with SQL Server 2022, we can use OPENROWSET to query Parquet files directly from Azure Blob Storage, ADLS Gen2, or S3-compatible storage without creating a permanent table first.

-- 1. Create credential
CREATE DATABASE SCOPED CREDENTIAL [MyAzureCredential]
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2022-11-02&ss=b&srt=sco&sp=rwdl&se=2025-12-31...';

-- 2. Create the Data Source pointing to your container
CREATE EXTERNAL DATA SOURCE MyCloudLogs
WITH (
    LOCATION = 'abs://yourcontainer@yourstorageaccount.blob.core.windows.net',
    CREDENTIAL = [MyAzureCredential] -- Links to the secret created above
);

-- 3. Query using the Data Source
SELECT * FROM OPENROWSET(
    BULK '2024/sales_report.parquet',
    DATA_SOURCE = 'MyCloudLogs',
    FORMAT = 'PARQUET'
) AS [result];

If we didn’t create extenal data source as above, but using the the full storage URL as following, SQL Server looks for a Server-Level Credential where the name of the credential matches the URL of the storage.

SELECT TOP 10 *
FROM OPENROWSET(
    BULK 'abs://yourcontainer@yourstorageaccount.blob.core.windows.net/2024/sales_report.parquet',
    FORMAT = 'PARQUET'
) AS [data];

OPENROWSET with FORMAT = ‘PARQUET’ is offically supported for Azure/S3 cloud paths. For Parquet files on local disk, we typically use PolyBase or Python.

2. Using PolyBase

If we need to query the Parquet file frequently as if it were a regular table, we can use PolyBase to create an External Table. This works for both cloud storage and local Hadoop-style sources.

-- Create the External Table
CREATE EXTERNAL TABLE [dbo].[ReadParquetData] (
    [SalesID] INT,
    [Amount] DECIMAL(18,2),
    [OrderDate] DATE
)
WITH (
    LOCATION = '/data/sales.parquet',
    DATA_SOURCE = MyAzureStorage,
    FILE_FORMAT = ParquetFormat
);

-- Now query it like a normal table
SELECT * FROM [dbo].[ReadParquetData] WHERE Amount > 1000;

3. Using Python

EXEC sp_execute_external_script
  @language = N'Python',
  @script = N'
import pandas as pd
# Read from local disk
df = pd.read_parquet("C:/Exports/YourFile.parquet")
OutputDataSet = df
'
WITH RESULT SETS ((SalesID INT, Amount DECIMAL(18,2), OrderDate DATE));

FEATURED TAGS

ai api automation availability availability sets availability zones aws vm azure azure automation runbook azure blob azure cosmos db azure data lake azure function app azure openai azure sign-in azure site recovery azure sql database azure sql db azure subscription azure vm base64 certificate change data capture change tracking chrome clr container cte data api builder data conversion data gateway database role database size date table dax db config derived table diagram direct query disk management disk space docker downtime dtc dynamic m parameter embedding encrypted connection excel excel online execution plan extended events external data fabric fabric capacity failover cluster fk geometry hierarchy httpwebrequest hugo hyper-v incognito mode index infrastructure inline tvf json kql lakehouse linked server live query statistics locking m machine learning machine learning model machine learning services master key mcp mdx memory memory grant mermaid mirrored sql server network network card network category onedrive onnx runtime openrowset p2v parquet performance polybase power automate power bi power bi report tricks power platform power query powershell printer public ip address pyspark python qgis qt designer query performance query plan query troubleshooting r regex replication route s3 schema design scripting self-signed certificate server role sharepoint snowflake software development sofware development spark sql sql agent sql availability group sql error sql failover cluster instance sql index sql openjson sql permission sql recovery sql script sql security sql server sql server admin sql server config sql statistics ssis ssisdb ssl ssl/tls error ssms table expression tempdb tips troubleshooting unicode view visual studio visual studio code vmware wait statistics wi-fi connection issue windows settings