John Liu Blog

Every drop counts

Generate Self-signed Certificate

There are time we might need a certificate, for example, to set up Azure Service Principle without using secret. We can use PowerShell or OpenSSL to generate self-signed certificate. # 1. Config $certname = "SQLDataExportCert" # password for secure the private key file # use single quote to quote the string, to handle special character like $ $pwdstring = 'YourSecurePassword' # Define the expiration date (e.g., 5 years from today) $expiryDate = (Get-Date).

SQL Generate Parquet File

There are several ways to generate Parquet file from SQL. PolyBase can generate Parquet file to S3 or Azure but we can’t specify the file name generated. Refer to SQL Export to parquet file. Azure Data Factory (ADF) or Fabric pipe line Copy Activity can generate Parquet file with specific file name. Python script can generate Parquet file with specific name. Following are examples using Python script in SQL to generate the Parquet file.

SQL Read Parquet File

There are several ways to read Parquet file within SQL Server. 1. Using OPENROWSET (SQL2022+) Starting with SQL Server 2022, we can use OPENROWSET to query Parquet files directly from Azure Blob Storage, ADLS Gen2, or S3-compatible storage without creating a permanent table first. -- 1. Create credential CREATE DATABASE SCOPED CREDENTIAL [MyAzureCredential] WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = 'sv=2022-11-02&ss=b&srt=sco&sp=rwdl&se=2025-12-31...'; -- 2. Create the Data Source pointing to your container CREATE EXTERNAL DATA SOURCE MyCloudLogs WITH ( LOCATION = 'abs://yourcontainer@yourstorageaccount.

SQL export to Parquet file

In SQL 2022+, we can export SQL query results directly to a Parquet file using the CETAS (Create External Table As Select) statement using PolyBase. PolyBase exists in SQL Server 2016+ and Azure SQL Database, but CETAS is only supported in SQL Server 2022+ and Azure SQL Database. Here is an example of how to export SQL query results to a Parquet file stored in an S3 bucket or Azure storage:

Fabric Study Notes

Load data into Lakehouse. %%python # this create a managed delta table, parquet file will be managed under the Tables folder. When table is deleted, associated # parquet files will be auto deleted as well df = spark.read.load(path='Files/Data/sales.csv',format='csv',header=True) df.write.format('delta').saveAsTable('test') %%python # we can create an external delta table, parquet file will be saved under external location specified. When table is deleted, # associated parquet files will not be auto deleted. df = spark.