John Liu Blog

Every drop counts

SQL Query External Dta

Read text file contents as a single value: SELECT [BulkColumn] FROM OPENROWSET(BULK 'C:\TEMP\test.json', SINGLE_CLOB) AS Contents Read data from Excel: Option1: Using OPENROWSET. Note: if your column has data more than 255 characters, try to format the column as Text. Also try to change following registry key TypeGuessRows to 0 (default is 8), under path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel (for 64-bit machine) or HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel (for 32-bit machine). Restart of SQL instance required if change registry setting.

Spark examples

This example shows how to rename a column, change column data type, add new column with no value, drop column, and debugging output df/table schema. from pyspark.sql.types import DecimalType, StringType, TimestampType from pyspark.sql.functions import current_timestamp, lit, col for table in os.listdir(SOURCE_FOLDER_FILE_API_PATH): tableFolderABFSPath = os.path.join(source_folder_abfs_path, table) df = spark.read.format("delta").load(tableFolderABFSPath) if table == "EmployeeLocation": df = df.withColumn("LocationId",df["LocationId"].cast("string")) elif table == "EmployeeRelatedParty": df = df.withColumnRenamed("RelationshipPeriodStartTimestamp","PeriodStartTimestamp") \ .withColumnRenamed("RelationshipPeriodEndTimestamp","PeriodEndTimestamp") \ .withColumn("SourceModifiedOn",lit(None)) \ .withColumn("SourceTable",lit(None)) \ .drop("RelationshipEstablishedDate") elif table == "GeographicArea": df = df.

Snowflake notes

To grant access to a role grant usage on warehouse compute_wh to role mytestrole; grant usage on database test to role mytestrole; grant usage on schema study to role mytestrole; --select on individual table grant select on table test.study.sales to role mytestrole; --full access to all existing tables grant select,insert,update,delete on all tables in schema test.study to role mytestrole; --automatically grant access to future tables grant select on future tables in schema test.

Fabric Study Notes

Load data into Lakehouse. %%python # this create a managed delta table, parquet file will be managed under the Tables folder. When table is deleted, associated # parquet files will be auto deleted as well df = spark.read.load(path='Files/Data/sales.csv',format='csv',header=True) df.write.format('delta').saveAsTable('test') %%python # we can create an external delta table, parquet file will be saved under external location specified. When table is deleted, # associated parquet files will not be auto deleted. df = spark.

SQL Server Firewall

In Windows, you can use following Powershell script to explicitly enable remote access to your SQL Server instance (substitute the DisplayName and LocalPort accordingly). New-NetFirewallRule -DisplayName "SQLServer default instance" -Direction Inbound -LocalPort 1433 -Protocol TCP -Action Allow New-NetFirewallRule -DisplayName "SQLServer Browser service" -Direction Inbound -LocalPort 1434 -Protocol UDP -Action Allow You can find more information about ports used by various SQL Server services in this articals Configure the Windows Firewall to allow SQL Server access.