John Liu Blog

Every drop counts

Spark examples

This example shows how to rename a column, change column data type, add new column with no value, drop column, and debugging output df/table schema. from pyspark.sql.types import DecimalType, StringType, TimestampType from pyspark.sql.functions import current_timestamp, lit, col for table in os.listdir(SOURCE_FOLDER_FILE_API_PATH): tableFolderABFSPath = os.path.join(source_folder_abfs_path, table) df = spark.read.format("delta").load(tableFolderABFSPath) if table == "EmployeeLocation": df = df.withColumn("LocationId",df["LocationId"].cast("string")) elif table == "EmployeeRelatedParty": df = df.withColumnRenamed("RelationshipPeriodStartTimestamp","PeriodStartTimestamp") \ .withColumnRenamed("RelationshipPeriodEndTimestamp","PeriodEndTimestamp") \ .withColumn("SourceModifiedOn",lit(None)) \ .withColumn("SourceTable",lit(None)) \ .drop("RelationshipEstablishedDate") elif table == "GeographicArea": df = df.

Snowflake notes

To grant access to a role grant usage on warehouse compute_wh to role mytestrole; grant usage on database test to role mytestrole; grant usage on schema study to role mytestrole; --select on individual table grant select on table test.study.sales to role mytestrole; --full access to all existing tables grant select,insert,update,delete on all tables in schema test.study to role mytestrole; --automatically grant access to future tables grant select on future tables in schema test.

Fabric Study Notes

Load data into Lakehouse. %%python # this create a managed delta table, parquet file will be managed under the Tables folder. When table is deleted, associated # parquet files will be auto deleted as well df = spark.read.load(path='Files/Data/sales.csv',format='csv',header=True) df.write.format('delta').saveAsTable('test') %%python # we can create an external delta table, parquet file will be saved under external location specified. When table is deleted, # associated parquet files will not be auto deleted. df = spark.

SQL Server Firewall

In Windows, you can use following Powershell script to explicitly enable remote access to your SQL Server instance (substitute the DisplayName and LocalPort accordingly). New-NetFirewallRule -DisplayName "SQLServer default instance" -Direction Inbound -LocalPort 1433 -Protocol TCP -Action Allow New-NetFirewallRule -DisplayName "SQLServer Browser service" -Direction Inbound -LocalPort 1434 -Protocol UDP -Action Allow You can find more information about ports used by various SQL Server services in this articals Configure the Windows Firewall to allow SQL Server access.

Fabric References

Some reference articles about Microsoft Fabrics. Moving Dataflows Gen 2 to different workspaces