Fabric Study Notes

Posted by John Liu on Wednesday, December 18, 2024

Load data into Lakehouse.

%%python
# this create a managed delta table, parquet file will be managed under the Tables folder. When table is deleted, associated
# parquet files will be auto deleted as well
df = spark.read.load(path='Files/Data/sales.csv',format='csv',header=True)
df.write.format('delta').saveAsTable('test')
%%python
# we can create an external delta table, parquet file will be saved under external location specified. When table is deleted,
# associated parquet files will not be auto deleted.
df = spark.read.load(path='Files/Data/sales.csv',format='csv',header=True)
df.write.format('delta').saveAsTable('myExternalTable',path='Files/myexternaltable')
# after external delta table deleted, we can recreated the table from the parquet file.
df = spark.read.parquet('Files/myexternaltable/part-00000-9d57224a-f267-437a-8669-cb69566a853d-c000.snappy.parquet')
df.write.format('delta').mode('overwrite').saveAsTable(name='myExternalTable',path='Files/myexternaltable')
%%sql
-- this command register an external table from the parquet file. The table created will not be delta table
create table myExternalTable
using parquet
location 'Files/myexternaltable/part-00000-9d57224a-f267-437a-8669-cb69566a853d-c000.snappy.parquet'