John Liu Blog

Every drop counts

Extract Excel Columns

You might want to grab all columns from all worksheets in an Excel. This simple Python script will help to do that. # pip install pandas openpyxl import pandas as pd # Path to your Excel file excel_file = "your excel file.xlsx" # Read the Excel file xls = pd.ExcelFile(excel_file) # Initialize an empty list to store tuples sheet_column_pairs = [] # Loop through each sheet and get column names for sheet_name in xls.

SQL Query External Dta

Read text file contents as a single value: SELECT [BulkColumn] FROM OPENROWSET(BULK 'C:\TEMP\test.json', SINGLE_CLOB) AS Contents Read data from Excel: Option1: Using OPENROWSET. Note: if your column has data more than 255 characters, try to format the column as Text. Also try to change following registry key TypeGuessRows to 0 (default is 8), under path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel (for 64-bit machine) or HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel (for 32-bit machine). Restart of SQL instance required if change registry setting.

Spark examples

This example shows how to rename a column, change column data type, add new column with no value, drop column, and debugging output df/table schema. from pyspark.sql.types import DecimalType, StringType, TimestampType from pyspark.sql.functions import current_timestamp, lit, col for table in os.listdir(SOURCE_FOLDER_FILE_API_PATH): tableFolderABFSPath = os.path.join(source_folder_abfs_path, table) df = spark.read.format("delta").load(tableFolderABFSPath) if table == "EmployeeLocation": df = df.withColumn("LocationId",df["LocationId"].cast("string")) elif table == "EmployeeRelatedParty": df = df.withColumnRenamed("RelationshipPeriodStartTimestamp","PeriodStartTimestamp") \ .withColumnRenamed("RelationshipPeriodEndTimestamp","PeriodEndTimestamp") \ .withColumn("SourceModifiedOn",lit(None)) \ .withColumn("SourceTable",lit(None)) \ .drop("RelationshipEstablishedDate") elif table == "GeographicArea": df = df.

Install Python Library in Visual Studio Code

In Visual Studio Code Python project, you can create a Python virtual environment for your project. Open Command Palette (Ctrl+Shift+P), search for Pyton: Create Environment command. This will auto set the workspace environment to the newly created virtual environment. Now in the terminal, run the pip install for the library needed. pip install <library> Alternativly, you can run following script to create the virtual environment (named venv or whatelse you prefer):

Setup R and Python on SQL Server 2022

When setup R or Python on SQL Server 2022, especially for named instance, some special notes required, apart from the offical documentation. You need to make sure when doing install, both Database Engine Services and Maching Learning Services and Language are installed. Unlike with earlier version of SQL, R and Python libraries are not installed/configured automatically in SQL 2022. You need to download and install latest R4.2. Then need to install some standard R dependencies.