John Liu Blog

Every drop counts

Convert Excel to CSV

This Python converts Excel worksheets to CSV comma delimited format and combine all worksheets into a single file with worksheet name as the section name. # pip install pandas openpyxl import pandas as pd # Path to your Excel file excel_file = "your excel file.xlsx" # Output text file output_file = "combined_output.csv" # Read the Excel file xls = pd.ExcelFile(excel_file) with open(output_file, "w", encoding="utf-8") as f: for sheet_name in xls.sheet_names: f.

Extract Excel Columns

You might want to grab all columns from all worksheets in an Excel. This simple Python script will help to do that. # pip install pandas openpyxl import pandas as pd # Path to your Excel file excel_file = "your excel file.xlsx" # Read the Excel file xls = pd.ExcelFile(excel_file) # Initialize an empty list to store tuples sheet_column_pairs = [] # Loop through each sheet and get column names for sheet_name in xls.

SQL Query External Dta

Read text file contents as a single value: SELECT [BulkColumn] FROM OPENROWSET(BULK 'C:\TEMP\test.json', SINGLE_CLOB) AS Contents Read data from Excel: Option1: Using OPENROWSET. Note: if your column has data more than 255 characters, try to format the column as Text. Also try to change following registry key TypeGuessRows to 0 (default is 8), under path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel (for 64-bit machine) or HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel (for 32-bit machine). Restart of SQL instance required if change registry setting.

Spark examples

This example shows how to rename a column, change column data type, add new column with no value, drop column, and debugging output df/table schema. from pyspark.sql.types import DecimalType, StringType, TimestampType from pyspark.sql.functions import current_timestamp, lit, col for table in os.listdir(SOURCE_FOLDER_FILE_API_PATH): tableFolderABFSPath = os.path.join(source_folder_abfs_path, table) df = spark.read.format("delta").load(tableFolderABFSPath) if table == "EmployeeLocation": df = df.withColumn("LocationId",df["LocationId"].cast("string")) elif table == "EmployeeRelatedParty": df = df.withColumnRenamed("RelationshipPeriodStartTimestamp","PeriodStartTimestamp") \ .withColumnRenamed("RelationshipPeriodEndTimestamp","PeriodEndTimestamp") \ .withColumn("SourceModifiedOn",lit(None)) \ .withColumn("SourceTable",lit(None)) \ .drop("RelationshipEstablishedDate") elif table == "GeographicArea": df = df.

Install Python Library in Visual Studio Code

In Visual Studio Code Python project, you can create a Python virtual environment for your project. Open Command Palette (Ctrl+Shift+P), search for Pyton: Create Environment command. This will auto set the workspace environment to the newly created virtual environment. Now in the terminal, run the pip install for the library needed. pip install <library> Alternativly, you can run following script to create the virtual environment (named venv or whatelse you prefer):