Azure Cosmos DB study notes

Posted by John Liu on Saturday, September 9, 2023

TTL (time-to-live):

max 2,147,483,647seconds (~68years) For TTL to work at either container or item level, it needs to be enable/configured at container level first. If only configured at item level, it will be ignored unless TTL is enabled/configured at container level default TTL is not configured max configureable TTL is ?

Previsioned throughput vs. serverless:

previsioned throughput is ideal for predictable traffic patterns that require sustained and predictable performance with minimal variance Serverless can handle wildly varying traffic and low average-to-peak traffic ratios

Previsioned throughput makes some number of request units available each second to each container. Serverless doesn’t require any planning

Previsioned throughput support distributing data to unlimited number of Azure regions Serverless can only run in a single Azure region

Previsioned throught has unlimited storage Serverless only allow upto 50GB storage

autoscale vs. standard (manual) throughput:

standard throughput is for steady traffic autoscale is better suited for unpredicable traffic

standard throughput require a static number of RU to be assigned ahead of time autoscale only need to set the maximum RU, and the minimum RUs will be 10% of the maximum RUs when there is zero request

standard throughput is for predictable throughput that will not change over time. It’s also ideal when the full previsioned RUs is consumed >66% of hours per month autoscale is helpful if throughput can’t be predicted or maximum throughput is used <66% of hours per month

standard throughput will be rate-limited when it reach the previsioned RUs autoscale will scale up to the max RUs before apply rate-limiting

.Net SDK

Microsoft.Azure.Cosmos.CosmosClient
Microsoft.Azure.Cosmos.Database
Microsoft.Azure.Cosmos.Container

##import NuGet package
dotnet add package Microsoft.Azure.Cosmos

connection mode and consistency level

There are two connection mode with the CosmosClientOption: Gateway or Direct. Direct connection mode is the default setting.

there are 5 ConsistencyLevel: (from strong to weak) Strong Bounded Staleness Session (default level) ConsistentPrefix Eventural

you can only set consistant level in the connection to only weak the consistant level for read. It can’t be strengthed or applied to writes

Strong consistant is NOT supported in multi-region write scenario.

Common error codes

400: bad request 401: not authorized 403: forbidden 404: not found 413: RequestEntityTooLarge 429: Too many requests 449: Concurrency error 500: Unexpected service error 503: Service unavailable

transactional batch

transactional batch supports operations with the same logical partition key. Operations with different logical partition keys will fail.

CosmosDB SQL functions

IS_DEFINIED() built-in function to check if a property exists in an item IS_ARRAY() built-in function to check if a properity is an array IS_NULL() IS_NUMBER() IS_STRING() IS_OBJECT() IS_BOOLEAN() CONCAT() GetCurrentDateTime() LOWER() UPPER()

Join

A JOIN in Azure Cosmos DB for NoSQL is different from a JOIN in a relational database as its only scope is a single item. A JOIN creates a cross-product between different sections of a single item.

SELECT 
    p.id,
    p.name,
    t.name AS tag
FROM 
    products p
JOIN
    t IN p.tags
	
SELECT 
    p.id,
    p.name,
    t.name AS tag
FROM 
    products p
JOIN
    (SELECT VALUE t FROM t IN p.tags WHERE t.class = 'trade-in') AS t

Index

All data in Azure Cosmos DB for NoSQL containers is indexed by default. The inverted index is updated for all create, update, or delete operations on an item All properties for every item is automatically indexed Range indexes are used for all strings or numbers

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*"
    }
  ],
  "excludedPaths": [
    {
      "path": "/\"_etag\"/?"
    }
  ]
}

The Consistent indxing mode updates index synchronously. The None indexing mode disables indexing on a container

three primary operators are used when defining index property path: The ? operator indicates that a path terminates with a string or number (scalar) value The [] operator indicates that his path includes an array and avoids having to specify an array index value The * operator is a wildcard and matches any element beyond the current path

In the index policy include/exclude expression, for any conflicts, the most precision takes precedence. For example: included path: /category/name/? excluded path: /category/* The above policy result with all properties within category are not indexed, except the name property

Index policies must include the root path and all possible values (*) as either an included or excluded path.

Embed or Reference data

When should embed data: Read or update togehter 1:1 relationship 1:few relationship

When should refernce data: read or update independently 1:many relationship manay:many relationship

Azure Cosmos DB has a maximumn document size of 2MB.

The maximumn storage size of a physical parition is 50GB and the maximumn throughput is 10,000 RU/s.

One can configure miniumn throughput is 400 RU/s with incremental of multiples of 100 RU/S, to to 10,000 RU/s.

documents with the same parition key value are considered to belon to the same logical partition.

Individual logical parition are moved to new physical partitions as a unit as the container grows.

A container can have unlimited number of logical partitions.

The maximumn size of a logical partition is 20GB. However, using partition key with high cardinality allows us to avoid this 20GB limit by spreading data across large number of logical paritions.

Region

Use ApplicationRegion property to configure a single region for request. Use ApplicationPreferredRegions property to configure a list of preferred regsions.

You can only set conflic resolution policy on newly created containers.

Integrated Cache

To enable integrated cache, we need two primary steps: 1. create a dedicated gateway in Azure Cosmos account 2. update SDK code to use the gateway for requests

Rate-limiting error (429) solutions

1. rate-limiting due to large request --> increse RU/s if not caused by hot partition. If caused by hot partition, consider change partition key.
2. rate-limiting due to metadata request --> do NOT increase RU/s as it won't help at all. Consider a backoff polity to perform metadata request at lower rate, or using single DocumentClient instance for the lifetime of your application, or cache names of the databases and containers.
3. rate-limiting due to transient error --> do NOT increase Ru/s as it won't help at all. Retry the request is the only recommended solution.

Periodical backups mode

Periodical backup is done automatically. By default, full backup every 4hrs and only last two copies are stored.
By default backup using geo-redundant storage. You can change to use zone-redundant or local storage.

You can change backup interval to beween 1-24hrs and retain backups upto 720hrs (30 days).

If container or database is deleted, the existing snapshot will be retained for 30days.

You can't restore by your self with periodical backup. A support ticket is required to do restore. All plans except Basic plan can log restore support request.
You need to be owner, contributor or have Cosmos DB Operator role assgined to request restore from the portal.

Settings that are not restored when restoring from backup:
	1. VNET access control lists
	2. stored procedures, triggers, and user-defined functions
	3. multi-region settings

Continuous backups mode

backups are continuously taken in all regions the account exists, both read/write. Retention period is 30days. You can do point-in-time restore.
By default, backup using local-redundant storage in each region. When Availability Zones are enabled for a region, backups are stored in Zon-Redundant storage.
You can't change storage redundancy when using continuous backup mode.

Continuous backup with 7days retaintion is free. 30days retaintion has extra cost.

Settings that are not restored:
	1. firewall, VNET, private endpoint settings
	2. consistency settings. By default, account is restored with session consistancy.
	3. Regions
	4. Stored procedure, triggers, UDFs

The account owner can either start the restore, or grant restore permission to roles or principals.
Restore will always create a new account
There are limitations that doesn't work with continuous backup.

Azure CLI

There are two core CLI command groups: az cosmosdb az cosmosdb sql

	az cosmosdb create \
		--name '<account name>'
		--resource-group '<group name>'
		
	az cosmosdb sql database create \
		--account-name '<account name>'
		--resource-group '<group name>'
		--name 'db name'
		
	az cosmosdb sql container create \
		--account-name '<account name>'
		--resource-group '<group name>'
		--database-name '<db name>'
		--name '<container name>'
		--throughput '400'
		--partition-key-path '<partition path>'

ARM and Bicep templete

Azure Resource Manager (ARM) resources, list under Microsot.DocumentDB resource provider: Microsoft.DocumentDB\databaseAccounts Microsoft.DcoumentDB\databaseAccounts\sqlDatabases Microsoft.DocumentDB\databaseAccounts\sqlDatabases\Containsers

An JSON ARM template requires an “empty” template syntax. All resource defitions are then defined as objects in the resources array: { “$schema”: “https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", “contentVersion”: “1.0.0.0”, “resources”: [ ] }

A Bicep ARM template does not require any “empty” template syntax.

Bicep ARM template using parent proproty defining relationships as opposed to JSON ARM template verbose dependsOn property.

To deploy the JSON ARM template: az deployment group create
–resource-group ‘
–template-file ‘.\template.json’

To deploy Bicep ARM template: az deployment group create
–resource-group ‘
–template-file ‘.\template.bicep’

Azure Resource Manager will skip a resource if it’s arealdy been created in previous deployment

Some syntax difference between Bicep template and JSON template, for Bicep template: Remove double quote from property name Change property value from double quote to single quote Remove commas typically required in JSON

SP/Function/Trigger

Stored proceudre are registered in containers, and run within the scope of that specific container. Stored procedures are scoped to a single logical partition. You cannot execute a stored procedure that performans operations across logical partition key values.

UDFs do not have access to the context object and are meant to be used as compute-only code.

pre-trigger cannot accept any parameter whereas post-trigger can have parameters. Like with stored procedure, triggers can access to context object.

In CosmosDB, triggers are not automatically executed; they must be speicfied for each database operation when you want them to execute, as part of the ItemRequestOptions:

ItemRequestOptions options = new()
{
	preTriggers = new list<string> {"addLabel"},
	postTriggers = new List<string> {"createView}
}