This post is the first block for the “Architect pills” section. What do I mean? Architect pills is a collection of subjects for people who want to look further the drag&drop knowledge of the Salesforce platform and want to understand its basics and the entire architecture. Today we’ll start with data normalization, database strategy and techniques for handling large data volume in the system.
A relational database table is often described as “normalized” if it is in the Third Normal Form (3NF). 3NF tables are free of insertion, update and deletion anomalies. An easy way to remember 3NF definition is the following statement:“Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key”(Bill Kent). But which are the elements that difference 3NF with 1NF and 2NF? Here some points that help to keep in mind their definition:
- Requiring existing of “the key” ensures the table is in 1NF
- Requiring that non-key attributes be dependent on “the whole key” ensures 2NF
- Further requiring that non-key attributes be dependent on “nothing but the key” ensure 3NF
Standard database design guidance is to create a fully normalized design (that is 3NF), particularly for standard transactional (OLTP) databases. For OLAP / Business Intelligence, often selective de-normalization will subsequently be performed for performance reasons
- Standard 3NF can be slow to query for large data volumes (reading data from multiple related tables – join operations)
- Sometimes the logical design remained 3NF normalized but the DBMS uses techniques such as indexed view (SQL Server) or materialized view s (Oracle).
Salesforce is based on a multi-tenant architecture where a single application is used by different organizations. Instead of providing a complete set of hardware and software resources to each organization, Salesforce inserts a layer of software between the single instance and each organization’s deployment. This layer is invisible to the organizations, which see only their own data and schemas while Salesforce reorganizes the data behind the scenes to perform efficient operations.
It is a very common scenario where a single company starts with a small Salesforce configuration and then wants to scale. Salesforce lets you scale in a very simple way, changing your license plan, increasing the number of licenses or adding storage for data and files. This scaling usually happens automatically, but when data sets get larger, the time required for certain operations might grow. So it is very important for architects to configure and design the Salesforce application from the beginning because configurations and customizations can increase or decrease those volumes by several orders of magnitude.
But which are the capabilities that Salesforce offers in order to support the performance of systems with large data volumes?
This functionality helps the database to produce effective execution plans for Salesforce queries contained in reports, list views and SOQL queries, providing efficient data access in Salesforce. The Lightning Platform query optimizer determines the starting index basing on the fields contained in the where condition (“filters”) of the query, it chooses the best table to drive the query if not valuable filters are available and it injects custom foreign keys to create efficient joins paths.
Skinny tables can be created as a copy table of a master object, always aligned to its source in order to speed up read operations (es. Reports). A skinny table contains only frequently asked fields and does not contains soft deleted records. At the database level, Salesforce maintains separate tables for standard and custom fields on a specific object. Instead, skinny tables avoid joins normalizing all the fields in the same row.
Database Statistics and Indexes
Salesforce database gathers every night statistics about the amount and types of data stored in its database in order to execute queries in an efficient way. Solution designers can use custom indexes in order to speed up the query contacting Salesforce support. The platform also maintains indexes on standard fields like CreatedDate, Name, Email, RecordTypeId and Division. When a query is executed it is optimized performing a pre-query using the statistics table in order to determine if an index can speed up the query. The behaviour is different if the field used for the filter is a custom or standard field.
- Standard Indexed Fields: used if the filter matches less than 30% of the first million records and less than 15% of additional records, up to a maximum of one million records.
- Custom Indexed Fields: used if the filter matches less than 10% of the total records, up to a maximum of 333,333 records.
Keep in touch for the next pill!