Managing Large Data Volume in Salesforce

Vinay Sail

Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp
Share on email

The term “Large Data Volume” is lacking exactness and vague. If your Org has tens of thousands of users, millions of records contributing to hundreds of gigabytes of data storage, then you have a large data volume.

LDV Implications

LDV Mitigation Strategy

Integration with external systems

  • Heroku:
    • Sync up data from Heroku or Expose Heroku data using Heroku External Objects
    • Use Salesforce Connect to view archived data from Salesforce
  • Salesforce Connect
    • Connect to the external data source using OData 2.0/Odata 4.0
  • API/Webservice Integration
    • Suited for small amounts of data and short interactions
    • Apex callouts using web services to exchange information with the external system in real-time
    • Always gets the latest data
    • Not possible to have reporting on such data or any workflow automation

Mashup

  • Canvas
    • Avoid bringing data from external systems into Salesforce
    • Use Canvas to view data from the user interface of the external system into Salesforce
  • iFrame
    • Embed external application using iFrame

Indexing

* Too many indexes impact DML operations

Standard Index

Salesforce maintains indexes on the following fields for most objects:

  • RecordTypeId
  • Division
  • CreatedDate
  • Systemmodstamp (LastModifiedDate)
  • Name
  • Email (for contacts and leads)
  • External Id and unique fields
  • Lookups and master-detail relationship
  • Salesforce record ID
 
Custom Index

Contact Salesforce Customer Support to create custom indexed

Can not support custom indexes for

  • Multi-select picklists
  • Text areas (long)
  • Text areas (rich),
  • Non-deterministic formula fields
  • Encrypted text fields
 
Two-Column Custom Indexes
  • Useful for list views
  • Useful when one column is to select the records to display, and the other column is to sort
 
 

Data Archiving using:

  • ETL
  • Heroku
  • BIG Objects
  • Appexchange

 
 

Skinny Tables

  • If you want to use skinny tables, contact Salesforce Customer Support.
  • Salesforce stores standard and custom field data in the separate database table
  • It does not contain soft-deleted records
  • When enabled, skinny tables are created and used automatically where appropriate.
  • You can’t create, access, or modify skinny tables yourself.
  • To add new fields—you’ll need to contact Salesforce to update your skinny table definition.
  • Skinny tables can be created on custom objects and Account, Contact, Opportunity, Lead, and Case objects.
  • Skinny tables and skinny indexes can also contain encrypted data.
  • Skinny tables can contain a maximum of 100 columns.
  • Skinny tables can’t contain fields from other objects.
  • For Full sandboxes: Skinny tables are copied to your Full sandbox org.

BIG Objects

  • Stores millions/billions of records on the Salesforce platform
  • Data can be archived from other objects which are not needed for the operational reasons
  • LWC + SOQL + Indexes to view data
 

Lightning Platform Query Optimizer

  • The Lightning Platform query optimizer helps the database system’s optimizer to prepare effective query execution plans for Salesforce queries.
  • It works on automatically generated queries which can handles reports, list views, and SOQL queries.
  • Best Practices:
    • Try to use fields available in skinny tables
    • Always use selective queries
    • Do not use NULLS in queries as the index can not be used
    • Use indexed fields in the where clause of SOQL queries
    • Avoid wildcards in queries, e.g. %
    • Use query filter which can bring back<10% of the data
    • Include only required fields in the Select clause of the query

Reporting

  • Structure reports using filters that narrow the data to be queried.
  • Select report columns carefully: By reducing the columns selected, the number of joins is reduced, and report generation is faster.
  • Use the scheduling email reports or dashboard feature for long-running reports
  • Use report filters that emphasize the use of standard or custom indexed fields. Use index fields in report filters whenever possible

Divisions

  • Divisions are a way which partitioning the data of large deployments to reduce the number of records returned by queries and reports.
    • e.g. region-specific divisions
  • Contact Salesforce Customer Support to enable divisions
 

Deleting Data

  • Soft Delete

    • Salesforce flags the data as deleted and makes it visible in recycle bin
  • Hard Delete (Recommended Approach)
    • Bulk API supports a hard delete option
    • It allows records to bypass the Recycle Bin and become immediately available for deletion.
 

Defer Sharing Calculation

  • Allows admin users to defer the processing of sharing rules until data is loaded into Salesforce or sharing configuration change gets completed
  • Two types: Group Membership Calculation and Sharing Rule Calculation
  • Suitable for data migration, Role Hierarchy updates
  • Contact Salesforce to enable this capability
  • Share Table calculations across the objects are deferred
  • Re-enable sharing calculations once data load, role hierarchy updates, etc. completed
  • Test the entire process in the sandbox environment first
  • Get benefit of performing recalculations outside of business hours

Parallel Sharing Rule Re-calculation

  • Parallel sharing recalculation allows you to update  sharing rules are placed in the standard background processing queue and are processed asynchronously – Processing get split into multiple execution threads

Granular Locking

  • In normal circumstances, Salesforce locks the entire table which tracks the group membership
  • During simultaneous updates on changes in roles, public groups or territories may cause locking when done concurrently
  • When performing updates to roles and groups, the entire group membership table is locked for data integrity.
  • With granular locking, Salesforce allow multiple updates to proceed simultaneously if there is no hierarchical or other relationship between the roles or groups involved

Using SOQL and SOSL

  • SOQL:  You know in which objects or fields the data resides
  • SOSL: You don’t know in which object or field the data resides, and you want to find it in the most efficient way possible

Org Wide Defaults

  • Set OWD for objects to Public Read/Write and Public Read Only wherever possible for non-confidential data to improve the performance
  • Use ‘Controlled by Parent’ to avoid additional share tables
  • In master-detail relationships, the Object Sharing table for the master object controls access to the detail object

Ownership Skew

  • Cause:
    • When a single user owns more than 10,000 records of an object
    • e.g. Dummy user owner of unassigned leads or cases
  • Problems:
    • Affects sharing table calculations, e.g. change in role hierarchy or territory management, yields into the sharing rules to be recalculated for both the user in the hierarchy and also users above this user in the role hierarchy
  • Prevention:
    • Avoid Ownership skew in the first place
    • Enforce Lead and Case assignment rules
    • Maintain optimum data quality so that Leads or Cases can leverage assignment rules
    • Ensure the skewed owner does not have a role
    • In case role hierarchy is used, then keep the skewed owner at the top of the hierarchy without child
    • Do not make an integration user the record owner
    • Avoid the user out of public groups that could be used in sharing rules
    • Carefully craft data migration by splitting records across multiple users

Account Skew

  • Cause
    • When you have a very large number of child records associated with the same account in Salesforce
  • Problems
    • Record locking – During update operation on a large number of child records in separate threads, Salesforce locks the child as well as the parent record to maintain database integrity
    • Degraded Sharing performance
    • Minor change for e.g., updating the owner of a parent record, may lead to all the sharing rules on the child records being recalculated
    • Prevention
    • Avoid Ownership skew in the first place
    • Place automation to distribute the child records across parent accounts

Lookup  Skew

  • Similar to Account Skew but having an impact on a broader number of objects
  • Cause:
    • When a large number of records associated with a single record in the lookup object
  • Problems:
    • Record locking leads to more time to process the records in order to maintain data integrity
  • Prevention:
    • Using picklist field if the number of lookup records is small
    • Distribute the skew – possibly add additional lookup values to distribute the skew
    • Reducing the load –  If automated processes causing record locking for end-user operations, then run the automated process serially during non-peak hours
    • Reducing record save time – only process the data which is required, remove unnecessary workflow or try to consolidate trigger, use apex best practices
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp
Share on email

Subscribe to our newsletter

Don't miss new updates on your email