+ Start a Discussion
Keith Stephens 18Keith Stephens 18 

Find Null Fields

Hello All,
I am in need to find a way to search Saleforce Database for Field or objects that were created but never used and there is no data in them.
So if you were to selct from these tables and fields it would be null.

Thansk,
Keith.
Raj VakatiRaj Vakati

You can able to use Salesforce reports .. Create  reports on a different object and filter the data based on null values  

Or you have to export the data into external or csv files to analyze the values
I guess noting is there native to SF..


 
Keith Stephens 18Keith Stephens 18
I have not tried the report or csv route yet, but what I am looking for is a tool or script that can do this.  I have a tool that when you select a sql server database it loops through all the tables and all the fields of that table and reports the table name and field name that is NULL and has NO data.
Alain CabonAlain Cabon
Hi,

I have writen some programs for this problem and the easiest way is to read all the exported CSV files (setup > data export).

Any CSV reader is sufficient.   https://commons.apache.org/proper/commons-csv/

Comparison: https://github.com/uniVocity/csv-parsers-comparison
You can always read big exported CSV files (no time out, no governor limits).

What is the maximum number of records of your objects and the number of custom fields?

With few custom fields and some hundreds of thousands of rows (up to one million), generated dynamic SOQL queries with Rest API calls (can be done in Apex): select myfield1__c from myObject__c where myfield1__c != null limit 1  ... are sufficient but you can have a time out nevertheless if you have big objects (not declared as big objects). When I launched parallel calls in java (10 queries for instance), the responses are incredibly fast (the DB engine of Salesforce is impressive and very powerfull despite of the governor limits).

AppExchange tools:  rarely free when powerful or limited free version in reality (free or limited is the same) or buggy (crashes because of the governor limits or a time out which are very difficult to overcome with synchronous requests from a screen with too simple algorithms)  but it is always interesting to test all these "free" tools if you have not too many records and you select an object one after another (great but slow and many manual operations)

Field Trip (powered by RingLead): Run reports on standard and custom field usage.
Ever wish you could run reports on the fields you have in Salesforce? Take a Field Trip! This utility lets you analyze the fields of any object, including what percentage of the records (or a subset of your records) have that field populated.
https://appexchange.salesforce.com/appxListingDetail?listingId=a0N30000003HSXEEA4
 
Alain CabonAlain Cabon
I have also tried a tool called Pandas (  https://pandas.pydata.org/ ).

In theory, that should be the "best" tool for analysing CSV files with its dataframes but with really big files, you have to complicate the queries (reading by chunks with cumulative results) and a simple CSV reader is sufficient when you don't need complicated statistics.

I used Pandas to have the first most used values for example. Pandas is useful for this kind of results directly from CSV files (including with filters) with short code in python.
 
Alain CabonAlain Cabon
A short example with Pandas and Account.csv

With some lines of code in python by using pandas, you can have directly the result of all the empty columns.

account_columns_null.py
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np

pd.set_option('max_info_columns', 500)

filename ='Account.csv'

mychunksize = 100000

for chunk in pd.read_csv(filename, chunksize=mychunksize, usecols=lambda col: col not in ['Id','Name','IsDeleted','CreatedDate','CreatedById','RecordTypeId','ParentId','LastModifiedDate','LastModifiedById','SystemModstamp']):
	chunk.info()  
	chunk.describe(include = 'all')
	null_columns=chunk.columns[chunk.isnull().all()]
	for col in null_columns:
		print("Column: " + col)

print 'fin'

Panda automatically analyze all the columns in its dataframe (excepted "col not in ['Id','Name' ... ])
 
c:\panda>account_columns_null.py

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136 entries, 0 to 135
Data columns (total 62 columns):
MasterRecordId             0 non-null float64
Type                       117 non-null object
BillingStreet              129 non-null object
BillingCity                129 non-null object
BillingState               78 non-null object
BillingPostalCode          129 non-null object
BillingCountry             132 non-null object
BillingLatitude            0 non-null float64
BillingLongitude           0 non-null float64
BillingGeocodeAccuracy     0 non-null float64
ShippingStreet             129 non-null object
ShippingCity               129 non-null object
ShippingState              78 non-null object
ShippingPostalCode         128 non-null object
ShippingCountry            132 non-null object
ShippingLatitude           0 non-null float64
ShippingLongitude          0 non-null float64
ShippingGeocodeAccuracy    0 non-null float64
Phone                      132 non-null object
Fax                        118 non-null object
AccountNumber              1 non-null float64
Website                    129 non-null object
Sic                        0 non-null float64
Industry                   120 non-null object
AnnualRevenue              129 non-null float64
NumberOfEmployees          129 non-null float64
Ownership                  0 non-null float64
TickerSymbol               0 non-null float64
Description                0 non-null float64
Rating                     1 non-null object
Site                       0 non-null float64
OwnerId                    136 non-null object
LastActivityDate           3 non-null object
IsExcludedFromRealign      136 non-null int64
Jigsaw                     0 non-null float64
JigsawCompanyId            0 non-null float64
CleanStatus                136 non-null object
AccountSource              0 non-null float64
DunsNumber                 0 non-null float64
Tradestyle                 0 non-null float64
NaicsCode                  0 non-null float64
NaicsDesc                  0 non-null float64
YearStarted                0 non-null float64
SicDesc                    0 non-null float64
DandbCompanyId             136 non-null object
CustomerPriority__c        0 non-null float64
SLA__c                     3 non-null object
Active__c                  3 non-null object
NumberofLocations__c       1 non-null float64
UpsellOpportunity__c       0 non-null float64
SLASerialNumber__c         3 non-null float64
SLAExpirationDate__c       3 non-null object
ginid__c                   3 non-null float64
ZC__c                      2 non-null object
rrpu__Alert_Message__c     0 non-null float64
City__c                    1 non-null object
Country__c                 1 non-null object
CompletionValue__c         0 non-null float64
Status__c                  13 non-null object
Support_Level__c           0 non-null float64
Subcategories__c           1 non-null object
ListeBilingue__c           1 non-null object

dtypes: float64(32), int64(1), object(29)

memory usage: 65.9+ KB

Column: MasterRecordId
Column: BillingLatitude
Column: BillingLongitude
Column: BillingGeocodeAccuracy
Column: ShippingLatitude
Column: ShippingLongitude
Column: ShippingGeocodeAccuracy
Column: Sic
Column: Ownership
Column: TickerSymbol
Column: Description
Column: Site
Column: Jigsaw
Column: JigsawCompanyId
Column: AccountSource
Column: DunsNumber
Column: Tradestyle
Column: NaicsCode
Column: NaicsDesc
Column: YearStarted
Column: SicDesc
Column: CustomerPriority__c
Column: UpsellOpportunity__c
Column: rrpu__Alert_Message__c
Column: CompletionValue__c
Column: Support_Level__c

fin

c:\panda>

It is the shortest way but you cannot read gigas of data directly with Pandas and reading by chunks complicate the final results.