Datasets, terminology & data quality

How online, offline, and scrape-trace data fit together — plus practical guidance on validation and match accuracy.

Online primary dataset

Our primary dataset consists of online data — derived from online sources such as co-reg databases, online forms, and publisher relationships.

Collecting data via online sources is the most direct path to the most actionable email signals and intent. Consumer attributes from online profiles may be less exhaustive than offline / skip-traced attributes; pair channels accordingly.

Online primary dataset — fields
Field Description
FIRST_NAME
LAST_NAME
DIRECT_NUMBER
MOBILE_PHONE
PERSONAL_ADDRESS
PERSONAL_CITY
PERSONAL_PHONE
PERSONAL_STATE
PERSONAL_ZIP
PERSONAL_ZIP4
SOCIAL_CONNECTIONS
AGE_RANGE
CHILDREN
GENDER
HOMEOWNER
MARRIED
NET_WORTH
INCOME_RANGE
BUSINESS_EMAIL
BUSINESS_EMAIL_VALIDATION_STATUS
PROGRAMMATIC_BUSINESS_EMAILS
BUSINESS_EMAIL_LAST_SEEN
PERSONAL_EMAIL
ADDITIONAL_PERSONAL_EMAILS
PERSONAL_EMAIL_VALIDATION_STATUS
PERSONAL_EMAIL_LAST_SEEN
SHA256_PERSONAL_EMAIL
SHA256_BUSINESS_EMAIL
LAST_UPDATED
COMPANY_ADDRESS
COMPANY_DESCRIPTION
COMPANY_DOMAIN
COMPANY_EMPLOYEE_COUNT
COMPANY_LINKEDIN_URL
COMPANY_NAME
COMPANY_PHONE
COMPANY_REVENUE
COMPANY_SIC
COMPANY_NAICS
COMPANY_CITY
COMPANY_STATE
COMPANY_ZIP
COMPANY_INDUSTRY
COMPANY_LAST_UPDATED
DEPARTMENT
JOB_TITLE
LINKEDIN_URL
PROFESSIONAL_ADDRESS
PROFESSIONAL_ADDRESS_2
PROFESSIONAL_CITY
PROFESSIONAL_STATE
PROFESSIONAL_ZIP
PROFESSIONAL_ZIP4
SENIORITY_LEVEL
JOB_TITLE_LAST_UPDATED

Offline dataset

Our offline dataset supports direct outreach — outbound calling, canvassing, and high-confidence identity matching when someone must be exactly who they claim offline.

Offline skiptrace dataset
Field Description
SKIPTRACE_MATCH_BY
SKIPTRACE_PERSON_TITLE_OF_RESPECT
SKIPTRACE_NAME
SKIPTRACE_ADDRESS
SKIPTRACE_CITY
SKIPTRACE_STATE
SKIPTRACE_ZIP
SKIPTRACE_LANDLINE_NUMBERS
SKIPTRACE_WIRELESS_NUMBERS
SKIPTRACE_CREDIT_RATING
SKIPTRACE_EXACT_AGE
SKIPTRACE_ETHNIC_CODE
SKIPTRACE_CARRIER_ROUTE
SKIPTRACE_LANGUAGE_CODE
SKIPTRACE_IP

Scrape-trace dataset

Scrape-trace adds validation for B2B by combining skip tracing with live web crawling — cross-checking records against the freshest public sources alongside offline matches.

Scrape-trace dataset
Field Description
SKIPTRACE_MATCH_BY
SKIPTRACE_PERSON_TITLE_OF_RESPECT
SKIPTRACE_NAME
SKIPTRACE_ADDRESS
SKIPTRACE_CITY
SKIPTRACE_STATE
SKIPTRACE_ZIP
SKIPTRACE_LANDLINE_NUMBERS
SKIPTRACE_WIRELESS_NUMBERS
SKIPTRACE_CREDIT_RATING
SKIPTRACE_EXACT_AGE
SKIPTRACE_ETHNIC_CODE
SKIPTRACE_CARRIER_ROUTE
SKIPTRACE_LANGUAGE_CODE
SKIPTRACE_IP

Core data set summary

Terminology

Term Meaning
Online Data Data that has been acquired through online sources and publishers.
Offline Data Data that has been acquired via offline sources such as real estate databases, finance databases.
Skip Trace Data has been matched to multiple sources (offline and online) to determine its accuracy.
Skip Scraped Combining skip tracing data with scraping to not only cross check different databases—we also check latest online sources.
B2C Business to Consumer — personal consumer data.
B2B Business to Business — business data and not personal.
B2B2C Business to Business to Consumer — business data matched on a personal level for greater accuracy.
Co-reg Co-registration — data obtained from publishers across certain verticals. Opt-in data.

Explanation of fields

Field Description
FIRST_NAME First name from online coreg.
LAST_NAME Last name from online coreg.
SHA256_PERSONAL_EMAIL Sha256 encrypted email (most recent).
PERSONAL_EMAIL Personal email from the sha256.
PERSONAL_EMAIL_VALIDATION_STATUS Validation signal of personal email.
PERSONAL_EMAIL_LAST_SEEN When the personal email was last seen by an ESP.
SKIPTRACE_MATCH_BY Fields we used to skip trace the online data with offline data for more accuracy.
SKIPTRACE_PERSON_TITLE_OF_RESPECT The title of the prospect.
SKIPTRACE_NAME Full name of the prospect offline data.
SKIPTRACE_ADDRESS Address of the prospect offline data.
SKIPTRACE_CITY City of prospect offline data.
SKIPTRACE_STATE State of prospect offline data.
SKIPTRACE_ZIP ZIP of prospect offline data.
SKIPTRACE_LANDLINE_NUMBERS Landline (home phone) of prospect offline data.
SKIPTRACE_WIRELESS_NUMBERS Wireless (mobile) phone of prospect offline data.
DNC National Do Not Call Registry tag.
SKIPTRACE_B2B_MATCH_BY Fields used to take the B2B data then skip trace it against more information via online/offline.
COMPANY_NAME Company name the prospect is associated with.
COMPANY_DOMAIN Company domain of the company.
COMPANY_DESCRIPTION AI generated description of the company and what they do.
BUSINESS_EMAIL Business email of the prospect.
BUSINESS_EMAIL_VALIDATION_STATUS Business email validation status and whether they have email signals.
BUSINESS_EMAIL_LAST_SEEN When the business email last had a signal via ESP.
SKIPTRACE_B2B_ADDRESS The business address which has been skip traced from another dataset.
SKIPTRACE_B2B_LANDLINE_PHONE The business landline that has been skip traced from another dataset.
SKIPTRACE_B2B_WIRELESS_PHONE The wireless phone (mobile) which has been skip traced from another dataset.
SKIPTRACE_B2B_SOURCE The source of where we obtained the additional business info.
SKIPTRACE_B2B_WEBSITE The business website (usually the root domain).
LINKEDIN_URL LinkedIn URL of the prospect.

Phone numbers — quality control

Multiple phone flavors exist inside the identity graph. For outbound dialing from call centres or similar workflows, rely on skip-traced numerics only — they reconcile across multiple offline sources.

Recommended fields:

  • SKIPTRACE_LANDLINE_NUMBERS
  • SKIPTRACE_WIRELESS_NUMBERS
  • DNC

Expect roughly 80–99% validity on hygiene checks after filtering to approved skip_trace fields — actual rates depend on list composition.

Email verification

B2B records resolve to employee-level granularity (often described as B2B2C) so you see both employer context and actionable contact points — usable for activation with stronger match coverage in paid environments.

We collect multiple email variants per person (programmatic vs deliverable workloads). Selecting the wrong validation tier can crater deliverability, so optimise per channel.

  • Column S — Business email is the mailbox to use for cold outreach.
  • Column T — Verification tags.
    • Valid (Catch-all) — typical organisation catch-all.
    • Valid (Digital) — present in programmatic channels; may still be risky for SMTP sends.
    • Valid (ESP) — receiving + sending telemetry from ESP partners — focus here for outbound email.
  • Column V — Last seen, refreshed weekly / monthly / quarterly depending on ingestion batch freshness from ESP pipelines.

Adjusting skiptrace match fields for accuracy

Dialled or postal programs should prioritise strictly skip-traced rows — deterministic matches across tens of corroborating sources.

Matches between the online spine and offline append are expressed under SKIPTRACE_MATCH_BY. Narrowing filters to combinations such as address + email typically boosts precision because multiple independent keys agree.