How SSA Baby Name Data Works: Sources, Limits, and What It Actually Measures
Last updated · Data Sources
Every baby name trend article you have ever read is based on a single dataset: the Social Security Administration's baby names file. It covers over 350 million name records stretching back to 1880 and is updated each May for the previous calendar year. But the dataset has significant blind spots that most articles never mention. This guide explains exactly how the data is collected, what the thresholds mean, and where the numbers fall short.
How SSA collects name data
The data comes from Social Security card applications (Form SS-5), not from birth certificates directly. When a parent applies for a Social Security number for their newborn, the name on that application enters the dataset. Before 1937, coverage was spotty because Social Security did not exist until 1935 and most applications came from working-age adults. Coverage became essentially universal by the late 1940s.
Each record includes the child's first name, sex assigned at birth, state of birth, and year of birth. Middle names are not recorded. Hyphenated first names are recorded as a single entry (Mary-Jane, not Mary + Jane). Suffixes like Jr. or III are stripped.
The 5-occurrence privacy threshold
Any name given to fewer than 5 babies of the same sex in a given year is excluded from the public file. This is a privacy measure, not a data quality filter. It means the published dataset undercounts rare names significantly. In any given year, roughly 4,000 to 6,000 unique name-sex combinations fall below this threshold and are suppressed.
The practical effect: if you are researching a truly unusual name, the SSA file may show zero occurrences even though 1-4 babies received it that year. The national file and the state-level files apply this threshold independently, so a name with 3 occurrences in California and 3 in Texas (6 total) might appear in the national file but not in either state file.
What changed after 1937 and again after 1986
Two legislative events reshaped the dataset. First, Social Security expanded coverage in 1937, which dramatically increased the number of infant applications. Pre-1937 data is heavily biased toward names common among adults who applied for work-related Social Security numbers.
Second, the Tax Reform Act of 1986 required parents to list a child's SSN on tax returns to claim the dependent deduction. Before 1986, many parents delayed SSN applications for years. After 1986, applications became nearly universal at birth. This is why post-1986 data is considered the most reliable era for name frequency analysis.
What the dataset cannot tell you
Several important dimensions are missing:
- Gender-neutral tracking — the dataset assigns every name to exactly one sex category per record. A name like "Avery" appears in both the male and female files, but you cannot tell how many parents chose it specifically as a gender-neutral name. There is no non-binary or unspecified category in historical data.
- Race and ethnicity — SSA does not record race. You cannot determine from this dataset alone whether a name is more common in one ethnic group.
- Spelling intent — Kaitlyn, Katelyn, Caitlin, and Katelynn are four separate entries. There is no built-in way to group spelling variants.
- Pronunciation — two identically spelled names with different pronunciations (e.g., "Andrea" with stress on the first vs. second syllable) are merged.
How NameBlooms uses SSA data
Our name rankings pull directly from the most recent SSA national file and normalize for annual birth volume. When we say a name was "#47 in 2024," we mean it was the 47th most frequently applied-for name on SS-5 forms that year. We group common spelling variants in our explorer so you can see combined popularity, and we flag names that cross the gender boundary in our trend charts.
For historical trends, we use the full 1880-present file but add a reliability flag for pre-1937 data. Any trend line before 1937 should be interpreted with caution due to the adult-application bias described above.
Annual release schedule
SSA publishes the previous year's data each May, typically in the second or third week. The exact date is not announced in advance. The release includes both the national file and 51 state-level files (50 states plus DC). Territories like Puerto Rico and Guam are not included in the standard release. Each file is a simple CSV with three columns: name, sex, and count.
Frequently Asked Questions
How far back does SSA baby name data go?+
The public file starts in 1880, but data before 1937 is unreliable because Social Security did not exist yet and most records came from adult applicants rather than infants. Post-1986 data is the most accurate.
Why does the SSA list show zero results for my name?+
If fewer than 5 babies of the same sex received your name in a given year, the SSA suppresses it for privacy. Your name was used, but the count was too low to publish.
Does SSA track middle names?+
No. Only the first name from the Social Security card application is recorded. Middle names, nicknames, and preferred names are not included in the dataset.
How accurate are the SSA name counts?+
After 1986, when the Tax Reform Act made infant SSN applications nearly universal, the counts are considered highly accurate for names above the 5-occurrence threshold. They may undercount by 1-2% due to delayed applications.
Can I find name data broken down by race or ethnicity?+
Not from SSA. The Social Security application does not collect race or ethnicity. For demographic breakdowns, researchers use state vital statistics records, which some states publish with race categories.
Why do different baby name sites show different rankings?+
Most sites use the same SSA data but may group spelling variants differently, use different years, or count both sexes together. Always check which year and whether variants are merged.
Does SSA track non-binary or gender-neutral name assignments?+
No. Every record in the SSA file is tagged as either male or female based on the sex recorded on the application. There is no gender-neutral or non-binary category in the historical data.