Name Data - synthetichealth/synthea GitHub Wiki

By default, the project contains English-style and Spanish-style names suitable for the United States in the src/main/resources/names.yaml file.

You can change those names, or add names for other languages. The primary language people speak determines their name. The language they speak is determined by their ethnicity. If you want to add other languages, or change how languages are assigned, you'll need to modify the src/main/java/org/mitre/synthea/world/geographic/Demographics.java file. Specifically, these two functions:

public String ethnicityFromRace(String race, Person person) { ... }
public String languageFromEthnicity(String ethnicity, Person person) { ... }

Names File Format

language:
  M: [MaleName1,MaleName2,...,MaleNameN]
  F: [FemaleName1,FemaleName2,...,FemaleNameN]
  family: [Surname1,...,SurnameN]

street:
  type: [Street,Straat,Rue]
  secondary: [Apt]

Languages currently supported (but not necessarily defined):

english,spanish,french,french_creole,german,russian,portuguese,polish,greek,chinese,hindi,arabic

Numbers in the names?

The synthetic patients have numbers appended to their names so it is painfully obvious that they are fake and no one could mistake them for real patients. If you want to turn this capability off, edit the following property to false:

# If true, person names have numbers appended to them to make them more obviously fake
generate.append_numbers_to_person_names = true