Gene Surfing

Adventures in Family History

Tag Archives: Tree

Fun with GEDCOM files

Farmer BullshotI came across the GEDCOM data format file some years ago and I have been surprised how little it has changed since it was originally defined back in the 1980’s.


A GEDCOM file is used to transfer family trees between different software applications and websites or just to keep as a backup. The format is based on specifications defined and maintained by the Family History Department of The Church of Jesus Christ of Latter-day Saints.

I recently had a need to split a GEDCOM file, but despite how long they have been around there are very few options for doing this, although most applications will merge files.

Strangely it is more recent innovations in DNA ancestry that brought about this requirement; there are several websites and applications that will process your DNA files for you, but to make it more beneficial they require you to upload your tree using a GEDCOM file.

But these sites only need the file to contain your direct pedigree – anyone who is not a blood relative is irrelevant and simply adds clutter to the matching  – but how do you extract your direct pedigree from your tree?

There is one website that offers a splitting service but some have been suspicious of uploading their family tree onto the internet.

So I decided to have a look at the GEDCOM format in more detail and see if I could come up with something simple that would work as a Windows desktop application.

The first question is why has the format for this file changed so little in over three decades – surely using a modern XML file format would be the way to go?

The GEDCOM format was designed for an older time with simpler computer storage – the floppy disc!

Therefore the size of a GEDCOM file was very important – and is still important today.

Converting the file to an XML format would be fairly easy, but would also increase the file size at least 10 times, and although the size is still pretty small compared to other files it is still 10 x larger than it needs to be.


Collections


The GEDCOM file format is fairly simple but it does have a few quirks and you need to know how to get at the information you need.

There a number of different collections in the file

Individuals and Families are the most important and the relationship between these two collections basically defines the tree – the other collections are just references and cross-references

Each item in the file is identifies with a LEVEL – the main item of each individual has a level of 0 (zero), so it is fairly easy to break up the file into individual items.

Following the item ID is a TAG that says what the record is followed by other references and successive lines with a different tree LEVEL.

Technically there can be up to 99 levels but there are rarely more than 3 or four.

0 @P1@ INDI

1 FAMC @F3@

1 FAMS @F2@

Notice one of the odd quirks in the file format – the position of the ID and TAG in the first record when compared with sub-items. I can only imagine that this is a belt-and-braces approach to be able to positively identify the first item in each record.


Families


Each Individual can belong to multiple families, either as a parent or a child.

References are added to each individual indicating which families they belong to and each family record contains a reference to the individuals belonging to the family.

FAMS @F2@ The person is a spouse (parent) in this family)

FAMC @F3@ The person is a child in this family.

Person references are identified as @P1@ and family references as @F2@ but this one-to-one relationship does mean that both collections have to be kept synchronised or everything falls apart.

From each family record individuals are referenced as a Husband, Wife or Child.

HUSB @P2@

WIFE @P3@

CHIL @P1@

So in order to identify your pedigree you need to do the following:

1. Identify the Home person @P1@

2. Find the family where they are a child @F3@ using the FAMC reference

3. Find the Husband @P2@ in that family and repeat the process.

The whole process is a single procedure that calls itself repeatedly until it runs out of Husbands, at which point it comes back down the tree and does the same for the Wife in each of the families.

If you want to export all of the family branch then you also have to process children and their descendants and spouses which is the same process but going down the tree using the FAMS record instead of going up using the FAMC record for each person.


GedMate


What I ended up with was a small desktop application named – for the moment anyway – GedMate which you can try for yourself.

GedMate has several options for processing your tree but the main purpose is to identify all of the ancestors of the selected home person.

Capture2This can be limited to either the Paternal or Maternal line if necessary.

You can choose whether to include all of the branches from this tree – both up the tree and down and back up again until all related individuals are included.

You can also choose to include the descendants of the Home person, so by selecting all of these options you should be able to export your whole tree.

However this will not include any orphan records so this can be used as a way of purging your tree of any records that are not directly connected with the home person.

If necessary you can limit the number of generations that will be processed.


Download


Capture1If you would like to use GedMate then please download it using the link below and give it a go – it’s completely free.

Download GedMate

The application should run on any version of Windows from XP to Windows 10.

GedMate does not make any changes to your original GEDCOM file, but you should always back things up just in case.

The header details provided in the original file will be included unchanged – apart from the Home Person – in the exported file.


Notes


A useful tool for checking the data in a GEDCOM file is Gedcom Validator from Chronoplex Software.

When checking the exported data from GedMate you will need to make a comparison with the original GEDCOM to identify differences; a lot of errors are identified by this application that are not related to anything that GedMate has done.

You can then upload the exported GEDCOM file into any desktop or web-based application to check the structure of the tree such as Chronoplex My Family Tree which is free.


Chris Sidney 2016