Gene Surfing

Adventures in Family History

Fun with GEDCOM files

Farmer BullshotI came across the GEDCOM data format file some years ago and I have been surprised how little it has changed since it was originally defined back in the 1980’s.


A GEDCOM file is used to transfer family trees between different software applications and websites or just to keep as a backup. The format is based on specifications defined and maintained by the Family History Department of The Church of Jesus Christ of Latter-day Saints.

I recently had a need to split a GEDCOM file, but despite how long they have been around there are very few options for doing this, although most applications will merge files.

Strangely it is more recent innovations in DNA ancestry that brought about this requirement; there are several websites and applications that will process your DNA files for you, but to make it more beneficial they require you to upload your tree using a GEDCOM file.

But these sites only need the file to contain your direct pedigree – anyone who is not a blood relative is irrelevant and simply adds clutter to the matching  – but how do you extract your direct pedigree from your tree?

There is one website that offers a splitting service but some have been suspicious of uploading their family tree onto the internet.

So I decided to have a look at the GEDCOM format in more detail and see if I could come up with something simple that would work as a Windows desktop application.

The first question is why has the format for this file changed so little in over three decades – surely using a modern XML file format would be the way to go?

The GEDCOM format was designed for an older time with simpler computer storage – the floppy disc!

Therefore the size of a GEDCOM file was very important – and is still important today.

Converting the file to an XML format would be fairly easy, but would also increase the file size at least 10 times, and although the size is still pretty small compared to other files it is still 10 x larger than it needs to be.


Collections


The GEDCOM file format is fairly simple but it does have a few quirks and you need to know how to get at the information you need.

There a number of different collections in the file

Individuals and Families are the most important and the relationship between these two collections basically defines the tree – the other collections are just references and cross-references

Each item in the file is identifies with a LEVEL – the main item of each individual has a level of 0 (zero), so it is fairly easy to break up the file into individual items.

Following the item ID is a TAG that says what the record is followed by other references and successive lines with a different tree LEVEL.

Technically there can be up to 99 levels but there are rarely more than 3 or four.

0 @P1@ INDI

1 FAMC @F3@

1 FAMS @F2@

Notice one of the odd quirks in the file format – the position of the ID and TAG in the first record when compared with sub-items. I can only imagine that this is a belt-and-braces approach to be able to positively identify the first item in each record.


Families


Each Individual can belong to multiple families, either as a parent or a child.

References are added to each individual indicating which families they belong to and each family record contains a reference to the individuals belonging to the family.

FAMS @F2@ The person is a spouse (parent) in this family)

FAMC @F3@ The person is a child in this family.

Person references are identified as @P1@ and family references as @F2@ but this one-to-one relationship does mean that both collections have to be kept synchronised or everything falls apart.

From each family record individuals are referenced as a Husband, Wife or Child.

HUSB @P2@

WIFE @P3@

CHIL @P1@

So in order to identify your pedigree you need to do the following:

1. Identify the Home person @P1@

2. Find the family where they are a child @F3@ using the FAMC reference

3. Find the Husband @P2@ in that family and repeat the process.

The whole process is a single procedure that calls itself repeatedly until it runs out of Husbands, at which point it comes back down the tree and does the same for the Wife in each of the families.

If you want to export all of the family branch then you also have to process children and their descendants and spouses which is the same process but going down the tree using the FAMS record instead of going up using the FAMC record for each person.


GedMate


What I ended up with was a small desktop application named – for the moment anyway – GedMate which you can try for yourself.

GedMate has several options for processing your tree but the main purpose is to identify all of the ancestors of the selected home person.

Capture2This can be limited to either the Paternal or Maternal line if necessary.

You can choose whether to include all of the branches from this tree – both up the tree and down and back up again until all related individuals are included.

You can also choose to include the descendants of the Home person, so by selecting all of these options you should be able to export your whole tree.

However this will not include any orphan records so this can be used as a way of purging your tree of any records that are not directly connected with the home person.

If necessary you can limit the number of generations that will be processed.


Download


Capture1If you would like to use GedMate then please download it using the link below and give it a go – it’s completely free.

Download GedMate

The application should run on any version of Windows from XP to Windows 10.

GedMate does not make any changes to your original GEDCOM file, but you should always back things up just in case.

The header details provided in the original file will be included unchanged – apart from the Home Person – in the exported file.


Notes


A useful tool for checking the data in a GEDCOM file is Gedcom Validator from Chronoplex Software.

When checking the exported data from GedMate you will need to make a comparison with the original GEDCOM to identify differences; a lot of errors are identified by this application that are not related to anything that GedMate has done.

You can then upload the exported GEDCOM file into any desktop or web-based application to check the structure of the tree such as Chronoplex My Family Tree which is free.


Chris Sidney 2016


 

 

 

 

Advertisements

3 responses to “Fun with GEDCOM files

  1. Konrad April 28, 2016 at 11:48 am

    Hello,

    I recently discovered your tool “GedMate”. When trying to split my GEDCOM-File, I noticed that it doesn’t include all persons belonging to my tree. Currently there are 2444 persons in the GEDCOM-File, but only 1816 belong to my tree.

    The number of 1816 persons in my tree is given independently by the older tool “GedSplit” and by the german genealogy application “Ages”, so I think this number must be true.

    GedMate gives me just 1540 Persons (with ancestors, descendants, branches, paternal, maternal, 99 generations), and it says that it processed 13 generations, but when I restrict the number of generations manually to 13, it gives me 1524 persons. I have to restrict the number of generations manually to 14 to get 1540 persons.

    Kind regards,
    Konrad

    Like

    • Farmer Bullshot April 28, 2016 at 5:36 pm

      Hi Konrad, thanks for commenting – I think you may be the first person to try GedMate so your feedback is appreciated.

      I think the 14 generation issue is because I started counting from zero and it would be easy to change to start from generation 1. As for the differences in the persons it is quite difficult to check without having access to your tree data and to see exactly what the code is doing. All I can say is that GedCom “walks” through the tree and tags each person it finds so the same person is never counted twice – even if they have married their cousin. So unless GedMate is missing a generation or a branch somewhere I can only think that this may be the difference. Do you have any links to the other products that you mentioned? If I can do the same comparison on my own tree I may be able to get closer to an answer.
      Best wishes

      Chris

      Like

  2. Konrad May 12, 2016 at 11:19 am

    Hi Chris,

    sorry for my late answer. Here are two links for the mentioned applications:

    GEDSplit: http://www.rootsweb.ancestry.com/~gumby/ged.html

    This older tool seems to work with ANSI-files, but not? with UTF8-files. But this should be tested.
    And there is a ittle bug: when a “root” Person is defined (TAG “_HOME” within “0 HEAD”), and you save the selected persons, the “root” Person will be saved, too (regardless you selected this person or not). So I edited a copy of the GEDCOM-File and deleted this line for this purpose.
    And if you want to open the helpfile (.hlp) you need the WinHlp32.exe from Microsoft ( https://support.microsoft.com/de-de/kb/917607 ).

    Ages: This a german genealogy application. It reads an writes directly the GEDCOM-files.
    You can download a free demoversion from http://www.daubnet.com/de/downloads .
    With this demoversion you can add only up to 50 persons, but you can open an existing GEDCOM-file wich contains more than 50 persons. After starting, you can change the language, choose “portable mode”. and then open your GEDCOM-file. If you click on “more” and choose “unconnected subtrees”, you will see your subtrees with the number of persons. The number of persons seems to be correct. You could also save your subtree (“export”, “to file”), but this didn’nt work correctly, again concerning the number of exported persons (too few).

    Best wishes
    Konrad

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: