Genealogy Program Specifications

John H. Yates

Last Update: Tue Aug 02 18:53 EDT 2011

Initial Version Released: Dec 29, 2010

It is my view that we are still waiting for the best and brightest combination of genealogists and programmers to put their minds toward design and delivery of an adequate genealogy program.

These are my own top personal requirements for a viable genealogy program. Lesser requirements will be fleshed out as major ones are met by viable programs.

See my challenge to genealogy program vendors regarding source referencing at: .

And my latest work along the lines of improving the genealogy data model at: A Genealogy Data Model (Matrix Algebra) Specification . This specification explains in full detail a model that meets requirement 2 below.

Also see the notes of my thoughts at the bottom of this page on how this analysis affects GEDCOM and leads me to suggest an API model for programs to do On Line Data Mining.

My Top Requirements for Genealogy Software Design
  1. It shall use modern, multiplatform (Windows, MacOS, and linux) native programming languages, architectures, and databases. Ones that can be expected to have a natural upgrade path to the future, and such that the code can easily be adapted to, and thus delivered on, all major platforms.
  2. It shall store all required information in an extensible (i.e. new objects may be easily defined, even by the end user, perhaps through a metalanguage) general data model enough to allow all data to be exported (and backed up) in a complete manner so that it can be imported without data loss. This also implicitly allows a program to be written to move all data to another program, or at least the subset the deficient program is able to understand, e.g. a GEDCOM export. See the bottom of this page for a note about GEDCOM.
  3. It shall have an easy to use and intuitive GUI (Graphical User Interface), as judged by user consensus.
  4. It shall have the ability to easily update via the Internet.
  5. It shall allow Evidence Style sources and citations, including Full, Short, and List types. And use them appropriately with proper punctuation and style in all reports, preferably through end user written, and sharable, templates and a mark up language (a source reference metalanguage) for all facts recorded. The templates shall be shareable by end users by simple export/import, individually or singly. This will allow users to use whatever style they prefer, write their own style, or to subscribe to a standard style whose templates were written by someone else (as Evidence Style). This model will also allow easy adaptation with changing standards, additions, corrections, etc. [I have presented a model proof of concept for this, including a simple template metalanguage and markup rules for all 170 QuickCheck Evidence Style templates [Mills, Elizabeth Shown. Evidence Explained: Citing History Sources from Artifacts to Cyberspace. Second Edition. Baltimore: Genealogical Publishing Co., 2009.] at ].
  6. All facts, i.e. sources, information, and evidence shall allow the tags: sources: original, derivative; information: primary, secondary; and evidence: direct, indirect. [still mulling this one over]. See:
    Mills, Elizabeth Shown. Evidence Explained: Citing History Sources from Artifacts to Cyberspace. Second Edition. Baltimore: Genealogical Publishing Co., 2009.
    Board for Certification of Genealogists. BCG Genealogical Standards Manual. Provo: Ancestry Publishing, 2000.
    Rose, Christine. Genealogical Proof Standard. Norwood: CR Publications, 2009.
    and Merriman, Brenda Dougall. Genealogical Standards of Evidence. Toronto: Dundurn Press, 2010.
  7. It shall allow research notes to be attached to all facts.
  8. It shall allow fields for confidence level, or the equivalent, to be assigned for all facts and conclusions or proof.
  9. It shall offer a robust set of accepted standard reports and charts, and in multiple formats (pdf, rtf, text, html, ...). [perhaps expand this when other more basic requirements are met].
  10. It shall have a printed, or an easily printable manual available for users to learn from.
  11. It shall have the promise of keeping or acquiring significant market share to ensure program longevity.
  12. It's company credibly inspires confidence in remaining in business (market share, company management, estimated capitalization, vision, track record, ...).
  13. It shall use at least three name fields (3NF). See justification at: A Genealogy Data Model (Matrix Algebra) Specification. The user must have fine control and feedback on which specific field an address part is stored to. (this is really covered in 2 above, but it is probably worthwhile making it its own task to make it clear that it is required).
  14. It shall use at least 12 address fields (12AF) for an address which will allow general, world wide, address specifications. See justification at: A Genealogy Data Model (Matrix Algebra) Specification. The user must have fine control and feedback on which specific field an address part is stored to. (this is really covered in 2 above, but it is probably worthwhile making it its own task to make it clear that it is required).
  15. It shall have the ability to generate a navigable web html version of a full tree that can be uploaded to user's own private web sites (as opposed to being locked into publishing, or wounded publishing to vendor web sites where users lose control over their data).
  16. It shall have a flag to enable setting of Asian name order. This one came up in my discussions of the inadequacy of having only two name fields, and is a simple and viable extension that all programs need. A simple flag, settable by person (arguments could be made for setting by family name I suppose, but by person would allow full compliance with minimum programming, at first anyway). Then the proper ethnic name order could be produced in reports, etc.
  17. It shall have search tools built into the program to allow on line database mining. This one is going to be controversial because it pretty much eliminates all programs except FTM. However, since I was able to use FTM on Mac natively (since December 2010) I have found that the "leaves" that appear within the program are so valuable for instant clues and primary data that I cannot see myself ever doing without it. Ancestry has set a new standard, and one that I'm not sure competitors can reach. As long as it is the only program that can do this, FTM will have to be in my arsenal of programs. Some day I hope to have only one program in my arsenal of programs, assuming one can meet all of the criteria that I require. I have had some further thoughts on this, see the relevant note at the bottom of this page.
  18. It shall succeed in importing an 100k GEDCOM person file in a time comparable to its competitors. Specifically, one test case that was supplied to me by a reader. Vendors may contact me, and with the owner's permission, I shall release the file to them for their own testing.
  19. Advanced features shall be optional so that the needs of name collectors are met.

Software programs shall be evaluated by how well they achieve each of the above requirements, in my personal opinion. The evaluations below are my own personal opinion formed from a combination of first hand experience, information gained from software descriptions and/or collective wisdom on the Internet. The evaluations for selected programs here are subject to change at any time, as is the list of requirements and programs. Constructive comments and suggestions are welcome. Email me. Note that the ordering above is not significant.

   12 34 56 78 910 11 12 13 14 15 16 17 18 19  
ProgramVersion MPGDM IGUIUpdate ESTags NotesConLev Rpt/ChrPrtMan MktShr RIB 3NF 12AF NW ANO OLDM 100k
NamCol Notes
FTM Windows  ***** ******** *FAIL  FAIL ******** **** ***** FAIL FAIL FAIL FAIL *****   *****  
FTM Mac* ***** ******** *FAIL  FAIL ******** *** **** FAIL FAIL FAIL FAIL ***** FAIL ***** 100k GEDCOM hangs at 13 min
(saving to database)
TMG*  FAIL*** ********** ***FAIL  **** ********* *** ** FAIL FAIL   FAIL FAIL   *****  
Legacy*  FAIL*** ********** ****FAIL  **** ******** *** **** FAIL FAIL   FAIL FAIL ***** ***** 100k GEDCOM took 18 min
Reunion*9.0c FAIL*** ********* FAILFAIL  FAIL ****** ** ** FAIL FAIL ***** FAIL FAIL ***** ***** 100k GEDCOM took 15 min
MacFamilyTree6.0.10 FAIL      FAIL       *** ** FAIL FAIL   FAIL FAIL   *****  
Light Edition
immediately on syntax
Gramps*  ***     FAILFAIL       ** *** FAIL FAIL   FAIL FAIL   *****  
GenealogyJ*  ***     FAILFAIL       ** *** FAIL FAIL   FAIL FAIL   *****  
RootsMagic  FAIL      FAIL       *** **** FAIL FAIL   FAIL FAIL   *****  

A star (*) at the end of the Program Name indicates that I not only have gathered wisdom from other sources, but I have and run a copy myself, but perhaps not the latest version.

            - Not Evaluated
FAIL   - Failure
*         - Weak
**       - Adequate
***     - Good
****   - Very Good
***** - Excellent


FTM = Family Tree Maker
TMG = The Master Genealogist
MP = MultiPlatform
GDM = General Data Model
IGUI = Intuitive GUI
ES = Evidence Style
ConLev = Confidence Level
Rpt/Chr = Reports and Charts
MktShr = Market Share
PrtMan = Printable Manual
RIB = Remaining In Business
TNF = Three Name Fields
12AF = 12 Address Fields
NW = Navigable Web
ANO = Asian Name Order
OLDM = On Line Database Mining
NamCol = Name Collectors

A Note concerning GEDCOM:

My view of GEDCOM (GEnealogical Data COMmunication) implementations is that they are an antiquated and outdated bolt-on to an old data model that was useful a long time ago, but currently deserves no place in a contemporary data model, and is, in fact, no longer needed as it is inherently defined in contemporary models. (although perhaps not the model that your favorite program uses!).

As I have shown in my data model treatments (see the top of this page), for those who have taken the time to understand them, there is only one simple, and achievable requirement for full data export and import. It is the same requirement that is needed for full data representation.

The one requirement is that you define a full set of data variables. There, you are done. Full export and import is now a simple programming exercise, as is moving your data from one program to another. Simple.

The hard part is defining this superset of data variables. This is the work of intelligent and knowledgeable genealogists and a great starting set could be had in under five meetings with even a small gathering of a few of the best and brightest genealogists and programmers at the round table. No open democracy, we don't want to carp about it for decades. Nor waste time listening to those that don't "get it". (yes I said it!). (just think if the Internet protocols were subject to any users "input" on designing them. No thank you). They can issue a report for comment by all for a period of time, and at the end of that time meet again, and decide what points have merit and discard those they deem that don't. That is the way progress will be made. Simple. The data variables would consist of name fields (three please), address fields (at least 12 please), and also source reference data fields (I delineate 577 on my web model data site, as I recall at this writing), and the rest of the required fields that have been learned by genealogists and vendors writing the codes to date.

This list will evolve over time, of course. But the closer to a complete set it is, the smaller the tweaks will be, until reasonable genealogists cannot think of any required extensions. And with new and modern codes, small tweaks will be much easier to achieve than in the legacy (small l!) codes of today.

If one program develops such a data model, it will be trivial to export its data so that any other program will be able to import it. With one big caveat. That is, the exported data has to be dumbed down to the capability of the program that will be importing it. But if the data represents a full set of variables, this WILL ALWAYS be possible. It will, of course, NOT be possible for a data deficient program to export data so that it can be imported intelligently by a smarter (full data model) program. But once a full data model variable set is defined by intelligent genealogists, who would want to use a program that does not accomodate that full set? Not me.

I thought I'd restate all this here. It is also explained in much more detail at: A Genealogy Data Model (Matrix Algebra) Specification . My hope is that program vendor management study and understand the data model and its implications. They all seem to be locked into the past with outdated data models and programmers just tweaking outdated models and codes.

I have tuned out of the BetterGEDCOM effort (sadly, keeping the GEDCOM in the name is perjorative, in my opinion). I think all they, or some group, needs to do is to produce the full list of data variables. They would be done. Except for turning it over to programmers to implement. If I were younger and more energetic, I see a great opening for a business project plan pitch to venture capitalists to fund a genealogy program start up. But the larger vendors of today are so close, yet so far. Maybe they will get there some day. I've been waiting for four years now. Despair caused me to produce my analyses.

A note concerning On Line Database Mining:

As pointed out above, FTM has built this into their program and thus makes their program very attractive to all genealogists. However, many genealogists strongly prefer maintaining and displaying their data with other programs.

Perhaps a good solution for both the programming vendors and Ancestry (and LDS's that they seem to be partnering with more and more) would be for them to develop a genealogy data mining API (Application Programming Interface) that they could license to other program vendors.

This would not prevent them from offering their own genealogy program, but it would open the door to them making licensing fees from users of other programs. I would think nearly all users of all programs would be happy to pay a fair sum to be able to directly mine the data at Ancestry and FamilySearch.

I would think that a cost model to use the API could be made proportional to the installed user base of a given program, and the vendor could choose how to pass that along to the end user at purchase time, or as an added feature, or... This would allow the small "mom and pop" program vendors to buy in to this feature at a low entry cost (hopefully).

Developing such an API should be a joint effort of a "best and brightest" team of programmers and genealogists from the companies and users. If and when such a model becomes attractive to all stake holders.

I'm not sure that Ancestry really wants to be in the advanced, professional genealogy program business from their behavior to date, so allowing competetition that do offer more professional programs, and collecting fees for it, may be an attractive model for them, other program vendors, and users. Ancestry could then focus on "getting the data right" and less so on the tool that allows them to make more money from the data. (I suspect programming a genalogy program is not a core interest of the company vs. making the data available, but I could be wrong).

In absence of this model, I predict that any vendor that cannot offer direct data mining will slowly be squeezed out of market share (even smaller niche markets?) and probably eventually out of business as FTM (or other direct data mining programs) evolve to be able to handle professional genealogist needs.