Genealogy as a Form of Data Analysis

This paper is loosely based on the Wikipedia article entitled “Data Analysis” and the book Mastering Genealogical Proof.

Genealogists use raw data to accumulate and analyze patterns and trends toward establishing a Genealogical Proof. Evidence in the genealogical community is generally understood as pieces of data that are arranged through collection, sifting, and arranging. Evidence, positive or negative, is acquired through examining and modeling data using generally accepted processes. One such process is to use the computer application Evidentia. Other processes enabling the development of evidence are those used in the legal and forensics professions (e.g., DNA analysis).

Each point of data genealogists use is inspected, cleansed, transformed, and modelled. Most serious genealogists use the Genealogical Proof Standard. While this standard is more qualitative than quantitative, the results are the same, actionable information used to formulate decisions.

The Genealogical Proof Standard follows, simply: Formulating a research question, gathering data sources, considering the information in those sources, formulating evidence from that information, and finally constructing a proof statement. The process is generally iterative since there is no such thing as a final statement of proof in genealogy.

While traditional data analysis is generally thought to be quantitative, there is much similarity to the genealogical research process. The steps in data analysis are analogous to the process used by genealogy professionals. Data analysis begins with a research question, followed by compiling source information, and finally, generating actionable conclusions.

Research Question

Sometimes thought of as a hypothesis, the research question is the beginning of both genealogical research and data analysis. Genealogists formulate a question by asking something such as “Who was Joan Jones’ mother?” Data analysts ask, “How is product A better than product B?” The answers come in basically the same way for both.

Data Collection, Processing, and Cleaning

To answer the research question, both genealogists and data analysts collect, process and classify data relevant to the issue. Almost all data is seen as relevant to analysts, but genealogists often go further, collecting source material relevant not only to the issue, but also surrounding the issue. Data analysts, on the other hand, are more focused on the question itself, locating only data relevant to products A and B.

The difference between traditional and genealogical data analysis is that genealogists have much more fuzzy information to deal with. Items like local and regional history books may include data about their question. Such items are generally not relevant to a data analyst focused on a product research project, unless it involves cultural appropriation, i.e., the Korean car makers’ KIA Tucson vehicle. 😊

Exploratory Analysis

Genealogists often explore different sets of data to glean information and evidence relevant to their questions. Similarly, a traditional data analyst will do the same, focusing more on specific items than general items.

Modelling and Algorithms

There are no “real” algorithms for genealogists to apply to their data findings. There is, however, a Genealogical Data Model, which was constructed to help genealogists apply their data to real-world projects. The Genealogical Data Model was originally constructed to be a basis for software, but since it was completed, no software has used the GDM (except for The Master Genealogist, which used large parts of it).

Data Products and Communications

Genealogists use a proof model to present data and their formulation of the evidence they’ve compiled. A traditional data analyst uses a tool such as business intelligence software to present their findings. The only real difference between the two is that they present findings in a different way.

NPM

Fascinating New Find About Richard Mellen

Just found a newish website whose author is apparently unwilling to commit to scholarly diligence and credibility. Doug Sinclair wrote a paper about Richard Mellen of Massachusetts, alleged father of Simon Mellen of Sherborne and Framingham.

Mr. Sinclair states that his website doesn’t claim to the standard of scholarly journals, and it certainly doesn’t.

Further, he ignores the clear, cogent, and concise statements at the beginning of my paper on Richard Mellen, and the genealogy of Simon Mellen, that they are each “extended literature review[s].” Further, the next paragraph in those publications cogently states that the focus of each was on “published record sets.” What is not credible or clear about that?

Mr. Sinclair fails to cite his sources for the statements concerning my work, yet appears to cite everything else he writes about. Nothing new was presented that hadn’t been already written about by myself or others, just pictures. One picture in particular clearly shows he read my blog article and paper about Richard “Maling” aka “Waling.” The picture he posted clearly shows a “W”.

Any diligent genealogical researcher will use both of our works as clues only. Any diligent genealogical researcher will also follow the BCG Code of Ethics in using our works.

NPM

Crafting a research question

Many good genealogy programs can help you get started crafting a good research question with their to-do features. RootsMagic has a good one, illustrated below.

The questions to ask before adding a new task are:

  • Who are you going to research?
  • What do you want to learn about the person?
  • Where was the person you are researching?
  • When was the person there?

Additionally, you might ask: Why was the person there at that time? This might seem like an existential question, but it is a good idea to add context to your family history.

These five questions get you started on the way to learning more about your ancestor.

The who is simple enough. The what can include any number of items like where/when were they born, when did they immigrate, where did they emigrate to, and who did they live with/marry/divorce, and so on. Where and when are a bit more complex due to the possible lack of information.

For instance, Lydia Peirce Gorton was born on 28 January 1822. I’ve got her birth date but no birthplace. I want to know where she was born, so I ask, “Where were Lydia Peirce Gorton’s parents, Daniel and Lydia (Peirce) Gorton, when Lydia was born in 1822?” The who, what, and when parts of this question are answered, but the best part is still unanswered: “where”?

The records I’ve got so far say different things, that she was born in Massachusetts, born in New York, born in Vermont. Most likely she was born in Vermont, though. I can make this hypothesis because her older brother was born there, and a few original records say so. This leads me to focus my question even more on Vermont records. Massachusetts records are very complete for the time and there is no indication her siblings were born there. New York state records on the other hand, are problematic, so they will have to wait for a while.

In this particular question, I ask why weren’t the parents in the records for Lydia’s potential birthplace? Were they there, just not recorded anywhere? These questions lead me to ask about the area where they may have been, to find out more about possible record sources. I also learn about the culture in that area, why the records may not exist, and what the economic conditions were during that period.

The process of crafting a specific question to be answered is key to great research. Answering the question is done during the research phase of the project. I’ll write more about the research project later this month.

Thoughts?

NPM

Sunday’s Obituary: Basil A. Malof, San Francisco, California

Rev. Malof, Author, Dies at 70

The Rev. Basil A. Malof, Baptist minister and author, died Thursday at the age of 70 in Herrick Memorial Hospital, Berkeley.

The Rev. Mr. Malof was born Basil A. Fetler in Riga, Latvia. He wrote under the name of Malof and had his name changed legally.

When a student he was exiled to Siberia for religious activities. Released, he attended Spurgeon College, a Baptist seminary in London.

From 1929 to 1939 he headed a large church in Riga. In the latter year he came to the United States and founded the Russian Bible Society in Washington, D. C. He was editor of “Russia Calling,” the society’s paper.

In addition to other books he was author of “Sentenced to Siberia,” an autobiography.

The Rev. Mr. Malof moved to Berkeley six months ago.

He is survived by his wife, Barbara, of 2442 Piedmont avenue, Berkeley, and 13 children: Daniel Fetler, New York City; Timothy Fetler, Fullerton; Lydia Hartsock, Silver Spring, Md.; Mary Miller, Mundelein, Ill.; Paul Fetler, Minneapolis; Philip Fetler, Riverside; John Fetler, Colorado Springs; Elizabeth Bregenzer, Arlington, Va.; Andrew Fetler, Chicago; David Fetler, Rochester, N.Y.; Peter Malof, Arlington, Va.; James Fetler, Sausalito; and Joseph Malof, Venice, Los Angeles county.

Funeral services were pending last night.

San Francisco Chronicle, 17 August 1957, Page 12, column 2.

5 Steps to Great Research

There are five related steps to take to get good results from your research. We create a specific question to be answered, a research plan using the question, a research log, and a research report. Optionally we create a biographical sketch from information in the research report.

Steps to Create a Research Question

First, we craft a question to answer. Use these four elements: who, where, when, and what, to focus on specific items that you want to learn more about. Being as specific as you can goes a long way toward getting reliable results from your research.

Steps to Create a Research Plan

Next, we examine the research question and gather more information about the subject we are interested in. We find sources relevant to the person, place, and time span involved. Sources such as locality guides, histories, and archives catalogs can provide good results for further searches.

Steps to Create a Research Log

After we have looked into each of the record types in the research plan, we can start actively searching for the best records available to us. We want to focus on relevant records that are likely to answer the research question. Prioritize the research items to gather information from the easiest to the hardest and organize your research plan accordingly.

Steps to Create a Research Report

When we have completely researched the question, we can then create the research report. I am a fan of the write as you cite method. This means when I am researching, I am also drafting parts of the research report. It is not a step back, but it is not a speedy process either. Take time to really look at the records and save time in the long run so you do not have to go back and revisit them.

Steps to Create a Biographical Sketch

The final element of great research is to make a biographical sketch. There are many ways to create a sketch. I have written a few posts about this topic, but one of the recommended ways is to use the NEHGS Register style. Whole books have been written about writing a family history sketch, so I will leave that choice to you.