Genealogy as a Form of Data Analysis

This paper is loosely based on the Wikipedia article entitled “Data Analysis” and the book Mastering Genealogical Proof.

Genealogists use raw data to accumulate and analyze patterns and trends toward establishing a Genealogical Proof. Evidence in the genealogical community is generally understood as pieces of data that are arranged through collection, sifting, and arranging. Evidence, positive or negative, is acquired through examining and modeling data using generally accepted processes. One such process is to use the computer application Evidentia. Other processes enabling the development of evidence are those used in the legal and forensics professions (e.g., DNA analysis).

Each point of data genealogists use is inspected, cleansed, transformed, and modelled. Most serious genealogists use the Genealogical Proof Standard. While this standard is more qualitative than quantitative, the results are the same, actionable information used to formulate decisions.

The Genealogical Proof Standard follows, simply: Formulating a research question, gathering data sources, considering the information in those sources, formulating evidence from that information, and finally constructing a proof statement. The process is generally iterative since there is no such thing as a final statement of proof in genealogy.

While traditional data analysis is generally thought to be quantitative, there is much similarity to the genealogical research process. The steps in data analysis are analogous to the process used by genealogy professionals. Data analysis begins with a research question, followed by compiling source information, and finally, generating actionable conclusions.

Research Question

Sometimes thought of as a hypothesis, the research question is the beginning of both genealogical research and data analysis. Genealogists formulate a question by asking something such as “Who was Joan Jones’ mother?” Data analysts ask, “How is product A better than product B?” The answers come in basically the same way for both.

Data Collection, Processing, and Cleaning

To answer the research question, both genealogists and data analysts collect, process and classify data relevant to the issue. Almost all data is seen as relevant to analysts, but genealogists often go further, collecting source material relevant not only to the issue, but also surrounding the issue. Data analysts, on the other hand, are more focused on the question itself, locating only data relevant to products A and B.

The difference between traditional and genealogical data analysis is that genealogists have much more fuzzy information to deal with. Items like local and regional history books may include data about their question. Such items are generally not relevant to a data analyst focused on a product research project, unless it involves cultural appropriation, i.e., the Korean car makers’ KIA Tucson vehicle. 😊

Exploratory Analysis

Genealogists often explore different sets of data to glean information and evidence relevant to their questions. Similarly, a traditional data analyst will do the same, focusing more on specific items than general items.

Modelling and Algorithms

There are no “real” algorithms for genealogists to apply to their data findings. There is, however, a Genealogical Data Model, which was constructed to help genealogists apply their data to real-world projects. The Genealogical Data Model was originally constructed to be a basis for software, but since it was completed, no software has used the GDM (except for The Master Genealogist, which used large parts of it).

Data Products and Communications

Genealogists use a proof model to present data and their formulation of the evidence they’ve compiled. A traditional data analyst uses a tool such as business intelligence software to present their findings. The only real difference between the two is that they present findings in a different way.

NPM

Fascinating New Find About Richard Mellen

Just found a newish website whose author is apparently unwilling to commit to scholarly diligence and credibility. Doug Sinclair wrote a paper about Richard Mellen of Massachusetts, alleged father of Simon Mellen of Sherborne and Framingham.

Mr. Sinclair states that his website doesn’t claim to the standard of scholarly journals, and it certainly doesn’t.

Further, he ignores the clear, cogent, and concise statements at the beginning of my paper on Richard Mellen, and the genealogy of Simon Mellen, that they are each “extended literature review[s].” Further, the next paragraph in those publications cogently states that the focus of each was on “published record sets.” What is not credible or clear about that?

Mr. Sinclair fails to cite his sources for the statements concerning my work, yet appears to cite everything else he writes about. Nothing new was presented that hadn’t been already written about by myself or others, just pictures. One picture in particular clearly shows he read my blog article and paper about Richard “Maling” aka “Waling.” The picture he posted clearly shows a “W”.

Any diligent genealogical researcher will use both of our works as clues only. Any diligent genealogical researcher will also follow the BCG Code of Ethics in using our works.

NPM

Staying Relevant in the Online World

 

[Fish market, Bergen, Norway] (LOC)

[Fish market, Bergen, Norway] (LOC) (Photo credit: The Library of Congress)

 

Note: This piece is opinion and you may or may not agree with the points raised.

 

Several genealogists have questioned the value of social media as a means of getting business for themselves. Does it work? Is it worthwhile to do constant social marketing? I answer, no, not really. Social marketing only turns us into social butterflies flitting from one thing to the next, searching for relevance. That doesn’t mean that with a little focus, we can’t be more effective in our own patches.

 

We all have our own niches where we are relevant and effective. Where are we most useful in the largest scheme of things? At home in our own patches. Where is your patch? It might be New England, the Pacific Northwest, the Deep South, or elsewhere. This answers the question of using social media effectively in one respect: locality.

 

Do we have always to go outside of our own patch to find clients and customers? No. The thing is, we need to focus on what we know and keep it up at a level and longevity that makes sense for us. The simple answer is that clients will come to us, looking for us; we don’t have to go to them anymore. That’s the value of the Internet. Push marketing is outdated. Pull marketing is the way things are now.

 

What attracts clients in the first place? Pull marketing. Pull marketing is the goodwill we generate in our own niche markets. Do you have an effective website? Do you focus on what you know in your area? Are these items present in your marketing online? Speaking of which, this is what social marketers (all of us, really) need to focus on, not plugging something from someone else; that’s giving away your time for no or little gain. Focus on your own gain, in your own market niche and you’ll be fine.

 

Does that mean that you can’t market outside of your niche market? No, but does the effectiveness of such marketing show? Not really; especially if there’s no response at all most of the time, which is what you’ll find when you do venture in that direction. It’s just less effective in the long-term and in the short-term a waste of time.

 

The majority of social marketing we do, plugging, liking, and linking to products is all that we can do. It’s socializing with others, seeking their approval and approving things we like. As far as I’m concerned, this sort of marketing is not business marketing, but marketing others’ products for them. It’s a time-consuming effort to constantly do such things and because of the low return on investment of your time, worthless.

 

Just focus on the who, what, where, when, and how of what you specialize in and you will be more effective in getting clients.

 

NPM

 

© 2012 N. P. Maling – Sea Genes Family History & Genealogy Research

 

 

 

English: A business ideally is continually see...

English: A business ideally is continually seeking feedback from customers: are the products helpful? are their needs being met? Constructive criticism helps marketers adjust offerings to meet customer needs. Source of diagram: here (see public domain declaration at top). Questions: write me at my Wikipedia talk page (Photo credit: Wikipedia)

 

In case you were wondering . . .

I’ve restarted my book selling business and will be focusing on that for the time being. I’ve kept the Seattle Book Scouts’ Blog for a while, here on WordPress.com, but haven’t maintained it for a while. I might open up that again in the future.

In the meantime, feel free to browse the archives here and there, and look for the very occasional post here.

NPM

 

Series Introduction: 1940 Obituaries

Over the next three months, I plan to post obituaries from the Pacific Northwest states. These posts will ostensibly be part of the Geneabloggers “Sunday’s Obituary” prompt. The overall theme, though, is that they all come from Alaska, Oregon, and Washington, and the months of January, February, and March in 1940.

Washington and Oregon have the best coverage of the four states that I focus on, so I’ve decided to focus on these two. Instead of my current location, Seattle, Spokane’s Spokesman-Review will provide the post content. Oregon’s Oregonian, out of Portland, will provide obituaries from there.

One interesting thing is that the two Alaska newspapers I’ve looked at for source materials, from Anchorage and Fairbanks, have no obituaries in them. My tentative workaround for this issue is to find articles about deaths through accident, murder, or other event, including of course, old age.

For Idaho, the University of Washington’s Suzzallo & Allen Library’s Microfilm and News department doesn’t have anything for the right time period. The closest interesting newspaper microfilm from that place and time period is at the Washington State University library in Pullman. Thus, Idaho will not be covered in the series, even though it is a Pacific Northwest state.

It will be an interesting series to read and I’m having fun putting it together. Enjoy.

NPM

© 2012 N. P. Maling