UNDERSTANDING THE DATA

UNDERSTANDING THE DATA

What is linkable administrative data, and why would you use it to do research? To find out, check out this short video by the Children's Data Network (at the University of Southern California):

There are a huge number of health and social databases housed at the Manitoba Centre for Health Policy. These databases are de-identified, and can be linked at the individual level. 

Some of our data go back to 1970, which means we can do really cool longitudinal studies.  Click here to see the years of data availability for each dataset. 

Meta-data 

What kinds of data are available?

There are so many different data sets available to be linked for your projects.  The MCHP website has some great resources to start exploring these data sets. To find information about a specific data, go to Data Repository > Data List .

On this page, there is a link to information on each individual data set.

For example, if you wanted to get information on physician visits, scroll down to Medical Claims / Medical Services

This page gives you:

  • A summary 

  • The Source Agency

  • Purpose

  • Years (of data available)

  • Type

  • Size

  • Data Level

  • Components

  • Scope

  • Data collection method

  • Data Highlights

  • Access Requirements

  • More Information

  • List of Studies that have used these data

Concept Dictionary: The Concept Dictionary is another great place to look to see how the data is used to define specific concepts and outcomes. This dictionary has glossary entries and concepts. Glossary entries give background information and definitions of terms.  Concepts provide information on how these terms are defined using the data at MCHP.

For example, if you are interested in looking at diabetes in Manitoba, enter 'diabetes' into the search bar.  This search has more than 80 results; here are the top 5:

If you click on the first result, you will find a glossary entry for diabetes:

At the bottom of the page, there are also links to related concepts, terms and references.

If you are interested in how prevalence of diabetes is measured using the data at MCHP, click on the Diabetes - Measuring Prevalence

This link will take you to a page that has:

  • Definition of Diabetes

  • Literature Review

  • Manitoba Diabetes Algorithms

  • Calculating Population-based Prevalence Rates

  • Cautions / Limitations

Deliverables: Another great way to understand the data housed at MCHP is to look through past deliverables. Deliverables are reports prepared by scientists at MCHP to answer questions posed by Manitoba Health, Seniors and Healthy Living. A lot of work goes into these reports, and a broad range of outcomes and predictors are examined. 

Concept Dictionary and Deliverables

Now that you know the kinds of data that you could link, you may be wondering what variables are actually in each dataset. The best way to find out is to access the meta-data for each data source. Unfortunately, if you do not work at MCHP, the only way to access this meta-data is in the Training Room at MCHP (which is open to everyone). The steps to access this information are:

  1. Log into the VDI on one of the computers in the Training Room. The username is 'train' and the password is 'training'​​

  2. Visit the internal MCHP site: portal.cpe.umanitoba.ca

       This is what the site looks like:

3. Click on the 'Metadata Repository' tab under 'MCHP Applications'

4. This will take you to a page that looks like this:

a.

b.

c.

d.

a. click on 'Selected Database' to choose the database you are interested in

Here, we have the Families First Screening dataset

b. click on the 'Dataset to View' to choose the dataset within the database you want to see. For example, within the Families First Screening, you can select one of the following datasets to view:

c. Once you have selected your desired dataset, the 'Overview' tab is a great place to start to understand each dataset.

d. The 'Data Dictionary' tab has detailed information on the variables that are in each dataset. Click on any variable name to get the values in each variable and frequencies.  For example, we clicked on 'AGE_MOM' variable, which indicates the mother's age, and it gives us an overview of the ages that were entered, and their frequencies.

Selecting any of the other tabs will give you more detailed information on some of the other aspects of these datasets.

Training and Simulated Data

Another way to get to know the data is to use the training or simulated datasets. 

Training Data - this is a 10 percent sample of the population data at MCHP that may be used to take a preliminary look at the feasibility of your project (e.g., methodology, sufficient events). These data are only available for use on site at MCHP.

Simulated Data - this is simulated data that reflects some of the characteristics of the data seen in the Population Data Research Repository, but does not contain real data on any individuals.  These are the datasets that are used in the SAS workshop.  These data are available upon request and can be used off-site. 

To obtain access to either of these datasets, contact access@cpe.umanitoba.ca