Data Warehouses and Data Mining

Database Examples in Healthcare Operations

So far, much of this course has been devoted to understanding what databases are, how they are structured, and some of their uses. A substantial healthcare organization such as a hospital will use not just one but several large databases. We have learned that each database contains multiple tables. Now, we should envision that most healthcare organizations maintain multiple computerized systems and databases. Here are just a few:

Example databases in a hospital:

· Laboratory information system (LIS)

· Radiology information system (RIS, and PACS)

· Pharmacy information system (PIS)

· Patient registration- admissions, discharge and transfer systems (ADT)

· Master Patient Index (MPI)

· Electronic health record system (EHR)

· Release of Information system (ROI)

· Scheduling systems

· Claims coding and billing systems

· Materials management systems

Each of these systems and databases has been designed with a particular purpose in mind. The goal is to rapidly and efficiently support the processes and work of the staff members in the related departments and functions of that department. The speed of processing is critical. When a specific patient’s record is brought up on the EHR, it is expected that detailed medication lists, medication allergies, and history will be nearly instantly available to staff and practitioners. Anything slower would not be supporting the clinical provider’s role.

It is very important to remember that not all of the healthcare organization’s clinical data may be stored within a single, main EHR system. There are many clinical systems that can exist independently from, but interface with, the electronic health record. When called upon to track patient information, these systems may deliver the results to the online screens using HL7 and other compatibility features. Now, we consider details on a few of these various databases.

When it comes to radiology systems, the images and film reads/interpretations need to be stored in accessible manner, too. Older films and digital files need to be processed rapidly. When the radiologist or other physician goes to compare what changes have happened over time in a fracture or soft tissue analysis, they expect that the films can be retrieved within a few minutes’ time. Radiology files can be very large because they include images as bitmap digital files, not just simple text and numeric data. PDF files are examples of one bitmap format that has come into common usage by consumers. Other image file formats exist as well.

Laboratory information systems are designed to rapidly store and retrieve laboratory test types and results. Laboratory and pharmacy systems were some of the earliest clinical systems to be designed and digitized. Lab tests are easily characterized by test type/name, date of sample collection, results, and so forth.

Pharmacy systems need to maintain accurate records on each and every pharmaceutical item dispensed, for what patient, in what dosage amount and dosage form (tablet, capsule, liquids for IV), frequency and route of administration. Start and stop dates, number of refills, prescribing provider and other prescription details also must be tracked. Obviously, the pharmacist needs to be able to rapidly retrieve any prescription’s data, plus have decision support software that checks for duplicate prescriptions and potential warnings, such as drug interactions.

The prescribing practitioner also has CPOE (computerized practitioner order entry) software that conveys the prescribing information to the pharmacist and also has “smart checks” decision support to check for some drug interactions. CPOE is often one component of an EHR. However, a patient may be seeing multiple practitioners and specialists in the outpatient setting for practice locations that are not connected. The pharmacist ultimately has the data on prescriptions from various providers as long as the patient uses the same pharmacy (or pharmacy chain) for all prescriptions. Both CPOE and pharmacy systems need very rapid response times as the prescription is being dispensed, run through allowable insurance payment and reviewed for potential contraindications for the prescription. It is “too late” to check for safety issues once the patient has left the pharmacy or taken the drug in the hospital.

Release of information systems (ROI) may be familiar. These are used to make sure that proper patient authorizations have been provided for release of records, to identify just what medical record items were released, to whom and when, and to process applicable charges for the released PHI. Release of information systems, along with computer-assisted coding systems, are examples of administrative information systems. The patient’s direct care does not depend on having rapid response from these systems. However, the healthcare organization’s ability to conduct its business functions for billing, payments and the like, will still need rapid response for efficiency.

Data Warehouse as a Shopping Mall

By now you may be getting the picture that there are many different systems and places where data are located in a healthcare organization of size. It takes great effort to keep all the information needed for patient care updated and available for analysis.

Let’s consider the data warehouse as a shopping mall for the purpose of demonstration. Instead of one large “store” of data, we have many “shops” of databases of various sizes within a hospital to go to for information. Some are very large complex databases, just as a department store is large and complex. The EHR is a large and complex system, similar to the department store. Other databases are more like small specialty shops. Think of a bookstore or a place to purchase just greeting cards, as these are more specialized vendors. A release of information system or a surgical materials management system is more specialized and does not need as many connections to other databases as the larger clinical databases. All these systems have value to the organization, though, similarly that both specialty and department stores add value to a mall.

Well, when conducting a particular study and data request for financial, administrative or quality of care purposes, we might need to go to a variety of “shops” (databases) to get all the information needed. The various databases may not be connected the way we want them to be (say to support a study, for example), because they were constructed to do the day-to-day business of taking care of patients, or getting proper payments for the care. When different databases are involved (not just the different tables within one database), it can be inadequate to use SQL language or queries to extract all that is necessary. Not everything needed is in the same place. We have to painstakingly find out what system contains needed information, what are the data attribute names, what format was it stored in, and then try to go fill the “shopping cart” with the data before even starting to run any analysis. It can be very time-consuming just to create a file to support the study.

Picture this: If you have to drive all over town to complete the following tasks: buy a new bicycle for a kid, get a watch battery replaced, buy shampoo, and get a new windshield wiper for the car — this can take much of a day. If all those errands can be completed within the same large “shopping mall” or large department store, though, then our errand time is vastly reduced. One stop, one parking place to find — take care of the list and you are done.

The Data Warehouse is a lot like the shopping mall. A large storehouse of data is created that has already pulled together many information items that the organization thinks might be needed down the road for analysis. Instead of “running all over town” for the data, meaning going to many different database sources, it has been already pulled from the existing operations databases and plopped into a single, coherently organized place — a data warehouse.

The data warehouse is a location and structure for data envisioned and constructed before any study has even been started. Much thought goes into the strategy and plan for the data warehouse. It tries to “anticipate” that many different purposes and studies on our data will needed, to draw conclusions and answer future questions. If enough usable data items have been anticipated and pulled together into the data warehouse, and many items were included in the space with a good interface for requesting what is needed, then more management questions can be readily answered, more study data pulled, etc. The organization can better survive with good information that is easy to retrieve; in addition, facility needs, such as how to reduce costs and improve care, can more easily be met.

OLTP vs. OLAP Systems

The various systems in the hospital explained so far, with exception of the data warehouse, use what is called an Online Transaction Processing System (OLTP). Understand that this is not so much a “structure” but more the way the system runs its programs and updates. These database systems need to be able to accept updates (new data entry) rapidly, update the database entries almost instantly, and return any current information the user needs. The data need to be entered, the storage of that data updated, and the updated results made available very rapidly. This is truly an amazing era that we live in. Now, a pharmacist is processing hundreds of prescriptions a day with safety checks in place and inventory automatically updated. Physicians are seeing dozens of patients a day with much of the patient record updated that same day.

Think of OLTP as “fast” processing and quickly changing. Even when you go online to purchase a book or a blouse, this process is made possible by OLTP systems. The order is placed, warehouses checked for availability, accurate pricing and taxes rendered, and shipping arranged all within a matter of minutes.

All that speed of carrying on everyday operations leads to rapid change in the databases, though. It is great to have speed to take care of patients. Rapid changes and updates are not so good though, for conducting management studies. This leads to the second reason for having a Data Warehouse (or Data Mart). The kind of processing used to conduct research and management studies needs stable data that stay the same for a while, not items that are changing their values all the day long while patient care is updated.

An ideal processing method for the data warehouse is called an OLAP system. OLAP means Online Analytical Processing. It is used for analysis, not every-day operations. This kind of database will not be updated every time a new patient comes in. For example, to perform a study on patients over the past two years who experienced readmissions within 30 days, we do not need instant, or even daily, updates. In fact, rapid changes would interfere with doing the study efficiently. Therefore, the data warehouse might be constructed as OLAP type. OLAP is good for trend analysis (data is changed slowly or not very often). OLTP (changes fast) makes trend analysis difficult and has often “lost” the history of data. An OLAP database and warehouse would be updated only once daily (or perhaps weekly) instead of many times each day. OLAP data reads are more comparable to reading a newspaper that is published and updated only once a day.

Trend Data

Trend data is data that is collected and analyzed over time. In healthcare, trend data exists not only within an organization but about an organization as well. It is important to understand trend components and when each component is reported and needed.

Trend data within a healthcare organization consists of many different facets. Traditional business operations concerns such as revenue, expenditures, staffing levels, and quality of care must be measured and analyzed. In addition, trends in patient demographics must be analyzed to make decisions involving items from staffing levels to organizational growth opportunities. Trends in the kinds of patient diagnoses observed and procedures performed need to be analyzed to determine how our healthcare organization is doing at meeting healthcare needs. They can also be used to determine marketing decisions or building expansions. Revenues are also impacted by trends in procedures performed and their frequency.

External entities, such as Federal insurers, like CMS, state health departments and private insurance companies, also consume trend data from healthcare organizations and turn it into national and regional trend analyses. Governmental agencies and research institutions are constantly reviewing trends for epidemiological and public health reasons. The HIM department and healthcare organization must report certain kinds of cases to national registries, including cancer registries.

Diseases of special interest such as Zika virus and AIDs are also tracked by the U.S. Centers for Disease Control (CDC), which issues frequent reports to consumers, governments and providers. This trend data can be used to alleviate epidemics, prioritize vaccine research and production, and improve the overall public’s health. Generally speaking, the cancer registries and the CDC will request that data be provided to them only in specific formats, known as Data Sets. Only by receiving such sets in the specified formats and organization can the CDC rapidly update information for providers and the public such as weekly reports and maps on influenza outbreak updates. One CDC site containing such influenza updates is FluView Weekly U.S. Influenza Surveillance Report. The HIM department may be responsible for submitting the data sets in the prescribed formats.

Almost anything that involves measurement over time and is not just a simple “naming” field can be used for trend analysis — patient blood pressures, numbers of cases of lead-based paint poisoning, obesity levels in a practice or population, cases of Clostridium difficile in a region or hospital (since C. difficile is a common nosocomial infection), times of day when the most patients arrive for treatment to urgent care or emergency departments — any of these items and many more can be subjects for trend analysis.

The HIM professional will need to assure that management studies conducted for trend analysis do not inadvertently reveal Protected Health Information (PHI) or lead to a single or few patients being identifiable from a study, even if personal information was not part of the report. For instance, if there was only one case of C. difficile in a hospital for a month studied, care would need to be taken to not report too many details about that case.

Data Mining

Identifying patterns within large sets of diverse data is called data mining. Data mining can require analytical models or other methods to identify unique patterns, involving the efforts of highly trained analysis personnel. Not every analysis and research question asked by an organization would require a “data mining” effort. Data mining is often more of an exploratory effort than an effort to answer a single, well-defined research question, but it is often used as a source of decision making in healthcare organizations.

Exploratory data mining requires excellent knowledge of statistics, usually not simply descriptive statistics such as Lengths-of-Stay or mortality rates but also inferential statistics. Inferential statistics can include linear regression, logistic regression analysis, and others. A good knowledge of the mathematics field of calculus is suggested to fully understand methods of multivariate linear regression and logistic regression. Simpler forms of inferential statistics, which review just a single dependent variable and single independent variable, include ANOVA and chi-squared tests.

Other kinds of data mining involve using large and diverse data sets, such as within an insurance company or government agency, which both have access to large billed-claims data sets. As mentioned, certain kinds of healthcare decision-making can be done if an organization has access to data from many healthcare organizations. In this kind of data mining activity, a healthcare insurer may look at many claims across numerous healthcare providers and different hospitals and clinics. The insurer has greater access to such large data sets than an individual provider. However, the quality of data may vary from healthcare organization to healthcare organization. Not all institutions will be at the same stage of development, database definitions, and data gathering efforts. Drawing conclusions from massive numbers of claims originating from different healthcare entities, therefore, requires caution.

Data mining may be used to support efforts at fraud detection and utilization review as well. Healthcare insurers may monitor submitted claims for changes and issues over time. If they observe a systematic change in billing codes submitted over time from a hospital or within a region, for example, this could require further research. Billing changes might result from verifiable change in patient care patterns, or they could result from a change in coding practice. Insurers are always on the lookout for systematic “up-coding” and other practices that can increase hospital and healthcare claims revenue.

The success of data mining relies heavily on the quality of data within the database itself. Quality data only comes from a quality patient record, and the HIM department is the steward of the patient record. HIM oversees patient information. HIM ensures the patient record in its totality — everything from registration, through an inpatient stay, onto discharge, outpatient care, emergency care and follow-up care — is complete, thorough and accurate. Therein lies the promise of quality information in an organization’s database. This allows an organization to depend on its data for direction in decisions — such as marketing efforts or building expansions — and for successful audits and federal regulation reviews, research and much, much more. The functions and responsibility of an HIM department and HIM professionals are well demonstrated in the quality of an organization’s data and databases. Quality data input results in quality database outputs and quality data driven analysis and decisions.