data analysis in research steps

Two data analysts discussing the data analysis process

A Step-by-Step Guide to the Data Analysis Process

Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it’s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny.

In this post, we’ll explore the main steps in the data analysis process. This will cover how to define your goal, collect data, and carry out an analysis. Where applicable, we’ll also use examples and highlight a few tools to make the journey easier. When you’re done, you’ll have a much better understanding of the basics. This will help you tweak the process to fit your own needs.

Here are the steps we’ll take you through:

Defining the question
Collecting the data
Cleaning the data
Analyzing the data
Sharing your results
Embracing failure

On popular request, we’ve also developed a video based on this article. Scroll further along this article to watch that.

The five steps in the data analysis process: Define the question, gather your data, clean the data, analyze it, visualize and share your findings

Ready? Let’s get started with step one.

1. Step one: Defining the question

The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the ‘problem statement’.

Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, your organization’s senior management might pose an issue, such as: “Why are we losing customers?” It’s possible, though, that this doesn’t get to the core of the problem. A data analyst’s job is to understand the business and its goals in enough depth that they can frame the problem the right way.

Let’s say you work for a fictional company called TopNotch Learning. TopNotch creates custom training software for its clients. While it is excellent at securing new clients, it has much lower repeat business. As such, your question might not be, “Why are we losing customers?” but, “Which factors are negatively impacting the customer experience?” or better yet: “How can we boost customer retention while minimizing costs?”

Now you’ve defined a problem, you need to determine which sources of data will best help you solve it. This is where your business acumen comes in again. For instance, perhaps you’ve noticed that the sales process for new clients is very slick, but that the production team is inefficient. Knowing this, you could hypothesize that the sales process wins lots of new clients, but the subsequent customer experience is lacking. Could this be why customers don’t come back? Which sources of data will help you answer this question?

Tools to help define your objective

Defining your objective is mostly about soft skills, business knowledge, and lateral thinking. But you’ll also need to keep track of business metrics and key performance indicators (KPIs). Monthly reports can allow you to track problem points in the business. Some KPI dashboards come with a fee, like Databox and DashThis . However, you’ll also find open-source software like Grafana , Freeboard , and Dashbuilder . These are great for producing simple dashboards, both at the beginning and the end of the data analysis process.

2. Step two: Collecting the data

Once you’ve established your objective, you’ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. sales figures, or qualitative (descriptive) data, such as customer reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let’s explore each one.

What is first-party data?

First-party data are data that you, or your company, have directly collected from customers. It might come in the form of transactional tracking data or information from your company’s customer relationship management (CRM) system. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include customer satisfaction surveys, focus groups, interviews, or direct observation.

What is second-party data?

To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.

What is third-party data?

Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data .

Tools to help you collect data

Once you’ve devised a data strategy (i.e. you’ve identified which data you need, and how best to go about collecting them) there are many tools you can use to help you. One thing you’ll need, regardless of industry or area of expertise, is a data management platform (DMP). A DMP is a piece of software that allows you to identify and aggregate data from numerous sources, before manipulating them, segmenting them, and so on. There are many DMPs available. Some well-known enterprise DMPs include Salesforce DMP , SAS , and the data integration platform, Xplenty . If you want to play around, you can also try some open-source platforms like Pimcore or D:Swarm .

Want to learn more about what data analytics is and the process a data analyst follows? We cover this topic (and more) in our free introductory short course for beginners. Check out tutorial one: An introduction to data analytics .

3. Step three: Cleaning the data

Once you’ve collected your data, the next step is to get it ready for analysis. This means cleaning, or ‘scrubbing’ it, and is crucial in making sure that you’re working with high-quality data . Key data cleaning tasks include:

Removing major errors, duplicates, and outliers —all of which are inevitable problems when aggregating data from numerous sources.
Removing unwanted data points —extracting irrelevant observations that have no bearing on your intended analysis.
Bringing structure to your data —general ‘housekeeping’, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily.
Filling in major gaps —as you’re tidying up, you might notice that important data are missing. Once you’ve identified gaps, you can go about filling them.

A good data analyst will spend around 70-90% of their time cleaning their data. This might sound excessive. But focusing on the wrong data points (or analyzing erroneous data) will severely impact your results. It might even send you back to square one…so don’t rush it! You’ll find a step-by-step guide to data cleaning here . You may be interested in this introductory tutorial to data cleaning, hosted by Dr. Humera Noor Minhas.

Carrying out an exploratory analysis

Another thing many data analysts do (alongside cleaning data) is to carry out an exploratory analysis. This helps identify initial trends and characteristics, and can even refine your hypothesis. Let’s use our fictional learning company as an example again. Carrying out an exploratory analysis, perhaps you notice a correlation between how much TopNotch Learning’s clients pay and how quickly they move on to new suppliers. This might suggest that a low-quality customer experience (the assumption in your initial hypothesis) is actually less of an issue than cost. You might, therefore, take this into account.

Tools to help you clean your data

Cleaning datasets manually—especially large ones—can be daunting. Luckily, there are many tools available to streamline the process. Open-source tools, such as OpenRefine , are excellent for basic data cleaning, as well as high-level exploration. However, free tools offer limited functionality for very large datasets. Python libraries (e.g. Pandas) and some R packages are better suited for heavy data scrubbing. You will, of course, need to be familiar with the languages. Alternatively, enterprise tools are also available. For example, Data Ladder , which is one of the highest-rated data-matching tools in the industry. There are many more. Why not see which free data cleaning tools you can find to play around with?

4. Step four: Analyzing the data

Finally, you’ve cleaned your data. Now comes the fun bit—analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you’re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.

Descriptive analysis

Descriptive analysis identifies what has already happened . It is a common first step that companies carry out before proceeding with deeper explorations. As an example, let’s refer back to our fictional learning provider once more. TopNotch Learning might use descriptive analytics to analyze course completion rates for their customers. Or they might identify how many users access their products during a particular period. Perhaps they’ll use it to measure sales figures over the last five years. While the company might not draw firm conclusions from any of these insights, summarizing and describing the data will help them to determine how to proceed.

Learn more: What is descriptive analytics?

Diagnostic analysis

Diagnostic analytics focuses on understanding why something has happened . It is literally the diagnosis of a problem, just as a doctor uses a patient’s symptoms to diagnose a disease. Remember TopNotch Learning’s business problem? ‘Which factors are negatively impacting the customer experience?’ A diagnostic analysis would help answer this. For instance, it could help the company draw correlations between the issue (struggling to gain repeat business) and factors that might be causing it (e.g. project costs, speed of delivery, customer sector, etc.) Let’s imagine that, using diagnostic analytics, TopNotch realizes its clients in the retail sector are departing at a faster rate than other clients. This might suggest that they’re losing customers because they lack expertise in this sector. And that’s a useful insight!

Predictive analysis

Predictive analysis allows you to identify future trends based on historical data . In business, predictive analysis is commonly used to forecast future growth, for example. But it doesn’t stop there. Predictive analysis has grown increasingly sophisticated in recent years. The speedy evolution of machine learning allows organizations to make surprisingly accurate forecasts. Take the insurance industry. Insurance providers commonly use past data to predict which customer groups are more likely to get into accidents. As a result, they’ll hike up customer insurance premiums for those groups. Likewise, the retail industry often uses transaction data to predict where future trends lie, or to determine seasonal buying habits to inform their strategies. These are just a few simple examples, but the untapped potential of predictive analysis is pretty compelling.

Prescriptive analysis

Prescriptive analysis allows you to make recommendations for the future. This is the final step in the analytics part of the process. It’s also the most complex. This is because it incorporates aspects of all the other analyses we’ve described. A great example of prescriptive analytics is the algorithms that guide Google’s self-driving cars. Every second, these algorithms make countless decisions based on past and present data, ensuring a smooth, safe ride. Prescriptive analytics also helps companies decide on new products or areas of business to invest in.

Learn more: What are the different types of data analysis?

5. Step five: Sharing your results

You’ve finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with the wider world (or at least with your organization’s stakeholders!) This is more complex than simply sharing the raw results of your work—it involves interpreting the outcomes, and presenting them in a manner that’s digestible for all types of audiences. Since you’ll often present information to decision-makers, it’s very important that the insights you present are 100% clear and unambiguous. For this reason, data analysts commonly use reports, dashboards, and interactive visualizations to support their findings.

How you interpret and present results will often influence the direction of a business. Depending on what you share, your organization might decide to restructure, to launch a high-risk product, or even to close an entire division. That’s why it’s very important to provide all the evidence that you’ve gathered, and not to cherry-pick data. Ensuring that you cover everything in a clear, concise way will prove that your conclusions are scientifically sound and based on the facts. On the flip side, it’s important to highlight any gaps in the data or to flag any insights that might be open to interpretation. Honest communication is the most important part of the process. It will help the business, while also helping you to excel at your job!

Tools for interpreting and sharing your findings

There are tons of data visualization tools available, suited to different experience levels. Popular tools requiring little or no coding skills include Google Charts , Tableau , Datawrapper , and Infogram . If you’re familiar with Python and R, there are also many data visualization libraries and packages available. For instance, check out the Python libraries Plotly , Seaborn , and Matplotlib . Whichever data visualization tools you use, make sure you polish up your presentation skills, too. Remember: Visualization is great, but communication is key!

You can learn more about storytelling with data in this free, hands-on tutorial . We show you how to craft a compelling narrative for a real dataset, resulting in a presentation to share with key stakeholders. This is an excellent insight into what it’s really like to work as a data analyst!

6. Step six: Embrace your failures

The last ‘step’ in the data analytics process is to embrace your failures. The path we’ve described above is more of an iterative process than a one-way street. Data analytics is inherently messy, and the process you follow will be different for every project. For instance, while cleaning data, you might spot patterns that spark a whole new set of questions. This could send you back to step one (to redefine your objective). Equally, an exploratory analysis might highlight a set of data points you’d never considered using before. Or maybe you find that the results of your core analyses are misleading or erroneous. This might be caused by mistakes in the data, or human error earlier in the process.

While these pitfalls can feel like failures, don’t be disheartened if they happen. Data analysis is inherently chaotic, and mistakes occur. What’s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn’t be as interesting. Use the steps we’ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer back to the process to keep yourself on track.

In this post, we’ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst’s work:

Define the question —What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.
Collect data —Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?
Clean the data —Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don’t rush…take your time!
Analyze the data —Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
Share your results —How best can you share your insights and recommendations? A combination of visualization tools and communication is key.
Embrace your mistakes —Mistakes happen. Learn from them. This is what transforms a good data analyst into a great one.

What next? From here, we strongly encourage you to explore the topic on your own. Get creative with the steps in the data analysis process, and see what tools you can find. As long as you stick to the core principles we’ve described, you can create a tailored technique that works for you.

To learn more, check out our free, 5-day data analytics short course . You might also be interested in the following:

These are the top 9 data analytics tools
10 great places to find free datasets for your next project
How to build a data analytics portfolio

Skip to main content
Skip to primary sidebar
Skip to footer
QuestionPro

Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
Resources Blog eBooks Survey Templates Case Studies Training Help center

Home Market Research

Data Analysis in Research: Types & Methods

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense.

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research.

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words.

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find “food” and “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended text analysis methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other.

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

Content Analysis: It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
Discourse Analysis: Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
Grounded Theory: When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

Fraud: To ensure an actual human being records each response to the survey or the questionnaire
Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
Procedure: To ensure ethical standards were maintained while collecting the data sample
Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

Count, Percent, Frequency
It is used to denote home often a particular event occurs.
Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

Mean, Median, Mode
The method is widely used to demonstrate distribution by various points.
Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

Range, Variance, Standard deviation
Here the field equals high/low points.
Variance standard deviation = difference between the observed score and mean
It is used to identify the spread of scores by stating intervals.
Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

Percentile ranks, Quartile ranks
It relies on standardized scores helping researchers to identify the relationship between different scores.
It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided sample without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected sample to reason that about 80-90% of people like the movie.

Here are two significant areas of inferential statistics.

Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
Cross-tabulation: Also called contingency tables, cross-tabulation is used to analyze the relationship between multiple variables. Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing audience sample il to draw a biased inference.
Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

10 Quantitative Data Analysis Software for Every Data Scientist

Apr 18, 2024

11 Best Enterprise Feedback Management Software in 2024

17 Best Online Reputation Management Software in 2024

Apr 17, 2024

Top 11 Customer Satisfaction Survey Software in 2024

Other categories.

Academic Research
Artificial Intelligence
Assessments
Brand Awareness
Case Studies
Communities
Consumer Insights
Customer effort score
Customer Engagement
Customer Experience
Customer Loyalty
Customer Research
Customer Satisfaction
Employee Benefits
Employee Engagement
Employee Retention
Friday Five
General Data Protection Regulation
Insights Hub
Life@QuestionPro
Market Research
Mobile diaries
Mobile Surveys
New Features
Online Communities
Question Types
Questionnaire
QuestionPro Products
Release Notes
Research Tools and Apps
Revenue at Risk
Survey Templates
Training Tips
Uncategorized
Video Learning Series
What’s Coming Up
Workforce Intelligence

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis.

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include:

Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate.
Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes.
Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic.
Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply.
Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis.

Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step.
Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others. An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario.
Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data.
Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others.
Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them.

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches.

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world:

A. Quantitative Methods

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods.

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist.

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge. When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result.

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events.

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.

8. Decision Trees

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision.

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely. Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision. In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more.

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments.

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic.

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example.

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of.

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all” and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses. When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all.

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading.

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best.

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data.

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next.

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context.

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question.

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service.

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore, to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize.

15. Narrative Analysis

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others.

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study.

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on.

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice.

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data.

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes.

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

4. Think of governance

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical.

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole.

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors.

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations.

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation.
Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving.

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in.

Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low.
External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high.
Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now.
Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps.

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource .

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail.

Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions.
Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective.
Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them.
Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them.
Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.
Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy.
Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way.
Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data.

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers.
Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate.
Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient.
SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis.
Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context.

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
94% of enterprises say that analyzing data is important for their growth and digital transformation.
Companies that exploit the full potential of their data can increase their operating margins by 60% .
We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

Cluster analysis
Cohort analysis
Regression analysis
Factor analysis
Neural Networks
Data Mining
Text analysis
Time series analysis
Decision trees
Conjoint analysis
Correspondence Analysis
Multidimensional Scaling
Content analysis
Thematic analysis
Narrative analysis
Grounded theory analysis
Discourse analysis

Top 17 Data Analysis Techniques:

Collaborate your needs
Establish your questions
Data democratization
Think of data governance
Clean your data
Set your KPIs
Omit useless data
Build a data management roadmap
Integrate technology
Answer your questions
Visualize your data
Interpretation of data
Consider autonomous technology
Build a narrative
Share the load
Data Analysis tools
Refine your process constantly

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

Buy Me a Coffee

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Definition:

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
SQL : A programming language used to manage and manipulate relational databases.
R : An open-source programming language and software environment for statistical computing and graphics.
Python : A general-purpose programming language that is widely used in data analysis and machine learning.
Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
SAS : A statistical analysis software used for data management, analysis, and reporting.
SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
Matlab : A numerical computing software that is widely used in scientific research and engineering.
RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Cluster Analysis – Types, Methods and Examples

Data Collection – Methods Types and Examples

Delimitations in Research – Types, Examples and...

Discriminant Analysis – Methods, Types and...

Research Process – Steps, Examples and Tips

Research Design – Types, Methods and Examples

Introduction

Data analysis is the process of inspecting, cleaning, transforming, and modeling data with the aim of discovering useful information, drawing conclusions and making better decisions. In today's world, data analysis has become an essential aspect of businesses, organizations, and industries. It provides insights that help businesses to address problems effectively, optimize their operations, and increase profits. This guide aims to explain the importance of data analysis and provide an overview of the process involved.

Importance of Data Analysis

Data analysis helps businesses to make informed decisions based on facts and insights rather than opinions and guesswork
It helps to identify patterns, trends, and relationships in data that can be used to predict the future
Data analysis provides insights that help businesses to optimize their operations, reduce costs, and increase efficiency
It helps organizations to evaluate the effectiveness of their strategies and initiatives and make necessary adjustments

Overview of the Data Analysis Process

The data analysis process is a systematic approach to analyzing data. The following are the key steps involved in the process:

Define the problem or research question
Collect and organize the data
Clean and preprocess the data
Analyze and model the data
Interpret and communicate the results

Each step of the process is equally important and requires careful consideration to ensure that the final results are reliable and accurate. In the next sections, we will delve deeper into each step of the process.

Step 1: Defining the Research Question

Defining the research question is the initial step of the data analysis process. In this step, researchers must identify the research question and hypotheses that can be tested with data analysis.

Identifying the Research Question

The research question should be clear, specific, and answerable with the available data. It should be relevant to the research objectives and provide insights that can contribute to decision-making.

Formulating Hypotheses

Hypotheses are educated guesses about the relationship between variables, which can be tested with data analysis. Researchers must formulate hypotheses that are aligned with the research question and can be supported or refuted by data.

Creating an Analytical Plan

After identifying the research question and hypotheses, researchers must create an analytical plan that outlines the data sources, variables, and methods for data collection and analysis. The analytical plan should be comprehensive, systematic, and transparent, and should include measures to ensure data quality and validity.

Defining the research question is a critical step in the data analysis process, as it provides the foundation for subsequent data collection and analysis. By formulating a clear research question and hypotheses, researchers can ensure that their analysis is relevant, rigorous, and contributes to the knowledge base in their field.

Step 2: Collecting Data

Collecting data is a crucial step in any data analysis process. It involves identifying the right data sources and gathering relevant information to support your analysis. Here are some tips on how to collect data for your analysis:

Choosing the Right Data Sources

Before you start collecting data, it's important to identify the right sources. This will ensure that you have access to accurate and reliable information that is relevant to your analysis. Some tips for choosing the right data sources include:

Identify the purpose and scope of your analysis
Determine what kind of data is required to support your analysis
Consider the reliability and validity of the data sources
Ensure that the data sources are up-to-date and relevant
Consider the cost and accessibility of the data sources

Collecting Relevant Data

After you have identified the right data sources, the next step is to collect relevant data. This involves selecting the specific information that you need to support your analysis. Some tips for collecting relevant data include:

Define the variables that you need to measure
Identify the data points that are relevant to your analysis
Ensure that the data is accurate, complete, and consistent
Consider the format in which the data is presented
Use data analysis tools to collect and organize the data

Organizing Data in a Suitable Format

Once you have collected the relevant data, the next step is to organize it in a suitable format. This involves structuring the data in a way that is easy to analyze and interpret. Some tips for organizing data in a suitable format include:

Choose a format that is appropriate for your analysis (e.g., tables, graphs, charts)
Ensure that the data is properly labeled and categorized
Consider the order in which the data is presented
Use data analysis tools to create visualizations and summaries of the data

By following these tips, you can ensure that you collect relevant and accurate data that will support your analysis.

Step 3: Cleaning and Preparing Data

Before delving into the actual data analysis, it is crucial to ensure that the data to be analyzed is complete, accurate, and free of errors. This step involves detecting any missing data, outliers, and errors, and subsequently fixing them to prepare the data for analysis.

Detecting Missing Data, Outliers, and Errors

The first part of the cleaning and preparation process is to identify any missing data, outliers, and errors that could skew the results of the analysis. This is typically done by using statistical methods and software tools that can detect patterns and anomalies in the data set.

Missing Data: Missing data can occur when there are blank fields in the data set, or if certain data points were not collected. To detect missing data, the data set can be analyzed using tools such as histograms, scatter plots, and correlation matrices. If missing data is detected, there are several methods to fill in the blanks, including mean imputation, regression imputation, and multiple imputation.
Outliers: Outliers are data points that deviate significantly from the rest of the data set. They can occur due to measurement errors, data entry errors, or extreme values in the data. To detect outliers, various statistical techniques can be used, such as box plots, scatter plots, and z-scores. Once detected, outliers can either be removed or corrected depending on the cause.
Errors: Errors can be introduced into the data set due to mistakes in data entry, measurement, or calculations. To detect errors, data can be cross-checked with source documents or verified with independent sources. Once identified, errors can be corrected by either removing the erroneous data points or making appropriate changes to the data values.

Fixing Missing Data, Outliers, and Errors

Once the missing data, outliers, and errors have been identified, the next step is to fix them to ensure that the data is accurate and complete. This involves selecting the appropriate method for each type of issue identified and applying it to the data set.

Missing Data: As mentioned earlier, there are several methods to fill in missing data, including mean imputation, regression imputation, and multiple imputation. The best method depends on the type and amount of missing data, as well as the nature of the data set.
Outliers: Outliers can either be removed or corrected depending on the cause. For example, if an outlier is due to a measurement error, it may be appropriate to remove it. However, if the outlier is a legitimate value, it may need to be corrected to reflect the true nature of the data.
Errors: Errors can be corrected by either removing the erroneous data points or making appropriate changes to the data values. The best approach depends on the nature and cause of the error, as well as the context of the data set.

Overall, the data cleaning and preparation process is critical for ensuring that the subsequent analysis is accurate, reliable, and meaningful. By taking the time to identify and fix any missing data, outliers, and errors, analysts can have confidence in their findings and ensure that their insights are based on a solid foundation of accurate data.

Step 4: Analyzing Data

Once you have gathered all the necessary data, it is time to analyze it in order to draw meaningful insights and conclusions. The data analysis process involves a number of techniques that can be used to examine and interpret the data to identify patterns, trends, and relationships.

Techniques for Analyzing Data

There are several techniques for analyzing data, depending on the type and scope of the data. Some common techniques include:

Descriptive statistics: This involves using numerical measures such as mean, median, and standard deviation to summarize and describe the data.
Regression analysis: This technique is used to model the relationship between two or more variables and predict their future behavior.
Hypothesis testing: This involves using statistical tests to determine whether the results of a study are statistically significant or simply due to chance.

Depending on the nature of your data, you may need to use one or all of these techniques to fully analyze and interpret your findings. By carefully examining and understanding your data, you can gain valuable insights that can inform your decision-making and help you achieve your goals.

Step 5: Visualizing Data

After completing the data analysis process, it's time to communicate your insights in a clear and concise way. This is where data visualizations come in handy. By presenting your data in an easily digestible format, you can effectively communicate your findings to your audience.

Demonstrating how to use visualizations

Visualizations are a powerful tool for telling a story with data. There are many different types of visualizations you can use, such as charts, graphs, and maps. In this step, we will demonstrate how to use various types of visualizations to effectively communicate your insights.

Choose the right type of visualization for your data
Create clear and easy-to-understand visualizations
Use color, labels, and annotations to enhance your visualizations
Practice effective data storytelling through your visualizations

By following these guidelines, you can effectively use data visualizations to communicate your findings and insights to your audience.

Step 6: Interpreting and Communicating Results

After performing data analysis, it is crucial to interpret the findings accurately and communicate them effectively to various stakeholders. In this step, we will provide tips for interpreting the results of your analysis and communicating them to different audiences.

Interpreting Results

Ensure that you understand the limitations of your analysis and report them accurately to stakeholders
Look for trends and patterns in the data to identify key insights
Use visual aids such as charts and graphs to help explain the data
Consider performing additional analysis to support or challenge your findings

Communicating Results

It is important to adjust your communication style depending on the audience you are presenting to.

Provide the key findings upfront and in a clear and concise manner
Use relevant examples and stories to help illustrate the data
Be mindful of technical jargon and explain any complex terms or concepts
Use visual aids to help convey the data in an engaging way
Allow time for questions and feedback from the audience

By following these tips, you will be able to effectively interpret the results of your data analysis and communicate them to different stakeholders, ensuring that your findings are understood and impactful.

Conclusion: Summary of Key Takeaways and Tips for Improving Your Data Analysis Skills

Throughout this article, we have discussed the data analysis process and the steps involved in it. We have also covered different techniques and tools that can help you improve your data analysis skills. To summarize, here are the key takeaways:

Start with defining your problem and identifying the data you need to solve it.
Clean and preprocess your data to make sure it’s accurate and relevant.
Explore your data using different methods such as visualization and descriptive statistics.
Formulate hypotheses and test them using statistical methods and machine learning algorithms.
Publish and communicate your findings to stakeholders using clear and concise reports and presentations.

If you want to improve your data analysis skills, here are some tips:

Enhance your programming skills by learning languages like Python or R.
Familiarize yourself with statistical concepts and methods.
Practice working with different types of data and datasets.
Stay up-to-date with new tools and techniques in the data analysis field.
Participate in online courses and training programs to improve your skills.

By following these tips and regularly engaging in data analysis projects, you can significantly improve your skills and become a proficient data analyst.

How ExactBuyer Can Help You

Reach your best-fit prospects & candidates and close deals faster with verified prospect & candidate details updated in real-time. Sign up for ExactBuyer .

The Importance of Having a Customer Success Team for Your Business

Top Courses
Online Degrees
Find your New Career
Join for Free

What Is Data Analysis? (With Examples)

Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions.

[Featured image] A female data analyst takes notes on her laptop at a standing desk in a modern office space

"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock Holme's proclaims in Sir Arthur Conan Doyle's A Scandal in Bohemia.

This idea lies at the root of data analysis. When we can extract meaning from data, it empowers us to make better decisions. And we’re living in a time when we have more data than ever at our fingertips.

Companies are wisening up to the benefits of leveraging data. Data analysis can help a bank to personalize customer interactions, a health care system to predict future health needs, or an entertainment company to create the next big streaming hit.

The World Economic Forum Future of Jobs Report 2023 listed data analysts and scientists as one of the most in-demand jobs, alongside AI and machine learning specialists and big data specialists [ 1 ]. In this article, you'll learn more about the data analysis process, different types of data analysis, and recommended courses to help you get started in this exciting field.

Read more: How to Become a Data Analyst (with or Without a Degree)

Beginner-friendly data analysis courses

Interested in building your knowledge of data analysis today? Consider enrolling in one of these popular courses on Coursera:

In Google's Foundations: Data, Data, Everywhere course, you'll explore key data analysis concepts, tools, and jobs.

In Duke University's Data Analysis and Visualization course, you'll learn how to identify key components for data analytics projects, explore data visualization, and find out how to create a compelling data story.

Data analysis process

As the data available to companies continues to grow both in amount and complexity, so too does the need for an effective and efficient process by which to harness the value of that data. The data analysis process typically moves through several iterative phases. Let’s take a closer look at each.

Identify the business question you’d like to answer. What problem is the company trying to solve? What do you need to measure, and how will you measure it?

Collect the raw data sets you’ll need to help you answer the identified question. Data collection might come from internal sources, like a company’s client relationship management (CRM) software, or from secondary sources, like government records or social media application programming interfaces (APIs).

Clean the data to prepare it for analysis. This often involves purging duplicate and anomalous data, reconciling inconsistencies, standardizing data structure and format, and dealing with white spaces and other syntax errors.

Analyze the data. By manipulating the data using various data analysis techniques and tools, you can begin to find trends, correlations, outliers, and variations that tell a story. During this stage, you might use data mining to discover patterns within databases or data visualization software to help transform data into an easy-to-understand graphical format.

Interpret the results of your analysis to see how well the data answered your original question. What recommendations can you make based on the data? What are the limitations to your conclusions?

Learn more about data analysis in this lecture by Kevin, Director of Data Analytics at Google, from Google's Data Analytics Professional Certificate :

Read more: What Does a Data Analyst Do? A Career Guide

Types of data analysis (with examples)

Data can be used to answer questions and support decisions in many different ways. To identify the best way to analyze your date, it can help to familiarize yourself with the four types of data analysis commonly used in the field.

In this section, we’ll take a look at each of these data analysis methods, along with an example of how each might be applied in the real world.

Descriptive analysis

Descriptive analysis tells us what happened. This type of analysis helps describe or summarize quantitative data by presenting statistics. For example, descriptive statistical analysis could show the distribution of sales across a group of employees and the average sales figure per employee.

Descriptive analysis answers the question, “what happened?”

Diagnostic analysis

If the descriptive analysis determines the “what,” diagnostic analysis determines the “why.” Let’s say a descriptive analysis shows an unusual influx of patients in a hospital. Drilling into the data further might reveal that many of these patients shared symptoms of a particular virus. This diagnostic analysis can help you determine that an infectious agent—the “why”—led to the influx of patients.

Diagnostic analysis answers the question, “why did it happen?”

Predictive analysis

So far, we’ve looked at types of analysis that examine and draw conclusions about the past. Predictive analytics uses data to form projections about the future. Using predictive analysis, you might notice that a given product has had its best sales during the months of September and October each year, leading you to predict a similar high point during the upcoming year.

Predictive analysis answers the question, “what might happen in the future?”

Prescriptive analysis

Prescriptive analysis takes all the insights gathered from the first three types of analysis and uses them to form recommendations for how a company should act. Using our previous example, this type of analysis might suggest a market plan to build on the success of the high sales months and harness new growth opportunities in the slower months.

Prescriptive analysis answers the question, “what should we do about it?”

This last type is where the concept of data-driven decision-making comes into play.

Read more : Advanced Analytics: Definition, Benefits, and Use Cases

What is data-driven decision-making (DDDM)?

Data-driven decision-making, sometimes abbreviated to DDDM), can be defined as the process of making strategic business decisions based on facts, data, and metrics instead of intuition, emotion, or observation.

This might sound obvious, but in practice, not all organizations are as data-driven as they could be. According to global management consulting firm McKinsey Global Institute, data-driven companies are better at acquiring new customers, maintaining customer loyalty, and achieving above-average profitability [ 2 ].

Get started with Coursera

If you’re interested in a career in the high-growth field of data analytics, consider these top-rated courses on Coursera:

Begin building job-ready skills with the Google Data Analytics Professional Certificate . Prepare for an entry-level job as you learn from Google employees—no experience or degree required.

Practice working with data with Macquarie University's Excel Skills for Business Specialization . Learn how to use Microsoft Excel to analyze data and make data-informed business decisions.

Deepen your skill set with Google's Advanced Data Analytics Professional Certificate . In this advanced program, you'll continue exploring the concepts introduced in the beginner-level courses, plus learn Python, statistics, and Machine Learning concepts.

Frequently asked questions (FAQ)

Where is data analytics used ‎.

Just about any business or organization can use data analytics to help inform their decisions and boost their performance. Some of the most successful companies across a range of industries — from Amazon and Netflix to Starbucks and General Electric — integrate data into their business plans to improve their overall business performance. ‎

What are the top skills for a data analyst? ‎

Data analysis makes use of a range of analysis tools and technologies. Some of the top skills for data analysts include SQL, data visualization, statistical programming languages (like R and Python), machine learning, and spreadsheets.

Read : 7 In-Demand Data Analyst Skills to Get Hired in 2022 ‎

What is a data analyst job salary? ‎

Data from Glassdoor indicates that the average base salary for a data analyst in the United States is $75,349 as of March 2024 [ 3 ]. How much you make will depend on factors like your qualifications, experience, and location. ‎

Do data analysts need to be good at math? ‎

Data analytics tends to be less math-intensive than data science. While you probably won’t need to master any advanced mathematics, a foundation in basic math and statistical analysis can help set you up for success.

Learn more: Data Analyst vs. Data Scientist: What’s the Difference? ‎

Article sources

World Economic Forum. " The Future of Jobs Report 2023 , https://www3.weforum.org/docs/WEF_Future_of_Jobs_2023.pdf." Accessed March 19, 2024.

McKinsey & Company. " Five facts: How customer analytics boosts corporate performance , https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/five-facts-how-customer-analytics-boosts-corporate-performance." Accessed March 19, 2024.

Glassdoor. " Data Analyst Salaries , https://www.glassdoor.com/Salaries/data-analyst-salary-SRCH_KO0,12.htm" Accessed March 19, 2024.

Keep reading

Coursera staff.

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Data & Finance for Work & Life

Data Analysis for Qualitative Research: 6 Step Guide

Data analysis for qualitative research is not intuitive. This is because qualitative data stands in opposition to traditional data analysis methodologies: while data analysis is concerned with quantities, qualitative data is by definition unquantified . But there is an easy, methodical approach that anyone can take use to get reliable results when performing data analysis for qualitative research. The process consists of 6 steps that I’ll break down in this article:

Perform interviews(if necessary )
Gather all documents and transcribe any non-paper records
Decide whether to either code analytical data, analyze word frequencies, or both
Decide what interpretive angle you want to take: content analysis , narrative analysis, discourse analysis, framework analysis, and/or grounded theory
Compile your data in a spreadsheet using document saving techniques (windows and mac)
Identify trends in words, themes, metaphors, natural patterns, and more

To complete these steps, you will need:

Microsoft word
Microsoft excel
Internet access

You can get the free Intro to Data Analysis eBook to cover the fundamentals and ensure strong progression in all your data endeavors.

What is qualitative research?

Qualitative research is not the same as quantitative research. In short, qualitative research is the interpretation of non-numeric data. It usually aims at drawing conclusions that explain why a phenomenon occurs, rather than that one does occur. Here’s a great quote from a nursing magazine about quantitative vs qualitative research:

“A traditional quantitative study… uses a predetermined (and auditable) set of steps to confirm or refute [a] hypothesis. “In contrast, qualitative research often takes the position that an interpretive understanding is only possible by way of uncovering or deconstructing the meanings of a phenomenon. Thus, a distinction between explaining how something operates (explanation) and why it operates in the manner that it does (interpretation) may be [an] effective way to distinguish quantitative from qualitative analytic processes involved in any particular study.” (bold added) (( EBN ))

Learn to Interpret Your Qualitative Data

This article explain what data analysis is and how to do it. To learn how to interpret the results, visualize, and write an insightful report, sign up for our handbook below.

Step 1a: Data collection methods and techniques in qualitative research: interviews and focus groups

Step 1 is collecting the data that you will need for the analysis. If you are not performing any interviews or focus groups to gather data, then you can skip this step. It’s for people who need to go into the field and collect raw information as part of their qualitative analysis.

Since the whole point of an interview and of qualitative analysis in general is to understand a research question better, you should start by making sure you have a specific, refined research question . Whether you’re a researcher by trade or a data analyst working on one-time project, you must know specifically what you want to understand in order to get results.

Good research questions are specific enough to guide action but open enough to leave room for insight and growth. Examples of good research questions include:

Good : To what degree does living in a city impact the quality of a person’s life? (open-ended, complex)
Bad : Does living in a city impact the quality of a person’s life? (closed, simple)

Once you understand the research question, you need to develop a list of interview questions. These questions should likewise be open-ended and provide liberty of expression to the responder. They should support the research question in an active way without prejudicing the response. Examples of good interview questions include:

Good : Tell me what it’s like to live in a city versus in the country. (open, not leading)
Bad : Don’t you prefer the city to the country because there are more people? (closed, leading)

Some additional helpful tips include:

Begin each interview with a neutral question to get the person relaxed
Limit each question to a single idea
If you don’t understand, ask for clarity
Do not pass any judgements
Do not spend more than 15m on an interview, lest the quality of responses drop

Focus groups

The alternative to interviews is focus groups. Focus groups are a great way for you to get an idea for how people communicate their opinions in a group setting, rather than a one-on-one setting as in interviews.

In short, focus groups are gatherings of small groups of people from representative backgrounds who receive instruction, or “facilitation,” from a focus group leader. Typically, the leader will ask questions to stimulate conversation, reformulate questions to bring the discussion back to focus, and prevent the discussion from turning sour or giving way to bad faith.

Focus group questions should be open-ended like their interview neighbors, and they should stimulate some degree of disagreement. Disagreement often leads to valuable information about differing opinions, as people tend to say what they mean if contradicted.

However, focus group leaders must be careful not to let disagreements escalate, as anger can make people lie to be hurtful or simply to win an argument. And lies are not helpful in data analysis for qualitative research.

Step 1b: Tools for qualitative data collection

When it comes to data analysis for qualitative analysis, the tools you use to collect data should align to some degree with the tools you will use to analyze the data.

As mentioned in the intro, you will be focusing on analysis techniques that only require the traditional Microsoft suite programs: Microsoft Excel and Microsoft Word . At the same time, you can source supplementary tools from various websites, like Text Analyzer and WordCounter.

In short, the tools for qualitative data collection that you need are Excel and Word , as well as web-based free tools like Text Analyzer and WordCounter . These online tools are helpful in the quantitative part of your qualitative research.

Step 2: Gather all documents & transcribe non-written docs

Once you have your interviews and/or focus group transcripts, it’s time to decide if you need other documentation. If you do, you’ll need to gather it all into one place first, then develop a strategy for how to transcribe any non-written documents.

When do you need documentation other than interviews and focus groups? Two situations usually call for documentation. First , if you have little funding , then you can’t afford to run expensive interviews and focus groups.

Second , social science researchers typically focus on documents since their research questions are less concerned with subject-oriented data, while hard science and business researchers typically focus on interviews and focus groups because they want to know what people think, and they want to know today.

Non-written records

Other factors at play include the type of research, the field, and specific research goal. For those who need documentation and to describe non-written records, there are some steps to follow:

Put all hard copy source documents into a sealed binder (I use plastic paper holders with elastic seals ).
If you are sourcing directly from printed books or journals, then you will need to digitalize them by scanning them and making them text readable by the computer. To do so, turn all PDFs into Word documents using online tools such as PDF to Word Converter . This process is never full-proof, and it may be a source of error in the data collection, but it’s part of the process.
If you are sourcing online documents, try as often as possible to get computer-readable PDF documents that you can easily copy/paste or convert. Locked PDFs are essentially a lost cause .
Transcribe any audio files into written documents. There are free online tools available to help with this, such as 360converter . If you run a test through the system, you’ll see that the output is not 100%. The best way to use this tool is as a first draft generator. You can then correct and complete it with old fashioned, direct transcription.

Step 3: Decide on the type of qualitative research

Before step 3 you should have collected your data, transcribed it all into written-word documents, and compiled it in one place. Now comes the interesting part. You need to decide what you want to get out of your research by choosing an analytic angle, or type of qualitative research.

The available types of qualitative research are as follows. Each of them takes a unique angle that you must choose to get what information you want from the analysis . In addition, each of them has a different impact on the data analysis for qualitative research (coding vs word frequency) that we use.

Content analysis

Narrative analysis, discourse analysis.

Framework analysis, and/or

Grounded theory

From a high level, content, narrative, and discourse analysis are actionable independent tactics, whereas framework analysis and grounded theory are ways of honing and applying the first three.

Definition : Content analysis is identify and labelling themes of any kind within a text.
Focus : Identifying any kind of pattern in written text, transcribed audio, or transcribed video. This could be thematic, word repetition, idea repetition. Most often, the patterns we find are idea that make up an argument.
Goal : To simplify, standardize, and quickly reference ideas from any given text. Content analysis is a way to pull the main ideas from huge documents for comparison. In this way, it’s more a means to an end.
Pros : The huge advantage of doing content analysis is that you can quickly process huge amounts of texts using simple coding and word frequency techniques we will look at below. To use a metaphore, it is to qualitative analysis documents what Spark notes are to books.
Cons : The downside to content analysis is that it’s quite general. If you have a very specific, narrative research question, then tracing “any and all ideas” will not be very helpful to you.
Definition : Narrative analysis is the reformulation and simplification of interview answers or documentation into small narrative components to identify story-like patterns.
Focus : Understanding the text based on its narrative components as opposed to themes or other qualities.
Goal : To reference the text from an angle closer to the nature of texts in order to obtain further insights.
Pros : Narrative analysis is very useful for getting perspective on a topic in which you’re extremely limited. It can be easy to get tunnel vision when you’re digging for themes and ideas from a reason-centric perspective. Turning to a narrative approach will help you stay grounded. More importantly, it helps reveal different kinds of trends.
Cons : Narrative analysis adds another layer of subjectivity to the instinctive nature of qualitative research. Many see it as too dependent on the researcher to hold any critical value.
Definition : Discourse analysis is the textual analysis of naturally occurring speech. Any oral expression must be transcribed before undergoing legitimate discourse analysis.
Focus : Understanding ideas and themes through language communicated orally rather than pre-processed on paper.
Goal : To obtain insights from an angle outside the traditional content analysis on text.
Pros : Provides a considerable advantage in some areas of study in order to understand how people communicate an idea, versus the idea itself. For example, discourse analysis is important in political campaigning. People rarely vote for the candidate who most closely corresponds to his/her beliefs, but rather for the person they like the most.
Cons : As with narrative analysis, discourse analysis is more subjective in nature than content analysis, which focuses on ideas and patterns. Some do not consider it rigorous enough to be considered a legitimate subset of qualitative analysis, but these people are few.

Framework analysis

Definition : Framework analysis is a kind of qualitative analysis that includes 5 ordered steps: coding, indexing, charting, mapping, and interpreting . In most ways, framework analysis is a synonym for qualitative analysis — the same thing. The significant difference is the importance it places on the perspective used in the analysis.
Focus : Understanding patterns in themes and ideas.
Goal : Creating one specific framework for looking at a text.
Pros : Framework analysis is helpful when the researcher clearly understands what he/she wants from the project, as it’s a limitation approach. Since each of its step has defined parameters, framework analysis is very useful for teamwork.
Cons : It can lead to tunnel vision.
Definition : The use of content, narrative, and discourse analysis to examine a single case, in the hopes that discoveries from that case will lead to a foundational theory used to examine other like cases.
Focus : A vast approach using multiple techniques in order to establish patterns.
Goal : To develop a foundational theory.
Pros : When successful, grounded theories can revolutionize entire fields of study.
Cons : It’s very difficult to establish ground theories, and there’s an enormous amount of risk involved.

Step 4: Coding, word frequency, or both

Coding in data analysis for qualitative research is the process of writing 2-5 word codes that summarize at least 1 paragraphs of text (not writing computer code). This allows researchers to keep track of and analyze those codes. On the other hand, word frequency is the process of counting the presence and orientation of words within a text, which makes it the quantitative element in qualitative data analysis.

Video example of coding for data analysis in qualitative research

In short, coding in the context of data analysis for qualitative research follows 2 steps (video below):

Reading through the text one time
Adding 2-5 word summaries each time a significant theme or idea appears

Let’s look at a brief example of how to code for qualitative research in this video:

Click here for a link to the source text. 1

Example of word frequency processing

And word frequency is the process of finding a specific word or identifying the most common words through 3 steps:

Decide if you want to find 1 word or identify the most common ones
Use word’s “Replace” function to find a word or phrase
Use Text Analyzer to find the most common terms

Here’s another look at word frequency processing and how you to do it. Let’s look at the same example above, but from a quantitative perspective.

Imagine we are already familiar with melanoma and KITs , and we want to analyze the text based on these keywords. One thing we can do is look for these words using the Replace function in word

Locate the search bar
Click replace
Type in the word
See the total results

Here’s a brief video example:

Another option is to use an online Text Analyzer. This methodology won’t help us find a specific word, but it will help us discover the top performing phrases and words. All you need to do it put in a link to a target page or paste a text. I pasted the abstract from our source text, and what turns up is as expected. Here’s a picture:

Step 5: Compile your data in a spreadsheet

After you have some coded data in the word document, you need to get it into excel for analysis. This process requires saving the word doc as an .htm extension, which makes it a website. Once you have the website, it’s as simple as opening that page, scrolling to the bottom, and copying/pasting the comments, or codes, into an excel document.

You will need to wrangle the data slightly in order to make it readable in excel. I’ve made a video to explain this process and places it below.

Step 6: Identify trends & analyze!

There are literally thousands of different ways to analyze qualitative data, and in most situations, the best technique depends on the information you want to get out of the research.

Nevertheless, there are a few go-to techniques. The most important of this is occurrences . In this short video, we finish the example from above by counting the number of times our codes appear. In this way, it’s very similar to word frequency (discussed above).

A few other options include:

Ranking each code on a set of relevant criteria and clustering
Pure cluster analysis
Causal analysis

We cover different types of analysis like this on the website, so be sure to check out other articles on the home page .

How to analyze qualitative data from an interview

To analyze qualitative data from an interview , follow the same 6 steps for quantitative data analysis:

Perform the interviews
Transcribe the interviews onto paper
Decide whether to either code analytical data (open, axial, selective), analyze word frequencies, or both
Compile your data in a spreadsheet using document saving techniques (for windows and mac)
Source text [ ↩ ]

About the Author

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

File available immediately.

Notice: JavaScript is required for this content.

Quantitative Data Analysis: A Comprehensive Guide

By: Ofem Eteng | Published: May 18, 2022

A healthcare giant successfully introduces the most effective drug dosage through rigorous statistical modeling, saving countless lives. A marketing team predicts consumer trends with uncanny accuracy, tailoring campaigns for maximum impact.

Table of Contents

These trends and dosages are not just any numbers but are a result of meticulous quantitative data analysis. Quantitative data analysis offers a robust framework for understanding complex phenomena, evaluating hypotheses, and predicting future outcomes.

In this blog, we’ll walk through the concept of quantitative data analysis, the steps required, its advantages, and the methods and techniques that are used in this analysis. Read on!

What is Quantitative Data Analysis?

Quantitative data analysis is a systematic process of examining, interpreting, and drawing meaningful conclusions from numerical data. It involves the application of statistical methods, mathematical models, and computational techniques to understand patterns, relationships, and trends within datasets.

Quantitative data analysis methods typically work with algorithms, mathematical analysis tools, and software to gain insights from the data, answering questions such as how many, how often, and how much. Data for quantitative data analysis is usually collected from close-ended surveys, questionnaires, polls, etc. The data can also be obtained from sales figures, email click-through rates, number of website visitors, and percentage revenue increase.

Quantitative Data Analysis vs Qualitative Data Analysis

When we talk about data, we directly think about the pattern, the relationship, and the connection between the datasets – analyzing the data in short. Therefore when it comes to data analysis, there are broadly two types – Quantitative Data Analysis and Qualitative Data Analysis.

Quantitative data analysis revolves around numerical data and statistics, which are suitable for functions that can be counted or measured. In contrast, qualitative data analysis includes description and subjective information – for things that can be observed but not measured.

Let us differentiate between Quantitative Data Analysis and Quantitative Data Analysis for a better understanding.

Data Preparation Steps for Quantitative Data Analysis

Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis:

Step 1: Data Collection

Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as interviews, focus groups, surveys, and questionnaires.

Step 2: Data Cleaning

Once the data is collected, begin the data cleaning process by scanning through the entire data for duplicates, errors, and omissions. Keep a close eye for outliers (data points that are significantly different from the majority of the dataset) because they can skew your analysis results if they are not removed.

This data-cleaning process ensures data accuracy, consistency and relevancy before analysis.

Step 3: Data Analysis and Interpretation

Now that you have collected and cleaned your data, it is now time to carry out the quantitative analysis. There are two methods of quantitative data analysis, which we will discuss in the next section.

However, if you have data from multiple sources, collecting and cleaning it can be a cumbersome task. This is where Hevo Data steps in. With Hevo, extracting, transforming, and loading data from source to destination becomes a seamless task, eliminating the need for manual coding. This not only saves valuable time but also enhances the overall efficiency of data analysis and visualization, empowering users to derive insights quickly and with precision

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Now that you are familiar with what quantitative data analysis is and how to prepare your data for analysis, the focus will shift to the purpose of this article, which is to describe the methods and techniques of quantitative data analysis.

Methods and Techniques of Quantitative Data Analysis

Quantitative data analysis employs two techniques to extract meaningful insights from datasets, broadly. The first method is descriptive statistics, which summarizes and portrays essential features of a dataset, such as mean, median, and standard deviation.

Inferential statistics, the second method, extrapolates insights and predictions from a sample dataset to make broader inferences about an entire population, such as hypothesis testing and regression analysis.

An in-depth explanation of both the methods is provided below:

Descriptive Statistics
Inferential Statistics

1) Descriptive Statistics

Descriptive statistics as the name implies is used to describe a dataset. It helps understand the details of your data by summarizing it and finding patterns from the specific data sample. They provide absolute numbers obtained from a sample but do not necessarily explain the rationale behind the numbers and are mostly used for analyzing single variables. The methods used in descriptive statistics include:

Mean: This calculates the numerical average of a set of values.
Median: This is used to get the midpoint of a set of values when the numbers are arranged in numerical order.
Mode: This is used to find the most commonly occurring value in a dataset.
Percentage: This is used to express how a value or group of respondents within the data relates to a larger group of respondents.
Frequency: This indicates the number of times a value is found.
Range: This shows the highest and lowest values in a dataset.
Standard Deviation: This is used to indicate how dispersed a range of numbers is, meaning, it shows how close all the numbers are to the mean.
Skewness: It indicates how symmetrical a range of numbers is, showing if they cluster into a smooth bell curve shape in the middle of the graph or if they skew towards the left or right.

2) Inferential Statistics

In quantitative analysis, the expectation is to turn raw numbers into meaningful insight using numerical values, and descriptive statistics is all about explaining details of a specific dataset using numbers, but it does not explain the motives behind the numbers; hence, a need for further analysis using inferential statistics.

Inferential statistics aim to make predictions or highlight possible outcomes from the analyzed data obtained from descriptive statistics. They are used to generalize results and make predictions between groups, show relationships that exist between multiple variables, and are used for hypothesis testing that predicts changes or differences.

There are various statistical analysis methods used within inferential statistics; a few are discussed below.

Cross Tabulations: Cross tabulation or crosstab is used to show the relationship that exists between two variables and is often used to compare results by demographic groups. It uses a basic tabular form to draw inferences between different data sets and contains data that is mutually exclusive or has some connection with each other. Crosstabs help understand the nuances of a dataset and factors that may influence a data point.
Regression Analysis: Regression analysis estimates the relationship between a set of variables. It shows the correlation between a dependent variable (the variable or outcome you want to measure or predict) and any number of independent variables (factors that may impact the dependent variable). Therefore, the purpose of the regression analysis is to estimate how one or more variables might affect a dependent variable to identify trends and patterns to make predictions and forecast possible future trends. There are many types of regression analysis, and the model you choose will be determined by the type of data you have for the dependent variable. The types of regression analysis include linear regression, non-linear regression, binary logistic regression, etc.
Monte Carlo Simulation: Monte Carlo simulation, also known as the Monte Carlo method, is a computerized technique of generating models of possible outcomes and showing their probability distributions. It considers a range of possible outcomes and then tries to calculate how likely each outcome will occur. Data analysts use it to perform advanced risk analyses to help forecast future events and make decisions accordingly.
Analysis of Variance (ANOVA): This is used to test the extent to which two or more groups differ from each other. It compares the mean of various groups and allows the analysis of multiple groups.
Factor Analysis: A large number of variables can be reduced into a smaller number of factors using the factor analysis technique. It works on the principle that multiple separate observable variables correlate with each other because they are all associated with an underlying construct. It helps in reducing large datasets into smaller, more manageable samples.
Cohort Analysis: Cohort analysis can be defined as a subset of behavioral analytics that operates from data taken from a given dataset. Rather than looking at all users as one unit, cohort analysis breaks down data into related groups for analysis, where these groups or cohorts usually have common characteristics or similarities within a defined period.
MaxDiff Analysis: This is a quantitative data analysis method that is used to gauge customers’ preferences for purchase and what parameters rank higher than the others in the process.
Cluster Analysis: Cluster analysis is a technique used to identify structures within a dataset. Cluster analysis aims to be able to sort different data points into groups that are internally similar and externally different; that is, data points within a cluster will look like each other and different from data points in other clusters.
Time Series Analysis: This is a statistical analytic technique used to identify trends and cycles over time. It is simply the measurement of the same variables at different times, like weekly and monthly email sign-ups, to uncover trends, seasonality, and cyclic patterns. By doing this, the data analyst can forecast how variables of interest may fluctuate in the future.
SWOT analysis: This is a quantitative data analysis method that assigns numerical values to indicate strengths, weaknesses, opportunities, and threats of an organization, product, or service to show a clearer picture of competition to foster better business strategies

How to Choose the Right Method for your Analysis?

Choosing between Descriptive Statistics or Inferential Statistics can be often confusing. You should consider the following factors before choosing the right method for your quantitative data analysis:

1. Type of Data

The first consideration in data analysis is understanding the type of data you have. Different statistical methods have specific requirements based on these data types, and using the wrong method can render results meaningless. The choice of statistical method should align with the nature and distribution of your data to ensure meaningful and accurate analysis.

2. Your Research Questions

When deciding on statistical methods, it’s crucial to align them with your specific research questions and hypotheses. The nature of your questions will influence whether descriptive statistics alone, which reveal sample attributes, are sufficient or if you need both descriptive and inferential statistics to understand group differences or relationships between variables and make population inferences.

Pros and Cons of Quantitative Data Analysis

1. Objectivity and Generalizability:

Quantitative data analysis offers objective, numerical measurements, minimizing bias and personal interpretation.
Results can often be generalized to larger populations, making them applicable to broader contexts.

Example: A study using quantitative data analysis to measure student test scores can objectively compare performance across different schools and demographics, leading to generalizable insights about educational strategies.

2. Precision and Efficiency:

Statistical methods provide precise numerical results, allowing for accurate comparisons and prediction.
Large datasets can be analyzed efficiently with the help of computer software, saving time and resources.

Example: A marketing team can use quantitative data analysis to precisely track click-through rates and conversion rates on different ad campaigns, quickly identifying the most effective strategies for maximizing customer engagement.

3. Identification of Patterns and Relationships:

Statistical techniques reveal hidden patterns and relationships between variables that might not be apparent through observation alone.
This can lead to new insights and understanding of complex phenomena.

Example: A medical researcher can use quantitative analysis to pinpoint correlations between lifestyle factors and disease risk, aiding in the development of prevention strategies.

1. Limited Scope:

Quantitative analysis focuses on quantifiable aspects of a phenomenon , potentially overlooking important qualitative nuances, such as emotions, motivations, or cultural contexts.

Example: A survey measuring customer satisfaction with numerical ratings might miss key insights about the underlying reasons for their satisfaction or dissatisfaction, which could be better captured through open-ended feedback.

2. Oversimplification:

Reducing complex phenomena to numerical data can lead to oversimplification and a loss of richness in understanding.

Example: Analyzing employee productivity solely through quantitative metrics like hours worked or tasks completed might not account for factors like creativity, collaboration, or problem-solving skills, which are crucial for overall performance.

3. Potential for Misinterpretation:

Statistical results can be misinterpreted if not analyzed carefully and with appropriate expertise.
The choice of statistical methods and assumptions can significantly influence results.

This blog discusses the steps, methods, and techniques of quantitative data analysis. It also gives insights into the methods of data collection, the type of data one should work with, and the pros and cons of such analysis.

Gain a better understanding of data analysis with these essential reads:

Data Analysis and Modeling: 4 Critical Differences
Exploratory Data Analysis Simplified 101
25 Best Data Analysis Tools in 2024

Carrying out successful data analysis requires prepping the data and making it analysis-ready. That is where Hevo steps in.

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing Hevo price , which will assist you in selecting the best plan for your requirements.

Share your experience of understanding Quantitative Data Analysis in the comment section below! We would love to hear your thoughts.

Ofem is a freelance writer specializing in data-related topics, who has expertise in translating complex concepts. With a focus on data science, analytics, and emerging technologies.

No-code Data Pipeline for your Data Warehouse

Data Strategy

Get Started with Hevo

Preetipadma Khandavilli

Data Mining and Data Analysis: 4 Key Differences

Nicholas Samuel

Data Quality Analysis Simplified: A Comprehensive Guide 101

Sharon Rithika

Data Analysis in Tableau: Unleash the Power of COUNTIF

I want to read this e-book.

Data Analysis

Introduction to Data Analysis
Quantitative Analysis Tools
Qualitative Analysis Tools
Mixed Methods Analysis
Geospatial Analysis
Further Reading

What is Data Analysis?

According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ).

In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats.

Some examples of types of data are as follows:

Photographs
Hand-written notes from field observation
Machine learning training data sets
Ethnographic interview transcripts
Sheet music
Scripts for plays and musicals
Observations from laboratory experiments ( CMU Data 101 )

Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis.

Data analysis falls within the larger research data lifecycle, as seen below.

( University of Virginia )

Why Analyze Data?

Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data.

What are the Types of Data Analysis?

Data analysis can be quantitative, qualitative, or mixed methods.

Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning.

Qualitative research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning.

Mixed methods research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4).

Next: Planning >>
Last Updated: Apr 2, 2024 3:53 PM
URL: https://guides.library.georgetown.edu/data-analysis

AI & NLP
Churn & Loyalty
Customer Experience
Customer Journeys
Customer Metrics
Feedback Analysis
Product Experience
Product Updates
Sentiment Analysis
Surveys & Feedback Collection
Try Thematic

Welcome to the community

Qualitative Data Analysis: Step-by-Step Guide (Manual vs. Automatic)

When we conduct qualitative methods of research, need to explain changes in metrics or understand people's opinions, we always turn to qualitative data. Qualitative data is typically generated through:

Interview transcripts
Surveys with open-ended questions
Contact center transcripts
Texts and documents
Audio and video recordings
Observational notes

Compared to quantitative data, which captures structured information, qualitative data is unstructured and has more depth. It can answer our questions, can help formulate hypotheses and build understanding.

It's important to understand the differences between quantitative data & qualitative data . But unfortunately, analyzing qualitative data is difficult. While tools like Excel, Tableau and PowerBI crunch and visualize quantitative data with ease, there are a limited number of mainstream tools for analyzing qualitative data . The majority of qualitative data analysis still happens manually.

That said, there are two new trends that are changing this. First, there are advances in natural language processing (NLP) which is focused on understanding human language. Second, there is an explosion of user-friendly software designed for both researchers and businesses. Both help automate the qualitative data analysis process.

In this post we want to teach you how to conduct a successful qualitative data analysis. There are two primary qualitative data analysis methods; manual & automatic. We will teach you how to conduct the analysis manually, and also, automatically using software solutions powered by NLP. We’ll guide you through the steps to conduct a manual analysis, and look at what is involved and the role technology can play in automating this process.

More businesses are switching to fully-automated analysis of qualitative customer data because it is cheaper, faster, and just as accurate. Primarily, businesses purchase subscriptions to feedback analytics platforms so that they can understand customer pain points and sentiment.

We’ll take you through 5 steps to conduct a successful qualitative data analysis. Within each step we will highlight the key difference between the manual, and automated approach of qualitative researchers. Here's an overview of the steps:

The 5 steps to doing qualitative data analysis

Gathering and collecting your qualitative data
Organizing and connecting into your qualitative data
Coding your qualitative data
Analyzing the qualitative data for insights
Reporting on the insights derived from your analysis

What is Qualitative Data Analysis?

Qualitative data analysis is a process of gathering, structuring and interpreting qualitative data to understand what it represents.

Qualitative data is non-numerical and unstructured. Qualitative data generally refers to text, such as open-ended responses to survey questions or user interviews, but also includes audio, photos and video.

Businesses often perform qualitative data analysis on customer feedback. And within this context, qualitative data generally refers to verbatim text data collected from sources such as reviews, complaints, chat messages, support centre interactions, customer interviews, case notes or social media comments.

How is qualitative data analysis different from quantitative data analysis?

Understanding the differences between quantitative & qualitative data is important. When it comes to analyzing data, Qualitative Data Analysis serves a very different role to Quantitative Data Analysis. But what sets them apart?

Qualitative Data Analysis dives into the stories hidden in non-numerical data such as interviews, open-ended survey answers, or notes from observations. It uncovers the ‘whys’ and ‘hows’ giving a deep understanding of people’s experiences and emotions.

Quantitative Data Analysis on the other hand deals with numerical data, using statistics to measure differences, identify preferred options, and pinpoint root causes of issues. It steps back to address questions like "how many" or "what percentage" to offer broad insights we can apply to larger groups.

In short, Qualitative Data Analysis is like a microscope, helping us understand specific detail. Quantitative Data Analysis is like the telescope, giving us a broader perspective. Both are important, working together to decode data for different objectives.

Qualitative Data Analysis methods

Once all the data has been captured, there are a variety of analysis techniques available and the choice is determined by your specific research objectives and the kind of data you’ve gathered. Common qualitative data analysis methods include:

Content Analysis

This is a popular approach to qualitative data analysis. Other qualitative analysis techniques may fit within the broad scope of content analysis. Thematic analysis is a part of the content analysis. Content analysis is used to identify the patterns that emerge from text, by grouping content into words, concepts, and themes. Content analysis is useful to quantify the relationship between all of the grouped content. The Columbia School of Public Health has a detailed breakdown of content analysis .

Narrative Analysis

Narrative analysis focuses on the stories people tell and the language they use to make sense of them. It is particularly useful in qualitative research methods where customer stories are used to get a deep understanding of customers’ perspectives on a specific issue. A narrative analysis might enable us to summarize the outcomes of a focused case study.

Discourse Analysis

Discourse analysis is used to get a thorough understanding of the political, cultural and power dynamics that exist in specific situations. The focus of discourse analysis here is on the way people express themselves in different social contexts. Discourse analysis is commonly used by brand strategists who hope to understand why a group of people feel the way they do about a brand or product.

Thematic Analysis

Thematic analysis is used to deduce the meaning behind the words people use. This is accomplished by discovering repeating themes in text. These meaningful themes reveal key insights into data and can be quantified, particularly when paired with sentiment analysis . Often, the outcome of thematic analysis is a code frame that captures themes in terms of codes, also called categories. So the process of thematic analysis is also referred to as “coding”. A common use-case for thematic analysis in companies is analysis of customer feedback.

Grounded Theory

Grounded theory is a useful approach when little is known about a subject. Grounded theory starts by formulating a theory around a single data case. This means that the theory is “grounded”. Grounded theory analysis is based on actual data, and not entirely speculative. Then additional cases can be examined to see if they are relevant and can add to the original grounded theory.

Methods of qualitative data analysis; approaches and techniques to qualitative data analysis

Challenges of Qualitative Data Analysis

While Qualitative Data Analysis offers rich insights, it comes with its challenges. Each unique QDA method has its unique hurdles. Let’s take a look at the challenges researchers and analysts might face, depending on the chosen method.

Time and Effort (Narrative Analysis): Narrative analysis, which focuses on personal stories, demands patience. Sifting through lengthy narratives to find meaningful insights can be time-consuming, requires dedicated effort.
Being Objective (Grounded Theory): Grounded theory, building theories from data, faces the challenges of personal biases. Staying objective while interpreting data is crucial, ensuring conclusions are rooted in the data itself.
Complexity (Thematic Analysis): Thematic analysis involves identifying themes within data, a process that can be intricate. Categorizing and understanding themes can be complex, especially when each piece of data varies in context and structure. Thematic Analysis software can simplify this process.
Generalizing Findings (Narrative Analysis): Narrative analysis, dealing with individual stories, makes drawing broad challenging. Extending findings from a single narrative to a broader context requires careful consideration.
Managing Data (Thematic Analysis): Thematic analysis involves organizing and managing vast amounts of unstructured data, like interview transcripts. Managing this can be a hefty task, requiring effective data management strategies.
Skill Level (Grounded Theory): Grounded theory demands specific skills to build theories from the ground up. Finding or training analysts with these skills poses a challenge, requiring investment in building expertise.

Benefits of qualitative data analysis

Qualitative Data Analysis (QDA) is like a versatile toolkit, offering a tailored approach to understanding your data. The benefits it offers are as diverse as the methods. Let’s explore why choosing the right method matters.

Tailored Methods for Specific Needs: QDA isn't one-size-fits-all. Depending on your research objectives and the type of data at hand, different methods offer unique benefits. If you want emotive customer stories, narrative analysis paints a strong picture. When you want to explain a score, thematic analysis reveals insightful patterns
Flexibility with Thematic Analysis: thematic analysis is like a chameleon in the toolkit of QDA. It adapts well to different types of data and research objectives, making it a top choice for any qualitative analysis.
Deeper Understanding, Better Products: QDA helps you dive into people's thoughts and feelings. This deep understanding helps you build products and services that truly matches what people want, ensuring satisfied customers
Finding the Unexpected: Qualitative data often reveals surprises that we miss in quantitative data. QDA offers us new ideas and perspectives, for insights we might otherwise miss.
Building Effective Strategies: Insights from QDA are like strategic guides. They help businesses in crafting plans that match people’s desires.
Creating Genuine Connections: Understanding people’s experiences lets businesses connect on a real level. This genuine connection helps build trust and loyalty, priceless for any business.

How to do Qualitative Data Analysis: 5 steps

Now we are going to show how you can do your own qualitative data analysis. We will guide you through this process step by step. As mentioned earlier, you will learn how to do qualitative data analysis manually , and also automatically using modern qualitative data and thematic analysis software.

To get best value from the analysis process and research process, it’s important to be super clear about the nature and scope of the question that’s being researched. This will help you select the research collection channels that are most likely to help you answer your question.

Depending on if you are a business looking to understand customer sentiment, or an academic surveying a school, your approach to qualitative data analysis will be unique.

Once you’re clear, there’s a sequence to follow. And, though there are differences in the manual and automatic approaches, the process steps are mostly the same.

The use case for our step-by-step guide is a company looking to collect data (customer feedback data), and analyze the customer feedback - in order to improve customer experience. By analyzing the customer feedback the company derives insights about their business and their customers. You can follow these same steps regardless of the nature of your research. Let’s get started.

Step 1: Gather your qualitative data and conduct research (Conduct qualitative research)

The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

Classic methods of gathering qualitative data

Most companies use traditional methods for gathering qualitative data: conducting interviews with research participants, running surveys, and running focus groups. This data is typically stored in documents, CRMs, databases and knowledge bases. It’s important to examine which data is available and needs to be included in your research project, based on its scope.

Using your existing qualitative feedback

As it becomes easier for customers to engage across a range of different channels, companies are gathering increasingly large amounts of both solicited and unsolicited qualitative feedback.

Most organizations have now invested in Voice of Customer programs , support ticketing systems, chatbot and support conversations, emails and even customer Slack chats.

These new channels provide companies with new ways of getting feedback, and also allow the collection of unstructured feedback data at scale.

The great thing about this data is that it contains a wealth of valubale insights and that it’s already there! When you have a new question about user behavior or your customers, you don’t need to create a new research study or set up a focus group. You can find most answers in the data you already have.

Typically, this data is stored in third-party solutions or a central database, but there are ways to export it or connect to a feedback analysis solution through integrations or an API.

Utilize untapped qualitative data channels

There are many online qualitative data sources you may not have considered. For example, you can find useful qualitative data in social media channels like Twitter or Facebook. Online forums, review sites, and online communities such as Discourse or Reddit also contain valuable data about your customers, or research questions.

If you are considering performing a qualitative benchmark analysis against competitors - the internet is your best friend. Gathering feedback in competitor reviews on sites like Trustpilot, G2, Capterra, Better Business Bureau or on app stores is a great way to perform a competitor benchmark analysis.

Customer feedback analysis software often has integrations into social media and review sites, or you could use a solution like DataMiner to scrape the reviews.

G2.com reviews of the product Airtable. You could pull reviews from G2 for your analysis.

Step 2: Connect & organize all your qualitative data

Now you all have this qualitative data but there’s a problem, the data is unstructured. Before feedback can be analyzed and assigned any value, it needs to be organized in a single place. Why is this important? Consistency!

If all data is easily accessible in one place and analyzed in a consistent manner, you will have an easier time summarizing and making decisions based on this data.

The manual approach to organizing your data

The classic method of structuring qualitative data is to plot all the raw data you’ve gathered into a spreadsheet.

Typically, research and support teams would share large Excel sheets and different business units would make sense of the qualitative feedback data on their own. Each team collects and organizes the data in a way that best suits them, which means the feedback tends to be kept in separate silos.

An alternative and a more robust solution is to store feedback in a central database, like Snowflake or Amazon Redshift .

Keep in mind that when you organize your data in this way, you are often preparing it to be imported into another software. If you go the route of a database, you would need to use an API to push the feedback into a third-party software.

Computer-assisted qualitative data analysis software (CAQDAS)

Traditionally within the manual analysis approach (but not always), qualitative data is imported into CAQDAS software for coding.

In the early 2000s, CAQDAS software was popularised by developers such as ATLAS.ti, NVivo and MAXQDA and eagerly adopted by researchers to assist with the organizing and coding of data.

The benefits of using computer-assisted qualitative data analysis software:

Assists in the organizing of your data
Opens you up to exploring different interpretations of your data analysis
Allows you to share your dataset easier and allows group collaboration (allows for secondary analysis)

However you still need to code the data, uncover the themes and do the analysis yourself. Therefore it is still a manual approach.

The user interface of CAQDAS software 'NVivo'

Organizing your qualitative data in a feedback repository

Another solution to organizing your qualitative data is to upload it into a feedback repository where it can be unified with your other data , and easily searchable and taggable. There are a number of software solutions that act as a central repository for your qualitative research data. Here are a couple solutions that you could investigate:

Dovetail: Dovetail is a research repository with a focus on video and audio transcriptions. You can tag your transcriptions within the platform for theme analysis. You can also upload your other qualitative data such as research reports, survey responses, support conversations, and customer interviews. Dovetail acts as a single, searchable repository. And makes it easier to collaborate with other people around your qualitative research.
EnjoyHQ: EnjoyHQ is another research repository with similar functionality to Dovetail. It boasts a more sophisticated search engine, but it has a higher starting subscription cost.

Organizing your qualitative data in a feedback analytics platform

If you have a lot of qualitative customer or employee feedback, from the likes of customer surveys or employee surveys, you will benefit from a feedback analytics platform. A feedback analytics platform is a software that automates the process of both sentiment analysis and thematic analysis . Companies use the integrations offered by these platforms to directly tap into their qualitative data sources (review sites, social media, survey responses, etc.). The data collected is then organized and analyzed consistently within the platform.

If you have data prepared in a spreadsheet, it can also be imported into feedback analytics platforms.

Once all this rich data has been organized within the feedback analytics platform, it is ready to be coded and themed, within the same platform. Thematic is a feedback analytics platform that offers one of the largest libraries of integrations with qualitative data sources.

Some of qualitative data integrations offered by Thematic

Step 3: Coding your qualitative data

Your feedback data is now organized in one place. Either within your spreadsheet, CAQDAS, feedback repository or within your feedback analytics platform. The next step is to code your feedback data so we can extract meaningful insights in the next step.

Coding is the process of labelling and organizing your data in such a way that you can then identify themes in the data, and the relationships between these themes.

To simplify the coding process, you will take small samples of your customer feedback data, come up with a set of codes, or categories capturing themes, and label each piece of feedback, systematically, for patterns and meaning. Then you will take a larger sample of data, revising and refining the codes for greater accuracy and consistency as you go.

If you choose to use a feedback analytics platform, much of this process will be automated and accomplished for you.

The terms to describe different categories of meaning (‘theme’, ‘code’, ‘tag’, ‘category’ etc) can be confusing as they are often used interchangeably. For clarity, this article will use the term ‘code’.

To code means to identify key words or phrases and assign them to a category of meaning. “I really hate the customer service of this computer software company” would be coded as “poor customer service”.

How to manually code your qualitative data

Decide whether you will use deductive or inductive coding. Deductive coding is when you create a list of predefined codes, and then assign them to the qualitative data. Inductive coding is the opposite of this, you create codes based on the data itself. Codes arise directly from the data and you label them as you go. You need to weigh up the pros and cons of each coding method and select the most appropriate.
Read through the feedback data to get a broad sense of what it reveals. Now it’s time to start assigning your first set of codes to statements and sections of text.
Keep repeating step 2, adding new codes and revising the code description as often as necessary. Once it has all been coded, go through everything again, to be sure there are no inconsistencies and that nothing has been overlooked.
Create a code frame to group your codes. The coding frame is the organizational structure of all your codes. And there are two commonly used types of coding frames, flat, or hierarchical. A hierarchical code frame will make it easier for you to derive insights from your analysis.
Based on the number of times a particular code occurs, you can now see the common themes in your feedback data. This is insightful! If ‘bad customer service’ is a common code, it’s time to take action.

We have a detailed guide dedicated to manually coding your qualitative data .

Example of a hierarchical coding frame in qualitative data analysis

Using software to speed up manual coding of qualitative data

An Excel spreadsheet is still a popular method for coding. But various software solutions can help speed up this process. Here are some examples.

CAQDAS / NVivo - CAQDAS software has built-in functionality that allows you to code text within their software. You may find the interface the software offers easier for managing codes than a spreadsheet.
Dovetail/EnjoyHQ - You can tag transcripts and other textual data within these solutions. As they are also repositories you may find it simpler to keep the coding in one platform.
IBM SPSS - SPSS is a statistical analysis software that may make coding easier than in a spreadsheet.
Ascribe - Ascribe’s ‘Coder’ is a coding management system. Its user interface will make it easier for you to manage your codes.

Automating the qualitative coding process using thematic analysis software

In solutions which speed up the manual coding process, you still have to come up with valid codes and often apply codes manually to pieces of feedback. But there are also solutions that automate both the discovery and the application of codes.

Advances in machine learning have now made it possible to read, code and structure qualitative data automatically. This type of automated coding is offered by thematic analysis software .

Automation makes it far simpler and faster to code the feedback and group it into themes. By incorporating natural language processing (NLP) into the software, the AI looks across sentences and phrases to identify common themes meaningful statements. Some automated solutions detect repeating patterns and assign codes to them, others make you train the AI by providing examples. You could say that the AI learns the meaning of the feedback on its own.

Thematic automates the coding of qualitative feedback regardless of source. There’s no need to set up themes or categories in advance. Simply upload your data and wait a few minutes. You can also manually edit the codes to further refine their accuracy. Experiments conducted indicate that Thematic’s automated coding is just as accurate as manual coding .

Paired with sentiment analysis and advanced text analytics - these automated solutions become powerful for deriving quality business or research insights.

You could also build your own , if you have the resources!

The key benefits of using an automated coding solution

Automated analysis can often be set up fast and there’s the potential to uncover things that would never have been revealed if you had given the software a prescribed list of themes to look for.

Because the model applies a consistent rule to the data, it captures phrases or statements that a human eye might have missed.

Complete and consistent analysis of customer feedback enables more meaningful findings. Leading us into step 4.

Step 4: Analyze your data: Find meaningful insights

Now we are going to analyze our data to find insights. This is where we start to answer our research questions. Keep in mind that step 4 and step 5 (tell the story) have some overlap . This is because creating visualizations is both part of analysis process and reporting.

The task of uncovering insights is to scour through the codes that emerge from the data and draw meaningful correlations from them. It is also about making sure each insight is distinct and has enough data to support it.

Part of the analysis is to establish how much each code relates to different demographics and customer profiles, and identify whether there’s any relationship between these data points.

Manually create sub-codes to improve the quality of insights

If your code frame only has one level, you may find that your codes are too broad to be able to extract meaningful insights. This is where it is valuable to create sub-codes to your primary codes. This process is sometimes referred to as meta coding.

Note: If you take an inductive coding approach, you can create sub-codes as you are reading through your feedback data and coding it.

While time-consuming, this exercise will improve the quality of your analysis. Here is an example of what sub-codes could look like.

You need to carefully read your qualitative data to create quality sub-codes. But as you can see, the depth of analysis is greatly improved. By calculating the frequency of these sub-codes you can get insight into which customer service problems you can immediately address.

Correlate the frequency of codes to customer segments

Many businesses use customer segmentation . And you may have your own respondent segments that you can apply to your qualitative analysis. Segmentation is the practise of dividing customers or research respondents into subgroups.

Segments can be based on:

Demographic
And any other data type that you care to segment by

It is particularly useful to see the occurrence of codes within your segments. If one of your customer segments is considered unimportant to your business, but they are the cause of nearly all customer service complaints, it may be in your best interest to focus attention elsewhere. This is a useful insight!

Manually visualizing coded qualitative data

There are formulas you can use to visualize key insights in your data. The formulas we will suggest are imperative if you are measuring a score alongside your feedback.

If you are collecting a metric alongside your qualitative data this is a key visualization. Impact answers the question: “What’s the impact of a code on my overall score?”. Using Net Promoter Score (NPS) as an example, first you need to:

Calculate overall NPS
Calculate NPS in the subset of responses that do not contain that theme
Subtract B from A

Then you can use this simple formula to calculate code impact on NPS .

Visualizing qualitative data: Calculating the impact of a code on your score

You can then visualize this data using a bar chart.

You can download our CX toolkit - it includes a template to recreate this.

Trends over time

This analysis can help you answer questions like: “Which codes are linked to decreases or increases in my score over time?”

We need to compare two sequences of numbers: NPS over time and code frequency over time . Using Excel, calculate the correlation between the two sequences, which can be either positive (the more codes the higher the NPS, see picture below), or negative (the more codes the lower the NPS).

Now you need to plot code frequency against the absolute value of code correlation with NPS. Here is the formula:

Analyzing qualitative data: Calculate which codes are linked to increases or decreases in my score

The visualization could look like this:

Visualizing qualitative data trends over time

These are two examples, but there are more. For a third manual formula, and to learn why word clouds are not an insightful form of analysis, read our visualizations article .

Using a text analytics solution to automate analysis

Automated text analytics solutions enable codes and sub-codes to be pulled out of the data automatically. This makes it far faster and easier to identify what’s driving negative or positive results. And to pick up emerging trends and find all manner of rich insights in the data.

Another benefit of AI-driven text analytics software is its built-in capability for sentiment analysis, which provides the emotive context behind your feedback and other qualitative textual data therein.

Thematic provides text analytics that goes further by allowing users to apply their expertise on business context to edit or augment the AI-generated outputs.

Since the move away from manual research is generally about reducing the human element, adding human input to the technology might sound counter-intuitive. However, this is mostly to make sure important business nuances in the feedback aren’t missed during coding. The result is a higher accuracy of analysis. This is sometimes referred to as augmented intelligence .

Codes displayed by volume within Thematic. You can 'manage themes' to introduce human input.

Step 5: Report on your data: Tell the story

The last step of analyzing your qualitative data is to report on it, to tell the story. At this point, the codes are fully developed and the focus is on communicating the narrative to the audience.

A coherent outline of the qualitative research, the findings and the insights is vital for stakeholders to discuss and debate before they can devise a meaningful course of action.

Creating graphs and reporting in Powerpoint

Typically, qualitative researchers take the tried and tested approach of distilling their report into a series of charts, tables and other visuals which are woven into a narrative for presentation in Powerpoint.

Using visualization software for reporting

With data transformation and APIs, the analyzed data can be shared with data visualisation software, such as Power BI or Tableau , Google Studio or Looker. Power BI and Tableau are among the most preferred options.

Visualizing your insights inside a feedback analytics platform

Feedback analytics platforms, like Thematic, incorporate visualisation tools that intuitively turn key data and insights into graphs. This removes the time consuming work of constructing charts to visually identify patterns and creates more time to focus on building a compelling narrative that highlights the insights, in bite-size chunks, for executive teams to review.

Using a feedback analytics platform with visualization tools means you don’t have to use a separate product for visualizations. You can export graphs into Powerpoints straight from the platforms.

Two examples of qualitative data visualizations within Thematic

Conclusion - Manual or Automated?

There are those who remain deeply invested in the manual approach - because it’s familiar, because they’re reluctant to spend money and time learning new software, or because they’ve been burned by the overpromises of AI.

For projects that involve small datasets, manual analysis makes sense. For example, if the objective is simply to quantify a simple question like “Do customers prefer X concepts to Y?”. If the findings are being extracted from a small set of focus groups and interviews, sometimes it’s easier to just read them

However, as new generations come into the workplace, it’s technology-driven solutions that feel more comfortable and practical. And the merits are undeniable. Especially if the objective is to go deeper and understand the ‘why’ behind customers’ preference for X or Y. And even more especially if time and money are considerations.

The ability to collect a free flow of qualitative feedback data at the same time as the metric means AI can cost-effectively scan, crunch, score and analyze a ton of feedback from one system in one go. And time-intensive processes like focus groups, or coding, that used to take weeks, can now be completed in a matter of hours or days.

But aside from the ever-present business case to speed things up and keep costs down, there are also powerful research imperatives for automated analysis of qualitative data: namely, accuracy and consistency.

Finding insights hidden in feedback requires consistency, especially in coding. Not to mention catching all the ‘unknown unknowns’ that can skew research findings and steering clear of cognitive bias.

Some say without manual data analysis researchers won’t get an accurate “feel” for the insights. However, the larger data sets are, the harder it is to sort through the feedback and organize feedback that has been pulled from different places. And, the more difficult it is to stay on course, the greater the risk of drawing incorrect, or incomplete, conclusions grows.

Though the process steps for qualitative data analysis have remained pretty much unchanged since psychologist Paul Felix Lazarsfeld paved the path a hundred years ago, the impact digital technology has had on types of qualitative feedback data and the approach to the analysis are profound.

If you want to try an automated feedback analysis solution on your own qualitative data, you can get started with Thematic .

Community & Marketing

Tyler manages our community of CX, insights & analytics professionals. Tyler's goal is to help unite insights professionals around common challenges.

We make it easy to discover the customer and product issues that matter.

Unlock the value of feedback at scale, in one platform. Try it for free now!

Questions to ask your Feedback Analytics vendor
How to end customer churn for good
Scalable analysis of NPS verbatims
5 Text analytics approaches
How to calculate the ROI of CX

Our experts will show you how Thematic works, how to discover pain points and track the ROI of decisions. To access your free trial, book a personal demo today.

1. Informed Decision-Making

Data analysis is the compass that guides decision-makers through a sea of information. It enables organizations to base their choices on concrete evidence rather than intuition or guesswork. In business, this means making decisions more likely to lead to success, whether choosing the right marketing strategy, optimizing supply chains, or launching new products. By analyzing data, decision-makers can assess various options' potential risks and rewards, leading to better choices.

2. Improved Understanding

Data analysis provides a deeper understanding of processes, behaviors, and trends. It allows organizations to gain insights into customer preferences, market dynamics, and operational efficiency .

3. Competitive Advantage

Organizations can identify opportunities and threats by analyzing market trends, consumer behavior , and competitor performance. They can pivot their strategies to respond effectively, staying one step ahead of the competition. This ability to adapt and innovate based on data insights can lead to a significant competitive advantage.

Become a Data Science & Business Analytics Professional

11.5 M Expected New Jobs For Data Science And Analytics
28% Annual Job Growth By 2026
$46K-$100K Average Annual Salary

Post Graduate Program in Data Analytics

Post Graduate Program certificate and Alumni Association membership
Exclusive hackathons and Ask me Anything sessions by IBM

Data Analyst

Industry-recognized Data Analyst Master’s certificate from Simplilearn
Dedicated live sessions by faculty of industry experts

Here's what learners are saying regarding our programs:

Felix Chong

Project manage , codethink.

After completing this course, I landed a new job & a salary hike of 30%. I now work with Zuhlke Group as a Project Manager.

Gayathri Ramesh

Associate data engineer , publicis sapient.

The course was well structured and curated. The live classes were extremely helpful. They made learning more productive and interactive. The program helped me change my domain from a data analyst to an Associate Data Engineer.

4. Risk Mitigation

Data analysis is a valuable tool for risk assessment and management. Organizations can assess potential issues and take preventive measures by analyzing historical data. For instance, data analysis detects fraudulent activities in the finance industry by identifying unusual transaction patterns. This not only helps minimize financial losses but also safeguards the reputation and trust of customers.

5. Efficient Resource Allocation

Data analysis helps organizations optimize resource allocation. Whether it's allocating budgets, human resources, or manufacturing capacities, data-driven insights can ensure that resources are utilized efficiently. For example, data analysis can help hospitals allocate staff and resources to the areas with the highest patient demand, ensuring that patient care remains efficient and effective.

6. Continuous Improvement

Data analysis is a catalyst for continuous improvement. It allows organizations to monitor performance metrics, track progress, and identify areas for enhancement. This iterative process of analyzing data, implementing changes, and analyzing again leads to ongoing refinement and excellence in processes and products.

The data analysis process is a structured sequence of steps that lead from raw data to actionable insights. Here are the answers to what is data analysis:

Data Collection: Gather relevant data from various sources, ensuring data quality and integrity.
Data Cleaning: Identify and rectify errors, missing values, and inconsistencies in the dataset. Clean data is crucial for accurate analysis.
Exploratory Data Analysis (EDA): Conduct preliminary analysis to understand the data's characteristics, distributions, and relationships. Visualization techniques are often used here.
Data Transformation: Prepare the data for analysis by encoding categorical variables, scaling features, and handling outliers, if necessary.
Model Building: Depending on the objectives, apply appropriate data analysis methods, such as regression, clustering, or deep learning.
Model Evaluation: Depending on the problem type, assess the models' performance using metrics like Mean Absolute Error, Root Mean Squared Error , or others.
Interpretation and Visualization: Translate the model's results into actionable insights. Visualizations, tables, and summary statistics help in conveying findings effectively.
Deployment: Implement the insights into real-world solutions or strategies, ensuring that the data-driven recommendations are implemented.

1. Regression Analysis

Regression analysis is a powerful method for understanding the relationship between a dependent and one or more independent variables. It is applied in economics, finance, and social sciences. By fitting a regression model, you can make predictions, analyze cause-and-effect relationships, and uncover trends within your data.

2. Statistical Analysis

Statistical analysis encompasses a broad range of techniques for summarizing and interpreting data. It involves descriptive statistics (mean, median, standard deviation), inferential statistics (hypothesis testing, confidence intervals), and multivariate analysis. Statistical methods help make inferences about populations from sample data, draw conclusions, and assess the significance of results.

3. Cohort Analysis

Cohort analysis focuses on understanding the behavior of specific groups or cohorts over time. It can reveal patterns, retention rates, and customer lifetime value, helping businesses tailor their strategies.

4. Content Analysis

It is a qualitative data analysis method used to study the content of textual, visual, or multimedia data. Social sciences, journalism, and marketing often employ it to analyze themes, sentiments, or patterns within documents or media. Content analysis can help researchers gain insights from large volumes of unstructured data.

5. Factor Analysis

Factor analysis is a technique for uncovering underlying latent factors that explain the variance in observed variables. It is commonly used in psychology and the social sciences to reduce the dimensionality of data and identify underlying constructs. Factor analysis can simplify complex datasets, making them easier to interpret and analyze.

6. Monte Carlo Method

This method is a simulation technique that uses random sampling to solve complex problems and make probabilistic predictions. Monte Carlo simulations allow analysts to model uncertainty and risk, making it a valuable tool for decision-making.

7. Text Analysis

Also known as text mining , this method involves extracting insights from textual data. It analyzes large volumes of text, such as social media posts, customer reviews, or documents. Text analysis can uncover sentiment, topics, and trends, enabling organizations to understand public opinion, customer feedback, and emerging issues.

8. Time Series Analysis

Time series analysis deals with data collected at regular intervals over time. It is essential for forecasting, trend analysis, and understanding temporal patterns. Time series methods include moving averages, exponential smoothing, and autoregressive integrated moving average (ARIMA) models. They are widely used in finance for stock price prediction, meteorology for weather forecasting, and economics for economic modeling.

9. Descriptive Analysis

Descriptive analysis involves summarizing and describing the main features of a dataset. It focuses on organizing and presenting the data in a meaningful way, often using measures such as mean, median, mode, and standard deviation. It provides an overview of the data and helps identify patterns or trends.

10. Inferential Analysis

Inferential analysis aims to make inferences or predictions about a larger population based on sample data. It involves applying statistical techniques such as hypothesis testing, confidence intervals, and regression analysis. It helps generalize findings from a sample to a larger population.

11. Exploratory Data Analysis (EDA)

EDA focuses on exploring and understanding the data without preconceived hypotheses. It involves visualizations, summary statistics, and data profiling techniques to uncover patterns, relationships, and interesting features. It helps generate hypotheses for further analysis.

12. Diagnostic Analysis

Diagnostic analysis aims to understand the cause-and-effect relationships within the data. It investigates the factors or variables that contribute to specific outcomes or behaviors. Techniques such as regression analysis, ANOVA (Analysis of Variance), or correlation analysis are commonly used in diagnostic analysis.

13. Predictive Analysis

Predictive analysis involves using historical data to make predictions or forecasts about future outcomes. It utilizes statistical modeling techniques, machine learning algorithms, and time series analysis to identify patterns and build predictive models. It is often used for forecasting sales, predicting customer behavior, or estimating risk.

14. Prescriptive Analysis

Prescriptive analysis goes beyond predictive analysis by recommending actions or decisions based on the predictions. It combines historical data, optimization algorithms, and business rules to provide actionable insights and optimize outcomes. It helps in decision-making and resource allocation.

Our Data Analyst Master's Program will help you learn analytics tools and techniques to become a Data Analyst expert! It's the pefect course for you to jumpstart your career. Enroll now!

Data analysis is a versatile and indispensable tool that finds applications across various industries and domains. Its ability to extract actionable insights from data has made it a fundamental component of decision-making and problem-solving. Let's explore some of the key applications of data analysis:

1. Business and Marketing

Market Research: Data analysis helps businesses understand market trends, consumer preferences, and competitive landscapes. It aids in identifying opportunities for product development, pricing strategies, and market expansion.
Sales Forecasting: Data analysis models can predict future sales based on historical data, seasonality, and external factors. This helps businesses optimize inventory management and resource allocation.

2. Healthcare and Life Sciences

Disease Diagnosis: Data analysis is vital in medical diagnostics, from interpreting medical images (e.g., MRI, X-rays) to analyzing patient records. Machine learning models can assist in early disease detection.
Drug Discovery: Pharmaceutical companies use data analysis to identify potential drug candidates, predict their efficacy, and optimize clinical trials.
Genomics and Personalized Medicine: Genomic data analysis enables personalized treatment plans by identifying genetic markers that influence disease susceptibility and response to therapies.
Risk Management: Financial institutions use data analysis to assess credit risk, detect fraudulent activities, and model market risks.
Algorithmic Trading: Data analysis is integral to developing trading algorithms that analyze market data and execute trades automatically based on predefined strategies.
Fraud Detection: Credit card companies and banks employ data analysis to identify unusual transaction patterns and detect fraudulent activities in real time.

4. Manufacturing and Supply Chain

Quality Control: Data analysis monitors and controls product quality on manufacturing lines. It helps detect defects and ensure consistency in production processes.
Inventory Optimization: By analyzing demand patterns and supply chain data, businesses can optimize inventory levels, reduce carrying costs, and ensure timely deliveries.

5. Social Sciences and Academia

Social Research: Researchers in social sciences analyze survey data, interviews, and textual data to study human behavior, attitudes, and trends. It helps in policy development and understanding societal issues.
Academic Research: Data analysis is crucial to scientific physics, biology, and environmental science research. It assists in interpreting experimental results and drawing conclusions.

6. Internet and Technology

Search Engines: Google uses complex data analysis algorithms to retrieve and rank search results based on user behavior and relevance.
Recommendation Systems: Services like Netflix and Amazon leverage data analysis to recommend content and products to users based on their past preferences and behaviors.

7. Environmental Science

Climate Modeling: Data analysis is essential in climate science. It analyzes temperature, precipitation, and other environmental data. It helps in understanding climate patterns and predicting future trends.
Environmental Monitoring: Remote sensing data analysis monitors ecological changes, including deforestation, water quality, and air pollution.

1. Descriptive Statistics

Descriptive statistics provide a snapshot of a dataset's central tendencies and variability. These techniques help summarize and understand the data's basic characteristics.

2. Inferential Statistics

Inferential statistics involve making predictions or inferences based on a sample of data. Techniques include hypothesis testing, confidence intervals, and regression analysis. These methods are crucial for drawing conclusions from data and assessing the significance of findings.

3. Regression Analysis

It explores the relationship between one or more independent variables and a dependent variable. It is widely used for prediction and understanding causal links. Linear, logistic, and multiple regression are common in various fields.

4. Clustering Analysis

It is an unsupervised learning method that groups similar data points. K-means clustering and hierarchical clustering are examples. This technique is used for customer segmentation, anomaly detection, and pattern recognition.

5. Classification Analysis

Classification analysis assigns data points to predefined categories or classes. It's often used in applications like spam email detection, image recognition, and sentiment analysis. Popular algorithms include decision trees, support vector machines, and neural networks.

6. Time Series Analysis

Time series analysis deals with data collected over time, making it suitable for forecasting and trend analysis. Techniques like moving averages, autoregressive integrated moving averages (ARIMA), and exponential smoothing are applied in fields like finance, economics, and weather forecasting.

7. Text Analysis (Natural Language Processing - NLP)

Text analysis techniques, part of NLP , enable extracting insights from textual data. These methods include sentiment analysis, topic modeling, and named entity recognition. Text analysis is widely used for analyzing customer reviews, social media content, and news articles.

8. Principal Component Analysis

It is a dimensionality reduction technique that simplifies complex datasets while retaining important information. It transforms correlated variables into a set of linearly uncorrelated variables, making it easier to analyze and visualize high-dimensional data.

9. Anomaly Detection

Anomaly detection identifies unusual patterns or outliers in data. It's critical in fraud detection, network security, and quality control. Techniques like statistical methods, clustering-based approaches, and machine learning algorithms are employed for anomaly detection.

10. Data Mining

Data mining involves the automated discovery of patterns, associations, and relationships within large datasets. Techniques like association rule mining, frequent pattern analysis, and decision tree mining extract valuable knowledge from data.

11. Machine Learning and Deep Learning

ML and deep learning algorithms are applied for predictive modeling, classification, and regression tasks. Techniques like random forests, support vector machines, and convolutional neural networks (CNNs) have revolutionized various industries, including healthcare, finance, and image recognition.

12. Geographic Information Systems (GIS) Analysis

GIS analysis combines geographical data with spatial analysis techniques to solve location-based problems. It's widely used in urban planning, environmental management, and disaster response.

Uncovering Patterns and Trends: Data analysis allows researchers to identify patterns, trends, and relationships within the data. By examining these patterns, researchers can better understand the phenomena under investigation. For example, in epidemiological research, data analysis can reveal the trends and patterns of disease outbreaks, helping public health officials take proactive measures.
Testing Hypotheses: Research often involves formulating hypotheses and testing them. Data analysis provides the means to evaluate hypotheses rigorously. Through statistical tests and inferential analysis, researchers can determine whether the observed patterns in the data are statistically significant or simply due to chance.
Making Informed Conclusions: Data analysis helps researchers draw meaningful and evidence-based conclusions from their research findings. It provides a quantitative basis for making claims and recommendations. In academic research, these conclusions form the basis for scholarly publications and contribute to the body of knowledge in a particular field.
Enhancing Data Quality: Data analysis includes data cleaning and validation processes that improve the quality and reliability of the dataset. Identifying and addressing errors, missing values, and outliers ensures that the research results accurately reflect the phenomena being studied.
Supporting Decision-Making: In applied research, data analysis assists decision-makers in various sectors, such as business, government, and healthcare. Policy decisions, marketing strategies, and resource allocations are often based on research findings.
Identifying Outliers and Anomalies: Outliers and anomalies in data can hold valuable information or indicate errors. Data analysis techniques can help identify these exceptional cases, whether medical diagnoses, financial fraud detection, or product quality control.
Revealing Insights: Research data often contain hidden insights that are not immediately apparent. Data analysis techniques, such as clustering or text analysis, can uncover these insights. For example, social media data sentiment analysis can reveal public sentiment and trends on various topics in social sciences.
Forecasting and Prediction: Data analysis allows for the development of predictive models. Researchers can use historical data to build models forecasting future trends or outcomes. This is valuable in fields like finance for stock price predictions, meteorology for weather forecasting, and epidemiology for disease spread projections.
Optimizing Resources: Research often involves resource allocation. Data analysis helps researchers and organizations optimize resource use by identifying areas where improvements can be made, or costs can be reduced.
Continuous Improvement: Data analysis supports the iterative nature of research. Researchers can analyze data, draw conclusions, and refine their hypotheses or research designs based on their findings. This cycle of analysis and refinement leads to continuous improvement in research methods and understanding.

Data analysis is an ever-evolving field driven by technological advancements. The future of data analysis promises exciting developments that will reshape how data is collected, processed, and utilized. Here are some of the key trends of data analysis:

1. Artificial Intelligence and Machine Learning Integration

Artificial intelligence (AI) and machine learning (ML) are expected to play a central role in data analysis. These technologies can automate complex data processing tasks, identify patterns at scale, and make highly accurate predictions. AI-driven analytics tools will become more accessible, enabling organizations to harness the power of ML without requiring extensive expertise.

2. Augmented Analytics

Augmented analytics combines AI and natural language processing (NLP) to assist data analysts in finding insights. These tools can automatically generate narratives, suggest visualizations, and highlight important trends within data. They enhance the speed and efficiency of data analysis, making it more accessible to a broader audience.

3. Data Privacy and Ethical Considerations

As data collection becomes more pervasive, privacy concerns and ethical considerations will gain prominence. Future data analysis trends will prioritize responsible data handling, transparency, and compliance with regulations like GDPR . Differential privacy techniques and data anonymization will be crucial in balancing data utility with privacy protection.

4. Real-time and Streaming Data Analysis

The demand for real-time insights will drive the adoption of real-time and streaming data analysis. Organizations will leverage technologies like Apache Kafka and Apache Flink to process and analyze data as it is generated. This trend is essential for fraud detection, IoT analytics, and monitoring systems.

5. Quantum Computing

It can potentially revolutionize data analysis by solving complex problems exponentially faster than classical computers. Although quantum computing is in its infancy, its impact on optimization, cryptography , and simulations will be significant once practical quantum computers become available.

6. Edge Analytics

With the proliferation of edge devices in the Internet of Things (IoT), data analysis is moving closer to the data source. Edge analytics allows for real-time processing and decision-making at the network's edge, reducing latency and bandwidth requirements.

7. Explainable AI (XAI)

Interpretable and explainable AI models will become crucial, especially in applications where trust and transparency are paramount. XAI techniques aim to make AI decisions more understandable and accountable, which is critical in healthcare and finance.

8. Data Democratization

The future of data analysis will see more democratization of data access and analysis tools. Non-technical users will have easier access to data and analytics through intuitive interfaces and self-service BI tools , reducing the reliance on data specialists.

9. Advanced Data Visualization

Data visualization tools will continue to evolve, offering more interactivity, 3D visualization, and augmented reality (AR) capabilities. Advanced visualizations will help users explore data in new and immersive ways.

10. Ethnographic Data Analysis

Ethnographic data analysis will gain importance as organizations seek to understand human behavior, cultural dynamics, and social trends. This qualitative data analysis approach and quantitative methods will provide a holistic understanding of complex issues.

11. Data Analytics Ethics and Bias Mitigation

Ethical considerations in data analysis will remain a key trend. Efforts to identify and mitigate bias in algorithms and models will become standard practice, ensuring fair and equitable outcomes.

Our Data Analytics courses have been meticulously crafted to equip you with the necessary skills and knowledge to thrive in this swiftly expanding industry. Our instructors will lead you through immersive, hands-on projects, real-world simulations, and illuminating case studies, ensuring you gain the practical expertise necessary for success. Through our courses, you will acquire the ability to dissect data, craft enlightening reports, and make data-driven choices that have the potential to steer businesses toward prosperity.

Having addressed the question of what is data analysis, if you're considering a career in data analytics, it's advisable to begin by researching the prerequisites for becoming a data analyst. You may also want to explore the Post Graduate Program in Data Analytics offered in collaboration with Purdue University. This program offers a practical learning experience through real-world case studies and projects aligned with industry needs. It provides comprehensive exposure to the essential technologies and skills currently employed in the field of data analytics.

Program Name Data Analyst Post Graduate Program In Data Analytics Data Analytics Bootcamp Geo All Geos All Geos US University Simplilearn Purdue Caltech Course Duration 11 Months 8 Months 6 Months Coding Experience Required No Basic No Skills You Will Learn 10+ skills including Python, MySQL, Tableau, NumPy and more Data Analytics, Statistical Analysis using Excel, Data Analysis Python and R, and more Data Visualization with Tableau, Linear and Logistic Regression, Data Manipulation and more Additional Benefits Applied Learning via Capstone and 20+ industry-relevant Data Analytics projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Access to Integrated Practical Labs Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

1. What is the difference between data analysis and data science?

Data analysis primarily involves extracting meaningful insights from existing data using statistical techniques and visualization tools. Whereas, data science encompasses a broader spectrum, incorporating data analysis as a subset while involving machine learning, deep learning, and predictive modeling to build data-driven solutions and algorithms.

2. What are the common mistakes to avoid in data analysis?

Common mistakes to avoid in data analysis include neglecting data quality issues, failing to define clear objectives, overcomplicating visualizations, not considering algorithmic biases, and disregarding the importance of proper data preprocessing and cleaning. Additionally, avoiding making unwarranted assumptions and misinterpreting correlation as causation in your analysis is crucial.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Learn from Industry Experts with free Masterclasses

Data science & business analytics.

How Can You Master the Art of Data Analysis: Uncover the Path to Career Advancement

Develop Your Career in Data Analytics with Purdue University Professional Certificate

Career Masterclass: How to Get Qualified for a Data Analytics Career

Get Affiliated Certifications with Live Class programs

PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

An Introduction to Data Analysis

First Online: 02 September 2023

Cite this chapter

Fabio Nelli 2

1506 Accesses

In this chapter, you take the first steps in the world of data analysis, learning in detail about all the concepts and processes that make up this discipline. The concepts discussed in this chapter are helpful background for the following chapters, where these concepts and procedures are applied in the form of Python code, through the use of several libraries that are discussed in later chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

Rome, Italy

Fabio Nelli

You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Nelli, F. (2023). An Introduction to Data Analysis. In: Python Data Analytics. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9532-8_1

Download citation

DOI : https://doi.org/10.1007/978-1-4842-9532-8_1

Published : 02 September 2023

Publisher Name : Apress, Berkeley, CA

Print ISBN : 978-1-4842-9531-1

Online ISBN : 978-1-4842-9532-8

eBook Packages : Professional and Applied Computing Apress Access Books Professional and Applied Computing (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Mastering Qualitative Data Analysis: The Step-by-Step Process & 5 Essential Methods

12 min read

Wondering how to analyze qualitative data and get actionable insights? Search no further!

This article will help you analyze qualitative data and fuel your product growth . We’ll walk you through the following steps:

5 Qualitative data analysis methods.
5 Steps to analysing qualitative data.
How to act on research findings.

Let’s get started!

Qualitative data analysis turns non-numerical data into insights, including customer feedback , surveys, and interviews.
Qualitative data provides rich insights for refining strategies and uncovering growth opportunities.
The benefits of qualitative data analysis include deep insight, flexibility, contextual understanding, and amplifying participant voices.
Challenges include data overload, reliability, and validity concerns, as well as time-intensive nature.
Qualitative and quantitative data analysis differ in analyzing numerical vs. non-numerical data.
Qualitative data methods include content analysis, narrative analysis, discourse analysis, thematic analysis, and grounded theory analysis.
Content analysis involves systematically analyzing text to identify patterns and themes.
Narrative analysis interprets stories to understand customer feelings and behaviors.
The thematic analysis identifies patterns and themes in data.
Grounded theory analysis generates hypotheses from data.
Choosing a method depends on research questions, data type, context, expertise, and resources.
The qualitative data analysis process involves defining questions, gathering data, organizing, coding, and making hypotheses.
Userpilot facilitates qualitative data collection through surveys and offers NPS dashboard analytics.
Building in-app experiences based on qualitative insights enhances user experience and drives satisfaction.
The iterative qualitative data analysis process aims to refine understanding of the customer base.
Userpilot can automate data collection and analysis, saving time and improving customer understanding. Book a demo to learn more!

Try Userpilot and Take Your Qualitative Research to the Next Level

14 Day Trial
No Credit Card Required

What is a qualitative data analysis?

Qualitative data analysis is the process of turning qualitative data — information that can’t be measured numerically — into insights.

This could be anything from customer feedback, surveys , website recordings, customer reviews, or in-depth interviews.

Qualitative data is often seen as more “rich” and “human” than quantitative data, which is why product teams use it to refine customer acquisition and retention strategies and uncover product growth opportunities.

Benefits of qualitative data analysis

Here are the key advantages of qualitative data analysis that underscore its significance in research endeavors:

Deep Insight: Qualitative data analysis allows for a deep understanding of complex patterns and trends by uncovering underlying meanings, motivations, and perspectives.
Flexibility: It offers flexibility in data interpretation, allowing researchers to explore emergent themes and adapt their analysis to new insights.
Contextual Understanding: Qualitative analysis enables the exploration of contextual factors, providing rich context to quantitative findings and uncovering hidden dynamics.
Participant Voice: It amplifies the voices of participants, allowing their perspectives and experiences to shape the analysis and resulting interpretations.

Challenges of qualitative data analysis

While qualitative data analysis offers rich insights, it comes with its challenges:

Data Overload and Management: Qualitative data often comprises large volumes of text or multimedia, posing challenges in organizing, managing, and analyzing the data effectively.
Reliability and Validity: Ensuring the reliability and validity of qualitative findings can be complex, as there are fewer standardized measures compared to quantitative analysis, requiring meticulous attention to methodological rigor.
Time-Intensive Nature: Qualitative data analysis can be time-consuming, involving iterative processes of coding, categorizing, and synthesizing data, which may prolong the research timeline and increase resource requirements.

Quantitative data analysis vs. Qualitative data analysis

Here let’s understand the difference between qualitative and quantitative data analysis.

Quantitative data analysis is analyzing numerical data to locate patterns and trends. Quantitative research uses numbers and statistics to systematically measure variables and test hypotheses.

Qualitative data analysis , on the other hand, is the process of analyzing non-numerical, textual data to derive actionable insights from it. This data type is often more “open-ended” and can be harder to conclude from.

However, qualitative data can provide insights that quantitative data cannot. For example, qualitative data can help you understand how customers feel about your product, their unmet needs , and what motivates them.

Other differences include:

What are the 5 qualitative data analysis methods?

There are 5 main methods of qualitative data analysis. Which one you choose will depend on the type of data you collect, your preferences, and your research goals.

Content analysis

Content analysis is a qualitative data analysis method that systematically analyses a text to identify specific features or patterns. This could be anything from a customer interview transcript to survey responses, social media posts, or customer success calls.

The data is first coded, which means assigning it labels or categories.

For example, if you were looking at customer feedback , you might code all mentions of “price” as “P,” all mentions of “quality” as “Q,” and so on. Once manual coding is done, start looking for patterns and trends in the codes.

Content analysis is a prevalent qualitative data analysis method, as it is relatively quick and easy to do and can be done by anyone with a good understanding of the data.

The advantages of content analysis process

Rich insights: Content analysis can provide rich, in-depth insights into how customers feel about your product, what their unmet needs are, and their motives.
Easily replicable: Once you have developed a coding system, content analysis is relatively quick and easy because it’s a systematic process.
Affordable: Content analysis requires very little investment since all you need is a good understanding of the data, and it doesn’t require any special software.

The disadvantages of content analysis process

Time-consuming: Coding the data is time-consuming, particularly if you have a large amount of data to analyze.
Ignores context: Content analysis can ignore the context in which the data was collected which may lead to misinterpretations.
Reductive approach: Some people argue that content analysis is a reductive approach to qualitative data because it involves breaking the data down into smaller pieces.

Narrative analysis

Analysing qualitative data with narrative analysis involves identifying, analyzing, and interpreting customer or research participants’ stories. The input can be in the form of customer interviews, testimonials, or other text data.

Narrative analysis helps product managers to understand customers’ feelings toward the product identify trends in customer behavior and personalize their in-app experiences .

The advantages of narrative analysis

Provide a rich form of data: The stories people tell give a deep understanding of customers’ needs and pain points.
Collects unique, in-depth data based on customer interviews or testimonials.

The disadvantages of narrative analysis

Hard to implement in studies of large numbers.
Time-consuming: Transcribing customer interviews or testimonials is labor-intensive.
Hard to reproduce since it relies on unique customer stories.

Discourse analysis

Discourse analysis is about understanding how people communicate with each other. It can be used to analyse written or spoken language. For instance, product teams can use discourse analysis to understand how customers talk about their products on the web.

The advantages of discourse analysis

Uncovers motivation behind customers’ words.
Gives insights into customer data.

The disadvantages of disclosure analysis

Takes a large amount of time and effort as the process is highly specialized and requires training and practice. There’s no “right” way to do it.
Focuses solely on language.

Thematic analysis

Thematic analysis is a popular qualitative data analysis method that identifies patterns and themes in data. The process of thematic analysis involves coding the data, which means assigning it labels or categories.

It can be paired with sentiment analysis to determine whether a piece of writing is positive, negative, or neutral. This can be done using a lexicon (i.e., a list of words and their associated sentiment scores).

A common use case for thematic analysis in SaaS companies is customer feedback analysis with NPS surveys and NPS tagging to identify patterns among your customer base.

The advantages of thematic analysis

Doesn’t require training: Anyone with little training on how to label the data can perform thematic analysis.
It’s easy to draw important information from raw data: Surveys or customer interviews can be easily converted into insights and quantitative data with the help of labeling.
An effective way to process large amounts of data if done automatically: you will need AI tools for this.

The disadvantages of thematic analysis

Doesn’t capture complex narratives: If the data isn’t coded correctly, it can be difficult to identify themes since it’s a phrase-based method.
Difficult to implement from scratch because a perfect approach must be able to merge and organize themes in a meaningful way, producing a set of themes that are not too generic and not too large.

Grounded theory analysis

Grounded theory analysis is a method that involves the constant comparative method, meaning qualitative researchers analyze and code the data on the fly.

The grounded theory approach is useful for product managers who want to understand how customers interact with their products . It can also be used to generate hypotheses about how customers will behave in the future.

Suppose product teams want to understand the reasons behind the high churn rate , they can use customer surveys and grounded theory to analyze responses and develop hypotheses about why users churn and how to reengage inactive ones .

You can filter the disengaged/inactive user segment to make analysis easier.

The advantages of ground theory analysis

Based on actual data, qualitative analysis is more accurate than other methods that rely on assumptions.
Analyse poorly researched topics by generating hypotheses.
Reduces the bias in interpreting qualitative data as it’s analyzed and coded as it’s collected.

The disadvantages of ground theory analysis

Overly theoretical
Requires a lot of objectivity, creativity, and critical thinking

Which qualitative data analysis method should you choose?

We have covered different qualitative data analysis techniques with their pros and cons but choosing the appropriate qualitative data analysis method depends on various factors, including:

Research Question : Different qualitative methods are suitable for different research questions.
Nature of Data : Consider the type of data you have collected—interview transcripts, reviews, or survey responses—and choose a method that aligns with the data’s characteristics. For instance, thematic analysis is versatile and can be applied to various types of qualitative data, while narrative analysis focuses specifically on stories and narratives.
Research Context : Take into account the broader context of your research. Some qualitative methods may be more prevalent or accepted in certain fields or contexts.
Researcher Expertise : Consider your own skills and expertise in qualitative analysis techniques. Some methods may require specialized training or familiarity with specific software tools. Choose a method that you feel comfortable with and confident in applying effectively.
Research Goals and Resources : Evaluate your research goals, timeline, and resources available for analysis. Some methods may be more time-consuming or resource-intensive than others. Consider the balance between the depth of analysis and practical constraints.

How to perform qualitative data analysis process in steps

With all that theory above, we’ve decided to elicit the essential steps of qualitative research methods and designed a super simple guide for gathering qualitative data.

Let’s dive in!

Step 1: Define your qualitative research questions

The qualitative analysis research process starts with defining your research questions . It’s important to be as specific as possible, as this will guide the way you choose to collect qualitative research data and the rest of your analysis.

Examples are:

What are the primary reasons customers are dissatisfied with our product?
How does X group of users feel about our new feature?
What are our customers’ needs, and how do they vary by segment?
How do our products fit into our customers’ lives?
What factors influence the low feature usage rate of the new feature ?

Step 2: Gather your qualitative customer data

Now, you decide what type of data collection to use based on previously defined goals. Here are 5 methods to collect qualitative data for product companies:

User feedback

NPS follow-up questions

Review sites

User interviews
Focus groups

We recommend using a mix of in-app surveys and in-person interviews. The former helps to collect rich data automatically and on an ongoing basis. You can collect user feedback through in-product surveys, NPS platforms, or use Zoom for live interviews.

The latter enables you to understand the customer experience in the business context as you can ask clarifying questions during the interviews.

Try Userpilot and Easily Collect Qualitative Customer Data

Step 3: organize and categorize collected data.

Before analyzing customer feedback and assigning any value, unstructured feedback data needs to be organized in a single place. This will help you detect patterns and similar themes more easily.

One way to do this is to create a spreadsheet with all the data organized by research questions. Then, arrange the data by theme or category within each research question.

You can also organize NPS responses with Userpilot . This will allow you to quickly calculate scores and see how many promoters, passives, and detractors there are for each research question.

Step 4: Use qualitative data coding to identify themes and patterns

Themes are the building blocks of analysis and help you understand how your data fits together.

For product teams, an NPS survey might reveal the following themes: product defect, pricing, and customer service. Thus, the main themes in SaaS will be around identifying friction points, usability issues, UI issues, UX issues, missing features, etc.

You need to define specific themes and then identify how often they occur. In turn, the pattern is a relationship between 2 or multiple elements (e.g. users who have specific JTBD complain of a specific missing feature).

You can detect those patterns from survey analytics.

Pair themes with in-app customer behavior and product usage data to understand whether different user segments fall under specific feedback themes.

Following this step, you will get enough data to improve customer loyalty .

Step 5: Make hypotheses and test them

The last step in qualitative research is to analyze the data collected to find insights. Segment your users based on in-app behavior, user type, company size, or job to be done to draw meaningful decisions.

For instance, you may notice that negative feedback stems from the customer segment that recently engaged with XYZ features. Just like that, you can pinpoint friction points and the strongest sides of your product to capitalize on.

How to perform qualitative data analysis with Userpilot

Userpilot is a product growth platform that helps product managers collect and analyze qualitative data. It offers a suite of features to make it easy to understand how users interact with your product, their needs, and how you can improve user experience.

When it comes to performing qualitative research, Userpilot is not a qualitative data analysis software but it has some very useful features you could use.

Collect qualitative feedback from users with in-app surveys

Userpilot facilitates the collection of qualitative feedback from users through in-app surveys.

These surveys can be strategically placed within your application to gather insights directly from users while they interact with your product.

By leveraging Userpilot’s in-app survey feature, you can gather valuable feedback on user experiences, preferences, pain points , and suggestions for improvement.

Benefit from NPS dashboard and survey analytics

With Userpilot, you can harness the power of the NPS (Net Promoter Score) dashboard and survey analytics to gain valuable insights into user sentiment and satisfaction levels.

The NPS dashboard provides a comprehensive overview of your NPS scores over time, allowing you to track changes and trends in user loyalty and advocacy.

Additionally, Userpilot’s survey analytics offer detailed insights into survey responses, enabling you to identify common themes, uncover actionable feedback, and prioritize areas for improvement.

Build different in-app experiences based on the insights from qualitative data analysis

By analyzing qualitative feedback collected through in-app surveys, you can segment users based on these insights and create targeted in-app experiences designed to address specific user concerns or enhance key workflows.

Whether it’s guiding users through new features, addressing common user challenges, or personalizing the user journey based on individual preferences, Userpilot empowers you to deliver a more engaging and personalized user experience that drives user satisfaction and product adoption.

The qualitative data analysis process is iterative and should be revisited as new data is collected. The goal is to constantly refine your understanding of your customer base and how they interact with your product.

Want to get started with qualitative analysis? Get a Userpilot Demo and automate the data collection process. Save time on mundane work and understand your customers better!

Try Userpilot and Take Your Qualitative Data Analysis to the Next Level

Get The Insights!

The fastest way to learn about Product Growth,Management & Trends.

The coolest way to learn about Product Growth, Management & Trends. Delivered fresh to your inbox, weekly.

The fastest way to learn about Product Growth, Management & Trends.

You might also be interested in ...

Exploring userpilot’s resource center editor: a walkthrough.

Aazar Ali Shad

User Feedback For SaaS: The Ultimate Guide

15 saas best help center designs to inspire you [+steps to build yours].

Adina Timar

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Qualitative Data Analysis

20 Preparing and Managing Qualitative Data

Mikaila Mariel Lemonik Arthur

When you have completed data collection for a qualitative research project, you will likely have voluminous quantities of data—thousands of pages of fieldnotes , hundreds of hours of interview recordings, many gigabytes of images or documents—and these quantities of data can seem overwhelming at first. Therefore, preparing and managing your data is an essential part of the qualitative research process. Researchers must find ways to organize the voluminous quantities of data into a form that is useful and workable. This chapter will explore data management and data preparation as steps in the research process, steps that help facilitate data analysis. It will also review methods for data reduction, a step designed to help researchers get a handle on the volumes of data they have collected and coalesce the data into a more manageable form. Finally, it will discuss the use of computer software in qualitative data analysis.

Data Management

Even before the first piece of data is collected, a data management system is a necessity for researchers. Data management helps to ensure that data remain safe, organized, and accessible throughout the research process and that data will be ready for analysis when that part of the project begins. Miles and Huberman (1994) outline a series of processes and procedures that are important parts of data management .

First, researchers must attend to the formatting and layout of their data. Developing a consistent template for storing fieldnotes, interview transcripts, documents, and other materials, and including consistent metadata (data about your data) such as time, date, pseudonym of interviewee, source of document, person who interacted with the data, and other details will be of much use later in the research process.

Similarly, it is essential to keep detailed records of the research process and all research decisions that are made. Storing these inside one’s head is insufficient. Researchers should keep a digital file or a paper notebook in which all details and decisions are recorded. For instance, how was the sample conducted? Which potential respondents never ended up going through with the interview? What software decisions were made? When did the digital voice recorder fail, and for how long? What day did the researcher miss going into the field because they were ill? And, going forward, what decisions were made about each step in the analytical process?

As data begin to be collected, it is necessary to have appropriate, well-developed physical and/or digital filing systems to ensure that data are safely stored, well-organized, and easy to retrieve when needed. For paper storage, it is typical to use a set of file folders organized chronologically, by respondent, or by some other meaningful system. For digital storage, researchers might use a similar set of folders or might keep all data in a single folder but use careful file naming conventions (e.g. RespondentPseudonym_Date_Transcript) to make it easy to find each piece of data. Some researchers will keep duplicate copies of all data and use these copies to begin to sort, mark, and organize data in ways that enable the presence of relationships and themes to emerge. For instance, researchers might sort interview transcripts by the way respondents answered a particular key question. Or they might sort fieldnotes by the central activities that took place in the field that day. Activities such as these can be facilitated by the use of index cards, color-coding systems, sticky notes, marginal annotations, or even just piles. Cross-referencing systems may be useful to ensure that thematic files can be connected to respondent-based files or to other relevant thematic files. Finally, it is essential that researchers develop a system of backups to ensure that data is not lost in the event of a catastrophic hard drive failure, a house fire, lack of access to the office for an extended period, or some other type of disaster.

One more issue to attend to in data management is research ethics. It is essential to ensure that confidential data is protected from disclosure; that identifying information (including signed consent forms) are not kept with or linkable to data; and that all researchers, analysts, interns, and administrative personnel involved in a study sign statements of confidentiality to ensure they understand the importance of nondisclosure (Berg 2009). Note that such documents will not protect researchers and research personnel from subpoena by the courts—if research documents will contain information that could expose participants to criminal or legal liability, there are additional concerns to consider and researchers should do due diligence to protect themselves and their respondents (see, e.g., Khan 2019), though the methods and mechanisms for doing so are beyond the scope of this text. Researchers must attend to data security protocols, many of which were likely agreed to in the IRB submission process. For example, paper research records should be locked securely where they cannot be seen by visitors or by personnel or accessed by accident. Digital records should be securely stored in password protected files that meet current standards for strong passwords. Cloud storage or backups should have similar protections, and researchers should carefully review the terms of service to ensure that they continue to own their data and that the data are protected from disclosure.

Preparing Data

In most cases, data are not entirely ready for analysis at the moment at which they are collected. Additional steps must be taken to prepare data for analysis, and these steps are somewhat different depending on the form in which the data exists and the approach to data collection that was used: fieldnotes from observation or ethnography, interviews and other recorded data, or documentary data like texts and images.

When researchers conduct ethnographic or observational research, they typically do not have the ability to maintain verbatim recordings. Instead, they maintain fieldnotes. Maintaining fieldnotes is a tricky and time-consuming process! In most instances, researchers cannot take notes—at least not too many—while present in the research site without making themselves conspicuous. Therefore, they need to limit themselves to a couple of jotted words or sentences to help jog their memories later on, though the quantity of notes that can be taken in the field is higher these days because of the possibility of taking notes via smartphone, a notetaking process largely indistinguishable from the socially-ubiquitous practices of text messaging and social media posts. Immediately after leaving the site, researchers use the skeleton of notes they have taken to write up full notes recording everything that happened. And later, within a day or so, many researchers go back over the fieldnotes to edit and refine the fieldnotes into a useful document for later analysis. As this process suggests, analysis is already beginning even while the research is ongoing, as researchers make notes and annotations about theoretical ideas, connections to explore, potential answers to their research questions, and other things in the process of refining their fieldnotes.

When fleshing out fieldnotes, researchers should be attentive to the distinctions between recollections they believe are accurate, interpretations and reflections they have made, and analytical thoughts that develop later through the process of refining the fieldnotes. It is surprisingly easy for a slight mistake in recording, say, which people did what, or in what sequence a series of events occurred, to entirely change the interpretation of circumstances observed in the field. To demonstrate how such issues can arise, consider the following two hypothetical fieldnote excerpts:

In Excerpt A, the most reasonable interpretation of events is probably that Sarah walked into the room and found Marisol, the victim of an accident, and was concerned about her. In Excerpt B, in contrast, Sarah probably caused the accident herself. Yet the words are exactly the same in both excerpts—they have just been slightly rearranged. This example highlights how important careful attention to detail is in recording, refining, and analyzing fieldnotes (and other forms of qualitative data, for that matter).

Fieldnotes contain within them a vast array of different types of data: records of verbal interactions between people, observations about social practices and interactions, researchers’ inferences and interpretations of social meanings and understandings, and other thoughts (Berg 2009). Therefore, as researchers work to prepare their fieldnotes for analysis, they may need to work through them again to organize and categorize different types of notes for different uses during analysis. The data collected from ethnographic or observational research can also include documents, maps, images, and recordings, which then need to be prepared and managed alongside the fieldnotes.

Interviews & Other Recordings

First of all, interview researchers need to think carefully about the form in which they will obtain their data. While most researchers audio- or video-record their interviews, it is useful to keep additional information alongside the recordings. Typically, this might include a form for keeping track of themes and data from each interview, including details of the context in which the interview took place, such as the location and who was present; biographical information about the participant; notes about theoretical ideas, questions, or themes that occur to the researcher during the interview; and reminders of particularly notable or valuable points during the interview. These information sheets should also contain the same pseudonym or respondent number that is used during the interview recording, and thus can be helpful in matching biographical details to participant quotes at the time of ultimate writeup. Interviewers may also want to consider taking notes throughout the interview, as notes can highlight elements of body language, facial expression, or more subtle comments that might not be picked up on audio recordings. While video recordings can pick up such details, they tend to make participants more self-conscious than do audio recordings.

Once the interview has concluded, recordings need to be transcribed. While automated transcription has improved in recent years, it still falls far short of what is needed to make an accurate transcript. Transcription quality is typically assessed using a metric called the Word Error Rate—basically, dividing the number of incorrect words by the number of words that should appear in the passage—there are other, more complex assessment metrics that take into consideration individual words’ importance to meaning. As of 2020, automated transcription services still tended to have Word Error Rates of over 10%, which may be sufficient for general understanding (such as in the case of apps that convert voicemails to text) but which is definitely too high of an error rate for use in data analysis. And error rates increase when audio recordings contain background noise, accented speech, or the use of dialects other than Standard American English (SAE). There can also be ethical concerns about data privacy when automated services are used (Khamsi 2019). However, automated services can be cost-effective, with a typical cost of about 25 cents per minute of audio (Brewster 2020). For a typical study involving 40 interviews averaging 90 minutes each, this would come to a total cost of about $900, far less than the cost of human transcription, which averages about $1 per minute these days. Human transcription is far more accurate, with extremely low Word Error Rates, especially for words essential to meaning. But human transcribers also suffer from increased error when transcribing audio with noisy backgrounds, where multiple speakers may be interrupting one another (for instance in recordings of focus groups), or in cases where speakers have stronger accents or speak in dialects other than Standard American English. For example, a study examining court reporters—professional transcribers with special experience and training at transcribing speech in legal contexts—working in Philadelphia who were assigned to transcribe African American English had average Word Error Rates of above 15%, and these errors were significant enough to fundamentally alter meaning in over 30% of the speech segments they transcribed (Jones et al. 2019).

Researchers can, of course, transcribe their recordings themselves, an option that vastly reduces cost but adds an enormous amount of time to the data preparation process. The use of specialized software or devices like foot-pedal controlled playback can facilitate the ease of transcription, but it can easily take up to four hours to complete the transcription of one hour of recordings. This is because people speak far faster than they type—a typical person speaks at a rate of about 150 words per minute and types at a rate more like 30-60 words per minute. Another possibility is to use a kind of hybrid approach in which the researcher uses automated transcription or voice recognition to get a basic—if error-laden—transcript and then corrects it by hand. Given the time that will be invested in correcting the transcript by listening to the recording while reviewing the transcript, even lower-quality transcription services may be acceptable, such as the automated captioning video services like YouTube offer, though of course these services also present data privacy concerns. Alternatively, researchers might use voice-recognition software. The accuracy of such software can typically be improved by training it on the user’s voice. This approach can be especially helpful when interview respondents speak with accents, as the researcher can re-record the interview in their own voice and feed it into software that is already trained to understand the researcher’s voice.

Table 1 below compares different approaches to transcription in terms of financial cost, time, error rate, and ethical concerns. Costs for transcription by the researcher and hybrid approaches are typically limited to the acquisition of software and hardware to aid the transcription process. For a new researcher, this might entail several hundred dollars of cost for a foot pedal, a good headset with microphone, and software, though these costs are often one-time costs not repeated with each project. In contrast, even automated transcription can cost nearly a thousand dollars per project, with costs far higher for the hired human transcriptionsts who have much better accuracy. In terms of time, though, automated and hired services require far less of the researchers’ time. Hired services will require some time for turnaround, more if the volume of data is high, but the researcher can work on other things during that time. For self and hybrid transcription approaches, researchers can expect to put in much more time on transcription than they did conducting interviews. For a typical project involving 40 interviews averaging 90 minutes each, the time required to conduct the interviews and transcribe them—not including time spent preparing for interviews, recruiting participants, traveling, analyzing data, or any other task—can easily exceed 300 hours. If you assume a researcher has 10 hours per week to devote to their project, that would mean it would take over 30 weeks just to collect and transcribe the data before analysis could begin. And after transcription is complete, most researchers find it useful to listen to the recordings again, transcript in hand, to correct any lingering errors and make notes about avenues for exploration during data analysis.

Table 1. Comparing Transcription Approaches for a Typical Interview-Based Research Project

Documents and Images

Data preparation is far different when data consists of documents and images, as these already exist in textual form. Here, concerns are more likely to revolve around storage, filing, and organization, which will be discussed later in this chapter. However, it can be important to conduct a preliminary review of the data to better understand what is there. And for visual data, it may be especially useful to take notes on the content in and the researcher’s impressions of each visual as a starting point to thinking about how to further work with the materials (Saldaña 2016).

There are special concerns about research involving documents and images that are worth noting here. First all, it is important to remember the importance of sampling issues in relation to the use of documents. Sampling is not always a concern—for instance, research involving newspaper articles may involve a well-conducted random sample, or photographs may have been taken by the researcher themselves according to a clear purposive sampling process—but many projects involving textual data have used sampling procedures where it remains unclear how representative the sample is of the universe of data. Researchers must keep careful notes on where the documents and images included in their data came from and what sorts of limitations may exist in the data and include a discussion of these issues in any reporting on their research.

When writing about interview data, it is typical to include excerpts from the interview transcripts. Similarly, when using documents or visual materials, it is preferable to include some of the original data. However, this can be more complex due to copyright concerns. When using published works, there are real legal limits on the quantity of text that you can include without getting permission from the copyright owner, who may make you pay for the privilege. This is not an issue for works that were created or published more than 95 years ago, as their copyrights have expired. For works more recent than that, the use of more than a small portion of the work typically violates copyright, and the use of an image is almost never permitted unless it has been specifically released from copyright (or created by the researcher themselves). Archival data may be subject to specific usage restrictions imposed by the archive or donor. Copyright can make the goal of providing the data in a form useful to the reader very difficult, so you might need to get the copyright clearance or find other creative ways of providing the data.

Data Reduction

In qualitative data analysis, data collection and data analysis are often not two distinct research phases. Rather, as researchers collect data, they begin to develop themes, ask analytical questions, write theoretical memos, and otherwise begin the work of analysis. And when researchers are analyzing data, they may find they need to go back and collect more to flesh out certain areas that need further elaboration (Taylor, Bogdan, and DeVault 2016). But as researchers move further towards analysis, one of the first steps is reading through all of the data they have collected. Many qualitative researchers recommend taking notes on the data and/or annotating it with simple notations like circles or highlighting to focus your attention on those passages that seem especially fruitful for later focus (Saldaña 2016). This is often called “pre-coding.” Other approaches to pre-coding include noting hypotheses about what might emerge elsewhere in the data, summarizing the main ideas of each piece of data and annotating it with details about the respondent or circumstances of its creation, and taking preliminary notes about concepts or ideas that emerge.

This sort of work is often called “preliminary analysis,” as it enables researchers to start making connections and working with themes and theoretical ideas, but before you get to the point of making actual conclusions. It is also a form of data reduction . In qualitative analysis, the volume of data collected in any given research project is often enormous, far more than can be productively dealt with in any particular project or publication. Thus, data reduction refers to the process of reducing large volumes of data such that the more meaningful or important parts are accessible. As sociologist Kristen Luker points out in her text Salsa Dancing into the Social Sciences (2008), what we are really trying to do is recognize patterns, and data reduction is a process of sifting through, digesting, and thinking about our data until we can see the patterns we might not have seen before. Luker argues that one important way to help ourselves see patterns is to talk about our data with others—lots of others, and not just other social scientists—until what we are explaining starts to make sense.

There are a variety of approaches to data reduction. Which of these are useful for a particular project depends on the type and form of data, the priorities of the researcher, and the goals of the research project, and so each researcher must decide for themselves how to proceed. One approach is summarization . Here, researchers write short summaries of the data—summaries of individual interview transcripts, of particular days or weeks of fieldnotes, or of documents. Then, these summaries can be used for preliminary analysis rather than requiring full engagement with the larger body of data. Another approach involves writing memos about the data in which connections, patterns, or theoretical ideas can be laid out with reference to particular segments of the data. A third approach is annotation, in which marginal notes are used to highlight or draw attention to particularly important or noteworthy segments of the data. And Luker’s suggestion of conversations about our data with others can be understood as a form of data reduction, especially if we record notes about our conversations.

One of the approaches to data reduction which many analysts find most useful is the creation of typologies , or systems by which objects, events, people, or ideas can be classified into categories. In constructing typologies, researchers develop a set of mutually-exclusive categories—no one can be placed into more than one category of the typology (Berg 2009)—that are, ideally, also exhaustive, so that no one is left out of the set of categories (an “other” category can always be used for those hard to classify). They then go through all their pieces of data or data elements, be they interview participants, events recorded in fieldnotes, photographs, tweets, or something else, and place each one into a category. Then, they examine the contents of each category to see what common elements and analytical ideas emerge and write notes about these elements and ideas.

One approach to data reduction which qualitative researchers often fall back on but which they should be extremely careful with is quantification. Quantification involves the transformation of non-numerical data into numerical data. For example, if a researcher counts the number of interview respondents who talk about a particular issue, that is a form of quantification. Some limited quantification is common in qualitative analysis, though its use should be particularly rare in ethnographic research given the fact that ethnographic research typically relies on one or a very small number of cases. However, the use of quantification should be constrained to those circumstances where it provides particularly useful or illuminating descriptive information about the data, and not as a core analytical tool. In addition, given that it is exceptionally uncommon for qualitative research projects to produce generalizable findings, any discussion of quantified data should focus on numbers rather than percents. Numbers are descriptive—“35 out of 40 interview respondents said they had argued with housemates over chores in the past week”—while percents suggest broader and more generalizable claims (“87.5% of respondents said they had argued with housemates over chores in the past week”).

Qualitative Data Analysis Software

As part of the process of preparing data for analysis and planning an analysis strategy, many—though not all—qualitative researchers today use software applications to facilitate their work. The use of such technologies has had a profound impact on the way research is carried out, as have many technological changes over history. Take a much older example: the development of technology permitting for the audio recording of interviews. This technology made it possible to develop verbatim transcripts, whereas prior interview-based research had to rely on handwritten notes conveying the interview content—or, if the interviewer had significant financial resources, perhaps a stenographer. Recordings and verbatim transcripts also made it possible for researchers to minutely analyze speech patterns, specific word choices, tones of voice, and other elements that would not previously have been able to be preserved.

Today’s technologies make it easier to store and retrieve data, make it faster to process and analyze data, and provide access to new analytical possibilities. On a basic level, software can allow for more sophisticated possibilities for linking data to memos and other documents. And there are a variety of other benefits (Adler and Clark 2008) to the use of software-aided analysis (often referred to as CAQDAS , or computer-aided qualitative data analysis software). It can allow for more attention to detail, more systematic analysis, and the use of more cases, especially when dealing with large data sets or in circumstances where some quantification is desirable. The use of CAQDAS can enhance the perception of rigor, which can be useful when bringing qualitative data to bear in settings where those using data are more used to quantitative analysis. When coding (to be discussed further in the chapter on qualitative coding), software enhances flexibility and complexity, and may enliven the coding process. And software can provide complex relational analysis tools that go well beyond what would be possible by hand.

However, there are limitations to the use of CAQDAS as well (Adler and Clark 2008). Software can promote ways of thinking about data that are disconnected from qualitative ideals, whether through reductions in the connection between data and context or the increased pressure to quantify. Each individual software application creates a specific model of the architecture of data and knowledge, and analysis may become shaped or constrained by this architecture. Coding schemes, taxonomies, and strategies may reflect the capacities available in and the structures prioritized by the software rather than reflecting what is actually happening in the data itself, and this can further homogenize research, as researchers draw from a few common software applications rather than from a wide variety of personal approaches to analysis. Software can also increase the psychic distance between the researcher or analyst and their data and reduce the likelihood of researchers understanding the limitations of their data. The tools available in CAQDAS applications tend to emphasize typical data rather than unusual data, and so outliers or negative cases may be missed. Finally, CAQDAS does not always reduce the amount of time that a research project takes, especially for newer users and in cases with smaller sets of data. This is because there can be very steep learning curves and prolonged set-up procedures.

The fact that this list of limitations is somewhat longer than the list of positives should not be understood as suggesting that researchers avoid CAQDAS-based approaches. Software truly does make forms of research possible that would not have been without it, speeds data processing tasks, and makes a variety of analytical tasks much easier to do, especially when they require attention to detail. And digital technologies, including both software applications and hardware devices, facilitate so much about how qualitative researchers work today. There are a wide variety of types of technological aids to the qualitative research purpose, each with different functions.

First of all, digital technologies can be used for capturing qualitative data. This may seem obvious, but as the example of audio recording above suggests, the development of technologies like audio and film recording, especially via cellphone or other small personal devices, led to profound changes in the way qualitative research is carried out as well as an expansion in the types of research that are possible. Other technologies that have had similar impacts include the photocopier and scanner, and more recently the possibility to use a cell phone to capture photographs of documents in archives (without the flash on to avoid damaging delicate items). Finally, videoconferencing software makes it possible to interview people who are halfway around the world, and most videoconferencing platforms have a built-in option to save a video record of the conversation, and potentially autocaption it. It’s also worth noting that digital technologies provide access to sources of data that simply did not exist in the past, whether interviewing via videoconferencing, content analysis of social media, or ethnography of massively-multiplayer online games or worlds.

Software applications are very useful for data management tasks. The ability to store, file, and search electronic documents makes the management of huge quantities of data much more feasible. Storing metadata with files can help enormously with the management of visual data and other files. Word processing programs are also relevant here. They help us produce and revise text and reports, compile and edit our fieldnotes and transcriptions, write memos, make tables, count words, and search for and count specific words and phrases. Graphics programs can also facilitate the creation of graphs, charts, infographics, and other data displays. Finally, speech recognition programs aid our transcription process and, for some of us, our writing process.

Coding programs fall somewhere between data reduction and data analysis in their functions. Such software applications typically provide researchers with the ability to apply one or more codes to specific segments of text, search for and retrieve all segments that have had particular codes applied to them, and look at relationships between different codes. Some also provide data management features, allowing researchers to store memos, documents, and other materials alongside the coded text, and allow for interrater reliability testing (to be discussed in another chapter). Finally, there are a variety of data analysis tools. These tools allow researchers to carry out functions like organizing coded data into maps or diagrams, testing hypotheses, merging work carried out by different researchers, building theory, utilizing formal comparative methods, creating diagrams of networks, and others. Many of these features will be discussed in subsequent chapters.

Choosing the Right Software

There are so many programs out there that carry out each of the functions discussed above, with new ones appearing constantly. Because the state of technology changes all the time, it is outside the scope of this chapter to detail specific options for software applications, though online resources can be helpful in this regard (see, e.g., University of Surrey n.d.). But researchers still need to make decisions about which software to use. So, how do researchers choose the right qualitative software application or applications for their projects? There are four primary sets of questions researchers should ask themselves to help with this decision.

First, what functions does this research need and does this project require? As discussed above, programs have very different functions. In many cases, researchers may need to combine multiple programs to get access to all the functions they need. In other cases, researchers may need only a simple software application already available on their computers.

Second, researchers should consider how they use technology. There are a variety of questions that are relevant here. For example, what kind of device will be used, a desktop computer, laptop, tablet, or phone? What operating system, Windows, Mac/iOS, Chrome, or Android? How much experience and skill do researchers have with computers—do they need software applications that are very easy to use, or can they handle command-line interfaces that require some programming skills? Do they prefer software that is installed on their devices or a cloud-based approach? And will the researcher be working alone or as part of a team where multiple people need to contribute and share access to the same materials?

What type of data will be used? Will it be textual, visual, audio, or video? Will data come from multiple sources and styles or will it all be consistent? Is the data organized or free-form? What is the magnitude of the data that will be analyzed?

Finally, what resources does the researcher already have available? What software can they access, whether already available on their personal computing devices or via licenses provided by their employer or college/university? What degree of technical support can they access, and are technical support personnel familiar with CAQDAS? And how much money do they have available to pay for software on a one-time or ongoing basis? Note that some software can be purchased, while other software is provided as a service with a monthly subscription fee. And even when software is purchased, licenses may only provide access for a limited time period such as a year. Thus, both short-term and long-term financial costs and resource availability should be assessed prior to committing to a software package.

Transcribe about 10 minutes of an audio interview—one good source might be your local NPR station’s website. Be sure that your transcription is an exact record of what was said, including any pauses, laughter, vulgarities, or other kinds of things you might not typically write in an academic context, and that you transcribe both questions and responses. What was it like to complete this exercise?
Use the course listings at your college or university as a set of data. Develop a typology of different types of courses— not based on the department or school offering them or the course number alone—and classify courses within this typology. What does this exercise tell you about the curriculum at your college or university?
Review the notes, documents, and other materials you have already collected from this course and develop a new system of file management for them, with digital or physical folders, subfolders, and labels or file names that make items easy to locate.

Qualitative notes recorded by researchers in relation to their observation and/or participation of participants, social circumstances, events, etc. in which they document occurrences, interactions, and other details they have observed in their observational or ethnographic research.

The process of organizing, preserving, and storing data so that it can be used effectively.

Data about other data.

The process of reducing the volume of data to make it more usable while maintaining the integrity of the data.

The process of creating abridged or shortened versions of content or texts that still keep intact the main points and ideas they contain.

Classification systems.

The transformation of non-numerical data into numerical data.

An acronym for "computer-aided qualitative data analysis software," or software that helps to facilitate qualitative data analysis.

The process of assigning observations to categories.

The extent to which multiple raters or coders assign the same or a similar score, code, or rating to a given text, item, or circumstance.

Social Data Analysis Copyright © 2021 by Mikaila Mariel Lemonik Arthur is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

This paper is in the following e-collection/theme issue:

Published on 18.4.2024 in Vol 26 (2024)

Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis

Authors of this article:

Original Paper

H Echo Wang 1 , DrPH ;
Jonathan P Weiner 1, 2 , DrPH ;
Suchi Saria 3 , PhD ;
Hadi Kharrazi 1, 2 , MD, PhD

1 Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States

2 Johns Hopkins Center for Population Health Information Technology, Baltimore, MD, United States

3 Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States

Corresponding Author:

Hadi Kharrazi, MD, PhD

Bloomberg School of Public Health

Johns Hopkins University

624 N Broadway, Hampton House

Baltimore, MD

United States

Phone: 1 443 287 8264

Email: [email protected]

Background: The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited.

Objective: This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics.

Methods: We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations.

Results: The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations.

Conclusions: Caution must be taken when interpreting fairness measures’ face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities.

Introduction

Background of algorithmic bias.

Predictive algorithms and machine learning tools are increasingly integrated into clinical decision-making and population health management. However, with the increasing reliance on predictive algorithms comes a growing concern of exacerbating health disparities [ 1 - 3 ]. Evidence has shown that widely used algorithms that use past health care expenditures to predict high-risk patients have systematically underestimated the health care needs of Black patients [ 4 ]. In addition, studies have shown that predictive performances of models predicting intensive care unit mortality, 30-day psychiatric readmission, and asthma exacerbation were worse in populations with lower socioeconomic status [ 5 , 6 ].

With algorithmic bias as a potentially pervasive issue, a few checklists have been published to qualitatively identify and understand the potential biases derived from predictive models [ 7 , 8 ]. However, no agreed-upon quantitative method exists to routinely assess whether deployed models will lead to biased results and exacerbate health disparities faced by marginalized groups [ 2 , 9 ]. In this study, we define algorithmic bias as the differential results or performance of predictive models that may lead to differential allocation or outcomes between subgroups [ 10 - 12 ]. In addition, we define disparity as the difference in the quality of health care (the degree to which health services increase the likelihood of desired health outcomes) received by a marginalized population that is not due to access-related factors, clinical needs, preferences, and appropriateness of intervention [ 10 , 13 ]. Fairness metrics, which are a set of mathematical expressions that formalize certain equality between groups (eg, equal false negative rates [FNRs]), were proposed to measure and detect biases in machine learning models [ 12 , 14 ]. Although the machine learning community has shown that fairness metrics are a promising way to identify algorithmic bias, these metrics are criticized for being insufficient to reflect the heterogeneous and dynamic nature of health care [ 15 , 16 ]. Fairness metrics can also be misleading or conflicting due to their narrow focus on equal rates between groups [ 12 , 15 ]. Furthermore, these metrics could be interpreted without context-specific judgment or domain knowledge, thus failing to connect predictions to interventions and the downstream health care disparity [ 15 , 17 ]. Most importantly, these measures are often not fully tested in real-world predictive tasks and lack evidence on how well these measures’ interpretation could guide intervention planning.

Background of Disparity in 30-Day Hospital Readmission

Predicting hospital readmissions is widely studied in health care management and delivery [ 18 - 21 ]. Hospital readmissions, especially unplanned or avoidable readmissions, are not only associated with a high risk of in-hospital mortality but also costly and burdensome to the health care system [ 19 , 22 ]. Since 2012, the Hospital Readmission Reduction Program by the Centers for Medicare & Medicaid Services (CMS) has imposed financial penalties for hospitals with excessive readmission rates [ 22 ]. CMS has consequently incentivized hospitals to segment patients by risk so that hospitals can target the delivery of these resource-intensive interventions to the patients at greatest risk, such as transitional care intervention and better discharge planning [ 19 , 23 , 24 ]. Many hospital readmission predictive models have been published, with >350 models predicting 30-day readmission identified in prior systematic reviews and our prior work [ 7 , 18 , 19 , 21 , 25 ]. The disparity in hospital readmission rates is well studied. For example, past studies have shown that Black patients have higher readmission rates after adjusting for demographic and clinical characteristics [ 26 - 29 ]. In addition to racial disparity, patients receiving care at racial and ethnic minority-serving hospitals [ 29 , 30 ] or living in disadvantaged neighborhoods have higher rates of readmission [ 31 - 33 ]. Research has also shown that disparity in health care use, including hospital readmission, is related to not only individuals’ racial and ethnic identity but also their communities [ 34 ]. Other research has also suggested that social environments, either the place of residence or the hospital where one receives care, may explain a meaningful portion of health disparity [ 35 , 36 ].

Despite model abundance and known disparity in hospital readmissions, research has been limited in evaluating how algorithmic bias or the disparate performances of these predictive models may impact patient outcomes and downstream health disparities once deployed. Lack of evidence is more prominent in how the model-guided intervention allocation may reduce or aggravate existing health disparities between different populations. To address this gap in evidence, in this study, we aimed to (1) implement a selection of fairness metrics to evaluate whether the application of common 30-day readmission predictive models may lead to bias between racial and income groups and (2) interpret the selected fairness metrics and assess their usefulness in the context of facilitating equitable allocation of interventions. In this paper, we represent the perspective of a health system or payer who uses an established, validated algorithm to identify patients at high risk of unplanned readmission so that targeted intervention can be planned for these patients. Thus, our main concern for algorithmic bias is the unequal allocation of intervention resources and the unequal health outcome as a result. Specifically, we are concerned about risk scores systematically underestimating or overestimating needs for a certain group, assuming the model we deploy is validated and has acceptable overall predictive performance.

Study Population and Data

This retrospective study included 1.9 million adult inpatient discharges in Maryland and 8.7 million inpatient discharges in Florida from 2016 to 2019. The State Inpatient Databases (SIDs) are maintained by the United States Agency for Healthcare Research and Quality, as part of the Healthcare Cost and Utilization Project (HCUP), were used for this analysis. The SIDs include longitudinal hospital care data in the United States, inclusive of all insurance payers (eg, Medicare, Medicaid, private insurance, and the uninsured) and all patient ages [ 37 ]. The SIDs capture >97% of all eligible hospital discharges in each state [ 38 ]. Maryland and Florida were selected due to their different population sizes, compositions (eg, racial and ethnic distribution and urban to rural ratio), and health care environment (Maryland’s all-payer model vs Medicaid expansion not adopted in Florida) [ 39 , 40 ]. In addition, Maryland and Florida are among a small subset of states in which the SIDs contain a “VisitLink” variable that tracks unique patients within the state and across years from 2016 to 2019, allowing for the longitudinal analysis of readmissions across hospitals and different calendar years [ 41 ]. The SIDs were also linked to the American Hospital Association’s Annual Survey Database to obtain hospital-level information. The study population excluded admissions where patients were aged <18 years, died in hospitals, were discharged against medical advice, or had insufficient information to calculate readmission (eg, missing the VisitLink variable or length of stay).

Study Outcome

The calculation of 30-day readmission followed the definition used by the HCUP [ 42 ]. Any inpatient admission was counted as an index admission. The all-cause 30-day readmission rate was defined as the number of admissions with at least 1 subsequent hospital admission within 30 days, divided by the total number of admissions during the study period. Unplanned, all-cause 30-day hospital readmissions were identified using the methodology developed by CMS [ 43 , 44 ]. The study cohort selection process and determination of unplanned readmission are outlined in Figure 1 .

Predictive Models

The LACE index [ 45 ], the HOSPITAL score [ 46 ], and the CMS hospital-wide all-cause readmission measure [ 43 ] were included in the analysis as they were validated externally and commonly used in practice based on our prior review [ 7 ]. The LACE index and the HOSPITAL score were designed for hospital staff to identify patients at high risk of readmission for targeted intervention efforts and have been converted to a scoring system and extensively validated. Thus, the 2 models were applied to obtain the predicted risk scores without retraining, to mimic how the models were used in practice. In total, 2 of the HOSPITAL score predictors—low hemoglobin and low sodium levels at discharge—were not available in the SIDs, and thus were excluded. The total risk scores were adjusted as a result. Details of model variables and how the 2 models were implemented are reported in Multimedia Appendices 1 and 2 . The CMS measure was evaluated using 2 approaches: applied as-is with existing coefficients and retrained to generate new coefficients using 50% of the sample. To ensure comparability between the CMS measure and other models, the predicted patient-level risk was used without the hospital-level effect from the original measure, and the CMS measure was limited to the “medicine cohort” [ 43 ]. On the basis of the CMS measure’s specification report, the patient population was divided into 5 mutually exclusive cohorts: surgery or gynecology, cardiorespiratory, cardiovascular, neurology, and medicine. The cohorts were determined using the Agency for Healthcare Research and Quality Clinical Classifications Software categories [ 43 ]. The medicine cohort was randomly split 50-50 into a retraining and testing data set. The CMS measure includes age and >100 variables, representing a wide range of condition categories. The measure was trained on the retraining data set with 5 cross-validations and then run on the testing data set using the new coefficients to obtain the performance and bias metrics for the CMS retrained model. Separately, the CMS measure with the published coefficients was run on the full medicine cohort data set to obtain performance and bias metrics for the CMS as-is model. The existing model thresholds were used to classify a positive, or high-risk, class: 10 points for LACE, and high-risk (5 in the adjusted scoring) for modified HOSPITAL. The optimal threshold identified using the Youden Index [ 47 ] on the receiver operating characteristic curve was used for the 2 CMS measures.

We measured predictive performances and biases between Black and White subpopulations and between low-income and other-income subpopulations. Race is a normalized variable in the HCUP that indicates race and ethnicity. The low-income group was defined as the fourth quartile of the median state household income, whereas the remaining 3 quartiles were grouped as other income. The median state income quartiles were provided in HCUP SIDs and were calculated based on the median income of the patient’s zip code. Predictive performances of each model were derived for all population and each subpopulation using area under the curve (AUC), Brier statistic, and Hosmer-Lemeshow goodness of fit. Bias was represented by the group difference of the mathematical measures: false positive rate (FPR) difference (eg, FPR between Black and White patients), FNR difference, 0-1 loss difference, and generalized entropy index (GEI). FNR was calculated as the ratio between false negatives (those predicted as low risk while having an unplanned 30-day readmission) and the total number of positives. Similarly, the FPR was calculated as the ratio of false positives out of the total number of negative cases. Normalized total error rates is 0-1 loss, and it is calculated as the percentage of incorrect predictions. Bias measured by FPR, FNR, and 0-1 loss differences focus on unequal error rates. The GEI is a measure of income inequality and proposed to measure algorithm fairness between groups with a range between 0 and infinity, in which lower scores represent more equity [ 48 ].

Ethical Considerations

This study was not human subjects research, as determined by the Johns Hopkins School of Public Health Institutional Review Board. No compensation was provided.

Statistical Analysis

Primary analyses were conducted using R (version 4.0.2; R Foundation for Statistical Computing). The aggregate condition categories required to calculate unplanned readmission and CMS measures were calculated in SAS software (version 9.4; SAS Institute) using the programs provided by the agencies [ 49 , 50 ]. GEI measures were calculated using the AI Fairness 360 package published by IBM Corp [ 51 ]. The unit of analysis was admission. FNR and FPR results were first stratified by individual hospital and visualized in a scatter plot. The racial bias results were then stratified by hospital population composition (eg, percentage of Black patients), which was shown to associate with the overall outcome of a hospital [ 35 ]. Hospitals were binned by the percentage of Black patients served in a hospital (eg, >10% and >20%), and the racial bias measures with their 95% CIs were calculated for each bin. For FNR difference, FPR difference, and 0-1 loss difference, the distribution across 2 groups was calculated, and the significance of the measure difference was assessed using the Student t test (2-tailed) under the null hypothesis that the group difference was equal to 0. For all statistical tests, an α of .05 was used.

Demographic and Clinical Characteristics

As presented in Table 1 , among the 1,857,658 Maryland inpatient discharges from 2016 to 2019, a total of 55.41% (n=1,029,292) were White patients and 33.71% (n=626,280) were Black patients, whereas in Florida, 64.49% (5,632,318/8,733,002) of the inpatient discharges were White patients and 16.59% (1,448,620/8,733,002) were Black patients.

White patients in both states were older, more likely to be on private insurance, and less likely to reside in large metropolitan areas or be treated in major teaching or large hospitals in urban areas. Compared to White patients, Black patients in Maryland had a longer length of inpatient stay, more inpatient procedures, fewer inpatient diagnoses, higher inpatient charges, and more comorbidities and were more likely to be discharged to home or self-care. However, Black patients in Florida had fewer inpatient diagnoses, fewer procedures, and fewer total charges. These patients also had longer lengths of inpatient stays, more comorbidities, and were more likely to be discharged to home or self-care. In both Maryland and Florida, those in the lowest income quartile were younger, had a longer length of inpatient stay, had higher inpatient charges, had more comorbidities, and had fewer procedures than other-income groups. The low-income group was less likely to reside in metropolitan areas but was more likely to be treated in major teaching hospitals. Except for those noted in footnote c of Table 1 , all characteristics showed statistically significant differences between racial and income groups (all P values <.001).

a MD: Maryland.

b FL: Florida.

c P values were computed between racial groups and between income groups, respectively. All P values are <.001 except for the ones in this footnote: P value for female between income groups=.80 and for discharge type between income groups=.99.

d CCI: Charlson Comorbidity Index.

Predictive Performance

The observed 30-day unplanned readmission rates in Maryland were higher in the Black and low-income patient groups (ie, 11.13% for White patients, 12.77% for Black patients, 10.59% for other-income patients, and 12.73% for low-income patients; Table 2 ).

a Predicted: the predicted readmission rates for LACE and HOSPITAL were calculated as the percentage of patients at high risk of unplanned readmission based on the model output for the group; and the predicted readmission rates for the two CMS models were the predicted probability of being at high risk of unplanned readmission for the group.

b LACE: The LACE Index for readmission risk.

c HOSPITAL: The modified HOSPITAL score for readmission risk.

d CMS: Centers for Medicare & Medicaid Services (readmission measure).

e MD: Maryland.

f FL: Florida.

A fair and well-calibrated predictive model would be assumed to overpredict or underpredict readmission rates to a similar degree across racial or income groups. Compared to the observed readmission rates, the LACE index overestimated readmission rates in all subpopulations and was more pronounced in Black and low-income populations. The readmission rates estimated by the modified HOSPITAL score were closest to the observed rates. The CMS as-is model underestimated across subpopulations, and the estimated rates of readmission were similar between subpopulations, while the retrained CMS model overestimated in all subpopulations to a similar degree. In Florida, the observed 30-day unplanned readmission rates were higher than those in Maryland in all populations. Similar to Maryland, Florida’s observed readmission rates were also higher in the Black and low-income groups (ie, 13.94% for White populations, 17.14% for Black populations, 13.6% for other-income populations, and 16.03% for low-income populations) and had similar overestimation and underestimation patterns ( Table 2 ).

As presented in Table 3 , in Maryland, the retrained CMS model had better predictive performance (AUC 0.74 in all subpopulations) than the other 3 models, which only achieved moderate predictive performance (AUC between 0.65 and 0.68). The modified HOSPITAL score had the best calibration (Brier score=0.16−0.19 in all subpopulations), whereas the CMS as-is model performed poorly on the Brier score. Calibration was better in the White (compared to the Black) population and other-income (compared to low-income) populations in both states, and the AUC was higher or similar in the Black (compared to the White) population. In Florida, the CMS retrained model also performed better than the other models in all subpopulations (AUC 0.68-0.72), and the modified HOSPITAL score had the best calibration (Brier score 0.19-0.21). All models demonstrated excellent goodness of fit across subpopulations ( Table 3 ).

a LACE: The LACE Index for readmission risk.

b HOSPITAL: The modified HOSPITAL score for readmission risk.

c CMS: Centers for Medicare & Medicaid Services (readmission measure).

d MD: Maryland.

e FL: Florida.

f AUC: area under the curve.

Bias Measures

Misclassification rates (ie, FPR difference and FNR difference) indicate relative between-group bias, whereas 0-1 loss differences indicate the overall error rates between groups. The between-group GEI indicates how unequally an outcome is distributed between groups [ 48 ]. In Maryland, the retrained CMS model and the modified HOSPITAL score had the lowest racial and income bias ( Table 4 ).

Specifically, the modified HOSPITAL score demonstrated the lowest racial bias based on 0-1 loss, FPR difference, and GEI, and the lowest income bias based on FPR and GEI. Retrained CMS demonstrated the lowest racial bias based on 0-1 loss and FNR difference, and the lowest income bias on all 4 measures. In Florida, racial biases based on FPR and FNR differences was generally greater than that in Maryland, especially for FNR differences. In Florida, the modified HOSPITAL score showed the lowest racial bias based on 0-1 loss, FPR difference, and GEI; the LACE index showed the lowest racial bias in FNR difference. Each model scored the best in at least one measure of income bias, but the overall HOSPITAL score and retrained CMS showed the lowest income bias in Florida. In both states, the White and other-income patient groups had a higher FNR, indicating that they were more likely to be predicted as low risk while having a 30-day unplanned readmission. The Black and low-income patient groups had a higher FPR, indicating that they were more likely to be predicted to be high-risk and not have a 30-day unplanned readmission. The overall error rates were higher in the Black and low-income patient groups compared to the White and other-income patient groups, respectively. Except for GEI and the values noted with a footnote in Table 4 , all other measures showed statistically significant differences (all P values <.001) between racial and income groups, respectively.

a The columns Difference (B-W) and Difference (L-O) indicate algorithmic bias measured as the difference in the bias measure (eg, FNR and FPR) between Black and White patients and between low-income and other-income groups.

c FNR: false negative rate.

d All P values of the bias measures are <.001 except for the ones in this footnote: the P value for FNR difference of LACE in MD is .41, and the FNR difference of CMS retrained in MD is .45, and FNR difference of CMS retrained in FL is .005. Statistical tests were not conducted for the GEI as this measure produces one value for the population.

e FPR: false positive rate.

f GEI: generalized entropy index.

g N/A: not applicable.

h HOSPITAL: The modified HOSPITAL score for readmission risk.

i CMS: Centers for Medicare & Medicaid Services (readmission measure).

Stratification Analyses

The results were first stratified by hospital and then by patient population composition (percentage of Black patients). As shown in Figure 2 , the models’ FNR differences and FPR differences between the Black and White patients varied by hospital within the state, indicating hospital shifts when applying the same model. The modified HOSPITAL score was more likely to cluster near the “equality lines” (ie, when the FNR or FPR difference is 0) than other models in both states. Colors representing LACE and CMS as-is were mostly distributed in the first quadrant in Maryland, indicating that the majority of hospitals had a positive FPR difference (ie, Black patients with higher FPR) and a negative FNR difference (ie, White patients with higher FNR) when applying these 2 models ( Figure 2 ). Despite most hospitals falling in the first quadrant, the variance between hospitals appeared to be greater in Florida ( Figure 3 ). In addition, more hospitals in Florida fell in the far corners of the first and fourth quadrants than those in Maryland, indicating more hospitals with severe bias (eg, large racial differences in FPR or FNR). Refer to Multimedia Appendix 3 for the measures of income bias and hospital distribution for Maryland and Florida.

Hospitals with a higher percentage of Black patients have been shown to be associated with low resources and poorer outcomes for their patients [ 35 ]; thus, the results were stratified by the proportion of Black patients served in a hospital. In Figures 4 and 5 , each data point represents the racial bias (FNR difference or FPR difference) in a stratum of hospitals with a certain percentage of Black patients (eg, hospitals with at least 20% of Black patients). The error bars show the 95% CI of the bias measure in the strata. In both figures, the racial biases of all models, represented as FNR and FPR differences, decreased and approached zero as the hospital population became more diverse. In Maryland, the diminishing racial bias was particularly notable in hospitals where >50% of patients were Black ( Figure 4 ). The diminishing racial bias was also observed in Florida’s hospitals ( Figure 5 ). The direction of bias flipped for the LACE index and the modified HOSPITAL score in Florida hospitals with >50% of Black patients. In hospitals with a lower percentage of Black patients, Black patients had a lower FNR compared to White patients, while in hospitals with a higher percentage of Black patients, White patients had a higher FNR ( Figure 4 ). In Florida, the widening gap shown in the 2 CMS models for hospitals serving >60% of Black patients was likely attributed to the small number of hospitals and small sample size in the strata ( Figure 5 ). Refer to Multimedia Appendix 4 for the details on the bias measures stratified by payers for both Maryland and Florida.

Overall Findings

The abundance of research on fairness and bias has provided potential means to quantify bias, but there has been a gap to operationalize these metrics, interpret them in specific contexts, and understand their impact on downstream health disparity [ 7 ]. Our analysis demonstrated a practical use case for measuring algorithmic bias when applying or deploying previously validated 30-day hospital readmission predictive models in a new setting. Our approach to testing the fairness measures could serve as a framework for routine assessment of algorithmic bias for health care predictive models, and our results also revealed the complexity and limitations of using mathematical bias measures. According to these bias measures, the retrained CMS model and the modified HOSPITAL score showed the best predictive performance and the lowest bias in Maryland and Florida. However, the CMS as-is model showed subpar performance in both states, indicating that retraining on the local data not only improved predictive performance but also reduced group bias. In addition, large variations were detected between hospitals, and system- or hospital-level factors needed to be considered when interpreting algorithmic bias.

Measure Interpretation

Caution must be taken when using algorithmic bias to guide equitable intervention allocation, as the bias measures may not include key context. When designing a risk-based intervention based on model output, we would be naturally more concerned about FNR, as a higher FNR means a groups that is more likely to be predicted low in risk of readmission will indeed be readmitted, indicating missed opportunities for intervention [ 52 ]. Looking at bias measures alone, our results suggest that the risk to White and higher-income patients has a systematically higher proportion of false negatives estimated by common readmission models, suggesting more missed opportunities to intervene and prevent unplanned readmissions. This observation is contrary to our assumption, and other parts of the results show that White and higher-income patients were less sick with lower readmission rates. An explanation would suggest that the higher FNR observed in the White and higher-income patient groups might be attributed to health care use patterns. For example, research has shown that White individuals and higher socioeconomic patient groups were more likely to overuse health care resources, while Black patients and disadvantaged groups tended to underuse them [ 53 - 55 ]. The overutilizers could have more unplanned visits to the hospital when the risk was not high, while the underusing group may be more likely to defer or skip care and only use costly hospital resources when they must. Similarly, a higher FPR in Black and low-income patient groups would indicate more wasted resources on “false positives.” However, such a conclusion did not align with the rest of the study findings. These subpopulations, on average, had more chronic comorbidities and longer inpatient stays, indicating that Black and low-income patient groups were more likely to have conditions that warrant an unplanned readmission but did not show up in the observed data, potentially alluding to a health care access gap in these groups. In this case, drawing a conclusion simply based on the face value of higher FPR would lead to a reduction in the resources allocated to the sicker, more vulnerable populations. It is also important to note that, despite the racial difference in health behaviors and outcomes, race merely represents a social classification rather than the driver of the observed differences [ 56 ]. Although the performance of the evaluated readmission models differed by race, we do not recommend including race as a variable in a predictive model unless race is a biological or clinical risk factor for the predictive outcome.

The interpretation of measurable bias requires considering models’ predictive performance, the nature of health data, analytic frameworks, and the underlying health care delivery system. In our analysis, all models had modest performance, and the high FNRs may deter their application in a real setting, especially in the score-based models of LACE and HOSPITAL (ie, FNR ranges from 0.63 to 0.75). When calculating these measures, we assumed the observed outcome (ie, 30-day unplanned readmission) as the ground truth; however, it was important to recognize the key limitations of this truth and the measured bias. First, despite the HCUP state inpatient data being one of the most comprehensive and high-quality data for studying readmission, no guarantee existed that all readmissions and their causes were captured. It is possible that a patient had conditions that warranted an unplanned revisit to the hospital but either did not occur due to the patient’s unwillingness to seek treatment in time [ 57 , 58 ] or did not get documented (eg, out-of-state admissions were not captured in HCUP’s state-wide inpatient data by design). Such underdocumentation was more likely to impact disadvantaged populations and those with fragmented care, thus introducing embedded bias into the underlying data. Second, a higher percentage of Black patients sought care in academic teaching institutions (eg, 120,649/626,280, 19.26% of Black patients in Maryland and 231,379/1,448,620, 15.97% of Black patients in Florida, compared to 181,493/1,029,292, 17.63% of White patients in Maryland and 576,819/5,632,318, 10.24% of White patients in Florida), which were generally considered to deliver high-quality care [ 35 , 59 , 60 ]. These hospitals may have a more effective readmission prevention program while serving sicker patients, contributing to a higher FPR among Black and low-income patients. Third, as shown in Figure 2 , we observed that hospitals that served a high proportion of Black patients had a lower algorithmic bias. For example, in Maryland, the majority Black hospitals (>70% of patients served are Black) were in resource-poor neighborhoods, and both White and Black patients had similar higher-than-average readmission rates in these hospitals (data not shown). The fairer model performance in these hospitals was not necessarily a reflection of a higher quality of care, as all patients served in those hospitals had higher unplanned readmission rates. Finally, whether a readmission was unplanned or planned was determined using a well-established algorithm developed by CMS [ 43 , 44 ], which categorized readmissions based on the nature of the diagnoses and procedures (eg, acute vs routine). Research demonstrated that different diagnosis intensities existed between regions and hospitals, and a higher intensity of services was associated with a higher prevalence of common chronic diseases [ 61 ]. If diagnosis was not just a patient attribute but indeed reflected the systematic characteristics of the health care environment [ 62 ], the quality of unplanned readmission classification and other predictors in our models would be subject to encoded bias in the health care system. In fact, in our population, the average number of diagnoses was higher in White patients than in Black patients and higher in Maryland than in Florida, indicating the presence of such systematic variation ( Table 1 ). Of course, this is not a unique issue with our data set; electronic health records and other health data sets also reflect histories of unequal access to health care and carry racial, ethnic, socioeconomic, and other societal biases due to how the data are collected [ 2 , 3 , 63 ].

Utility of Bias Measures

Once the limitations of real-world health data are acknowledged, the expectation of equity and interpretation of the measurable bias should adjust accordingly. First, it will be too restrictive to expect mathematical equality for measurable bias; rather, it is best viewed as a relative value to aid in the selection of a less biased model. Most real-world problems are based on imperfect data, and pushing the model to perform equally on these measures will inevitably create unintended results (eg, sacrificing accuracy and potentially increasing bias for other subpopulations) [ 15 ]. Second, a validated and accurate model may reveal the gap between the “supposed-to-be” state and the reality in the underlying data, showing areas of unmet needs [ 16 , 64 ], as we observed in our Black and low-income populations. Finally, the bias measures alone provide limited evidence about which group is being biased against and in which way. A conclusion based solely on the face value of a few bias measures can be misleading and may exacerbate the disparity already faced by marginalized groups. These quantitative bias measures are useful to evaluate a model’s disparate group performance on a given data set, but they are insufficient to inform the intervention allocation or mechanisms of potential bias, which are key to the mitigation strategies [ 15 ]. In addition, our study did not evaluate other definitions of bias, such as calibration or predictive parity, which do not focus on error rates and may require unique interpretation considerations.

This analysis addressed a fundamental gap in operationalizing fairness techniques. The selection of a bias definition and appropriate bias measures is as important as detecting bias itself, yet it has remained a blind spot in practice [ 2 ]. In addition to the fact that these mathematical notions cannot be satisfied simultaneously, using the appropriate measures is also highly contextual and data dependent [ 65 , 66 ]. For example, having a model with equal positive predictions across groups (known as demographic or statistical parity) would not be a meaningful measure for inherently unbalanced outcomes such as 30-day readmissions; however, based on the fairness concept, satisfying any of the bias measures would mean a fair model. In this study, the 4 evaluated bias measures showed consistent results, despite each measuring a different definition of bias. All selected measures were able to demonstrate the magnitude of bias, but FNR and FPR differences were the most informative, as they indicated the direction of bias and were more interpretable in the context of mitigation actions. In our attempt to translate the algorithmic bias findings to intervention planning, we found that the bias measures could serve as a quick and routine assessment to compare algorithms, subpopulations, or localities (eg, hospitals) to help target further investigation of drivers of potential disparity. However, simply relying on these statistical notions to make decisions could obscure or underplay the causes of health care disparities, and a more comprehensive approach is necessary. In real-world applications, the practical goal of predictive modeling must incorporate predictive accuracy and algorithmic bias, among other operational considerations. As there is usually a trade-off between these 2 model performance goals, the best model is likely the one that balances the 2 goals rather than the one achieving the highest possible accuracy or fairness alone.

Limitations

Our analysis has several limitations and caveats. First, none of the models evaluated in this analysis had high accuracy, which may affect the measurement of misclassifications. For simplicity and the focus on interpreting the bias measures, we did not evaluate machine learning models that usually improve local accuracy [ 20 ]. While the LACE index and the HOSPITAL score were used by hospitals to manage readmissions, the CMS measure was mostly used in payer operations or population health management in addition to CMS purposes (eg, budget allocation and hospital penalties); thus, it was not used as a typical predictive model. Although we believe the models evaluated in this study represented practical scenarios, we were unable to assess if a particular type of models, variables, weights, or modeling structures were more likely to be algorithmically biased. Second, we did not evaluate the scenario in which models can be optimized to minimize and constraint bias during training or retraining. Model optimization has been a popular approach to developing fair models but, it was considered out of scope as this analysis focused on model application and bias identification. Third, we only included bias measures that are algorithm-agnostic and can be routinely calculated; thus, they were not comprehensive or exclusive. Fourth, the conclusion was based on Maryland and Florida data, which would not represent all states nor the national average. For example, Maryland is a small state with an all-payer model payment system [ 39 ] and a high percentage of patients seeking care in neighboring states, whereas Florida is a large state with a large Hispanic population and has not adopted Medicaid expansion [ 40 ]. In addition, the data set we used was administrative in nature and did not have the detailed medical information (eg, medications, laboratory results, and clinical notes) to fully evaluate the potential drivers of our results, such as selection bias [ 67 ], data quality factors [ 68 ], and more accurate ascertainment of the outcome (ie, unplanned readmissions).

Conclusions

In conclusion, our analysis found that fairness metrics were useful to serve as a routine assessment to detect disparate model performance in subpopulations and to compare predictive models. However, these metrics have limited interpretability and are insufficient to inform mechanisms of bias or guide intervention planning. Further testing and demonstration will be required before using mathematical fairness measures to guide key decision-making or policy changes. Despite these limitations, demonstrating the differential model performances (eg, misclassification rates) is often the first step in recognizing potential algorithmic bias, which will be necessary as health care organizations move toward data-driven improvement in response to existing health care disparities. The potential subtle—and not so subtle—imperfections of underlying health data, analytic frameworks, and the underlying health care delivery system must be carefully considered when evaluating the potential bias that exists within predictive models. Finally, future research is required to improve the methodology of measuring algorithmic bias and to test more fairness definitions and measures (eg, calibration parity) through an operational lens. Future studies should also explore how modeling factors influence algorithmic bias (eg, how variable inclusions, weights, or scoring schemes affect the model’s differential performance). We hope that algorithmic bias assessment can be incorporated into routine model evaluation and ultimately inform meaningful actions to reduce health care disparity.

Acknowledgments

The authors acknowledge the contributions of Dr Darrell Gaskin and Dr Daniel Naiman of Johns Hopkins University for their input into the study conceptualization and results interpretation.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data Availability

The data sets analyzed during this study are available for a fee from the Agency for Healthcare Research and Quality Healthcare Cost and Utilization Project [ 37 ].

Authors' Contributions

HEW and HK conceived the study concept, and all authors contributed to the study design. HEW analyzed the data and wrote the manuscript. HK and HEW interpreted the results, and all authors provided input for the interpretations. All authors reviewed and contributed to the final manuscript.

Conflicts of Interest

HEW is an employee of Merck & Co, and the employer had no role in the development or funding of this work. SS has received funding from NIH, NSF, CDC, FDA, DARPA, AHA, and Gordon Betty Moore Foundation. She is an equity holder in Bayesian Health, a clincial AI platform company; Duality Tech, a privacy preserving technology comlany and sits on the scientific advisory board of large life sciences (Eg Sanofi) and digital health startups (eg Century Health). She has received honoraria for talks from a number of biotechnology, research and health-tech companies. This arrangement has been reviewed and approved by the Johns Hopkins University in accordance with its conflict-of-interest policies.

The LACE index.

The modified HOSPITAL score.

Income bias and hospital distribution in Maryland and Florida.

Racial and income bias measures by payer in Maryland and Florida.

Rojas JC, Fahrenbach J, Makhni S, Cook SC, Williams JS, Umscheid CA, et al. Framework for integrating equity into machine learning models: a case study. Chest. Jun 2022;161(6):1621-1627. [ FREE Full text ] [ CrossRef ] [ Medline ]
Ferryman K. Addressing health disparities in the Food and Drug Administration's artificial intelligence and machine learning regulatory framework. J Am Med Inform Assoc. Dec 09, 2020;27(12):2016-2019. [ FREE Full text ] [ CrossRef ] [ Medline ]
Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical machine learning in healthcare. Annu Rev Biomed Data Sci. Jul 2021;4:123-144. [ FREE Full text ] [ CrossRef ] [ Medline ]
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. Oct 25, 2019;366(6464):447-453. [ CrossRef ] [ Medline ]
Juhn YJ, Ryu E, Wi C, King KS, Malik M, Romero-Brufau S, et al. Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index. J Am Med Inform Assoc. Jun 14, 2022;29(7):1142-1151. [ FREE Full text ] [ CrossRef ] [ Medline ]
Chen IY, Szolovits P, Ghassemi M. Can AI help reduce disparities in general medical and mental health care? AMA J Ethics. Feb 01, 2019;21(2):E167-E179. [ FREE Full text ] [ CrossRef ] [ Medline ]
Wang HE, Landers M, Adams R, Subbaswamy A, Kharrazi H, Gaskin DJ, et al. A bias evaluation checklist for predictive models and its pilot application for 30-day hospital readmission models. J Am Med Inform Assoc. Jul 12, 2022;29(8):1323-1333. [ FREE Full text ] [ CrossRef ] [ Medline ]
Obermeyer Z, Nissan R, Stern M. Algorithmic bias playbook. Center for Applied AI. 2021. URL: https://www.chicagobooth.edu/research/center-for-applied-artificial-intelligence/research/algorithmic-bias/playbook [accessed 2024-03-05]
Xu J, Xiao Y, Wang WH, Ning Y, Shenkman EA, Bian J, et al. Algorithmic fairness in computational medicine. EBioMedicine. Oct 2022;84:104250. [ FREE Full text ] [ CrossRef ] [ Medline ]
Institute of Medicine (US) Committee on Understanding and Eliminating Racial and Ethnic Disparities in Health Care, Smedley BD, Stith AY, Nelson AR. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. Washington, DC. National Academies Press; 2003.
Rathore SS, Krumholz HM. Differences, disparities, and biases: clarifying racial variations in health care use. Ann Intern Med. Oct 19, 2004;141(8):635-638. [ CrossRef ] [ Medline ]
Verma S, Rubin J. Fairness definitions explained. In: Proceedings of the 40th International Workshop on Software Fairness. 2018. Presented at: FairWare '18; May 29, 2018;1-7; Gothenburg, Sweden. URL: https://dl.acm.org/doi/10.1145/3194770.3194776
Institute of Medicine (US) Committee on Quality of Health Care in America. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC. National Academies Press; 2001.
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv. Jul 13, 2021;54(6):1-35. [ CrossRef ]
Pfohl SR, Foryciarz A, Shah NH. An empirical characterization of fair machine learning for clinical risk prediction. J Biomed Inform. Jan 2021;113:103621. [ FREE Full text ] [ CrossRef ] [ Medline ]
Allen A, Mataraso S, Siefkas A, Burdick H, Braden G, Dellinger RP, et al. A racially unbiased, machine learning approach to prediction of mortality: algorithm development study. JMIR Public Health Surveill. Oct 22, 2020;6(4):e22400. [ FREE Full text ] [ CrossRef ] [ Medline ]
DeCamp M, Lindvall C. Latent bias and the implementation of artificial intelligence in medicine. J Am Med Inform Assoc. Dec 09, 2020;27(12):2020-2023. [ FREE Full text ] [ CrossRef ] [ Medline ]
Artetxe A, Beristain A, Graña M. Predictive models for hospital readmission risk: a systematic review of methods. Comput Methods Programs Biomed. Oct 2018;164:49-64. [ CrossRef ] [ Medline ]
Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk prediction models for hospital readmission: a systematic review. JAMA. Oct 19, 2011;306(15):1688-1698. [ FREE Full text ] [ CrossRef ] [ Medline ]
Huang Y, Talwar A, Chatterjee S, Aparasu RR. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Med Res Methodol. May 06, 2021;21(1):96. [ FREE Full text ] [ CrossRef ] [ Medline ]
Mahmoudi E, Kamdar N, Kim N, Gonzales G, Singh K, Waljee AK. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. BMJ. Apr 08, 2020;369:m958. [ FREE Full text ] [ CrossRef ] [ Medline ]
Hospital readmissions reduction program (HRRP). Centers for Medicare & Medicaid Services. Sep 2023. URL: https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Readmissions-Reduction-Program [accessed 2023-12-05]
Teo K, Yong CW, Muhamad F, Mohafez H, Hasikin K, Xia K, et al. The promise for reducing healthcare cost with predictive model: an analysis with quantized evaluation metric on readmission. J Healthc Eng. 2021;2021:9208138. [ FREE Full text ] [ CrossRef ] [ Medline ]
Romero-Brufau S, Wyatt KD, Boyum P, Mickelson M, Moore M, Cognetta-Rieke C. Implementation of artificial intelligence-based clinical decision support to reduce hospital readmissions at a regional hospital. Appl Clin Inform. Aug 02, 2020;11(4):570-577. [ FREE Full text ] [ CrossRef ] [ Medline ]
Zhou H, Della PR, Roberts P, Goh L, Dhaliwal SS. Utility of models to predict 28-day or 30-day unplanned hospital readmissions: an updated systematic review. BMJ Open. Jun 27, 2016;6(6):e011060. [ FREE Full text ] [ CrossRef ] [ Medline ]
Pandey A, Keshvani N, Khera R, Lu D, Vaduganathan M, Joynt Maddox KE, et al. Temporal trends in racial differences in 30-day readmission and mortality rates after acute myocardial infarction among medicare beneficiaries. JAMA Cardiol. Feb 01, 2020;5(2):136-145. [ FREE Full text ] [ CrossRef ] [ Medline ]
Rodriguez-Gutierrez R, Herrin J, Lipska KJ, Montori VM, Shah ND, McCoy RG. Racial and ethnic differences in 30-day hospital readmissions among US adults with diabetes. JAMA Netw Open. Oct 02, 2019;2(10):e1913249. [ FREE Full text ] [ CrossRef ] [ Medline ]
Jiang HJ, Andrews R, Stryer D, Friedman B. Racial/ethnic disparities in potentially preventable readmissions: the case of diabetes. Am J Public Health. Sep 2005;95(9):1561-1567. [ CrossRef ] [ Medline ]
Tsai TC, Orav EJ, Joynt KE. Disparities in surgical 30-day readmission rates for medicare beneficiaries by race and site of care. Ann Surg. Jun 2014;259(6):1086-1090. [ FREE Full text ] [ CrossRef ] [ Medline ]
Joynt KE, Orav EJ, Jha AK. Thirty-day readmission rates for medicare beneficiaries by race and site of care. JAMA. Feb 16, 2011;305(7):675-681. [ FREE Full text ] [ CrossRef ] [ Medline ]
Kind AJ, Jencks S, Brock J, Yu M, Bartels C, Ehlenbach W, et al. Neighborhood socioeconomic disadvantage and 30-day rehospitalization: a retrospective cohort study. Ann Intern Med. Dec 02, 2014;161(11):765-774. [ FREE Full text ] [ CrossRef ] [ Medline ]
Hu J, Kind AJ, Nerenz D. Area deprivation index predicts readmission risk at an urban teaching hospital. Am J Med Qual. 2018;33(5):493-501. [ FREE Full text ] [ CrossRef ] [ Medline ]
Gershon AS, Thiruchelvam D, Aaron S, Stanbrook M, Vozoris N, Tan WC, et al. Socioeconomic status (SES) and 30-day hospital readmissions for chronic obstructive pulmonary (COPD) disease: a population-based cohort study. PLoS One. 2019;14(5):e0216741. [ FREE Full text ] [ CrossRef ] [ Medline ]
Gaskin DJ, Dinwiddie GY, Chan KS, McCleary R. Residential segregation and disparities in health care services utilization. Med Care Res Rev. Apr 2012;69(2):158-175. [ FREE Full text ] [ CrossRef ] [ Medline ]
López L, Jha AK. Outcomes for whites and blacks at hospitals that disproportionately care for black medicare beneficiaries. Health Serv Res. Feb 2013;48(1):114-128. [ FREE Full text ] [ CrossRef ] [ Medline ]
LaVeist T, Pollack K, Thorpe Jr R, Fesahazion R, Gaskin D. Place, not race: disparities dissipate in southwest Baltimore when blacks and whites live under similar conditions. Health Aff (Millwood). Oct 2011;30(10):1880-1887. [ FREE Full text ] [ CrossRef ] [ Medline ]
Healthcare cost and utilization project (HCUP). Agency for Healthcare Research and Quality. 2023. URL: https://www.ahrq.gov/data/hcup/index.html [accessed 2024-03-05]
Metcalfe D, Zogg CK, Haut ER, Pawlik TM, Haider AH, Perry DC. Data resource profile: state inpatient databases. Int J Epidemiol. Dec 01, 2019;48(6):1742-172h. [ FREE Full text ] [ CrossRef ] [ Medline ]
Maryland all-payer model. Center for Medicare & Medicaid Services. Aug 2022. URL: https://www.cms.gov/priorities/innovation/innovation-models/maryland-all-payer-model [accessed 2023-03-05]
Status of state action on the Medicaid expansion decision. Kaiser Family Foundation. Sep 2022. URL: https://www.cms.gov/priorities/innovation/innovation-models/maryland-all-payer-model [accessed 2023-03-05]
User guide: HCUP supplemental variables for revisit analyses. Agency for Healthcare Research and Quality. 2022. URL: https://www.hcup-us.ahrq.gov/toolssoftware/revisit/UserGuide-SuppRevisitFilesCD.pdf [accessed 2024-03-06]
Statistical brief no. 248. characteristics of 30-day all-cause hospital readmissions, 2010-2016. Healthcare Cost and Utilization Project. Feb 2019. URL: https://hcup-us.ahrq.gov/reports/statbriefs/sb248-Hospital-Readmissions-2010-2016.jsp [accessed 2023-03-05]
Hospital-wide all-cause risk-standardized readmission measure: measure methodology report. Center for Medicare & Medicaid Services. Jul 2012. URL: https://www.cms.gov/priorities/innovation/files/fact-sheet/bpciadvanced-fs-nqf1789.pdf [accessed 2023-03-04]
Horwitz LI, Grady JN, Cohen DB, Lin Z, Volpe M, Ngo CK, et al. Development and validation of an algorithm to identify planned readmissions from claims data. J Hosp Med. Oct 2015;10(10):670-677. [ FREE Full text ] [ CrossRef ] [ Medline ]
van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ. Apr 06, 2010;182(6):551-557. [ FREE Full text ] [ CrossRef ] [ Medline ]
Donzé J, Aujesky D, Williams D, Schnipper JL. Potentially avoidable 30-day hospital readmissions in medical patients: derivation and validation of a prediction model. JAMA Intern Med. Apr 22, 2013;173(8):632-638. [ CrossRef ] [ Medline ]
Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biom J. Aug 2005;47(4):458-472. [ CrossRef ] [ Medline ]
Speicher T, Heidari H, Grgic-Hlaca N. A unified approach to quantifying algorithmic unfairness: measuring individual and group unfairness via inequality indices. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018. Presented at: KDD '18; August 19-23, 2018;2239-2248; London, UK. URL: https://dl.acm.org/doi/10.1145/3219819.3220046 [ CrossRef ]
Clinical classifications software (CCS) for ICD-10-PCS (beta version). Healthcare cost and utilization project (HCUP). Agency for Healthcare Research and Quality. Rockville, MD.; Nov 2019. URL: https://hcup-us.ahrq.gov/toolssoftware/ccs10/ccs10.jsp [accessed 2021-11-10]
Risk adjustment 2020 model software/ICD-10 mappings 2020. Center for Medicare & Medicaid Services. Sep 2020. URL: https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors [accessed 2023-12-05]
AI fairness 360. IBM Corp. 2020. URL: https://aif360.res.ibm.com/ [accessed 2021-04-05]
Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. Dec 18, 2018;169(12):866-872. [ FREE Full text ] [ CrossRef ] [ Medline ]
Chandler D. The underutilization of health services in the black community: an examination of causes and effects. J Black Stud. Aug 06, 2008;40(5):915-931. [ CrossRef ]
Best MJ, McFarland EG, Thakkar SC, Srikumaran U. Racial disparities in the use of surgical procedures in the US. JAMA Surg. Mar 01, 2021;156(3):274-281. [ FREE Full text ] [ CrossRef ] [ Medline ]
Kressin NR, Groeneveld PW. Race/Ethnicity and overuse of care: a systematic review. Milbank Q. Mar 2015;93(1):112-138. [ FREE Full text ] [ CrossRef ] [ Medline ]
Jones CP. Levels of racism: a theoretic framework and a gardener's tale. Am J Public Health. Aug 2000;90(8):1212-1215. [ CrossRef ] [ Medline ]
Hewins-Maroney B, Schumaker A, Williams E. Health seeking behaviors of African Americans: implications for health administration. J Health Hum Serv Adm. 2005;28(1):68-95. [ Medline ]
Moser DK, Kimble LP, Alberts MJ, Alonzo A, Croft JB, Dracup K, et al. Reducing delay in seeking treatment by patients with acute coronary syndrome and stroke: a scientific statement from the American Heart Association Council on cardiovascular nursing and stroke council. Circulation. Jul 11, 2006;114(2):168-182. [ CrossRef ] [ Medline ]
Allison JJ, Kiefe CI, Weissman NW, Person SD, Rousculp M, Canto JG, et al. Relationship of hospital teaching status with quality of care and mortality for medicare patients with acute MI. JAMA. Sep 13, 2000;284(10):1256-1262. [ CrossRef ] [ Medline ]
Popescu I, Nallamothu BK, Vaughan-Sarrazin MS, Cram P. Racial differences in admissions to high-quality hospitals for coronary heart disease. Arch Intern Med. Jul 26, 2010;170(14):1209-1215. [ CrossRef ] [ Medline ]
Song Y, Skinner J, Bynum J, Sutherland J, Wennberg JE, Fisher ES. Regional variations in diagnostic practices. N Engl J Med. Jul 01, 2010;363(1):45-53. [ FREE Full text ] [ CrossRef ] [ Medline ]
Welch HG, Sharp SM, Gottlieb DJ, Skinner JS, Wennberg JE. Geographic variation in diagnosis frequency and risk of death among medicare beneficiaries. JAMA. Mar 16, 2011;305(11):1113-1118. [ FREE Full text ] [ CrossRef ] [ Medline ]
Ghassemi M, Nsoesie EO. In medicine, how do we machine learn anything real? Patterns (N Y). Jan 14, 2022;3(1):100392. [ FREE Full text ] [ CrossRef ] [ Medline ]
Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial intelligence. Nat Med. Jan 2020;26(1):16-17. [ CrossRef ] [ Medline ]
Srivastava M, Heidari H, Krause A. Mathematical notions vs. human perception of fairness: a descriptive approach to fairness for machine learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. Presented at: KDD '19; August 4-8, 2019;2459-2468; Anchorage, AK. URL: https://dl.acm.org/doi/10.1145/3292500.3330664 [ CrossRef ]
Wawira Gichoya J, McCoy LG, Celi LA, Ghassemi M. Equity in essence: a call for operationalising fairness in machine learning for healthcare. BMJ Health Care Inform. Apr 2021;28(1):e100289. [ FREE Full text ] [ CrossRef ] [ Medline ]
Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. Jun 11, 2014;14:51. [ FREE Full text ] [ CrossRef ] [ Medline ]
Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform. Oct 2013;46(5):830-836. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Mavragani; submitted 11.03.23; peer-reviewed by D Nerenz, J Herington; comments to author 07.12.23; revised version received 28.12.23; accepted 27.02.24; published 18.04.24.

©H Echo Wang, Jonathan P Weiner, Suchi Saria, Hadi Kharrazi. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

Publications
Our Methods
Short Reads
Tools & Resources

Read Our Research On:

Key facts about Americans and guns

A customer shops for a handgun at a gun store in Florida.

Guns are deeply ingrained in American society and the nation’s political debates.

The Second Amendment to the United States Constitution guarantees the right to bear arms, and about a third of U.S. adults say they personally own a gun. At the same time, in response to concerns such as rising gun death rates and mass shootings , President Joe Biden has proposed gun policy legislation that would expand on the bipartisan gun safety bill Congress passed last year.

Here are some key findings about Americans’ views of gun ownership, gun policy and other subjects, drawn primarily from a Pew Research Center survey conducted in June 2023 .

Pew Research Center conducted this analysis to summarize key facts about Americans and guns. We used data from recent Center surveys to provide insights into Americans’ views on gun policy and how those views have changed over time, as well as to examine the proportion of adults who own guns and their reasons for doing so.

The analysis draws primarily from a survey of 5,115 U.S. adults conducted from June 5 to June 11, 2023. Everyone who took part in the surveys cited is a member of the Center’s American Trends Panel (ATP), an online survey panel that is recruited through national, random sampling of residential addresses. This way nearly all U.S. adults have a chance of selection. The survey is weighted to be representative of the U.S. adult population by gender, race, ethnicity, partisan affiliation, education and other categories. Read more about the ATP’s methodology .

Here are the questions used for the analysis on gun ownership , the questions used for the analysis on gun policy , and the survey’s methodology .

Additional information about the fall 2022 survey of parents and its methodology can be found at the link in the text of this post.

Measuring gun ownership in the United States comes with unique challenges. Unlike many demographic measures, there is not a definitive data source from the government or elsewhere on how many American adults own guns.

The Pew Research Center survey conducted June 5-11, 2023, on the Center’s American Trends Panel, asks about gun ownership using two separate questions to measure personal and household ownership. About a third of adults (32%) say they own a gun, while another 10% say they do not personally own a gun but someone else in their household does. These shares have changed little from surveys conducted in 2021 and 2017 . In each of those surveys, 30% reported they owned a gun.

These numbers are largely consistent with rates of gun ownership reported by Gallup , but somewhat higher than those reported by NORC’s General Social Survey . Those surveys also find only modest changes in recent years.

The FBI maintains data on background checks on individuals attempting to purchase firearms in the United States. The FBI reported a surge in background checks in 2020 and 2021, during the coronavirus pandemic. The number of federal background checks declined in 2022 and through the first half of this year, according to FBI statistics .

About four-in-ten U.S. adults say they live in a household with a gun, including 32% who say they personally own one, according to an August report based on our June survey. These numbers are virtually unchanged since the last time we asked this question in 2021.

There are differences in gun ownership rates by political affiliation, gender, community type and other factors.

Republicans and Republican-leaning independents are more than twice as likely as Democrats and Democratic leaners to say they personally own a gun (45% vs. 20%).
40% of men say they own a gun, compared with 25% of women.
47% of adults living in rural areas report personally owning a firearm, as do smaller shares of those who live in suburbs (30%) or urban areas (20%).
38% of White Americans own a gun, compared with smaller shares of Black (24%), Hispanic (20%) and Asian (10%) Americans.

A bar chart showing that nearly a third of U.S. adults say they personally own a gun.

Personal protection tops the list of reasons gun owners give for owning a firearm. About three-quarters (72%) of gun owners say that protection is a major reason they own a gun. Considerably smaller shares say that a major reason they own a gun is for hunting (32%), for sport shooting (30%), as part of a gun collection (15%) or for their job (7%).

The reasons behind gun ownership have changed only modestly since our 2017 survey of attitudes toward gun ownership and gun policies. At that time, 67% of gun owners cited protection as a major reason they owned a firearm.

A bar chart showing that nearly three-quarters of U.S. gun owners cite protection as a major reason they own a gun.

Gun owners tend to have much more positive feelings about having a gun in the house than non-owners who live with them. For instance, 71% of gun owners say they enjoy owning a gun – but far fewer non-gun owners in gun-owning households (31%) say they enjoy having one in the home. And while 81% of gun owners say owning a gun makes them feel safer, a narrower majority (57%) of non-owners in gun households say the same about having a firearm at home. Non-owners are also more likely than owners to worry about having a gun in the home (27% vs. 12%, respectively).

Feelings about gun ownership also differ by political affiliation, even among those who personally own firearms. Republican gun owners are more likely than Democratic owners to say owning a gun gives them feelings of safety and enjoyment, while Democratic owners are more likely to say they worry about having a gun in the home.

A chart showing the differences in feelings about guns between gun owners and non-owners in gun households.

Non-gun owners are split on whether they see themselves owning a firearm in the future. About half (52%) of Americans who don’t own a gun say they could never see themselves owning one, while nearly as many (47%) could imagine themselves as gun owners in the future.

Among those who currently do not own a gun:

A bar chart that shows non-gun owners are divided on whether they could see themselves owning a gun in the future.

61% of Republicans and 40% of Democrats who don’t own a gun say they would consider owning one in the future.
56% of Black non-owners say they could see themselves owning a gun one day, compared with smaller shares of White (48%), Hispanic (40%) and Asian (38%) non-owners.

Americans are evenly split over whether gun ownership does more to increase or decrease safety. About half (49%) say it does more to increase safety by allowing law-abiding citizens to protect themselves, but an equal share say gun ownership does more to reduce safety by giving too many people access to firearms and increasing misuse.

A bar chart that shows stark differences in views on whether gun ownership does more to increase or decrease safety in the U.S.

Republicans and Democrats differ on this question: 79% of Republicans say that gun ownership does more to increase safety, while a nearly identical share of Democrats (78%) say that it does more to reduce safety.

Urban and rural Americans also have starkly different views. Among adults who live in urban areas, 64% say gun ownership reduces safety, while 34% say it does more to increase safety. Among those who live in rural areas, 65% say gun ownership increases safety, compared with 33% who say it does more to reduce safety. Those living in the suburbs are about evenly split.

Americans increasingly say that gun violence is a major problem. Six-in-ten U.S. adults say gun violence is a very big problem in the country today, up 9 percentage points from spring 2022. In the survey conducted this June, 23% say gun violence is a moderately big problem, and about two-in-ten say it is either a small problem (13%) or not a problem at all (4%).

Looking ahead, 62% of Americans say they expect the level of gun violence to increase over the next five years. This is double the share who expect it to stay the same (31%). Just 7% expect the level of gun violence to decrease.

A line chart that shows a growing share of Americans say gun violence is a 'very big national problem.

A majority of Americans (61%) say it is too easy to legally obtain a gun in this country. Another 30% say the ease of legally obtaining a gun is about right, and 9% say it is too hard to get a gun. Non-gun owners are nearly twice as likely as gun owners to say it is too easy to legally obtain a gun (73% vs. 38%). Meanwhile, gun owners are more than twice as likely as non-owners to say the ease of obtaining a gun is about right (48% vs. 20%).

Partisan and demographic differences also exist on this question. While 86% of Democrats say it is too easy to obtain a gun legally, 34% of Republicans say the same. Most urban (72%) and suburban (63%) dwellers say it’s too easy to legally obtain a gun. Rural residents are more divided: 47% say it is too easy, 41% say it is about right and 11% say it is too hard.

A bar chart showing that about 6 in 10 Americans say it is too easy to legally obtain a gun in this country.

About six-in-ten U.S. adults (58%) favor stricter gun laws. Another 26% say that U.S. gun laws are about right, and 15% favor less strict gun laws. The percentage who say these laws should be stricter has fluctuated a bit in recent years. In 2021, 53% favored stricter gun laws, and in 2019, 60% said laws should be stricter.

A bar chart that shows women are more likely than men to favor stricter gun laws in the U.S.

About a third (32%) of parents with K-12 students say they are very or extremely worried about a shooting ever happening at their children’s school, according to a fall 2022 Center survey of parents with at least one child younger than 18. A similar share of K-12 parents (31%) say they are not too or not at all worried about a shooting ever happening at their children’s school, while 37% of parents say they are somewhat worried.

Among all parents with children under 18, including those who are not in school, 63% see improving mental health screening and treatment as a very or extremely effective way to prevent school shootings. This is larger than the shares who say the same about having police officers or armed security in schools (49%), banning assault-style weapons (45%), or having metal detectors in schools (41%). Just 24% of parents say allowing teachers and school administrators to carry guns in school would be a very or extremely effective approach, while half say this would be not too or not at all effective.

A pie chart that showing that 19% of K-12 parents are extremely worried about a shooting happening at their children's school.

There is broad partisan agreement on some gun policy proposals, but most are politically divisive, the June 2023 survey found . Majorities of U.S. adults in both partisan coalitions somewhat or strongly favor two policies that would restrict gun access: preventing those with mental illnesses from purchasing guns (88% of Republicans and 89% of Democrats support this) and increasing the minimum age for buying guns to 21 years old (69% of Republicans, 90% of Democrats). Majorities in both parties also oppose allowing people to carry concealed firearms without a permit (60% of Republicans and 91% of Democrats oppose this).

A dot plot showing bipartisan support for preventing people with mental illnesses from purchasing guns, but wide differences on other policies.

Republicans and Democrats differ on several other proposals. While 85% of Democrats favor banning both assault-style weapons and high-capacity ammunition magazines that hold more than 10 rounds, majorities of Republicans oppose these proposals (57% and 54%, respectively).

Most Republicans, on the other hand, support allowing teachers and school officials to carry guns in K-12 schools (74%) and allowing people to carry concealed guns in more places (71%). These proposals are supported by just 27% and 19% of Democrats, respectively.

Gun ownership is linked with views on gun policies. Americans who own guns are less likely than non-owners to favor restrictions on gun ownership, with a notable exception. Nearly identical majorities of gun owners (87%) and non-owners (89%) favor preventing mentally ill people from buying guns.

A dot plot that shows, within each party, gun owners are more likely than non-owners to favor expanded access to guns.

Within both parties, differences between gun owners and non-owners are evident – but they are especially stark among Republicans. For example, majorities of Republicans who do not own guns support banning high-capacity ammunition magazines and assault-style weapons, compared with about three-in-ten Republican gun owners.

Among Democrats, majorities of both gun owners and non-owners favor these two proposals, though support is greater among non-owners.

Note: This is an update of a post originally published on Jan. 5, 2016 .

Partisanship & Issues
Political Issues

About 1 in 4 U.S. teachers say their school went into a gun-related lockdown in the last school year

Striking findings from 2023, for most u.s. gun owners, protection is the main reason they own a gun, gun violence widely viewed as a major – and growing – national problem, what the data says about gun deaths in the u.s., most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 | Media Inquiries

Research Topics

Age & Generations
Coronavirus (COVID-19)
Economy & Work
Family & Relationships
Gender & LGBTQ
Immigration & Migration
International Affairs
Internet & Technology
Methodological Research
News Habits & Media
Non-U.S. Governments
Other Topics
Politics & Policy
Race & Ethnicity
Email Newsletters

ABOUT PEW RESEARCH CENTER Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

Terms & Conditions

Cookie Settings

Reprints, Permissions & Use Policy

IMAGES

A Step-by-Step Guide to the Data Analysis Process [2022]
What is Data Analysis in Research
Exploratory Data Analysis |Beginners Guide to Explanatory Data Analysis
Data Analysis: What it is + Free Guide with Examples
Data analysis
What Is Data Analysis In Research Process

VIDEO

How to Assess the Quantitative Data Collected from Questionnaire
Data Analysis
Module 3: Data Import and Package Installation in R Studio
NVIVO 14 Training Day-13: Thematic & Content Analysis
Why collect the data through Questionnaires || The Power of Questionnaires in Data Collection
DATA ANALYSIS

COMMENTS

A Step-by-Step Guide to the Data Analysis Process
Let's get started with step one. 1. Step one: Defining the question. The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the 'problem statement'. Defining your objective means coming up with a hypothesis and figuring how to test it.
Data Analysis in Research: Types & Methods
Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...
What is data analysis? Methods, techniques, types & how-to
A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.
Data Analysis
The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome. ... Market research: Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective ...
What Is the Data Analysis Process? (A Complete Guide)
The term "data analysis" can be a bit misleading, as it can seemingly imply that data analysis is a single step that's only conducted once. In actuality, data analysis is an iterative process. And while this is obvious to any experienced data analyst, it's important for aspiring data analysts, and those who are interested in a career in ...
Learning to Do Qualitative Data Analysis: A Starting Point
On the basis of Rocco (2010), Storberg-Walker's (2012) amended list on qualitative data analysis in research papers included the following: (a) the article should provide enough details so that reviewers could follow the same analytical steps; (b) the analysis process selected should be logically connected to the purpose of the study; and (c ...
What Is the Data Analysis Process? 5 Key Steps to Follow
Step 4: Perform data analysis. One of the last steps in the data analysis process is analyzing and manipulating the data. This can be done in a variety of ways. One way is through data mining, which is defined as "knowledge discovery within databases". Data mining techniques like clustering analysis, anomaly detection, association rule ...
Understanding the Data Analysis Process: A Step-by-Step Guide
The data analysis process is a systematic approach to analyzing data. The following are the key steps involved in the process: Define the problem or research question. Collect and organize the data. Clean and preprocess the data. Analyze and model the data. Interpret and communicate the results.
What Is Data Analysis? (With Examples)
Written by Coursera Staff • Updated on Apr 1, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...
The Beginner's Guide to Statistical Analysis
Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design. Writing statistical hypotheses. The goal of research is often to investigate a relationship between variables within a population. You start with a prediction ...
Data Analysis for Qualitative Research: 6 Step Guide
How to analyze qualitative data from an interview. To analyze qualitative data from an interview, follow the same 6 steps for quantitative data analysis: Perform the interviews. Transcribe the interviews onto paper. Decide whether to either code analytical data (open, axial, selective), analyze word frequencies, or both.
Quantitative Data Analysis: A Comprehensive Guide
Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis: Step 1: Data Collection. Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as ...
Introduction to Data Analysis
Data analysis can be quantitative, qualitative, or mixed methods. Quantitative research typically involves numbers and "close-ended questions and responses" (Creswell & Creswell, 2018, p. 3).Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures (Creswell & Creswell, 2018, p. 4).
A practical guide to data analysis in general literature reviews
Below we present a step-by-step guide for analysing data for two different types of research questions. The data analysis methods described here are based on basic content analysis as described by Elo and Kyngäs 4 and Graneheim and Lundman, 5 and the integrative review as described by Whittemore and Knafl, 6 but modified to be applicable to ...
Qualitative Data Analysis: Step-by-Step Guide (Manual vs ...
Step 1: Gather your qualitative data and conduct research (Conduct qualitative research) The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.
What Is Data Analysis: A Comprehensive Guide
The data analysis process is a structured sequence of steps that lead from raw data to actionable insights. Here are the answers to what is data analysis: Data Collection: ... Academic Research: Data analysis is crucial to scientific physics, biology, and environmental science research. It assists in interpreting experimental results and ...
An Introduction to Data Analysis
In this chapter, you take the first steps in the world of data analysis, learning in detail about all the concepts and processes that make up this discipline. ... The book uses the Python programming language and specialized libraries that contribute to the performance of the data analysis steps, from data research to data mining, to publishing ...
Data analysis
data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.Data analysis techniques are used to gain useful insights from datasets, which ...
PDF A Step-by-Step Guide to Qualitative Data Analysis
Step 1: Organizing the Data. "Valid analysis is immensely aided by data displays that are focused enough to permit viewing of a full data set in one location and are systematically arranged to answer the research question at hand." (Huberman and Miles, 1994, p. 432) The best way to organize your data is to go back to your interview guide.
A Step-by-Step Process of Thematic Analysis to Develop a Conceptual
Thematic analysis is a research method used to identify and interpret patterns or themes in a data set; it often leads to new insights and understanding (Boyatzis, 1998; Elliott, 2018; Thomas, 2006).However, it is critical that researchers avoid letting their own preconceptions interfere with the identification of key themes (Morse & Mitcham, 2002; Patton, 2015).
Mastering Qualitative Data Analysis: Step-by-Step Process & 5 Methods
Step 1: Define your qualitative research questions. The qualitative analysis research process starts with defining your research questions. It's important to be as specific as possible, as this will guide the way you choose to collect qualitative research data and the rest of your analysis. Examples are:
Preparing and Managing Qualitative Data
Therefore, preparing and managing your data is an essential part of the qualitative research process. Researchers must find ways to organize the voluminous quantities of data into a form that is useful and workable. This chapter will explore data management and data preparation as steps in the research process, steps that help facilitate data ...
Journal of Medical Internet Research
However, such an assessment is an important first step toward data-driven improvement to address existing health disparities. Journal of Medical Internet Research - Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis
Buffalo welfare: a literature review from 1992 to 2023 with a text
This kind of research may have reduced the number of papers recorded in our database. Another limitation could be due to the language of the data used (only English) or the choice to use only papers in which the abstract was available. For these reasons, out of 308 articles, we ended up with only 148 but we do hope for the most informative ones.
Key facts about Americans and guns
We used data from recent Center surveys to provide insights into Americans' views on gun policy and how those views have changed over time, as well as to examine the proportion of adults who own guns and their reasons for doing so. The analysis draws primarily from a survey of 5,115 U.S. adults conducted from June 5 to June 11, 2023.