A Step-by-Step Guide to the Data Analysis Process

Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it’s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny.

In this post, we’ll explore the main steps in the data analysis process. This will cover how to define your goal, collect data, and carry out an analysis. Where applicable, we’ll also use examples and highlight a few tools to make the journey easier. When you’re done, you’ll have a much better understanding of the basics. This will help you tweak the process to fit your own needs.

Here are the steps we’ll take you through:

  • Defining the question
  • Collecting the data
  • Cleaning the data
  • Analyzing the data
  • Sharing your results
  • Embracing failure

On popular request, we’ve also developed a video based on this article. Scroll further along this article to watch that.

Ready? Let’s get started with step one.

1. Step one: Defining the question

The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the ‘problem statement’.

Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, your organization’s senior management might pose an issue, such as: “Why are we losing customers?” It’s possible, though, that this doesn’t get to the core of the problem. A data analyst’s job is to understand the business and its goals in enough depth that they can frame the problem the right way.

Let’s say you work for a fictional company called TopNotch Learning. TopNotch creates custom training software for its clients. While it is excellent at securing new clients, it has much lower repeat business. As such, your question might not be, “Why are we losing customers?” but, “Which factors are negatively impacting the customer experience?” or better yet: “How can we boost customer retention while minimizing costs?”

Now you’ve defined a problem, you need to determine which sources of data will best help you solve it. This is where your business acumen comes in again. For instance, perhaps you’ve noticed that the sales process for new clients is very slick, but that the production team is inefficient. Knowing this, you could hypothesize that the sales process wins lots of new clients, but the subsequent customer experience is lacking. Could this be why customers don’t come back? Which sources of data will help you answer this question?

Tools to help define your objective

Defining your objective is mostly about soft skills, business knowledge, and lateral thinking. But you’ll also need to keep track of business metrics and key performance indicators (KPIs). Monthly reports can allow you to track problem points in the business. Some KPI dashboards come with a fee, like Databox and DashThis . However, you’ll also find open-source software like Grafana , Freeboard , and Dashbuilder . These are great for producing simple dashboards, both at the beginning and the end of the data analysis process.

2. Step two: Collecting the data

Once you’ve established your objective, you’ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. sales figures, or qualitative (descriptive) data, such as customer reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let’s explore each one.

What is first-party data?

First-party data are data that you, or your company, have directly collected from customers. It might come in the form of transactional tracking data or information from your company’s customer relationship management (CRM) system. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include customer satisfaction surveys, focus groups, interviews, or direct observation.

What is second-party data?

To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.

What is third-party data?

Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data .

Tools to help you collect data

Once you’ve devised a data strategy (i.e. you’ve identified which data you need, and how best to go about collecting them) there are many tools you can use to help you. One thing you’ll need, regardless of industry or area of expertise, is a data management platform (DMP). A DMP is a piece of software that allows you to identify and aggregate data from numerous sources, before manipulating them, segmenting them, and so on. There are many DMPs available. Some well-known enterprise DMPs include Salesforce DMP , SAS , and the data integration platform, Xplenty . If you want to play around, you can also try some open-source platforms like Pimcore or D:Swarm .

Want to learn more about what data analytics is and the process a data analyst follows? We cover this topic (and more) in our free introductory short course for beginners. Check out tutorial one: An introduction to data analytics .

3. Step three: Cleaning the data

Once you’ve collected your data, the next step is to get it ready for analysis. This means cleaning, or ‘scrubbing’ it, and is crucial in making sure that you’re working with high-quality data . Key data cleaning tasks include:

  • Removing major errors, duplicates, and outliers —all of which are inevitable problems when aggregating data from numerous sources.
  • Removing unwanted data points —extracting irrelevant observations that have no bearing on your intended analysis.
  • Bringing structure to your data —general ‘housekeeping’, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily.
  • Filling in major gaps —as you’re tidying up, you might notice that important data are missing. Once you’ve identified gaps, you can go about filling them.

A good data analyst will spend around 70-90% of their time cleaning their data. This might sound excessive. But focusing on the wrong data points (or analyzing erroneous data) will severely impact your results. It might even send you back to square one…so don’t rush it! You’ll find a step-by-step guide to data cleaning here . You may be interested in this introductory tutorial to data cleaning, hosted by Dr. Humera Noor Minhas.

Carrying out an exploratory analysis

Another thing many data analysts do (alongside cleaning data) is to carry out an exploratory analysis. This helps identify initial trends and characteristics, and can even refine your hypothesis. Let’s use our fictional learning company as an example again. Carrying out an exploratory analysis, perhaps you notice a correlation between how much TopNotch Learning’s clients pay and how quickly they move on to new suppliers. This might suggest that a low-quality customer experience (the assumption in your initial hypothesis) is actually less of an issue than cost. You might, therefore, take this into account.

Tools to help you clean your data

Cleaning datasets manually—especially large ones—can be daunting. Luckily, there are many tools available to streamline the process. Open-source tools, such as OpenRefine , are excellent for basic data cleaning, as well as high-level exploration. However, free tools offer limited functionality for very large datasets. Python libraries (e.g. Pandas) and some R packages are better suited for heavy data scrubbing. You will, of course, need to be familiar with the languages. Alternatively, enterprise tools are also available. For example, Data Ladder , which is one of the highest-rated data-matching tools in the industry. There are many more. Why not see which free data cleaning tools you can find to play around with?

4. Step four: Analyzing the data

Finally, you’ve cleaned your data. Now comes the fun bit—analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you’re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.

Descriptive analysis

Descriptive analysis identifies what has already happened . It is a common first step that companies carry out before proceeding with deeper explorations. As an example, let’s refer back to our fictional learning provider once more. TopNotch Learning might use descriptive analytics to analyze course completion rates for their customers. Or they might identify how many users access their products during a particular period. Perhaps they’ll use it to measure sales figures over the last five years. While the company might not draw firm conclusions from any of these insights, summarizing and describing the data will help them to determine how to proceed.

Learn more: What is descriptive analytics?

Diagnostic analysis

Diagnostic analytics focuses on understanding why something has happened . It is literally the diagnosis of a problem, just as a doctor uses a patient’s symptoms to diagnose a disease. Remember TopNotch Learning’s business problem? ‘Which factors are negatively impacting the customer experience?’ A diagnostic analysis would help answer this. For instance, it could help the company draw correlations between the issue (struggling to gain repeat business) and factors that might be causing it (e.g. project costs, speed of delivery, customer sector, etc.) Let’s imagine that, using diagnostic analytics, TopNotch realizes its clients in the retail sector are departing at a faster rate than other clients. This might suggest that they’re losing customers because they lack expertise in this sector. And that’s a useful insight!

Predictive analysis

Predictive analysis allows you to identify future trends based on historical data . In business, predictive analysis is commonly used to forecast future growth, for example. But it doesn’t stop there. Predictive analysis has grown increasingly sophisticated in recent years. The speedy evolution of machine learning allows organizations to make surprisingly accurate forecasts. Take the insurance industry. Insurance providers commonly use past data to predict which customer groups are more likely to get into accidents. As a result, they’ll hike up customer insurance premiums for those groups. Likewise, the retail industry often uses transaction data to predict where future trends lie, or to determine seasonal buying habits to inform their strategies. These are just a few simple examples, but the untapped potential of predictive analysis is pretty compelling.

Prescriptive analysis

Prescriptive analysis allows you to make recommendations for the future. This is the final step in the analytics part of the process. It’s also the most complex. This is because it incorporates aspects of all the other analyses we’ve described. A great example of prescriptive analytics is the algorithms that guide Google’s self-driving cars. Every second, these algorithms make countless decisions based on past and present data, ensuring a smooth, safe ride. Prescriptive analytics also helps companies decide on new products or areas of business to invest in.

Learn more:  What are the different types of data analysis?

5. Step five: Sharing your results

You’ve finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with the wider world (or at least with your organization’s stakeholders!) This is more complex than simply sharing the raw results of your work—it involves interpreting the outcomes, and presenting them in a manner that’s digestible for all types of audiences. Since you’ll often present information to decision-makers, it’s very important that the insights you present are 100% clear and unambiguous. For this reason, data analysts commonly use reports, dashboards, and interactive visualizations to support their findings.

How you interpret and present results will often influence the direction of a business. Depending on what you share, your organization might decide to restructure, to launch a high-risk product, or even to close an entire division. That’s why it’s very important to provide all the evidence that you’ve gathered, and not to cherry-pick data. Ensuring that you cover everything in a clear, concise way will prove that your conclusions are scientifically sound and based on the facts. On the flip side, it’s important to highlight any gaps in the data or to flag any insights that might be open to interpretation. Honest communication is the most important part of the process. It will help the business, while also helping you to excel at your job!

Tools for interpreting and sharing your findings

There are tons of data visualization tools available, suited to different experience levels. Popular tools requiring little or no coding skills include Google Charts , Tableau , Datawrapper , and Infogram . If you’re familiar with Python and R, there are also many data visualization libraries and packages available. For instance, check out the Python libraries Plotly , Seaborn , and Matplotlib . Whichever data visualization tools you use, make sure you polish up your presentation skills, too. Remember: Visualization is great, but communication is key!

You can learn more about storytelling with data in this free, hands-on tutorial .  We show you how to craft a compelling narrative for a real dataset, resulting in a presentation to share with key stakeholders. This is an excellent insight into what it’s really like to work as a data analyst!

6. Step six: Embrace your failures

The last ‘step’ in the data analytics process is to embrace your failures. The path we’ve described above is more of an iterative process than a one-way street. Data analytics is inherently messy, and the process you follow will be different for every project. For instance, while cleaning data, you might spot patterns that spark a whole new set of questions. This could send you back to step one (to redefine your objective). Equally, an exploratory analysis might highlight a set of data points you’d never considered using before. Or maybe you find that the results of your core analyses are misleading or erroneous. This might be caused by mistakes in the data, or human error earlier in the process.

While these pitfalls can feel like failures, don’t be disheartened if they happen. Data analysis is inherently chaotic, and mistakes occur. What’s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn’t be as interesting. Use the steps we’ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer back to the process to keep yourself on track.

In this post, we’ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst’s work:

  • Define the question —What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.
  • Collect data —Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?
  • Clean the data —Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don’t rush…take your time!
  • Analyze the data —Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
  • Share your results —How best can you share your insights and recommendations? A combination of visualization tools and communication is key.
  • Embrace your mistakes —Mistakes happen. Learn from them. This is what transforms a good data analyst into a great one.

What next? From here, we strongly encourage you to explore the topic on your own. Get creative with the steps in the data analysis process, and see what tools you can find. As long as you stick to the core principles we’ve described, you can create a tailored technique that works for you.

To learn more, check out our free, 5-day data analytics short course . You might also be interested in the following:

  • These are the top 9 data analytics tools
  • 10 great places to find free datasets for your next project
  • How to build a data analytics portfolio
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

data analysis in research steps

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

Employee listening strategy

Employee Listening Strategy: What it is & How to Build One

Jul 17, 2024

As your relationships grow, you’ll find that people will come to you for a different perspective or creative way to solve a problem, and it spirals from there.

Winning the Internal CX Battles — Tuesday CX Thoughts

Jul 16, 2024

Knowledge Management

Knowledge Management: What it is, Types, and Use Cases

Jul 12, 2024

Response Weighting: Enhancing Accuracy in Your Surveys

Response Weighting: Enhancing Accuracy in Your Surveys

Jul 11, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

Press ENTER to search or ESC to exit

Data Analysis 6 Steps: A Complete Guide Into Data Analysis Methodology

Data Analysis 6 Steps: A Complete Guide Into Data Analysis Methodology

We explore the 6 key steps in carrying out a data analysis process through examples and a comprehensive guide.

Despite being a science very much linked to technology, data analysis is still a science. Like any science, a data analysis process involves a rigorous and sequential procedure based on a series of steps that cannot be ignored. Discover the essential steps of a data analysis process through examples and a comprehensive guide.

pasos a seguir para llevar a cabo un análisis de datos

Often, when we talk about data analysis, we focus on the tools and technological knowledge associated with this scientific field which, although fundamental, are subordinate to the methodology of the data analysis process.

In this article we focus on the 6 essential steps of a data analysis process with examples and addressing the core points of the process' methodology : how to establish the objectives of the analysis , how to collect the data and how to perform the analysis . Each of the steps listed in this publication requires different expertise and knowledge. However, understanding the entire process is crucial to drawing meaningful conclusions.

Don't miss: The Role of Data Analytics in Business

On the other hand, it is important to note that an enterprise data analytics process depends on the maturity of the company's data strategy . Companies with a more developed data-driven culture will be able to conduct deeper, more complex and more efficient data analysis.

If you are interested in improving your corporate data strategy or in discovering how to design an efficient data strategy , we encourage you to download the e-book: "How to create a data strategy to leverage the business value of data" .

The 6 steps of a data analysis process in business

Step 1 of the data analysis process: define a specific objective.

definir un objetivo

The initial phase of any data analysis process is to define the specific objective of the analysis . That is, to establish what we want to achieve with the analysis. In the case of a business data analysis, our specific objective will be linked to a business goal and, as a consequence, to a performance indicator or KPI .

To define your objective effectively, you can formulate a hypothesis and define an evaluation strategy to test it. However, this step should always start from a crucial question:

What business objective do I want to achieve?

What business challenge am I trying to address?

While this process may seem simple, it is often more complicated than it first appears. For a data analytics process to be efficient, it is essential that the data analyst has a thorough understanding of the company's operations and business objectives .

Once the objective or problem we want to solve has been defined, the next step is to identify the data and data sources we need to achieve it. Again, this is where the business vision of the data analyst comes into play. Identifying the data sources that will provide the information to answer the question posed involves extensive knowledge of the business and its activity.

Bismart Tip: How to set the right objective?

Setting the objective of an analysis depends, in part, on our creative problem-solving skills and our level of knowledge about the field under study. However, in the case of a business data analysis, it is most effective to pay attention to established performance indicators and business metrics about the field of study we want to solve . Exploring the company's activity reports and dashboards will provide valuable information about the organisation's areas of interest.

Step 2 of the data analysis process: Data collection

fuente de datos

Once the objective has been defined, it is time to design a plan to obtain and consolidate the necessary data . At this point it is essential to identify the specific types of data you need, which can be quantitative (numerical data such as sales figures) or qualitative (descriptive data such as customer feedback).

On the other hand, you should also consider the typology of data in terms of the data source , which can be classified as: first-party data, second-party data and third-party data.

First-party data:

First-party data is the information that you or your organisation collects directly . It typically includes transactional tracking data or information obtained from your company's customer relationship management system, whether it is a CRM or a Customer Data Platform (CDP) .

Regardless of its source, first-party data is usually presented in a structured and well-organised way. Other sources of first-party data may include customer satisfaction surveys, feedback from focus groups, interviews or observational data.

Second-party data:

Second-party data is information that other organisations have directly collected . It can be understood as first-party data that has been collected for a different purpose than your analysis.

The main advantage of second-party data is that it is usually organised in a structured way. That is, it often is structured data that will make your work easier. It also tends to have a high degree of reliability. Examples of second-hand data include website, apps or social media activity, as well as online purchase or shipping data.

Third-party data:

Third-party data is information collected and consolidated from various sources by an external entity . Third-party data often comprises a wide range of unstructured data points. Many organisations collect data from third parties to generate industry reports or conduct a market research.

A specific example of third-party data collection is provided by the consultancy Gartner, which collects and distributes data of high business value to other companies.

Step 3 of the data analysis process: Data cleaning

limpieza de datos

Once we have collected the data we need, we need to prepare it for analysis. This involves a process known as data cleaning or consolidation, which is essential to ensure that the data we are working with is of quality .

The most common tasks in this part of the process are:

Eliminating significant errors, duplicated data and inconsistencies, which are inherent issues when aggregating data from different sources.

Getting rid of irrelevant data , i.e. extracting observations that are not relevant to the intended analysis.

Organising and structuring the data : performing general "cleaning" tasks, such as rectifying typographical errors or layout discrepancies, to facilitate data mapping and manipulation.

Fixing important gaps in the data : during the cleaning process, important missing data may be identified and should be remedied as soon as possible.

It is important to understand that this is the most time-consuming part of the process. In fact, it is estimated that a data analyst typically spends around 70-90% of their time cleaning data . If you are interested in learning more about the specific steps involved in this part of the process, you can read our post on data processing .

Bismart Tip: Resources to speed up data cleansing

Manually cleaning datasets can be a very time consuming task. Fortunately, there are several tools available to simplify this process. Open source tools such as OpenRefine are excellent options for basic data cleansing and even offer advanced scanning functions. However, free tools can have limitations when dealing with very large datasets. For more robust data cleaning, Python libraries such as Pandas and certain R packages are more suitable. Fluency in these programming languages is essential for their effective use.

Step 4 of the data analysis process: Data analysis

analizar los datos

Once the data has been cleaned and prepared, it is time to dive into the most exciting phase of the process, data analysis .

At this point, we should bear in mind that there are different types of data analysis and that the type of data analysis we choose will depend , to a large extent, on the objective of our analysis . On the other hand, there are also multiple techniques to carry out data analysis. Some of the best known are univariate or bivariate analysis, time series analysis and regression analysis.

In a broader context, all forms of data analysis fall into one of the following four categories.

Types of data analysis

Descriptive analysis.

Descriptive analysis is a type of analysis that explores past events . It is the first step that companies usually take before going into more in-depth investigations. 

Diagnostic analysis

Diagnostic analysis revolves around unravelling the "why" of something. In other words, the objective of this type of analysis is to discover the causes or reasons for an event of interest to the company.

Predictive analytics

The focus of predictive analytics is to forecast future trends based on historical data . In business, predictive analytics is becoming increasingly relevant.

Unlike the other types of analysis, predictive analytics is linked to artificial intelligence and, typically, to machine learning and deep learning . Recent advances in machine learning have significantly improved the accuracy of predictive analytics and it is now one of the most valued types of analysis by companies.

Predictive analytics enables a company's senior management to take high-value actions such as solving problems before they happen, anticipating future market trends or taking strategic actions ahead of the competition.

Prescriptive analysis

Prescriptive analysis is an evolution of the three types of analysis mentioned so far. It is a methodology that combines descriptive, diagnostic and predictive analytics to formulate recommendations for the future . In other words, it goes one step further than predictive analytics. Rather than simply explaining what will happen in the future, it offers the most appropriate courses of action based on what will happen. In business, prescriptive analytics can be very useful in determining new product projects or investment areas by aggregating information from other types of analytics.

An example of prescriptive analytics is the algorithms that guide Google's self-driving cars. These algorithms make a multitude of real-time decisions based on historical and current data, ensuring a safe and smooth journey. 

Step 5 of the data analysis process: Transforming results into reports or dashboards

report o cuadro de mando empresarial

Once the analysis is complete and conclusions have been drawn, the final stage of the data analysis process is to share these findings with a wider audience . In the case of a business data analysis, to the organisation's stakeholders.

This step requires interpreting the results and presenting them in an easily understandable way so that senior management can make data-driven decisions . It is therefore essential to convey clear, concise and unambiguous ideas. Data visualisation plays a key role in achieving this and data analysts frequently use reporting tools such as Power BI to transform data into interactive reports and dashboards to support their conclusions.

The interpretation and presentation of results significantly influences the trajectory of a company. In this regard, it is essential to provide a complete, clear and concise overview that demonstrates a scientific and fact-based methodology for the conclusions drawn. On the other hand, it is also critical to be honest and transparent and to share with stakeholders any doubts or unclear conclusions you may have about the analysis and its results.

The best data visualisation and reporting tools

If you want to delve deeper into this part of the data analysis process, don't miss our post on the best business intelligence tools .

However, we anticipate that Power BI has been proclaimed the leading BI and analytics platform in the market in 2023 by Gartner .

At Bismart, as a Microsoft Power BI partner , we have a large team of Power BI experts and, in addition, we also have our set of specific solutions to improve the productivity and performance of Power BI .

Recently, we have created an e-book in which we explore the keys for a company to develop an efficient self-service BI strategy with Power BI . Don't miss it!

Step 6 of the data analysis process: Transforming insights into actions and business opportunities

viaje

The final stage of a data analysis process involves turning the intelligence obtained into actions and business opportunities .

On the other hand, it is essential to be aware that a data analysis process is not a linear process, but rather a complex process full of ramifications . For example, during the data cleansing phase, you may identify patterns that raise new questions, leading you back to the first step of redefining your objectives. Similarly, an exploratory analysis may uncover a set of data that you had not previously considered. You may also discover that the results of your central analysis seem misleading or incorrect, perhaps due to inaccuracies in the data or human error earlier in the process.

Although these obstacles may seem like setbacks, it is essential not to become discouraged. Data analysis is intricate and setbacks are a natural part of the process.

In this article, we have delved i nto the key stages of a data analysis process , which, in brief, are as follows:

Defining the objective : Define the business challenge we intend to address. Formulating it as a question provides a structured approach to finding a clear solution.

Collect the data : Developing a strategy for gathering the data needed to answer our question and identifying the data sources most likely to have the information we need.

Clean the data : Drill down into the data, cleaning, organising and structuring it as necessary.

Analyse the data using one of four main types of data analysis : descriptive, diagnostic, predictive and prescriptive.

Disseminate findings : Choose the most effective means to disseminate our insights in a way that is clear, concise and encourages intelligent decision-making.

Learning from setbacks : Recognising and learning from mistakes is part of the journey. Challenges that arise during the process are learning opportunities that can also transform our analysis process into a more effective strategy.

Before you go...

Companies with a well-defined and efficient data strategy are much more likely to obtain truly useful business intelligence.

We encourage you to explore in more depth the steps to take to consolidate an enterprise data strategy through our e-book "How to create a data strategy" :

Keep up-to-date with the world of data!

Recent posts, the hotel industry 2024 - 2025 in 10 insights, hotel industry trends and forecasts 2024 - 2025: hotel market overview, data mesh: how to implement decentralized data management, how to implement a 360 customer view, dax queries: how to write and explain them using power bi copilot, explore more posts.

data analysis in research steps

What Is a Dashboard in Data Analytics and Business Intelligence?

Nowadays, almost all companies use dashboards to visually represent and track the performance of their business activity. Dashboards are a major tool...

data analysis in research steps

Microsoft Updates on Data Analysis Beyond Power BI

In recent months Microsoft has released several updates to its data analysis tools in response to the business transformation brought about by...

data analysis in research steps

9 Best Data Analysis Tools for Perfect Data Management

The importance of data analytics has continued to rise in recent years leading to an important worldwide market opening. So, data analysis tools have...

Covid Warning Banner

Employers, check out our Wage Subsidy Program!

The Five Stages of The Data Analysis Process

The Five Stages of The Data Analysis Process

The good news is that there’s a straightforward five-step process that can be followed to extract insights from data, identify new opportunities, and drive growth. And better yet, the ability to do so isn’t limited to data scientists or math geniuses. People across all disciplines and at all stages of their careers can develop the skills to analyze data. It’s useful whether one is looking to level up their career or move into an entirely new industry.

Woman holding a laptop and smiling.

Become a Data Analyst Professional in as little as 8 weeks!

No experience needed. Classes start soon and there's room for you.

Data analysis follows a detailed step-by-step process. In this post, we’ll walk you through this process to help you start a potential career in data science.

Jump to section:

  • Ask The Right Questions
  • Data Collection
  • Data Cleaning
  • Analyzing The Data
  • Interpreting The Results

Step One: Define Your Goals

Before you start collecting data, you need to first understand what you want to do with it. Take some time to think about a specific business problem you want to address or consider a hypothesis that could be solved with data. From there, you’ll create a set of measurable, clear, and concise goals that will help you solve this problem.

For example, an advertiser who wants to boost their client’s sales may ask if customers are likely to purchase from them after seeing an ad. Or an HR director who wants to reduce turnover might want to know why their top employees are leaving their company.

Starting with a clear objective is an essential step in the data analysis process. By recognizing the business problem that you want to solve and setting well-defined goals, it’ll be way easier to decide on the data you need to collect and analyze.

Get the latest insights on data analysis delivered straight to your inbox

Step Two: Data Collection

Now that you have a solid idea of what you want to accomplish, it’s time to define what type of data you need to find those answers, and where you’re going to source it. Whatever type of data you use, the end goal of this step is to make sure to have a complete, 360-degree view of the problem you want to solve. Data can be broken down into three types:

First Party Data

First-party, also known as 1P data is data that a company collects directly from customers. This data source improves your ability to engage with your customers. It also allows you to develop a data strategy to ensure that you are catering to your customer’s interests and needs.

  • Customer surveys
  • Purchase information
  • Customer interviews
  • In-store interactions

Second Party Data

Second-party data is first-party data given to you from a trusted partner or company. The additional benefit of this data set is that it can help you uncover more insights about your customers. This can help your company uncover budding trends and forecast future growth.

  • Social media activity
  • App activity
  • Website interactions

Third Party Data

Third-party data is any data collected by an organization or entity that doesn’t have a direct relationship with the individual the data is being collected from. This data consists of unstructured, semi-structured or structured data points also known as Big Data. Big Data is analyzed using machine learning and predictive analytics to build reports.

  • Open data repositories
  • Government resources

Whatever type of data you use, the end goal of this step is to make sure to have a complete, 360-degree view of the problem you want to solve.

Step Three: Data Cleaning

Now that you’ve collected and combined data from multiple sources, it’s time to polish the data to ensure it’s usable, readable, and actionable.

Data cleaning converts raw data into data that is suitable for analysis. This process involves removing incorrect data and checking for incompleteness or inconsistencies. Data cleaning is a vital step in the data analysis process because the accuracy of your analysis will depend on the quality of your data.

Step Four: Analyzing The Data

Now you’re ready for the fun stuff.

In this step, you’ll begin to make sense of your data to extract meaningful insights. There are many different data analysis techniques and processes that you can use. Let's explore the steps in a standard data analysis.

Data Analysis Steps & Techniques

1. exploratory analysis.

Exploratory data analysis seeks to uncover insights about your data before the analysis begins. This method will save you time as it will determine if your data is appropriate for the given problem. There are five goals of exploratory data analysis:

  • Uncover and resolve data quality issues such as missing data
  • Uncover high-level insights about your data set
  • Detect anomalies in your data set
  • Understand existing patterns and correlations between variables
  • Create new variables using your business knowledge

Tools and Software

2. Descriptive Analysis

Descriptive analysis seeks to answer the question, “What happened?”. This method will identify what is doing well and what is in need of improvement. It also lays the foundation for more advanced data analysis processes. For example, you own a clothing store that sells products that range from t-shirts to winter jackets. A descriptive analysis will tell you which products are your best and worst sellers.

3. Diagnostic Analysis

Diagnostic analysis seeks to answer the question, “Why did this happen?”. This method of analysis is the most abstract and involves detecting correlations between different variables. For example, your clothing store saw a decrease in revenue for t-shirt sales. A diagnostic analysis will look at the relationship between variables such as seasonality, the location of the t-shirts within the store, and social media engagement with t-shirt revenue to determine which one has the strongest correlation. In this case, you determined that seasonality had the biggest impact and you can make adjustments accordingly.

4. Predictive Analysis

Predictive analysis seeks to answer the question, “Will this happen again?”. This method of analysis determines what is going to happen in the future based on past data gathered. Your clothing store knows that t-shirt revenue will decrease in the winter months, but by how much? Predictive analysis will use your store’s historical data to create future revenue projections. This will give you an estimation of what your t-shirt revenue will be in the winter months.

5. Prescriptive Analysis

Prescriptive analysis seeks to answer the question, “What should we do?”. This method of analysis determines the best course of action based on previous analyses. The result is that you are able to take action according to future trends. Your clothing store is predicted to sell 50 t-shirts in December but you only have 40 t-shirts in your inventory. A prescriptive analysis will determine that you should order 15 more t-shirts. This will meet the predicted demand and create a buffer should the actual demand be higher

Interested in becoming a data analyst? Start your journey with our 8 week data analytics program.

Step five: visualizing the results.

After you’ve interpreted the results and drawn meaningful insights from them, the next step is to create data visualizations. Data visualization involves using several tools. Let's explore two popular tools that most data analysts use.

Popular Tools For Data Visualization

Tableau is arguably the most popular tool used to visualize data. It allows you to convert text or numerical information into an interactive visual dashboard. It also uses an API to deploy any machine learning models that you have developed.

Microsoft Power BI

Microsoft Power BI is another great tool for creating data visualizations. This software has features such as data warehousing, data discovery, and a cloud-based interface. This allows you to easily build visual dashboards.

If you want your findings to be implemented, you need to be able to present them to decision-makers and stakeholders in a manner that’s compelling and easy to comprehend. The best way to do this is through what’s called data storytelling, which involves turning your data into a compelling narrative. The goal of data storytelling is to propose a solution using appropriate business metrics that are directly related to your company’s key performance indicators.

Data is Everywhere

We live in a world that’s flooded with data. The ability to make sense of data isn’t limited to data scientists. With the right training, anyone can think like a data analyst and find the answers they need to tackle some of their biggest business problems.

As data continues to transform the way countless industries operate, there is an increase in demand for people with the skills to make the most of it. No matter your field—be it advertising, retail, healthcare, or beyond—mastering these five stages of data analysis will empower you to excel.

Begin your own data analysis with our free online Python course.

Learn Python Now

Or of you're ready to jump right in, join the Data Analytics Program to launch your career.

Begin Your Career Journey Here

How Big Data Is Fuelling The Future Workforce

Lighthouse and covid-19: an open letter from our ceo.

data analysis in research steps

  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Six Steps of Data Analysis Process

Data analysis is the methodical exploration and interpretation of data, underpins decision-making in today’s dynamic landscape. As the demand for skilled Data Analysts grows, understanding the six key steps in this process becomes imperative. From defining problems to presenting insights, each step plays a vital role in transforming raw data into actionable knowledge.

In this article let’s delve into the six essential steps of data analysis, emphasizing the significance of each phase in extracting meaningful conclusions.

What is Data Analysis?

The collection, transformation, and organization of data to draw conclusions make predictions for the future and make informed data-driven decisions is called Data Analysis . The profession that handles data analysis is called a Data Analyst.

There is a huge demand for Data Analysts as the data is expanding rapidly nowadays. Data Analysis is used to find possible solutions for a business problem. The advantage of being a Data Analyst is that they can work in any field they love healthcare, agriculture, IT, finance, business. Data-driven decision-making is an important part of Data Analysis. It makes the analysis process much easier. There are six steps for Data Analysis.

Steps for Data Analysis Process

Capture

  • Define the Problem or Research Question
  • Collect Data
  • Data Cleaning
  • Analyzing the Data
  • Presenting Data

Each step has its own process and tools to make overall conclusions based on the data. 

1. Define the Problem or Research Question

In the first step of process the data analyst is given a problem/business task. The analyst has to understand the task and the stakeholder’s expectations for the solution. A stakeholder is a person that has invested their money and resources to a project. The analyst must be able to ask different questions in order to find the right solution to their problem. The analyst has to find the root cause of the problem in order to fully understand the problem. The analyst must make sure that he/she doesn’t have any distractions while analyzing the problem. Communicate effectively with the stakeholders and other colleagues to completely understand what the underlying problem is. Questions to ask yourself for the Ask phase are: 

  • What are the problems that are being mentioned by my stakeholders?
  • What are their expectations for the solutions?

2. Collect Data

The second step is to Prepare or Collect the Data. This step includes collecting data and storing it for further analysis. The analyst has to collect the data based on the task given from multiple sources. The data has to be collected from various sources, internal or external sources. Internal data is the data available in the organization that you work for while external data is the data available in sources other than your organization. The data that is collected by an individual from their own resources is called first-party data. The data that is collected and sold is called second-party data. Data that is collected from outside sources is called third-party data. The common sources from where the data is collected are Interviews, Surveys, Feedback, Questionnaires. The collected data can be stored in a spreadsheet or SQL database. 

A spreadsheet is a digital worksheet that contains rows and columns while a database contains tables that have functions to manipulate the data. Spreadsheets are used to store some thousands or ten thousand of data while databases are used when there are too many rows to store. The best tools to store the data are MS Excel or Google Sheets in the case of Spreadsheets and there are so many databases like Oracle, Microsoft to store the data.

3. Data Cleaning  

The third step is Clean and Process Data . After the data is collected from multiple sources, it is time to clean the data. Clean data means data that is free from misspellings, redundancies, and irrelevance. Clean data largely depends on data integrity. There might be duplicate data or the data might not be in a format, therefore the unnecessary data is removed and cleaned. There are different functions provided by SQL and Excel to clean the data. This is one of the most important steps in Data Analysis as clean and formatted data helps in finding trends and solutions. The most important part of the Process phase is to check whether your data is biased or not. Bias is an act of favoring a particular group/community while ignoring the rest. Biasing is a big no-no as it might affect the overall data analysis. The data analyst must make sure to include every group while the data is being collected. 

4. Analyzing the Data

The fourth step is to Analyze . The cleaned data is used for analyzing and identifying trends. It also performs calculations and combines data for better results. The tools used for performing calculations are Excel or SQL. These tools provide in-built functions to perform calculations or sample code is written in SQL to perform calculations. Using Excel, we can create pivot tables and perform calculations while SQL creates temporary tables to perform calculations. Programming languages are another way of solving problems. They make it much easier to solve problems by providing packages. The most widely used programming languages for data analysis are R and Python.

5. Data Visualization

The fifth step is visualizing the data. Nothing is more compelling than a visualization. The data now transformed has to be made into a visual (chart, graph). The reason for making data visualizations is that there might be people, mostly stakeholders that are non-technical. Visualizations are made for a simple understanding of complex data. Tableau and Looker are the two popular tools used for compelling data visualizations. Tableau is a simple drag and drop tool that helps in creating compelling visualizations. Looker is a data viz tool that directly connects to the database and creates visualizations. Tableau and Looker are both equally used by data analysts for creating a visualization. R and Python have some packages that provide beautiful data visualizations. R has a package named ggplot which has a variety of data visualizations. A presentation is given based on the data findings. Sharing the insights with the team members and stakeholders will help in making better decisions. It helps in making more informed decisions and it leads to better outcomes. 

6. Presenting the Data

Presenting the data involves transforming raw information into a format that is easily comprehensible and meaningful for various stakeholders. This process encompasses the creation of visual representations, such as charts, graphs, and tables, to effectively communicate patterns, trends, and insights gleaned from the data analysis. The goal is to facilitate a clear understanding of complex information, making it accessible to both technical and non-technical audiences. Effective data presentation involves thoughtful selection of visualization techniques based on the nature of the data and the specific message intended. It goes beyond mere display to storytelling, where the presenter interprets the findings, emphasizes key points, and guides the audience through the narrative that the data unfolds. Whether through reports, presentations, or interactive dashboards, the art of presenting data involves balancing simplicity with depth, ensuring that the audience can easily grasp the significance of the information presented and use it for informed decision-making.

In conclusion, the data analysis processes the ability to distill complex information into clear, visual narratives empowers organizations to make informed decisions. Data-driven insights, effectively communicated, play a pivotal role in addressing business challenges and fostering continual improvement across various domains.

Frequently Asked Questions(FAQs)

1.what are the 5 methods of data analysis.

Descriptive, Inferential, Diagnostic, Predictive, and Prescriptive are five common methods used in data analysis to derive meaningful insights.

2.What are the 5 levels of data analysis?

Data collection, Data cleaning, Exploratory Data Analysis (EDA), Modeling, and Interpretation are the five levels involved in the data analysis process.

3.What are the 4 stages of data analysis?

Collection, Processing, Analysis, and Interpretation are the four key stages in the data analysis process, leading to informed decision-making and insights.

4.What are the 5 processes of data analysis?

Data collection, Data cleaning, Data analysis, Data interpretation, and Data presentation constitute the five fundamental processes in effective data analysis workflows.

5.What is the process of data analysis?

The process involves defining the problem, collecting and cleaning data, analyzing patterns, visualizing insights, and presenting findings, facilitating informed decision-making and problem resolution.

Please Login to comment...

Similar reads.

  • data-science

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

data analysis in research steps

What is Data Analysis? (Types, Methods, and Tools)

' src=

  • Couchbase Product Marketing December 17, 2023

Data analysis is the process of cleaning, transforming, and interpreting data to uncover insights, patterns, and trends. It plays a crucial role in decision making, problem solving, and driving innovation across various domains. 

In addition to further exploring the role data analysis plays this blog post will discuss common data analysis techniques, delve into the distinction between quantitative and qualitative data, explore popular data analysis tools, and discuss the steps involved in the data analysis process. 

By the end, you should have a deeper understanding of data analysis and its applications, empowering you to harness the power of data to make informed decisions and gain actionable insights.

Why is Data Analysis Important?

Data analysis is important across various domains and industries. It helps with:

  • Decision Making : Data analysis provides valuable insights that support informed decision making, enabling organizations to make data-driven choices for better outcomes.
  • Problem Solving : Data analysis helps identify and solve problems by uncovering root causes, detecting anomalies, and optimizing processes for increased efficiency.
  • Performance Evaluation : Data analysis allows organizations to evaluate performance, track progress, and measure success by analyzing key performance indicators (KPIs) and other relevant metrics.
  • Gathering Insights : Data analysis uncovers valuable insights that drive innovation, enabling businesses to develop new products, services, and strategies aligned with customer needs and market demand.
  • Risk Management : Data analysis helps mitigate risks by identifying risk factors and enabling proactive measures to minimize potential negative impacts.

By leveraging data analysis, organizations can gain a competitive advantage, improve operational efficiency, and make smarter decisions that positively impact the bottom line.

Quantitative vs. Qualitative Data

In data analysis, you’ll commonly encounter two types of data: quantitative and qualitative. Understanding the differences between these two types of data is essential for selecting appropriate analysis methods and drawing meaningful insights. Here’s an overview of quantitative and qualitative data:

Quantitative Data

Quantitative data is numerical and represents quantities or measurements. It’s typically collected through surveys, experiments, and direct measurements. This type of data is characterized by its ability to be counted, measured, and subjected to mathematical calculations. Examples of quantitative data include age, height, sales figures, test scores, and the number of website users.

Quantitative data has the following characteristics:

  • Numerical : Quantitative data is expressed in numerical values that can be analyzed and manipulated mathematically.
  • Objective : Quantitative data is objective and can be measured and verified independently of individual interpretations.
  • Statistical Analysis : Quantitative data lends itself well to statistical analysis. It allows for applying various statistical techniques, such as descriptive statistics, correlation analysis, regression analysis, and hypothesis testing.
  • Generalizability : Quantitative data often aims to generalize findings to a larger population. It allows for making predictions, estimating probabilities, and drawing statistical inferences.

Qualitative Data

Qualitative data, on the other hand, is non-numerical and is collected through interviews, observations, and open-ended survey questions. It focuses on capturing rich, descriptive, and subjective information to gain insights into people’s opinions, attitudes, experiences, and behaviors. Examples of qualitative data include interview transcripts, field notes, survey responses, and customer feedback.

Qualitative data has the following characteristics:

  • Descriptive : Qualitative data provides detailed descriptions, narratives, or interpretations of phenomena, often capturing context, emotions, and nuances.
  • Subjective : Qualitative data is subjective and influenced by the individuals’ perspectives, experiences, and interpretations.
  • Interpretive Analysis : Qualitative data requires interpretive techniques, such as thematic analysis, content analysis, and discourse analysis, to uncover themes, patterns, and underlying meanings.
  • Contextual Understanding : Qualitative data emphasizes understanding the social, cultural, and contextual factors that shape individuals’ experiences and behaviors.
  • Rich Insights : Qualitative data enables researchers to gain in-depth insights into complex phenomena and explore research questions in greater depth.

In summary, quantitative data represents numerical quantities and lends itself well to statistical analysis, while qualitative data provides rich, descriptive insights into subjective experiences and requires interpretive analysis techniques. Understanding the differences between quantitative and qualitative data is crucial for selecting appropriate analysis methods and drawing meaningful conclusions in research and data analysis.

Types of Data Analysis

Different types of data analysis techniques serve different purposes. In this section, we’ll explore four types of data analysis: descriptive, diagnostic, predictive, and prescriptive, and go over how you can use them.

Descriptive Analysis

Descriptive analysis involves summarizing and describing the main characteristics of a dataset. It focuses on gaining a comprehensive understanding of the data through measures such as central tendency (mean, median, mode), dispersion (variance, standard deviation), and graphical representations (histograms, bar charts). For example, in a retail business, descriptive analysis may involve analyzing sales data to identify average monthly sales, popular products, or sales distribution across different regions.

Diagnostic Analysis

Diagnostic analysis aims to understand the causes or factors influencing specific outcomes or events. It involves investigating relationships between variables and identifying patterns or anomalies in the data. Diagnostic analysis often uses regression analysis, correlation analysis, and hypothesis testing to uncover the underlying reasons behind observed phenomena. For example, in healthcare, diagnostic analysis could help determine factors contributing to patient readmissions and identify potential improvements in the care process.

Predictive Analysis

Predictive analysis focuses on making predictions or forecasts about future outcomes based on historical data. It utilizes statistical models, machine learning algorithms, and time series analysis to identify patterns and trends in the data. By applying predictive analysis, businesses can anticipate customer behavior, market trends, or demand for products and services. For example, an e-commerce company might use predictive analysis to forecast customer churn and take proactive measures to retain customers.

Prescriptive Analysis

Prescriptive analysis takes predictive analysis a step further by providing recommendations or optimal solutions based on the predicted outcomes. It combines historical and real-time data with optimization techniques, simulation models, and decision-making algorithms to suggest the best course of action. Prescriptive analysis helps organizations make data-driven decisions and optimize their strategies. For example, a logistics company can use prescriptive analysis to determine the most efficient delivery routes, considering factors like traffic conditions, fuel costs, and customer preferences.

In summary, data analysis plays a vital role in extracting insights and enabling informed decision making. Descriptive analysis helps understand the data, diagnostic analysis uncovers the underlying causes, predictive analysis forecasts future outcomes, and prescriptive analysis provides recommendations for optimal actions. These different data analysis techniques are valuable tools for businesses and organizations across various industries.

Data Analysis Methods

In addition to the data analysis types discussed earlier, you can use various methods to analyze data effectively. These methods provide a structured approach to extract insights, detect patterns, and derive meaningful conclusions from the available data. Here are some commonly used data analysis methods:

Statistical Analysis 

Statistical analysis involves applying statistical techniques to data to uncover patterns, relationships, and trends. It includes methods such as hypothesis testing, regression analysis, analysis of variance (ANOVA), and chi-square tests. Statistical analysis helps organizations understand the significance of relationships between variables and make inferences about the population based on sample data. For example, a market research company could conduct a survey to analyze the relationship between customer satisfaction and product price. They can use regression analysis to determine whether there is a significant correlation between these variables.

Data Mining

Data mining refers to the process of discovering patterns and relationships in large datasets using techniques such as clustering, classification, association analysis, and anomaly detection. It involves exploring data to identify hidden patterns and gain valuable insights. For example, a telecommunications company could analyze customer call records to identify calling patterns and segment customers into groups based on their calling behavior. 

Text Mining

Text mining involves analyzing unstructured data , such as customer reviews, social media posts, or emails, to extract valuable information and insights. It utilizes techniques like natural language processing (NLP), sentiment analysis, and topic modeling to analyze and understand textual data. For example, consider how a hotel chain might analyze customer reviews from various online platforms to identify common themes and sentiment patterns to improve customer satisfaction.

Time Series Analysis

Time series analysis focuses on analyzing data collected over time to identify trends, seasonality, and patterns. It involves techniques such as forecasting, decomposition, and autocorrelation analysis to make predictions and understand the underlying patterns in the data.

For example, an energy company could analyze historical electricity consumption data to forecast future demand and optimize energy generation and distribution.

Data Visualization

Data visualization is the graphical representation of data to communicate patterns, trends, and insights visually. It uses charts, graphs, maps, and other visual elements to present data in a visually appealing and easily understandable format. For example, a sales team might use a line chart to visualize monthly sales trends and identify seasonal patterns in their sales data.

These are just a few examples of the data analysis methods you can use. Your choice should depend on the nature of the data, the research question or problem, and the desired outcome.

How to Analyze Data

Analyzing data involves following a systematic approach to extract insights and derive meaningful conclusions. Here are some steps to guide you through the process of analyzing data effectively:

Define the Objective : Clearly define the purpose and objective of your data analysis. Identify the specific question or problem you want to address through analysis.

Prepare and Explore the Data : Gather the relevant data and ensure its quality. Clean and preprocess the data by handling missing values, duplicates, and formatting issues. Explore the data using descriptive statistics and visualizations to identify patterns, outliers, and relationships.

Apply Analysis Techniques : Choose the appropriate analysis techniques based on your data and research question. Apply statistical methods, machine learning algorithms, and other analytical tools to derive insights and answer your research question.

Interpret the Results : Analyze the output of your analysis and interpret the findings in the context of your objective. Identify significant patterns, trends, and relationships in the data. Consider the implications and practical relevance of the results.

Communicate and Take Action : Communicate your findings effectively to stakeholders or intended audiences. Present the results clearly and concisely, using visualizations and reports. Use the insights from the analysis to inform decision making.

Remember, data analysis is an iterative process, and you may need to revisit and refine your analysis as you progress. These steps provide a general framework to guide you through the data analysis process and help you derive meaningful insights from your data.

Data Analysis Tools

Data analysis tools are software applications and platforms designed to facilitate the process of analyzing and interpreting data . These tools provide a range of functionalities to handle data manipulation, visualization, statistical analysis, and machine learning. Here are some commonly used data analysis tools:

Spreadsheet Software

Tools like Microsoft Excel, Google Sheets, and Apple Numbers are used for basic data analysis tasks. They offer features for data entry, manipulation, basic statistical functions, and simple visualizations.

Business Intelligence (BI) Platforms

BI platforms like Microsoft Power BI, Tableau, and Looker integrate data from multiple sources, providing comprehensive views of business performance through interactive dashboards, reports, and ad hoc queries.

Programming Languages and Libraries

Programming languages like R and Python, along with their associated libraries (e.g., NumPy, SciPy, scikit-learn), offer extensive capabilities for data analysis. They provide flexibility, customizability, and access to a wide range of statistical and machine-learning algorithms.

Cloud-Based Analytics Platforms

Cloud-based platforms like Google Cloud Platform (BigQuery, Data Studio), Microsoft Azure (Azure Analytics, Power BI), and Amazon Web Services (AWS Analytics, QuickSight) provide scalable and collaborative environments for data storage, processing, and analysis. They have a wide range of analytical capabilities for handling large datasets.

Data Mining and Machine Learning Tools

Tools like RapidMiner, KNIME, and Weka automate the process of data preprocessing, feature selection, model training, and evaluation. They’re designed to extract insights and build predictive models from complex datasets.

Text Analytics Tools

Text analytics tools, such as Natural Language Processing (NLP) libraries in Python (NLTK, spaCy) or platforms like RapidMiner Text Mining Extension, enable the analysis of unstructured text data . They help extract information, sentiment, and themes from sources like customer reviews or social media.

Choosing the right data analysis tool depends on analysis complexity, dataset size, required functionalities, and user expertise. You might need to use a combination of tools to leverage their combined strengths and address specific analysis needs. 

By understanding the power of data analysis, you can leverage it to make informed decisions, identify opportunities for improvement, and drive innovation within your organization. Whether you’re working with quantitative data for statistical analysis or qualitative data for in-depth insights, it’s important to select the right analysis techniques and tools for your objectives.

To continue learning about data analysis, review the following resources:

  • What is Big Data Analytics?
  • Operational Analytics
  • JSON Analytics + Real-Time Insights
  • Database vs. Data Warehouse: Differences, Use Cases, Examples
  • Couchbase Capella Columnar Product Blog
  • Posted in: Analytics , Application Design , Best Practices and Tutorials
  • Tagged in: data analytics , data visualization , time series

Posted by Couchbase Product Marketing

Leave a reply cancel reply.

You must be logged in to post a comment.

Check your inbox or spam folder to confirm your subscription.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12
30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00
653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

logo image missing

  • > Big Data

5 Steps of Data Analysis

  • Mallika Rangaiah
  • May 04, 2021

5 Steps of Data Analysis title banner

A critical point of concern when it comes to research is not just the dearth of data but also scenarios where they might just be too much data at their behest, which becomes the case for many government agencies and businesses. The overwhelmingly high level of information generally leads to lack of clarity and confusion. 

With a massive level of data available for them to arrange, data analysts generally need to focus on determining if the data is helpful to them, drawing precise conclusions through that data and finally using that data to shape their decision making process. 

It's fascinating how the right data analysis process and tools can serve as the powerful weapon that makes an ocean of cluttered information become a piece of cake to sort and comprehend. 

A range of data visualization tools come to use in the data analysis process as per varying levels of experience. These include Infogram, DataBox, Data wrapper, Google Charts, Chartblocks and Tableau .

Steps of Data Analysis

Below are 5 data analysis steps which can be implemented in the data analysis process by the data analyst. 

Step 1 - Determining the objective

The initial step is ofcourse to determine our objective, which can also be termed as a “problem statement” . 

This step is all about determining a hypothesis and calculating how it can be tested.  Certain questions emerge in mind here, such as determining the business issue that the person is attempting to resolve. This question, the one the whole analysis would be based upon is extremely crucial. If the senior management of the business raises the question regarding the decline of customers. 

For example, if the issue of losing customers is raised, the focus of a data analyst is to comprehend the root of the issue by getting an idea regarding the business and its goals so that the issue can be defined in a proper manner.  

For instance, let’s assume we work at a fictional firm termed Prestos Knowledge and Learning that produces custom training softwares for its customers. Although the firm excels when it comes to gaining fresh clients, yet it fails to secure constant business with them, raising the question of not just why it is facing loss of customers but also about the aspects which adversely affect the customer experience and how we can enhance consumer retention while curtailing the expenses. 

Upon the issue getting defined, it is essential to conclude which data sources can aid in resolving it. For example, you may note that the platform has a smooth sales process but a weak customer experience owing to which customers fail to return to avail its services. So the question of which data sources can play a role in responding to this issue gets focused on here. 

While this step is all about making use of lateral thinking, soft skills and business knowledge, yet that doesn’t mean it doesn’t require tools. To keep track of our key performance indicators (KPIs) and business metrics, tools and softwares need to be put to use. For instance, KPI dashboards like DataBox or open source softwares such as Dashbuilder can be useful for generating easy dashboards, towards the start and end of data analysis processes. 

Step two: Gathering the data

Once the objective has been set up, the analyst needs to work on gathering and arranging the suitable data. This makes defining the required data a prerequisite. This can be either qualitative or quantitative data. Each of the data is primarily arranged into three categories, namely first party, second party and third party data.

1. First-party data

First-party data is basically the data which the user, or their company has directly gathered from its customers. This can either be the data gathered via the customer relationship management system of the company or it can be transactional tracking data. 

Wherever the data is generated it is generally organized and structured. Remaining first data sources can include the subscription data, social data, data gathered from interviews, focus groups, surveys regarding consumer satisfaction etc. This data is useful for predicting future patterns and gaining audience insights.

2. Second-party data

This data is primarily the first-party data gathered from other companies. This might be available directly from the company or through a private marketplace. It can include data from similar sources as first party data like website activity, customer surveys, social media activity etc.  

This data can be used for reaching new audiences and predicting behaviors. It offers the advantage of being generally structured and dependable.

3. Third-party data

This is the data that has been gathered and separated via multiple sources through a third party organisation. This is often largely unstructured and is collected by many companies for generating industry reports and for conducting marketing analytics and research. Examples of this data include, email address, postal address, phone numbers, social media handles, purchase history and website browsing activities of the customers. 

Other examples of this form of data include Open data repositories and government portals. 

Once the analyst has determined the data he needs and how to gather it,  many useful tools are put to work. Speaking of tools, data management platforms (DMP), is one of the first softwares that comes to mind. This is a software that enables the user to detect and accumulate data through a number of sources, prior to shaping and separating it. Examples of this software include Xplenty or Salesforce DMP.  

Recommended blog - Business Analysis Tools

These are the 5 data analysis steps which can be implemented in the data analysis process. Step 1 - Determining the objective Step 2 - Gathering the data Step 3 - Cleaning the data Step 4 - Interpreting the data Step 5 - Sharing the results

Data Analysis Steps

Step three: Cleaning the data

Once the data has been collected, we prepare to execute the analysis which involves cleaning and scrubbing the data and ensuring that its quality remains unmarred. The primary duties involved in cleaning the data include :

Getting rid of errors, replicas, and deviation issues that are encountered while the data is aggregated from multiple sources. 

Getting rid of nonessential data points, and picking out nonrelevant observations that are not related to the proposed analysis.

Giving the data structure by managing any layout problems, or typos and helping in mapping and maneuvering the data in a simple manner. 

Replenishing the breach by identifying and filling the gaps while cleaning.

It is important to ensure that the proper data points are analyzed so that the results are not influenced by the wrong points.

Exploratory analysis

Along with cleaning the data, this step also involves executing an exploratory analysis. This aids in detecting any initial trends and to reshape the analyst’s hypotheses. For instance, if we take the example of Prestos Knowledge and Learning, an exploratory analysis of the platform can offer a correlation between the amount that Prestos’s clients pay and how swiftly they divert on to other suppliers to determine the quality of its customer experience. This can lead to Prestos Knowledge and Learning reshaping its hypotheses and focusing on other factors. 

Uncluttering datasets using the conventional approaches can be quite a hassle, but that’s not the case with tools designed for this purpose coming to the rescue. For instance Open-source tools, such as OpenRefine, Trifacta Wrangler and Drake are useful tools that help in maintaining clean and consistent data. Yet when it comes to rigorous scrubbing of the data, R Packages and Python libraries come to the fore. 

Step four: Interpreting the data

Once the data has been cleaned, we focus on analyzing this cleaned data. The approach we take up for analyzing this data relies on our aim. Be it time series analysis, regression analysis or univariate and bivariate analysis, there’s plenty of data analysis types at our behest. Applying them is the real task. This would largely depend on what we hope to achieve by this analysis. The different types of data analysis can be put under four categories. 

1. Descriptive analysis

This form of analysis determines what has already taken place. This is normally carried out prior to the analyst exploring deeper into the issue. For instance, if we take the example of Prestos Knowledge and Learning again, the platform might utilize descriptive analytics to detect the number of users accessing their product during a certain period. They might use it for measuring sales figures in the past couple of years. Even if concrete decisions may not be undertaken through these insights, compiling and expressing the data, will aid them in concluding how to advance. 

2. Diagnostic analysis

This form of analytics is focused on comprehending why a certain issue has taken place, rather as the name suggests, it is the diagnosis of the issue. If we bring up the example of Prestos Knowledge and Learning again, the primary focus of the platform was on determining which factors adversely affect its customer experience. This issue can be resolved through a diagnostic analysis. 

For example, the analytics can aid the platform in making correlation between the main issue and what aspects could be triggering it. These aspects could range from the delivery speed to the project expenses. 

3. Predictive analysis

This form of analysis enables the analyst to detect future trends and forecast future growth on the basis of historical data. It has recently evolved over the years with the evolution of technology. For instance, the insurance industry providers generally make use of past records to forecast which of their clients have the probability of encountering accidents. Through these records they raise the insurance premium of those clients.  

Recommended blog - Business Intelligence and Analytics Trends

4. Prescriptive analysis

This form of analysis allows its users to make future recommendations. Being the final step in the analytics process, it includes all analysed aspects previously mentioned.  It suggests many courses of action and highlights their possible consequences. 

CenterLight Healthcare adopts prescriptive analytics to cut down the uncertain element in case of patient appointing and care. This form of analysis aids the organization in discovering the most suitable times for scheduling check-up appointments and treatments to avoid afflicting their patients, and also ensuring the health and security of the patient. 

Step five: Sharing the results

Once the analyst has concluded their analyses and derived their insights, the last step in the data analysis process is for sharing insights with the people concerned. Being more complicated than merely the disclosure of work results it is also concerned with deciphering the results and exhibiting them in an easy manner. 

It is crucial to ensure that the insights have clarity and are explicit. Owing to this, data analysts generally adopt reports, dashboards, and interactive visualizations for supplementing their discoveries. 

How the results are deciphered and exhibited has a significant impact on the course of a business. On the basis of what the analyst discloses, the decision is made regarding restructuring, launching of risky products and if a division is to be shut down. 

This makes it essential to supply all the collected evidence and to make sure that everything is covered in a proper, compact manner on the basis of evidence and facts. At the same time, it has to be ensured that all breach in the data or ambiguous data is highlighted.

These are the 5 primary steps involved in data analysis. With a massive range of data being produced by businesses each day, many sections of it still remain untouched. This data is put to use through data analysis which aids businesses in deriving relevant insights and plays a powerful role in determining their decisions. 

Share Blog :

data analysis in research steps

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

An Overview of Descriptive Analysis

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

data analysis in research steps

tractioncatalyst

loved your article, great way of defining the steps involved in data analysis.......<a href="https://dataanalysis.ie">data analysis</a>

data analysis in research steps

Diane Austin

GET RICH WITH BLANK ATM CARD, Whatsapp: +18033921735 I want to testify about Dark Web blank atm cards which can withdraw money from any atm machines around the world. I was very poor before and have no job. I saw so many testimony about how Dark Web Online Hackers send them the atm blank card and use it to collect money in any atm machine and become rich {[email protected]} I email them also and they sent me the blank atm card. I have use it to get 500,000 dollars. withdraw the maximum of 5,000 USD daily. Dark Web is giving out the card just to help the poor. Hack and take money directly from any atm machine vault with the use of atm programmed card which runs in automatic mode. You can also contact them for the service below * Western Union/MoneyGram Transfer * Bank Transfer * PayPal / Skrill Transfer * Crypto Mining * CashApp Transfer * Bitcoin Loans * Recover Stolen/Missing Crypto/Funds/Assets Email: [email protected] Telegram or WhatsApp: +18033921735 Website: https://darkwebonlinehackers.com

magretpaul6

I recently recovered back about 145k worth of Usdt from greedy and scam broker with the help of Mr Koven Gray a binary recovery specialist, I am very happy reaching out to him for help, he gave me some words of encouragement and told me not to worry, few weeks later I was very surprise of getting my lost fund in my account after losing all hope, he is really a blessing to this generation, and this is why I’m going to recommend him to everyone out there ready to recover back their lost of stolen asset in binary option trade. Contact him now via email at [email protected] or WhatsApp +1 218 296 6064.

data analysis in research steps

Analyst Answers

Data & Finance for Work & Life

man doing qualitative research

Data Analysis for Qualitative Research: 6 Step Guide

Data analysis for qualitative research is not intuitive. This is because qualitative data stands in opposition to traditional data analysis methodologies: while data analysis is concerned with quantities, qualitative data is by definition unquantified . But there is an easy, methodical approach that anyone can take use to get reliable results when performing data analysis for qualitative research. The process consists of 6 steps that I’ll break down in this article:

  • Perform interviews(if necessary )
  • Gather all documents and transcribe any non-paper records
  • Decide whether to either code analytical data, analyze word frequencies, or both
  • Decide what interpretive angle you want to take: content analysis , narrative analysis, discourse analysis, framework analysis, and/or grounded theory
  • Compile your data in a spreadsheet using document saving techniques (windows and mac)
  • Identify trends in words, themes, metaphors, natural patterns, and more

To complete these steps, you will need:

  • Microsoft word
  • Microsoft excel
  • Internet access

You can get the free Intro to Data Analysis eBook to cover the fundamentals and ensure strong progression in all your data endeavors.

What is qualitative research?

Qualitative research is not the same as quantitative research. In short, qualitative research is the interpretation of non-numeric data. It usually aims at drawing conclusions that explain why a phenomenon occurs, rather than that one does occur. Here’s a great quote from a nursing magazine about quantitative vs qualitative research:

“A traditional quantitative study… uses a predetermined (and auditable) set of steps to confirm or refute [a] hypothesis. “In contrast, qualitative research often takes the position that an interpretive understanding is only possible by way of uncovering or deconstructing the meanings of a phenomenon. Thus, a distinction between explaining how something operates (explanation) and why it operates in the manner that it does (interpretation) may be [an] effective way to distinguish quantitative from qualitative analytic processes involved in any particular study.” (bold added) (( EBN ))

Learn to Interpret Your Qualitative Data

This article explain what data analysis is and how to do it. To learn how to interpret the results, visualize, and write an insightful report, sign up for our handbook below.

data analysis in research steps

Step 1a: Data collection methods and techniques in qualitative research: interviews and focus groups

Step 1 is collecting the data that you will need for the analysis. If you are not performing any interviews or focus groups to gather data, then you can skip this step. It’s for people who need to go into the field and collect raw information as part of their qualitative analysis.

Since the whole point of an interview and of qualitative analysis in general is to understand a research question better, you should start by making sure you have a specific, refined research question . Whether you’re a researcher by trade or a data analyst working on one-time project, you must know specifically what you want to understand in order to get results.

Good research questions are specific enough to guide action but open enough to leave room for insight and growth. Examples of good research questions include:

  • Good : To what degree does living in a city impact the quality of a person’s life? (open-ended, complex)
  • Bad : Does living in a city impact the quality of a person’s life? (closed, simple)

Once you understand the research question, you need to develop a list of interview questions. These questions should likewise be open-ended and provide liberty of expression to the responder. They should support the research question in an active way without prejudicing the response. Examples of good interview questions include:

  • Good : Tell me what it’s like to live in a city versus in the country. (open, not leading)
  • Bad : Don’t you prefer the city to the country because there are more people? (closed, leading)

Some additional helpful tips include:

  • Begin each interview with a neutral question to get the person relaxed
  • Limit each question to a single idea
  • If you don’t understand, ask for clarity
  • Do not pass any judgements
  • Do not spend more than 15m on an interview, lest the quality of responses drop

Focus groups

The alternative to interviews is focus groups. Focus groups are a great way for you to get an idea for how people communicate their opinions in a group setting, rather than a one-on-one setting as in interviews.

In short, focus groups are gatherings of small groups of people from representative backgrounds who receive instruction, or “facilitation,” from a focus group leader. Typically, the leader will ask questions to stimulate conversation, reformulate questions to bring the discussion back to focus, and prevent the discussion from turning sour or giving way to bad faith.

Focus group questions should be open-ended like their interview neighbors, and they should stimulate some degree of disagreement. Disagreement often leads to valuable information about differing opinions, as people tend to say what they mean if contradicted.

However, focus group leaders must be careful not to let disagreements escalate, as anger can make people lie to be hurtful or simply to win an argument. And lies are not helpful in data analysis for qualitative research.

Step 1b: Tools for qualitative data collection

When it comes to data analysis for qualitative analysis, the tools you use to collect data should align to some degree with the tools you will use to analyze the data.

As mentioned in the intro, you will be focusing on analysis techniques that only require the traditional Microsoft suite programs: Microsoft Excel and Microsoft Word . At the same time, you can source supplementary tools from various websites, like Text Analyzer and WordCounter.

In short, the tools for qualitative data collection that you need are Excel and Word , as well as web-based free tools like Text Analyzer and WordCounter . These online tools are helpful in the quantitative part of your qualitative research.

Step 2: Gather all documents & transcribe non-written docs

Once you have your interviews and/or focus group transcripts, it’s time to decide if you need other documentation. If you do, you’ll need to gather it all into one place first, then develop a strategy for how to transcribe any non-written documents.

When do you need documentation other than interviews and focus groups? Two situations usually call for documentation. First , if you have little funding , then you can’t afford to run expensive interviews and focus groups.

Second , social science researchers typically focus on documents since their research questions are less concerned with subject-oriented data, while hard science and business researchers typically focus on interviews and focus groups because they want to know what people think, and they want to know today.

Non-written records

Other factors at play include the type of research, the field, and specific research goal. For those who need documentation and to describe non-written records, there are some steps to follow:

  • Put all hard copy source documents into a sealed binder (I use plastic paper holders with elastic seals ).
  • If you are sourcing directly from printed books or journals, then you will need to digitalize them by scanning them and making them text readable by the computer. To do so, turn all PDFs into Word documents using online tools such as PDF to Word Converter . This process is never full-proof, and it may be a source of error in the data collection, but it’s part of the process.
  • If you are sourcing online documents, try as often as possible to get computer-readable PDF documents that you can easily copy/paste or convert. Locked PDFs are essentially a lost cause .
  • Transcribe any audio files into written documents. There are free online tools available to help with this, such as 360converter . If you run a test through the system, you’ll see that the output is not 100%. The best way to use this tool is as a first draft generator. You can then correct and complete it with old fashioned, direct transcription.

Step 3: Decide on the type of qualitative research

Before step 3 you should have collected your data, transcribed it all into written-word documents, and compiled it in one place. Now comes the interesting part. You need to decide what you want to get out of your research by choosing an analytic angle, or type of qualitative research.

The available types of qualitative research are as follows. Each of them takes a unique angle that you must choose to get what information you want from the analysis . In addition, each of them has a different impact on the data analysis for qualitative research (coding vs word frequency) that we use.

Content analysis

Narrative analysis, discourse analysis.

  • Framework analysis, and/or

Grounded theory

From a high level, content, narrative, and discourse analysis are actionable independent tactics, whereas framework analysis and grounded theory are ways of honing and applying the first three.

  • Definition : Content analysis is identify and labelling themes of any kind within a text.
  • Focus : Identifying any kind of pattern in written text, transcribed audio, or transcribed video. This could be thematic, word repetition, idea repetition. Most often, the patterns we find are idea that make up an argument.
  • Goal : To simplify, standardize, and quickly reference ideas from any given text. Content analysis is a way to pull the main ideas from huge documents for comparison. In this way, it’s more a means to an end.
  • Pros : The huge advantage of doing content analysis is that you can quickly process huge amounts of texts using simple coding and word frequency techniques we will look at below. To use a metaphore, it is to qualitative analysis documents what Spark notes are to books.
  • Cons : The downside to content analysis is that it’s quite general. If you have a very specific, narrative research question, then tracing “any and all ideas” will not be very helpful to you.
  • Definition : Narrative analysis is the reformulation and simplification of interview answers or documentation into small narrative components to identify story-like patterns.
  • Focus : Understanding the text based on its narrative components as opposed to themes or other qualities.
  • Goal : To reference the text from an angle closer to the nature of texts in order to obtain further insights.
  • Pros : Narrative analysis is very useful for getting perspective on a topic in which you’re extremely limited. It can be easy to get tunnel vision when you’re digging for themes and ideas from a reason-centric perspective. Turning to a narrative approach will help you stay grounded. More importantly, it helps reveal different kinds of trends.
  • Cons : Narrative analysis adds another layer of subjectivity to the instinctive nature of qualitative research. Many see it as too dependent on the researcher to hold any critical value.
  • Definition : Discourse analysis is the textual analysis of naturally occurring speech. Any oral expression must be transcribed before undergoing legitimate discourse analysis.
  • Focus : Understanding ideas and themes through language communicated orally rather than pre-processed on paper.
  • Goal : To obtain insights from an angle outside the traditional content analysis on text.
  • Pros : Provides a considerable advantage in some areas of study in order to understand how people communicate an idea, versus the idea itself. For example, discourse analysis is important in political campaigning. People rarely vote for the candidate who most closely corresponds to his/her beliefs, but rather for the person they like the most.
  • Cons : As with narrative analysis, discourse analysis is more subjective in nature than content analysis, which focuses on ideas and patterns. Some do not consider it rigorous enough to be considered a legitimate subset of qualitative analysis, but these people are few.

Framework analysis

  • Definition : Framework analysis is a kind of qualitative analysis that includes 5 ordered steps: coding, indexing, charting, mapping, and interpreting . In most ways, framework analysis is a synonym for qualitative analysis — the same thing. The significant difference is the importance it places on the perspective used in the analysis.
  • Focus : Understanding patterns in themes and ideas.
  • Goal : Creating one specific framework for looking at a text.
  • Pros : Framework analysis is helpful when the researcher clearly understands what he/she wants from the project, as it’s a limitation approach. Since each of its step has defined parameters, framework analysis is very useful for teamwork.
  • Cons : It can lead to tunnel vision.
  • Definition : The use of content, narrative, and discourse analysis to examine a single case, in the hopes that discoveries from that case will lead to a foundational theory used to examine other like cases.
  • Focus : A vast approach using multiple techniques in order to establish patterns.
  • Goal : To develop a foundational theory.
  • Pros : When successful, grounded theories can revolutionize entire fields of study.
  • Cons : It’s very difficult to establish ground theories, and there’s an enormous amount of risk involved.

Step 4: Coding, word frequency, or both

Coding in data analysis for qualitative research is the process of writing 2-5 word codes that summarize at least 1 paragraphs of text (not writing computer code). This allows researchers to keep track of and analyze those codes. On the other hand, word frequency is the process of counting the presence and orientation of words within a text, which makes it the quantitative element in qualitative data analysis.

Video example of coding for data analysis in qualitative research

In short, coding in the context of data analysis for qualitative research follows 2 steps (video below):

  • Reading through the text one time
  • Adding 2-5 word summaries each time a significant theme or idea appears

Let’s look at a brief example of how to code for qualitative research in this video:

Click here for a link to the source text. 1

Example of word frequency processing

And word frequency is the process of finding a specific word or identifying the most common words through 3 steps:

  • Decide if you want to find 1 word or identify the most common ones
  • Use word’s “Replace” function to find a word or phrase
  • Use Text Analyzer to find the most common terms

Here’s another look at word frequency processing and how you to do it. Let’s look at the same example above, but from a quantitative perspective.

Imagine we are already familiar with melanoma and KITs , and we want to analyze the text based on these keywords. One thing we can do is look for these words using the Replace function in word

  • Locate the search bar
  • Click replace
  • Type in the word
  • See the total results

Here’s a brief video example:

Another option is to use an online Text Analyzer. This methodology won’t help us find a specific word, but it will help us discover the top performing phrases and words. All you need to do it put in a link to a target page or paste a text. I pasted the abstract from our source text, and what turns up is as expected. Here’s a picture:

text analyzer example

Step 5: Compile your data in a spreadsheet

After you have some coded data in the word document, you need to get it into excel for analysis. This process requires saving the word doc as an .htm extension, which makes it a website. Once you have the website, it’s as simple as opening that page, scrolling to the bottom, and copying/pasting the comments, or codes, into an excel document.

You will need to wrangle the data slightly in order to make it readable in excel. I’ve made a video to explain this process and places it below.

Step 6: Identify trends & analyze!

There are literally thousands of different ways to analyze qualitative data, and in most situations, the best technique depends on the information you want to get out of the research.

Nevertheless, there are a few go-to techniques. The most important of this is occurrences . In this short video, we finish the example from above by counting the number of times our codes appear. In this way, it’s very similar to word frequency (discussed above).

A few other options include:

  • Ranking each code on a set of relevant criteria and clustering
  • Pure cluster analysis
  • Causal analysis

We cover different types of analysis like this on the website, so be sure to check out other articles on the home page .

How to analyze qualitative data from an interview

To analyze qualitative data from an interview , follow the same 6 steps for quantitative data analysis:

  • Perform the interviews
  • Transcribe the interviews onto paper
  • Decide whether to either code analytical data (open, axial, selective), analyze word frequencies, or both
  • Compile your data in a spreadsheet using document saving techniques (for windows and mac)
  • Source text [ ↩ ]

About the Author

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

File available immediately.

data analysis in research steps

Notice: JavaScript is required for this content.

data analysis in research steps

  • AI & NLP
  • Churn & Loyalty
  • Customer Experience
  • Customer Journeys
  • Customer Metrics
  • Feedback Analysis
  • Product Experience
  • Product Updates
  • Sentiment Analysis
  • Surveys & Feedback Collection
  • Text Analytics
  • Try Thematic

Welcome to the community

data analysis in research steps

Qualitative Data Analysis: Step-by-Step Guide (Manual vs. Automatic)

When we conduct qualitative methods of research, need to explain changes in metrics or understand people's opinions, we always turn to qualitative data. Qualitative data is typically generated through:

  • Interview transcripts
  • Surveys with open-ended questions
  • Contact center transcripts
  • Texts and documents
  • Audio and video recordings
  • Observational notes

Compared to quantitative data, which captures structured information, qualitative data is unstructured and has more depth. It can answer our questions, can help formulate hypotheses and build understanding.

It's important to understand the differences between quantitative data & qualitative data . But unfortunately, analyzing qualitative data is difficult. While tools like Excel, Tableau and PowerBI crunch and visualize quantitative data with ease, there are a limited number of mainstream tools for analyzing qualitative data . The majority of qualitative data analysis still happens manually.

That said, there are two new trends that are changing this. First, there are advances in natural language processing (NLP) which is focused on understanding human language. Second, there is an explosion of user-friendly software designed for both researchers and businesses. Both help automate the qualitative data analysis process.

In this post we want to teach you how to conduct a successful qualitative data analysis. There are two primary qualitative data analysis methods; manual & automatic. We will teach you how to conduct the analysis manually, and also, automatically using software solutions powered by NLP. We’ll guide you through the steps to conduct a manual analysis, and look at what is involved and the role technology can play in automating this process.

More businesses are switching to fully-automated analysis of qualitative customer data because it is cheaper, faster, and just as accurate. Primarily, businesses purchase subscriptions to feedback analytics platforms so that they can understand customer pain points and sentiment.

Overwhelming quantity of feedback

We’ll take you through 5 steps to conduct a successful qualitative data analysis. Within each step we will highlight the key difference between the manual, and automated approach of qualitative researchers. Here's an overview of the steps:

The 5 steps to doing qualitative data analysis

  • Gathering and collecting your qualitative data
  • Organizing and connecting into your qualitative data
  • Coding your qualitative data
  • Analyzing the qualitative data for insights
  • Reporting on the insights derived from your analysis

What is Qualitative Data Analysis?

Qualitative data analysis is a process of gathering, structuring and interpreting qualitative data to understand what it represents.

Qualitative data is non-numerical and unstructured. Qualitative data generally refers to text, such as open-ended responses to survey questions or user interviews, but also includes audio, photos and video.

Businesses often perform qualitative data analysis on customer feedback. And within this context, qualitative data generally refers to verbatim text data collected from sources such as reviews, complaints, chat messages, support centre interactions, customer interviews, case notes or social media comments.

How is qualitative data analysis different from quantitative data analysis?

Understanding the differences between quantitative & qualitative data is important. When it comes to analyzing data, Qualitative Data Analysis serves a very different role to Quantitative Data Analysis. But what sets them apart?

Qualitative Data Analysis dives into the stories hidden in non-numerical data such as interviews, open-ended survey answers, or notes from observations. It uncovers the ‘whys’ and ‘hows’ giving a deep understanding of people’s experiences and emotions.

Quantitative Data Analysis on the other hand deals with numerical data, using statistics to measure differences, identify preferred options, and pinpoint root causes of issues.  It steps back to address questions like "how many" or "what percentage" to offer broad insights we can apply to larger groups.

In short, Qualitative Data Analysis is like a microscope,  helping us understand specific detail. Quantitative Data Analysis is like the telescope, giving us a broader perspective. Both are important, working together to decode data for different objectives.

Qualitative Data Analysis methods

Once all the data has been captured, there are a variety of analysis techniques available and the choice is determined by your specific research objectives and the kind of data you’ve gathered.  Common qualitative data analysis methods include:

Content Analysis

This is a popular approach to qualitative data analysis. Other qualitative analysis techniques may fit within the broad scope of content analysis. Thematic analysis is a part of the content analysis.  Content analysis is used to identify the patterns that emerge from text, by grouping content into words, concepts, and themes. Content analysis is useful to quantify the relationship between all of the grouped content. The Columbia School of Public Health has a detailed breakdown of content analysis .

Narrative Analysis

Narrative analysis focuses on the stories people tell and the language they use to make sense of them.  It is particularly useful in qualitative research methods where customer stories are used to get a deep understanding of customers’ perspectives on a specific issue. A narrative analysis might enable us to summarize the outcomes of a focused case study.

Discourse Analysis

Discourse analysis is used to get a thorough understanding of the political, cultural and power dynamics that exist in specific situations.  The focus of discourse analysis here is on the way people express themselves in different social contexts. Discourse analysis is commonly used by brand strategists who hope to understand why a group of people feel the way they do about a brand or product.

Thematic Analysis

Thematic analysis is used to deduce the meaning behind the words people use. This is accomplished by discovering repeating themes in text. These meaningful themes reveal key insights into data and can be quantified, particularly when paired with sentiment analysis . Often, the outcome of thematic analysis is a code frame that captures themes in terms of codes, also called categories. So the process of thematic analysis is also referred to as “coding”. A common use-case for thematic analysis in companies is analysis of customer feedback.

Grounded Theory

Grounded theory is a useful approach when little is known about a subject. Grounded theory starts by formulating a theory around a single data case. This means that the theory is “grounded”. Grounded theory analysis is based on actual data, and not entirely speculative. Then additional cases can be examined to see if they are relevant and can add to the original grounded theory.

Methods of qualitative data analysis; approaches and techniques to qualitative data analysis

Challenges of Qualitative Data Analysis

While Qualitative Data Analysis offers rich insights, it comes with its challenges. Each unique QDA method has its unique hurdles. Let’s take a look at the challenges researchers and analysts might face, depending on the chosen method.

  • Time and Effort (Narrative Analysis): Narrative analysis, which focuses on personal stories, demands patience. Sifting through lengthy narratives to find meaningful insights can be time-consuming, requires dedicated effort.
  • Being Objective (Grounded Theory): Grounded theory, building theories from data, faces the challenges of personal biases. Staying objective while interpreting data is crucial, ensuring conclusions are rooted in the data itself.
  • Complexity (Thematic Analysis): Thematic analysis involves identifying themes within data, a process that can be intricate. Categorizing and understanding themes can be complex, especially when each piece of data varies in context and structure. Thematic Analysis software can simplify this process.
  • Generalizing Findings (Narrative Analysis): Narrative analysis, dealing with individual stories, makes drawing broad challenging. Extending findings from a single narrative to a broader context requires careful consideration.
  • Managing Data (Thematic Analysis): Thematic analysis involves organizing and managing vast amounts of unstructured data, like interview transcripts. Managing this can be a hefty task, requiring effective data management strategies.
  • Skill Level (Grounded Theory): Grounded theory demands specific skills to build theories from the ground up. Finding or training analysts with these skills poses a challenge, requiring investment in building expertise.

Benefits of qualitative data analysis

Qualitative Data Analysis (QDA) is like a versatile toolkit, offering a tailored approach to understanding your data. The benefits it offers are as diverse as the methods. Let’s explore why choosing the right method matters.

  • Tailored Methods for Specific Needs: QDA isn't one-size-fits-all. Depending on your research objectives and the type of data at hand, different methods offer unique benefits. If you want emotive customer stories, narrative analysis paints a strong picture. When you want to explain a score, thematic analysis reveals insightful patterns
  • Flexibility with Thematic Analysis: thematic analysis is like a chameleon in the toolkit of QDA. It adapts well to different types of data and research objectives, making it a top choice for any qualitative analysis.
  • Deeper Understanding, Better Products: QDA helps you dive into people's thoughts and feelings. This deep understanding helps you build products and services that truly matches what people want, ensuring satisfied customers
  • Finding the Unexpected: Qualitative data often reveals surprises that we miss in quantitative data. QDA offers us new ideas and perspectives, for insights we might otherwise miss.
  • Building Effective Strategies: Insights from QDA are like strategic guides. They help businesses in crafting plans that match people’s desires.
  • Creating Genuine Connections: Understanding people’s experiences lets businesses connect on a real level. This genuine connection helps build trust and loyalty, priceless for any business.

How to do Qualitative Data Analysis: 5 steps

Now we are going to show how you can do your own qualitative data analysis. We will guide you through this process step by step. As mentioned earlier, you will learn how to do qualitative data analysis manually , and also automatically using modern qualitative data and thematic analysis software.

To get best value from the analysis process and research process, it’s important to be super clear about the nature and scope of the question that’s being researched. This will help you select the research collection channels that are most likely to help you answer your question.

Depending on if you are a business looking to understand customer sentiment, or an academic surveying a school, your approach to qualitative data analysis will be unique.

Once you’re clear, there’s a sequence to follow. And, though there are differences in the manual and automatic approaches, the process steps are mostly the same.

The use case for our step-by-step guide is a company looking to collect data (customer feedback data), and analyze the customer feedback - in order to improve customer experience. By analyzing the customer feedback the company derives insights about their business and their customers. You can follow these same steps regardless of the nature of your research. Let’s get started.

Step 1: Gather your qualitative data and conduct research (Conduct qualitative research)

The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

Classic methods of gathering qualitative data

Most companies use traditional methods for gathering qualitative data: conducting interviews with research participants, running surveys, and running focus groups. This data is typically stored in documents, CRMs, databases and knowledge bases. It’s important to examine which data is available and needs to be included in your research project, based on its scope.

Using your existing qualitative feedback

As it becomes easier for customers to engage across a range of different channels, companies are gathering increasingly large amounts of both solicited and unsolicited qualitative feedback.

Most organizations have now invested in Voice of Customer programs , support ticketing systems, chatbot and support conversations, emails and even customer Slack chats.

These new channels provide companies with new ways of getting feedback, and also allow the collection of unstructured feedback data at scale.

The great thing about this data is that it contains a wealth of valubale insights and that it’s already there! When you have a new question about user behavior or your customers, you don’t need to create a new research study or set up a focus group. You can find most answers in the data you already have.

Typically, this data is stored in third-party solutions or a central database, but there are ways to export it or connect to a feedback analysis solution through integrations or an API.

Utilize untapped qualitative data channels

There are many online qualitative data sources you may not have considered. For example, you can find useful qualitative data in social media channels like Twitter or Facebook. Online forums, review sites, and online communities such as Discourse or Reddit also contain valuable data about your customers, or research questions.

If you are considering performing a qualitative benchmark analysis against competitors - the internet is your best friend, and review analysis is a great place to start. Gathering feedback in competitor reviews on sites like Trustpilot, G2, Capterra, Better Business Bureau or on app stores is a great way to perform a competitor benchmark analysis.

Customer feedback analysis software often has integrations into social media and review sites, or you could use a solution like DataMiner to scrape the reviews.

G2.com reviews of the product Airtable. You could pull reviews from G2 for your analysis.

Step 2: Connect & organize all your qualitative data

Now you all have this qualitative data but there’s a problem, the data is unstructured. Before feedback can be analyzed and assigned any value, it needs to be organized in a single place. Why is this important? Consistency!

If all data is easily accessible in one place and analyzed in a consistent manner, you will have an easier time summarizing and making decisions based on this data.

The manual approach to organizing your data

The classic method of structuring qualitative data is to plot all the raw data you’ve gathered into a spreadsheet.

Typically, research and support teams would share large Excel sheets and different business units would make sense of the qualitative feedback data on their own. Each team collects and organizes the data in a way that best suits them, which means the feedback tends to be kept in separate silos.

An alternative and a more robust solution is to store feedback in a central database, like Snowflake or Amazon Redshift .

Keep in mind that when you organize your data in this way, you are often preparing it to be imported into another software. If you go the route of a database, you would need to use an API to push the feedback into a third-party software.

Computer-assisted qualitative data analysis software (CAQDAS)

Traditionally within the manual analysis approach (but not always), qualitative data is imported into CAQDAS software for coding.

In the early 2000s, CAQDAS software was popularised by developers such as ATLAS.ti, NVivo and MAXQDA and eagerly adopted by researchers to assist with the organizing and coding of data.  

The benefits of using computer-assisted qualitative data analysis software:

  • Assists in the organizing of your data
  • Opens you up to exploring different interpretations of your data analysis
  • Allows you to share your dataset easier and allows group collaboration (allows for secondary analysis)

However you still need to code the data, uncover the themes and do the analysis yourself. Therefore it is still a manual approach.

The user interface of CAQDAS software 'NVivo'

Organizing your qualitative data in a feedback repository

Another solution to organizing your qualitative data is to upload it into a feedback repository where it can be unified with your other data , and easily searchable and taggable. There are a number of software solutions that act as a central repository for your qualitative research data. Here are a couple solutions that you could investigate:  

  • Dovetail: Dovetail is a research repository with a focus on video and audio transcriptions. You can tag your transcriptions within the platform for theme analysis. You can also upload your other qualitative data such as research reports, survey responses, support conversations, and customer interviews. Dovetail acts as a single, searchable repository. And makes it easier to collaborate with other people around your qualitative research.
  • EnjoyHQ: EnjoyHQ is another research repository with similar functionality to Dovetail. It boasts a more sophisticated search engine, but it has a higher starting subscription cost.

Organizing your qualitative data in a feedback analytics platform

If you have a lot of qualitative customer or employee feedback, from the likes of customer surveys or employee surveys, you will benefit from a feedback analytics platform. A feedback analytics platform is a software that automates the process of both sentiment analysis and thematic analysis . Companies use the integrations offered by these platforms to directly tap into their qualitative data sources (review sites, social media, survey responses, etc.). The data collected is then organized and analyzed consistently within the platform.

If you have data prepared in a spreadsheet, it can also be imported into feedback analytics platforms.

Once all this rich data has been organized within the feedback analytics platform, it is ready to be coded and themed, within the same platform. Thematic is a feedback analytics platform that offers one of the largest libraries of integrations with qualitative data sources.

Some of qualitative data integrations offered by Thematic

Step 3: Coding your qualitative data

Your feedback data is now organized in one place. Either within your spreadsheet, CAQDAS, feedback repository or within your feedback analytics platform. The next step is to code your feedback data so we can extract meaningful insights in the next step.

Coding is the process of labelling and organizing your data in such a way that you can then identify themes in the data, and the relationships between these themes.

To simplify the coding process, you will take small samples of your customer feedback data, come up with a set of codes, or categories capturing themes, and label each piece of feedback, systematically, for patterns and meaning. Then you will take a larger sample of data, revising and refining the codes for greater accuracy and consistency as you go.

If you choose to use a feedback analytics platform, much of this process will be automated and accomplished for you.

The terms to describe different categories of meaning (‘theme’, ‘code’, ‘tag’, ‘category’ etc) can be confusing as they are often used interchangeably.  For clarity, this article will use the term ‘code’.

To code means to identify key words or phrases and assign them to a category of meaning. “I really hate the customer service of this computer software company” would be coded as “poor customer service”.

How to manually code your qualitative data

  • Decide whether you will use deductive or inductive coding. Deductive coding is when you create a list of predefined codes, and then assign them to the qualitative data. Inductive coding is the opposite of this, you create codes based on the data itself. Codes arise directly from the data and you label them as you go. You need to weigh up the pros and cons of each coding method and select the most appropriate.
  • Read through the feedback data to get a broad sense of what it reveals. Now it’s time to start assigning your first set of codes to statements and sections of text.
  • Keep repeating step 2, adding new codes and revising the code description as often as necessary.  Once it has all been coded, go through everything again, to be sure there are no inconsistencies and that nothing has been overlooked.
  • Create a code frame to group your codes. The coding frame is the organizational structure of all your codes. And there are two commonly used types of coding frames, flat, or hierarchical. A hierarchical code frame will make it easier for you to derive insights from your analysis.
  • Based on the number of times a particular code occurs, you can now see the common themes in your feedback data. This is insightful! If ‘bad customer service’ is a common code, it’s time to take action.

We have a detailed guide dedicated to manually coding your qualitative data .

Example of a hierarchical coding frame in qualitative data analysis

Using software to speed up manual coding of qualitative data

An Excel spreadsheet is still a popular method for coding. But various software solutions can help speed up this process. Here are some examples.

  • CAQDAS / NVivo - CAQDAS software has built-in functionality that allows you to code text within their software. You may find the interface the software offers easier for managing codes than a spreadsheet.
  • Dovetail/EnjoyHQ - You can tag transcripts and other textual data within these solutions. As they are also repositories you may find it simpler to keep the coding in one platform.
  • IBM SPSS - SPSS is a statistical analysis software that may make coding easier than in a spreadsheet.
  • Ascribe - Ascribe’s ‘Coder’ is a coding management system. Its user interface will make it easier for you to manage your codes.

Automating the qualitative coding process using thematic analysis software

In solutions which speed up the manual coding process, you still have to come up with valid codes and often apply codes manually to pieces of feedback. But there are also solutions that automate both the discovery and the application of codes.

Advances in machine learning have now made it possible to read, code and structure qualitative data automatically. This type of automated coding is offered by thematic analysis software .

Automation makes it far simpler and faster to code the feedback and group it into themes. By incorporating natural language processing (NLP) into the software, the AI looks across sentences and phrases to identify common themes meaningful statements. Some automated solutions detect repeating patterns and assign codes to them, others make you train the AI by providing examples. You could say that the AI learns the meaning of the feedback on its own.

Thematic automates the coding of qualitative feedback regardless of source. There’s no need to set up themes or categories in advance. Simply upload your data and wait a few minutes. You can also manually edit the codes to further refine their accuracy.  Experiments conducted indicate that Thematic’s automated coding is just as accurate as manual coding .

Paired with sentiment analysis and advanced text analytics - these automated solutions become powerful for deriving quality business or research insights.

You could also build your own , if you have the resources!

The key benefits of using an automated coding solution

Automated analysis can often be set up fast and there’s the potential to uncover things that would never have been revealed if you had given the software a prescribed list of themes to look for.

Because the model applies a consistent rule to the data, it captures phrases or statements that a human eye might have missed.

Complete and consistent analysis of customer feedback enables more meaningful findings. Leading us into step 4.

Step 4: Analyze your data: Find meaningful insights

Now we are going to analyze our data to find insights. This is where we start to answer our research questions. Keep in mind that step 4 and step 5 (tell the story) have some overlap . This is because creating visualizations is both part of analysis process and reporting.

The task of uncovering insights is to scour through the codes that emerge from the data and draw meaningful correlations from them. It is also about making sure each insight is distinct and has enough data to support it.

Part of the analysis is to establish how much each code relates to different demographics and customer profiles, and identify whether there’s any relationship between these data points.

Manually create sub-codes to improve the quality of insights

If your code frame only has one level, you may find that your codes are too broad to be able to extract meaningful insights. This is where it is valuable to create sub-codes to your primary codes. This process is sometimes referred to as meta coding.

Note: If you take an inductive coding approach, you can create sub-codes as you are reading through your feedback data and coding it.

While time-consuming, this exercise will improve the quality of your analysis. Here is an example of what sub-codes could look like.

Example of sub-codes

You need to carefully read your qualitative data to create quality sub-codes. But as you can see, the depth of analysis is greatly improved. By calculating the frequency of these sub-codes you can get insight into which  customer service problems you can immediately address.

Correlate the frequency of codes to customer segments

Many businesses use customer segmentation . And you may have your own respondent segments that you can apply to your qualitative analysis. Segmentation is the practise of dividing customers or research respondents into subgroups.

Segments can be based on:

  • Demographic
  • And any other data type that you care to segment by

It is particularly useful to see the occurrence of codes within your segments. If one of your customer segments is considered unimportant to your business, but they are the cause of nearly all customer service complaints, it may be in your best interest to focus attention elsewhere. This is a useful insight!

Manually visualizing coded qualitative data

There are formulas you can use to visualize key insights in your data. The formulas we will suggest are imperative if you are measuring a score alongside your feedback.

If you are collecting a metric alongside your qualitative data this is a key visualization. Impact answers the question: “What’s the impact of a code on my overall score?”. Using Net Promoter Score (NPS) as an example, first you need to:

  • Calculate overall NPS
  • Calculate NPS in the subset of responses that do not contain that theme
  • Subtract B from A

Then you can use this simple formula to calculate code impact on NPS .

Visualizing qualitative data: Calculating the impact of a code on your score

You can then visualize this data using a bar chart.

You can download our CX toolkit - it includes a template to recreate this.

Trends over time

This analysis can help you answer questions like: “Which codes are linked to decreases or increases in my score over time?”

We need to compare two sequences of numbers: NPS over time and code frequency over time . Using Excel, calculate the correlation between the two sequences, which can be either positive (the more codes the higher the NPS, see picture below), or negative (the more codes the lower the NPS).

Now you need to plot code frequency against the absolute value of code correlation with NPS. Here is the formula:

Analyzing qualitative data: Calculate which codes are linked to increases or decreases in my score

The visualization could look like this:

Visualizing qualitative data trends over time

These are two examples, but there are more. For a third manual formula, and to learn why word clouds are not an insightful form of analysis, read our visualizations article .

Using a text analytics solution to automate analysis

Automated text analytics solutions enable codes and sub-codes to be pulled out of the data automatically. This makes it far faster and easier to identify what’s driving negative or positive results. And to pick up emerging trends and find all manner of rich insights in the data.

Another benefit of AI-driven text analytics software is its built-in capability for sentiment analysis, which provides the emotive context behind your feedback and other qualitative textual data therein.

Thematic provides text analytics that goes further by allowing users to apply their expertise on business context to edit or augment the AI-generated outputs.

Since the move away from manual research is generally about reducing the human element, adding human input to the technology might sound counter-intuitive. However, this is mostly to make sure important business nuances in the feedback aren’t missed during coding. The result is a higher accuracy of analysis. This is sometimes referred to as augmented intelligence .

Codes displayed by volume within Thematic. You can 'manage themes' to introduce human input.

Step 5: Report on your data: Tell the story

The last step of analyzing your qualitative data is to report on it, to tell the story. At this point, the codes are fully developed and the focus is on communicating the narrative to the audience.

A coherent outline of the qualitative research, the findings and the insights is vital for stakeholders to discuss and debate before they can devise a meaningful course of action.

Creating graphs and reporting in Powerpoint

Typically, qualitative researchers take the tried and tested approach of distilling their report into a series of charts, tables and other visuals which are woven into a narrative for presentation in Powerpoint.

Using visualization software for reporting

With data transformation and APIs, the analyzed data can be shared with data visualisation software, such as Power BI or Tableau , Google Studio or Looker. Power BI and Tableau are among the most preferred options.

Visualizing your insights inside a feedback analytics platform

Feedback analytics platforms, like Thematic, incorporate visualisation tools that intuitively turn key data and insights into graphs.  This removes the time consuming work of constructing charts to visually identify patterns and creates more time to focus on building a compelling narrative that highlights the insights, in bite-size chunks, for executive teams to review.

Using a feedback analytics platform with visualization tools means you don’t have to use a separate product for visualizations. You can export graphs into Powerpoints straight from the platforms.

Two examples of qualitative data visualizations within Thematic

Conclusion - Manual or Automated?

There are those who remain deeply invested in the manual approach - because it’s familiar, because they’re reluctant to spend money and time learning new software, or because they’ve been burned by the overpromises of AI.  

For projects that involve small datasets, manual analysis makes sense. For example, if the objective is simply to quantify a simple question like “Do customers prefer X concepts to Y?”. If the findings are being extracted from a small set of focus groups and interviews, sometimes it’s easier to just read them

However, as new generations come into the workplace, it’s technology-driven solutions that feel more comfortable and practical. And the merits are undeniable.  Especially if the objective is to go deeper and understand the ‘why’ behind customers’ preference for X or Y. And even more especially if time and money are considerations.

The ability to collect a free flow of qualitative feedback data at the same time as the metric means AI can cost-effectively scan, crunch, score and analyze a ton of feedback from one system in one go. And time-intensive processes like focus groups, or coding, that used to take weeks, can now be completed in a matter of hours or days.

But aside from the ever-present business case to speed things up and keep costs down, there are also powerful research imperatives for automated analysis of qualitative data: namely, accuracy and consistency.

Finding insights hidden in feedback requires consistency, especially in coding.  Not to mention catching all the ‘unknown unknowns’ that can skew research findings and steering clear of cognitive bias.

Some say without manual data analysis researchers won’t get an accurate “feel” for the insights. However, the larger data sets are, the harder it is to sort through the feedback and organize feedback that has been pulled from different places.  And, the more difficult it is to stay on course, the greater the risk of drawing incorrect, or incomplete, conclusions grows.

Though the process steps for qualitative data analysis have remained pretty much unchanged since psychologist Paul Felix Lazarsfeld paved the path a hundred years ago, the impact digital technology has had on types of qualitative feedback data and the approach to the analysis are profound.  

If you want to try an automated feedback analysis solution on your own qualitative data, you can get started with Thematic .

data analysis in research steps

Community & Marketing

Tyler manages our community of CX, insights & analytics professionals. Tyler's goal is to help unite insights professionals around common challenges.

We make it easy to discover the customer and product issues that matter.

Unlock the value of feedback at scale, in one platform. Try it for free now!

  • Questions to ask your Feedback Analytics vendor
  • How to end customer churn for good
  • Scalable analysis of NPS verbatims
  • 5 Text analytics approaches
  • How to calculate the ROI of CX

Our experts will show you how Thematic works, how to discover pain points and track the ROI of decisions. To access your free trial, book a personal demo today.

Recent posts

Discover the power of thematic analysis to unlock insights from qualitative data. Learn about manual vs. AI-powered approaches, best practices, and how Thematic software can revolutionize your analysis workflow.

When two major storms wreaked havoc on Auckland and Watercare’s infrastructurem the utility went through a CX crisis. With a massive influx of calls to their support center, Thematic helped them get inisghts from this data to forge a new approach to restore services and satisfaction levels.

Become a qualitative theming pro! Creating a perfect code frame is hard, but thematic analysis software makes the process much easier.

data analysis in research steps

The Ultimate Guide to Qualitative Research - Part 2: Handling Qualitative Data

data analysis in research steps

  • Handling qualitative data
  • Transcripts
  • Field notes
  • Survey data and responses
  • Visual and audio data
  • Data organization
  • Data coding
  • Coding frame
  • Auto and smart coding
  • Organizing codes
  • Introduction

What is qualitative data analysis?

Qualitative data analysis methods, how do you analyze qualitative data, content analysis, thematic analysis.

  • Thematic analysis vs. content analysis
  • Narrative research

Phenomenological research

Discourse analysis, grounded theory.

  • Deductive reasoning
  • Inductive reasoning
  • Inductive vs. deductive reasoning
  • Qualitative data interpretation
  • Qualitative analysis software

Qualitative data analysis

Analyzing qualitative data is the next step after you have completed the use of qualitative data collection methods . The qualitative analysis process aims to identify themes and patterns that emerge across the data.

data analysis in research steps

In simplified terms, qualitative research methods involve non-numerical data collection followed by an explanation based on the attributes of the data . For example, if you are asked to explain in qualitative terms a thermal image displayed in multiple colors, then you would explain the color differences rather than the heat's numerical value. If you have a large amount of data (e.g., of group discussions or observations of real-life situations), the next step is to transcribe and prepare the raw data for subsequent analysis.

Researchers can conduct studies fully based on qualitative methodology, or researchers can preface a quantitative research study with a qualitative study to identify issues that were not originally envisioned but are important to the study. Quantitative researchers may also collect and analyze qualitative data following their quantitative analyses to better understand the meanings behind their statistical results.

Conducting qualitative research can especially help build an understanding of how and why certain outcomes were achieved (in addition to what was achieved). For example, qualitative data analysis is often used for policy and program evaluation research since it can answer certain important questions more efficiently and effectively than quantitative approaches.

data analysis in research steps

Qualitative data analysis can also answer important questions about the relevance, unintended effects, and impact of programs, such as:

  • Were expectations reasonable?
  • Did processes operate as expected?
  • Were key players able to carry out their duties?
  • Were there any unintended effects of the program?

The importance of qualitative data analysis

Qualitative approaches have the advantage of allowing for more diversity in responses and the capacity to adapt to new developments or issues during the research process itself. While qualitative analysis of data can be demanding and time-consuming to conduct, many fields of research utilize qualitative software tools that have been specifically developed to provide more succinct, cost-efficient, and timely results.

data analysis in research steps

Qualitative data analysis is an important part of research and building greater understanding across fields for a number of reasons. First, cases for qualitative data analysis can be selected purposefully according to whether they typify certain characteristics or contextual locations. In other words, qualitative data permits deep immersion into a topic, phenomenon, or area of interest. Rather than seeking generalizability to the population the sample of participants represent, qualitative research aims to construct an in-depth and nuanced understanding of the research topic.

Secondly, the role or position of the researcher in qualitative analysis of data is given greater critical attention. This is because, in qualitative data analysis, the possibility of the researcher taking a ‘neutral' or transcendent position is seen as more problematic in practical and/or philosophical terms. Hence, qualitative researchers are often exhorted to reflect on their role in the research process and make this clear in the analysis.

data analysis in research steps

Thirdly, while qualitative data analysis can take a wide variety of forms, it largely differs from quantitative research in the focus on language, signs, experiences, and meaning. In addition, qualitative approaches to analysis are often holistic and contextual rather than analyzing the data in a piecemeal fashion or removing the data from its context. Qualitative approaches thus allow researchers to explore inquiries from directions that could not be accessed with only numerical quantitative data.

Establishing research rigor

Systematic and transparent approaches to the analysis of qualitative data are essential for rigor . For example, many qualitative research methods require researchers to carefully code data and discern and document themes in a consistent and credible way.

data analysis in research steps

Perhaps the most traditional division in the way qualitative and quantitative research have been used in the social sciences is for qualitative methods to be used for exploratory purposes (e.g., to generate new theory or propositions) or to explain puzzling quantitative results, while quantitative methods are used to test hypotheses .

data analysis in research steps

After you’ve collected relevant data , what is the best way to look at your data ? As always, it will depend on your research question . For instance, if you employed an observational research method to learn about a group’s shared practices, an ethnographic approach could be appropriate to explain the various dimensions of culture. If you collected textual data to understand how people talk about something, then a discourse analysis approach might help you generate key insights about language and communication.

data analysis in research steps

The qualitative data coding process involves iterative categorization and recategorization, ensuring the evolution of the analysis to best represent the data. The procedure typically concludes with the interpretation of patterns and trends identified through the coding process.

To start off, let’s look at two broad approaches to data analysis.

Deductive analysis

Deductive analysis is guided by pre-existing theories or ideas. It starts with a theoretical framework , which is then used to code the data. The researcher can thus use this theoretical framework to interpret their data and answer their research question .

The key steps include coding the data based on the predetermined concepts or categories and using the theory to guide the interpretation of patterns among the codings. Deductive analysis is particularly useful when researchers aim to verify or extend an existing theory within a new context.

Inductive analysis

Inductive analysis involves the generation of new theories or ideas based on the data. The process starts without any preconceived theories or codes, and patterns, themes, and categories emerge out of the data.

data analysis in research steps

The researcher codes the data to capture any concepts or patterns that seem interesting or important to the research question . These codes are then compared and linked, leading to the formation of broader categories or themes. The main goal of inductive analysis is to allow the data to 'speak for itself' rather than imposing pre-existing expectations or ideas onto the data.

Deductive and inductive approaches can be seen as sitting on opposite poles, and all research falls somewhere within that spectrum. Most often, qualitative analysis approaches blend both deductive and inductive elements to contribute to the existing conversation around a topic while remaining open to potential unexpected findings. To help you make informed decisions about which qualitative data analysis approach fits with your research objectives, let's look at some of the common approaches for qualitative data analysis.

Content analysis is a research method used to identify patterns and themes within qualitative data. This approach involves systematically coding and categorizing specific aspects of the content in the data to uncover trends and patterns. An often important part of content analysis is quantifying frequencies and patterns of words or characteristics present in the data .

It is a highly flexible technique that can be adapted to various data types , including text, images, and audiovisual content . While content analysis can be exploratory in nature, it is also common to use pre-established theories and follow a more deductive approach to categorizing and quantifying the qualitative data.

data analysis in research steps

Thematic analysis is a method used to identify, analyze, and report patterns or themes within the data. This approach moves beyond counting explicit words or phrases and focuses on also identifying implicit concepts and themes within the data.

data analysis in research steps

Researchers conduct detailed coding of the data to ascertain repeated themes or patterns of meaning. Codes can be categorized into themes, and the researcher can analyze how the themes relate to one another. Thematic analysis is flexible in terms of the research framework, allowing for both inductive (data-driven) and deductive (theory-driven) approaches. The outcome is a rich, detailed, and complex account of the data.

Grounded theory is a systematic qualitative research methodology that is used to inductively generate theory that is 'grounded' in the data itself. Analysis takes place simultaneously with data collection , and researchers iterate between data collection and analysis until a comprehensive theory is developed.

Grounded theory is characterized by simultaneous data collection and analysis, the development of theoretical codes from the data, purposeful sampling of participants, and the constant comparison of data with emerging categories and concepts. The ultimate goal is to create a theoretical explanation that fits the data and answers the research question .

Discourse analysis is a qualitative research approach that emphasizes the role of language in social contexts. It involves examining communication and language use beyond the level of the sentence, considering larger units of language such as texts or conversations.

data analysis in research steps

Discourse analysts typically investigate how social meanings and understandings are constructed in different contexts, emphasizing the connection between language and power. It can be applied to texts of all kinds, including interviews , documents, case studies , and social media posts.

Phenomenological research focuses on exploring how human beings make sense of an experience and delves into the essence of this experience. It strives to understand people's perceptions, perspectives, and understandings of a particular situation or phenomenon.

data analysis in research steps

It involves in-depth engagement with participants, often through interviews or conversations, to explore their lived experiences. The goal is to derive detailed descriptions of the essence of the experience and to interpret what insights or implications this may bear on our understanding of this phenomenon.

data analysis in research steps

Whatever your data analysis approach, start with ATLAS.ti

Qualitative data analysis done quickly and intuitively with ATLAS.ti. Download a free trial today.

Now that we've summarized the major approaches to data analysis, let's look at the broader process of research and data analysis. Suppose you need to do some research to find answers to any kind of research question, be it an academic inquiry, business problem, or policy decision. In that case, you need to collect some data. There are many methods of collecting data: you can collect primary data yourself by conducting interviews, focus groups , or a survey , for instance. Another option is to use secondary data sources. These are data previously collected for other projects, historical records, reports, statistics – basically everything that exists already and can be relevant to your research.

data analysis in research steps

The data you collect should always be a good fit for your research question . For example, if you are interested in how many people in your target population like your brand compared to others, it is no use to conduct interviews or a few focus groups . The sample will be too small to get a representative picture of the population. If your questions are about "how many….", "what is the spread…" etc., you need to conduct quantitative research . If you are interested in why people like different brands, their motives, and their experiences, then conducting qualitative research can provide you with the answers you are looking for.

Let's describe the important steps involved in conducting research.

Step 1: Planning the research

As the saying goes: "Garbage in, garbage out." Suppose you find out after you have collected data that

  • you talked to the wrong people
  • asked the wrong questions
  • a couple of focus groups sessions would have yielded better results because of the group interaction, or
  • a survey including a few open-ended questions sent to a larger group of people would have been sufficient and required less effort.

Think thoroughly about sampling, the questions you will be asking, and in which form. If you conduct a focus group or an interview, you are the research instrument, and your data collection will only be as good as you are. If you have never done it before, seek some training and practice. If you have other people do it, make sure they have the skills.

data analysis in research steps

Step 2: Preparing the data

When you conduct focus groups or interviews, think about how to transcribe them. Do you want to run them online or offline? If online, check out which tools can serve your needs, both in terms of functionality and cost. For any audio or video recordings , you can consider using automatic transcription software or services. Automatically generated transcripts can save you time and money, but they still need to be checked. If you don't do this yourself, make sure that you instruct the person doing it on how to prepare the data.

  • How should the final transcript be formatted for later analysis?
  • Which names and locations should be anonymized?
  • What kind of speaker IDs to use?

What about survey data ? Some survey data programs will immediately provide basic descriptive-level analysis of the responses. ATLAS.ti will support you with the analysis of the open-ended questions. For this, you need to export your data as an Excel file. ATLAS.ti's survey import wizard will guide you through the process.

Other kinds of data such as images, videos, audio recordings, text, and more can be imported to ATLAS.ti. You can organize all your data into groups and write comments on each source of data to maintain a systematic organization and documentation of your data.

data analysis in research steps

Step 3: Exploratory data analysis

You can run a few simple exploratory analyses to get to know your data. For instance, you can create a word list or word cloud of all your text data or compare and contrast the words in different documents. You can also let ATLAS.ti find relevant concepts for you. There are many tools available that can automatically code your text data, so you can also use these codings to explore your data and refine your coding.

data analysis in research steps

For instance, you can get a feeling for the sentiments expressed in the data. Who is more optimistic, pessimistic, or neutral in their responses? ATLAS.ti can auto-code the positive, negative, and neutral sentiments in your data. Naturally, you can also simply browse through your data and highlight relevant segments that catch your attention or attach codes to begin condensing the data.

data analysis in research steps

Step 4: Build a code system

Whether you start with auto-coding or manual coding, after having generated some first codes, you need to get some order in your code system to develop a cohesive understanding. You can build your code system by sorting codes into groups and creating categories and subcodes. As this process requires reading and re-reading your data, you will become very familiar with your data. Counting on a tool like ATLAS.ti qualitative data analysis software will support you in the process and make it easier to review your data, modify codings if necessary, change code labels, and write operational definitions to explain what each code means.

data analysis in research steps

Step 5: Query your coded data and write up the analysis

Once you have coded your data, it is time to take the analysis a step further. When using software for qualitative data analysis , it is easy to compare and contrast subsets in your data, such as groups of participants or sets of themes.

data analysis in research steps

For instance, you can query the various opinions of female vs. male respondents. Is there a difference between consumers from rural or urban areas or among different age groups or educational levels? Which codes occur together throughout the data set? Are there relationships between various concepts, and if so, why?

Step 6: Data visualization

Data visualization brings your data to life. It is a powerful way of seeing patterns and relationships in your data. For instance, diagrams allow you to see how your codes are distributed across documents or specific subpopulations in your data.

data analysis in research steps

Exploring coded data on a canvas, moving around code labels in a virtual space, linking codes and other elements of your data set, and thinking about how they are related and why – all of these will advance your analysis and spur further insights. Visuals are also great for communicating results to others.

Step 7: Data presentation

The final step is to summarize the analysis in a written report . You can now put together the memos you have written about the various topics, select some salient quotes that illustrate your writing, and add visuals such as tables and diagrams. If you follow the steps above, you will already have all the building blocks, and you just have to put them together in a report or presentation.

When preparing a report or a presentation, keep your audience in mind. Does your audience better understand numbers than long sections of detailed interpretations? If so, add more tables, charts, and short supportive data quotes to your report or presentation. If your audience loves a good interpretation, add your full-length memos and walk your audience through your conceptual networks and illustrative data quotes.

data analysis in research steps

Qualitative data analysis begins with ATLAS.ti

For tools that can make the most out of your data, check out ATLAS.ti with a free trial.

Reference management. Clean and simple.

How to do a thematic analysis

data analysis in research steps

What is a thematic analysis?

When is thematic analysis used, braun and clarke’s reflexive thematic analysis, the six steps of thematic analysis, 1. familiarizing, 2. generating initial codes, 3. generating themes, 4. reviewing themes, 5. defining and naming themes, 6. creating the report, the advantages and disadvantages of thematic analysis, disadvantages, frequently asked questions about thematic analysis, related articles.

Thematic analysis is a broad term that describes an approach to analyzing qualitative data . This approach can encompass diverse methods and is usually applied to a collection of texts, such as survey responses and transcriptions of interviews or focus group discussions. Learn more about different research methods.

A researcher performing a thematic analysis will study a set of data to pinpoint repeating patterns, or themes, in the topics and ideas that are expressed in the texts.

In analyzing qualitative data, thematic analysis focuses on concepts, opinions, and experiences, as opposed to pure statistics. This requires an approach to data that is complex and exploratory and can be anchored by different philosophical and conceptual foundations.

A six-step system was developed to help establish clarity and rigor around this process, and it is this system that is most commonly used when conducting a thematic analysis. The six steps are:

  • Familiarization
  • Generating codes
  • Generating themes
  • Reviewing themes
  • Defining and naming themes
  • Creating the report

It is important to note that even though the six steps are listed in sequence, thematic analysis is not necessarily a linear process that advances forward in a one-way, predictable fashion from step one through step six. Rather, it involves a more fluid shifting back and forth between the phases, adjusting to accommodate new insights when they arise.

And arriving at insight is a key goal of this approach. A good thematic analysis doesn’t just seek to present or summarize data. It interprets and makes a statement about it; it extracts meaning from the data.

Since thematic analysis is used to study qualitative data, it works best in cases where you’re looking to gather information about people’s views, values, opinions, experiences, and knowledge.

Some examples of research questions that thematic analysis can be used to answer are:

  • What are senior citizens’ experiences of long-term care homes?
  • How do women view social media sites as a tool for professional networking?
  • How do non-religious people perceive the role of the church in a society?
  • What are financial analysts’ ideas and opinions about cryptocurrency?

To begin answering these questions, you would need to gather data from participants who can provide relevant responses. Once you have the data, you would then analyze and interpret it.

Because you’re dealing with personal views and opinions, there is a lot of room for flexibility in terms of how you interpret the data. In this way, thematic analysis is systematic but not purely scientific.

A landmark 2006 paper by Victoria Braun and Victoria Clarke (“ Using thematic analysis in psychology ”) established parameters around thematic analysis—what it is and how to go about it in a systematic way—which had until then been widely used but poorly defined.

Since then, their work has been updated, with the name being revised, notably, to “reflexive thematic analysis.”

One common misconception that Braun and Clarke have taken pains to clarify about their work is that they do not believe that themes “emerge” from the data. To think otherwise is problematic since this suggests that meaning is somehow inherent to the data and that a researcher is merely an objective medium who identifies that meaning.

Conversely, Braun and Clarke view analysis as an interactive process in which the researcher is an active participant in constructing meaning, rather than simply identifying it.

The six stages they presented in their paper are still the benchmark for conducting a thematic analysis. They are presented below.

This step is where you take a broad, high-level view of your data, looking at it as a whole and taking note of your first impressions.

This typically involves reading through written survey responses and other texts, transcribing audio, and recording any patterns that you notice. It’s important to read through and revisit the data in its entirety several times during this stage so that you develop a thorough grasp of all your data.

After familiarizing yourself with your data, the next step is coding notable features of the data in a methodical way. This often means highlighting portions of the text and applying labels, aka codes, to them that describe the nature of their content.

In our example scenario, we’re researching the experiences of women over the age of 50 on professional networking social media sites. Interviews were conducted to gather data, with the following excerpt from one interview.

Interview snippetCodes

It’s hard to get a handle on it. It’s so different from how things used to be done, when networking was about handshakes and business cards.

Confusion

Comparison with old networking methods

It makes me feel like a dinosaur.

Sense of being left behind

Plus, I've been burned a few times. I'll spend time making what I think are professional connections with male peers, only for the conversation to unexpectedly turn romantic on me. It seems like a lot of men use these sites as a way to meet women, not to develop their careers. It's stressful, to be honest.

Discomfort and unease

Unexpected experience with other users

In the example interview snippet, portions have been highlighted and coded. The codes describe the idea or perception described in the text.

It pays to be exhaustive and thorough at this stage. Good practice involves scrutinizing the data several times, since new information and insight may become apparent upon further review that didn’t jump out at first glance. Multiple rounds of analysis also allow for the generation of more new codes.

Once the text is thoroughly reviewed, it’s time to collate the data into groups according to their code.

Now that we’ve created our codes, we can examine them, identify patterns within them, and begin generating themes.

Keep in mind that themes are more encompassing than codes. In general, you’ll be bundling multiple codes into a single theme.

To draw on the example we used above about women and networking through social media, codes could be combined into themes in the following way:

CodesTheme

Confusion, Discomfort and unease, Unexpected experience with other users

Negative experience

Comparison with old networking methods, Sense of being left behind

Perceived lack of skills

You’ll also be curating your codes and may elect to discard some on the basis that they are too broad or not directly relevant. You may also choose to redefine some of your codes as themes and integrate other codes into them. It all depends on the purpose and goal of your research.

This is the stage where we check that the themes we’ve generated accurately and relevantly represent the data they are based on. Once again, it’s beneficial to take a thorough, back-and-forth approach that includes review, assessment, comparison, and inquiry. The following questions can support the review:

  • Has anything been overlooked?
  • Are the themes definitively supported by the data?
  • Is there any room for improvement?

With your final list of themes in hand, the next step is to name and define them.

In defining them, we want to nail down the meaning of each theme and, importantly, how it allows us to make sense of the data.

Once you have your themes defined, you’ll need to apply a concise and straightforward name to each one.

In our example, our “perceived lack of skills” may be adjusted to reflect that the texts expressed uncertainty about skills rather than the definitive absence of them. In this case, a more apt name for the theme might be “questions about competence.”

To finish the process, we put our findings down in writing. As with all scholarly writing, a thematic analysis should open with an introduction section that explains the research question and approach.

This is followed by a statement about the methodology that includes how data was collected and how the thematic analysis was performed.

Each theme is addressed in detail in the results section, with attention paid to the frequency and presence of the themes in the data, as well as what they mean, and with examples from the data included as supporting evidence.

The conclusion section describes how the analysis answers the research question and summarizes the key points.

In our example, the conclusion may assert that it is common for women over the age of 50 to have negative experiences on professional networking sites, and that these are often tied to interactions with other users and a sense that using these sites requires specialized skills.

Thematic analysis is useful for analyzing large data sets, and it allows a lot of flexibility in terms of designing theoretical and research frameworks. Moreover, it supports the generation and interpretation of themes that are backed by data.

There are times when thematic analysis is not the best approach to take because it can be highly subjective, and, in seeking to identify broad patterns, it can overlook nuance in the data.

What’s more, researchers must be judicious about reflecting on how their own position and perspective bears on their interpretations of the data and if they are imposing meaning that is not there or failing to pick up on meaning that is.

Thematic analysis offers a flexible and recursive way to approach qualitative data that has the potential to yield valuable insights about people’s opinions, views, and lived experience. It must be applied, however, in a conscientious fashion so as not to allow subjectivity to taint or obscure the results.

The purpose of thematic analysis is to find repeating patterns, or themes, in qualitative data. Thematic analysis can encompass diverse methods and is usually applied to a collection of texts, such as survey responses and transcriptions of interviews or focus group discussions. In analyzing qualitative data, thematic analysis focuses on concepts, opinions, and experiences, as opposed to pure statistics.

A big advantage of thematic analysis is that it allows a lot of flexibility in terms of designing theoretical and research frameworks. It also supports the generation and interpretation of themes that are backed by data.

A disadvantage of thematic analysis is that it can be highly subjective and can overlook nuance in the data. Also, researchers must be aware of how their own position and perspective influences their interpretations of the data and if they are imposing meaning that is not there or failing to pick up on meaning that is.

How many themes make sense in your thematic analysis of course depends on your topic and the material you are working with. In general, it makes sense to have no more than 6-10 broader themes, instead of having many really detailed ones. You can then identify further nuances and differences under each theme when you are diving deeper into the topic.

Since thematic analysis is used to study qualitative data, it works best in cases where you’re looking to gather information about people’s views, values, opinions, experiences, and knowledge. Therefore, it makes sense to use thematic analysis for interviews.

After familiarizing yourself with your data, the first step of a thematic analysis is coding notable features of the data in a methodical way. This often means highlighting portions of the text and applying labels, aka codes, to them that describe the nature of their content.

data analysis in research steps

Study Site Homepage

  • Request new password
  • Create a new account

The Essential Guide to Doing Your Research Project

Student resources, steps in secondary data analysis, stepping your way through effective secondary data analysis.

Determine your research question  – As indicated above, knowing exactly what you are looking for

Locating data – Knowing what is out there and whether you can gain access to it. A quick Internet search, possibly with the help of a librarian, will reveal a wealth of options.

Evaluating relevance of the data  – Considering things like the data’s original purpose, when it was collected, population, sampling strategy/sample, data collection protocols, operationalization of concepts, questions asked, and form/shape of the data.

Assessing credibility of the data  – Establishing the credentials of the original researchers, searching for full explication of methods including any problems encountered, determining how consistent the data is with data from other sources, and discovering whether the data has been used in any credible published research.

Analysis –  This will generally involve a range of statistical processes as discussed in Chapter 13.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Sydney.

language

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Analysis & Reporting

Try Qualtrics for free

How to analyse survey data: best practices, tips and tools.

20 min read Data can do beautiful things, but turning your survey results into clear, compelling analysis isn’t always a straightforward task. We’ve collected our tips for survey analysis along with a beginner’s guide to survey data and analysis tools.

Data can do beautiful things, but turning your survey responses into clear, compelling analysis isn’t always a straightforward task. We’ve collected our tips for survey analysis along with a beginner’s guide to survey data and analysis tools.

What is survey data analysis?

Survey analysis is the process of turning the raw material of your survey data into insights and answers you can use to improve things for your business. It’s an essential part of doing survey-based research .

There are a huge number of survey data analysis methods available, from simple cross-tabulation , where data from your survey responses is arranged into rows and columns that make it easier to understand, to statistical methods for survey data analysis which tell you things you could never work out on your own, such as whether the results you’re seeing have statistical significance.

Get your free Qualtrics account now

Types of survey data

Different kinds of survey questions yield data in different forms. Here’s a quick guide to a few of them. Often, survey data will belong to more than one of these categories as they frequently overlap.

Quantitative data vs. qualitative data

What’s the difference between qualitative data and quantitative data?

  • Quantitative data, aka numerical data, involves numerical values and quantities. An example of quantitative data would be the number of times a customer has visited a location, the temperature of a city or the scores achieved in an NPS survey .
  • Qualitative data is information that isn’t numerical. It may be verbal or visual, or consist of spoken audio or video. It’s more likely to be descriptive or subjective, although it doesn’t have to be. Qualitative data highlights the “why” behind the what.

Survey analysis

Image Source: Intellispot

Closed-ended questions

These are questions with a limited range of responses. They could be a ‘yes’ or ‘no’ question such as ‘do you live in Portland, OR?’. Closed-ended questions can also take the form of multiple-choice, ranking, or drop-down menu items. Respondents can’t qualify their choice between the options or explain why they chose which one they did.

This type of question produces structured data that is easy to sort, code and quantify since the responses will fit into a limited number of ‘buckets’. However, its simplicity means you lose out on some of the finer details that respondents could have provided.

Natural language data (open-ended questions)

Answers written in the respondent’s own words are also a form of survey data. This type of response is usually given in open field (text box) question formats. Questions might begin with ‘how,’ ‘why,’ ‘describe…’ or other conversational phrases that encourage the respondent to open up.

This type of data, known as unstructured data , is rich in information. It typically requires advanced tools such as Natural Language Processing and sentiment analysis to extract the full value from how the respondents answered, because of its complexity and volume.

Categorical (nominal) data

This kind of data exists in categories that have no hierarchical relationship to each other. No item is treated as being more or less, better or worse, than the others. Examples would be primary colours (red v. blue), genders (male v female) or brand names (Chrysler v Mitsubishi).

Multiple choice questions often produce this kind of data (though not always).

Ordinal data

Unlike categorical data, ordinal data has an intrinsic rank that relates to quantity or quality, such as degrees of preference, or how strongly someone agrees or disagrees with a statement.

Likert scales and ranking scales often serve up this kind of data.

Likert Scale

Scalar data

Like ordinal data, scalar data deals with quantity and quality on a relative basis, with some items ranking above others. What makes it different is that it uses an established scale, such as age (expressed as a number), test scores (out of 100), or time (in days, hours, minutes etc.)

You might get this kind of data from a drop-down or sliding scale question format, among others.

Conduct survey research with your free Qualtrics account

The type of data you receive affects the kind of survey results analysis you’ll be doing, so it’s very important to consider the type of survey data you will end up with when you’re writing your survey questions and designing survey flows .

Steps to analyse your survey data

Here’s an overview of how you can analyse survey data, identify trends and hopefully draw meaningful conclusions from your research.

1.   Review your research questions

Research questions are the underlying questions your survey seeks to answer. Research questions are not the same as the questions in your questionnaire, although they may cover similar ground.

It’s important to review your research questions before you analyse your survey data to determine if it aligns with what you want to accomplish and find out from your data.

2.   Cross-tabulate your data

Cross-tabulation is a valuable step in sifting through your data and uncovering its meaning. When you cross-tabulate, you’re breaking out your data according to the sub-groups within your research population or your sample, and comparing the relationship between one variable and another. The table you produce will give you an overall picture of how responses vary among your subgroups.

Target the survey questions that best address your research question. For example, if you want to know how many people would be interested in buying from you in the future, cross-tabulating the data will help you see whether some groups were more likely than others to want to return. This gives you an idea of where to focus your efforts when improving your product design or your customer experience .

Cross Tabulation

Cross-tabulation works best for categorical data and other types of structured data. You can cross-tabulate your data in multiple ways across different questions and sub-groups using survey analysis software . Be aware, though, that slicing and dicing your data very finely will give you a smaller sample size, which then affects the reliability of your results.

1.   Review and investigate your results

Put your results in context – how have things changed since the last time you researched these kinds of questions? Do your findings tie in to changes in your market or other research done within your company?

Look at how different demographics within your sample or research population have answered, and compare your findings to other data on these groups. For example, does your survey analysis tell you something about why a certain group is purchasing less, or more? Does the data tell you anything about how well your company is meeting strategic goals, such as changing brand perceptions or appealing to a younger market?

Look at quantitative measures too. Which questions were answered the most? Which ones produced the most polarised responses? Were there any questions with very skewed data? This could be a clue to issues with survey design .

2.   Use statistical analysis to check your findings

Statistics give you certainty (or as close to it as you can get) about the results of your survey. Statistical tools like T-test, regression and ANOVA help you make sure that the results you’re seeing have statistical significance and aren’t just there by chance.

Statistical tools can also help you determine which aspects of your data are most important, and what kinds of relationships – if any – they have with one another.

Become a market research expert with our Market Research Expert Reading List

Benchmarking your survey data

One of the most powerful aspects of survey data analysis is its ability to build on itself. By repeating market research surveys at different points in time, you can not only use it to uncover insights from your results, but to strengthen those insights over time.

Using consistent types of data and methods of analysis means you can use your initial results as a benchmark for future research . What’s changed year-on-year? Has your survey data followed a steady rise, performed a sudden leap or fallen incrementally? Over time, all these questions become answerable when you listen regularly and analyse your data consistently.

Maintaining your question and data types and your data analysis methods means you achieve a like-for-like measurement of results over time. And if you collect data consistently enough to see patterns and processes emerging, you can use these to make predictions about future events and outcomes.

Another benefit of data analysis over time is that you can compare your results with other people’s, provided you are using the same measurements and metrics. A classic example is NPS (Net Promoter Score) , which has become a standard measurement of customer experience that companies typically track over time.

Success Toolkit eBook: Rethink and reinvent your market research

How to present survey results

Most data isn’t very friendly to the human eye or brain in its raw form. Survey data analysis helps you turn your data into something that’s accessible, intuitive, and even interesting to a wide range of people.

1.   Make it visual

You can present data in a visual form, such as a chart or graph, or put it into a tabular form so it’s easy for people to see the relationships between variables in your crosstab analysis. Choose a graphic format that best suits your data type and clearly shows the results to the untrained eye. There are plenty of options, including linear graphs, bar graphs, Venn diagrams, word clouds and pie charts. If time and budget allows, you can create an infographic or animation.

2.   Keep language human

You can express discoveries in plain language, for example, in phrases like “customers in the USA consistently preferred potato chips to corn chips.” Adding direct quotes from your natural language data (provided respondents have consented to this) can add immediacy and illustrate your points.

3.   Tell the story of your research

Another approach is to express data using the power of storytelling, using a beginning-middle-end or situation-crisis-resolution structure to talk about how trends have emerged or challenges have been overcome. This helps people understand the context of your research and why you did it the way you did.

Turning your Data into Storytelling: Download your free eBook

4.   Include your insights

As well as presenting your data in terms of numbers and proportions, always be sure to share the insights it has produced too. Insights come when you apply knowledge and ideas to the data in the survey, which means they’re often more striking and easier to grasp than the data by itself. Insights may take the form of a recommended action , or examine how two different data points are connected.

CX text IQ

Related Reading: Maximise your research ROI with our eBook

Common mistakes in analysing data and how to avoid them

1.   being too quick to interpret survey results.

It’s easy to get carried away when the data seems to show the results you were expecting or confirms a hypothesis you started with. This is why it’s so important to use statistics to make sure your survey report is statistically significant, i.e. based on reality, not a coincidence. Remember that a skewed or coincidental result becomes more likely with a smaller sample size.

2.   Treating correlation like causation

You may have heard the phrase “correlation is not causation” before. It’s well-known for a reason: mistaking a link between two independent variables as a causal relationship between them is a common pitfall in research. Results can correlate without one having a direct effect on the other.

An example is when there is another common variable involved that isn’t measured and acts as a kind of missing link between the correlated variables. Sales of sunscreen might go up in line with the number of ice-creams sold at the beach, but it’s not because there’s something about ice-cream that makes people more vulnerable to getting sunburned. It’s because a third variable – sunshine – affects both sunscreen use and ice-cream sales.

3.   Missing the nuances in qualitative natural language data

Human language is complex, and analysing survey data in the form of speech or text isn’t as straightforward as mapping vocabulary items to positive or negative codes. The latest AI solutions go further, uncovering meaning, emotion and intent within human language.

Trusting your rich qualitative data to an AI’s interpretation means relying on the software’s ability to understand language in the way a human would, taking into account things like context and conversational dynamics. If you’re investing in software to analyse natural language data in your surveys, make sure it’s capable of sentiment analysis that uses machine learning to get a deeper understanding of what survey respondents are trying to tell you.

Free eBook – Go Beyond Surveys: How to Use Multiple Listening Channels

Tools for survey analysis

If you’re planning to run an ongoing data insights program (and we recommend that you do), it’s important to have tools on hand that make it easy and efficient to perform your research and extract valuable insights from the results.

It’s even better if those tools help you to share your findings with the right people, at the right time, in a format that works for them. Here are a few attributes to look for in a survey analysis software platform.

  • Easy to use (for non-experts) Look for software that demands minimal training or expertise, and you’ll save time and effort while maximising the number of people who can pitch in on your experience management program . User-friendly drag-and-drop interfaces, straightforward menus, and automated data analysis are all worth looking out for.
  • Works on any platform Don’t restrict your team to a single place where software is located on a few terminals. Instead, choose a cloud-based platform that’s optimised for mobile, desktop, tablet and more.
  • Integrates with your existing setup Stand-alone analysis tools create additional work you shouldn’t have to do. Why export, convert, paste and print out when you can use a software tool that plugs straight into your existing systems via API?
  • Incorporates statistical analysis Choose a system that gives you the tools to not just process and present your data, but refine your survey results using statistical tools that generate deep insights and future predictions with just a few clicks.
  • Comes with first-class support The best survey data tool is one that scales with you and adapts to your goals and growth. A large part of that is having an expert team on call to answer questions, propose bespoke solutions, and help you get the most out of the service you’ve paid for.

Stay up to date with our Market Research Global Trends Report

Tips from the team at Qualtrics

We’ve run more than a few survey research programs in our time, and we have some tips to share that you may not find in the average survey data analysis guide. Here are some innovative ways to help make sure your survey analysis hits the mark, grabs attention, and provokes change.

Write the headlines

The #1 way to make your research hit the mark is to start with the end in mind. Before you even write your survey questions, make sample headlines of what the survey will discover. Sample headlines are the main data takeaways from your research. Some sample headlines might be:

  • The #1 concern that travellers have with staying at our hotel is X
  • X% of visitors to our showroom want to be approached by a salesperson within the first 10 minutes
  • Diners are X% more likely to choose our new lunch menu than our old one

You may even want to sketch out mock charts that show how the data will look in your results. If you “write” the results first, those results become a guide to help you design questions that ensure you get the data you want.

Gut Data Gut

We live in a data-driven society. Marketing is a data-driven business function. But don’t be afraid to overlap qualitative research findings onto your quantitative data . Don’t be hesitant to apply what you know in your gut with what you know from the data.

This is called “Gut Data Gut”. Check your gut, check your data, and check your gut. If you have personal experience with the research topic, use it! If you have qualitative research that supports the data, use it!

Your survey is one star in a constellation of information that combines to tell a story. Use every atom of information at your disposal. Just be sure to let your audience know when you are showing them findings from statistically significant research and when it comes from a different source.

Write a mock press release to encourage taking action

One of the biggest challenges of research is acting on it . This is sometimes called the “Knowing / Doing Gap” where an organisation has a difficult time implementing truths they know.

One way you can ignite change with your research is to write a press release dated six months into the future that proudly announces all the changes as a result of your research. Maybe it touts the three new features that were added to your product. Perhaps it introduces your new approach to technical support. Maybe it outlines the improvements to your website.

After six months, gather your team and read the press release together to see how well you executed change based on the research.

Focus your research findings

Everyone consumes information differently. Some people want to fly over your findings at 30,000 feet and others want to slog through the weeds in their rubber boots. You should package your research for these different research consumer types.

Package your survey results analysis findings in 5 ways:

  • A 1-page executive summary with key insights
  • A 1-page stat sheet that ticks off the top supporting stats
  • A shareable slide deck with data visuals that can be understood as a stand-alone or by being presented in person
  • Live dashboards with all the survey data that allow team members to filter the data and dig in as deeply as they want on a DIY basis
  • The Mock Press Release (mentioned above)

Improve your market research with tips from our eBook: 3 Benefits of Research Platforms

How to analyse survey data

Reporting on survey results will prove the value of your work. Learn more about statistical analysis types or jump into an analysis type below to see our favourite tools of the trade:

  • Conjoint Analysis
  • CrossTab Analysis
  • Cluster Analysis
  • Factor Analysis
  • Analysis of Variance (ANOVA)

eBook: 5 Practices that Improve the Business Impact of Research

Related resources

Analysis & Reporting

Sentiment Analysis 20 min read

Thematic analysis 11 min read, predictive analytics 19 min read, descriptive statistics 15 min read, statistical significance calculator 18 min read, data analysis 29 min read, regression analysis 19 min read, request demo.

Ready to learn more about Qualtrics?

Examples of qualitative data.

What is qualitative data? How to understand, collect, and analyze it

A comprehensive guide to qualitative data, how it differs from quantitative data, and why it's a valuable tool for solving problems.

What is qualitative research?

Importance of qualitative data.

  • Differences between qualitative and quantitative data

Characteristics of qualitative data

Types of qualitative data.

  • Pros and cons
  • Collection methods

Everything that’s done digitally—from surfing the web to conducting a transaction—creates a data trail. And data analysts are constantly exploring and examining that trail, trying to find out ways to use data to make better decisions.

Different types of data define more and more of our interactions online—one of the most common and well-known being qualitative data or data that can be expressed in descriptions and feelings. 

This guide takes a deep look at what qualitative data is, what it can be used for, how it’s collected, and how it’s important to you. 

Key takeaways: 

Qualitative data gives insights into people's thoughts and feelings through detailed descriptions from interviews, observations, and visual materials.

The three main types of qualitative data are binary, nominal, and ordinal.

There are many different types of qualitative data, like data in research, work, and statistics. 

Both qualitative and quantitative research are conducted through surveys and interviews, among other methods. 

What is qualitative data?

Qualitative data is descriptive information that captures observable qualities and characteristics not quantifiable by numbers. It is collected from interviews, focus groups, observations, and documents offering insights into experiences, perceptions, and behaviors.

Qualitative data analysis cannot be counted or measured because it describes the data. It refers to the words or labels used to describe certain characteristics or traits.

This type of data answers the "why" or "how" behind the analysis . It’s often used to conduct open-ended studies, allowing those partaking to show their true feelings and actions without direction.

Think of qualitative data as the type of data you’d get if you were to ask someone why they did something—what was their reasoning? 

Qualitative research not only helps to collect data, it also gives the researcher a chance to understand the trends and meanings of natural actions. 

This type of data research focuses on the qualities of users—the actions behind the numbers. Qualitative research is the descriptive and subjective research that helps bring context to quantitative data. 

It’s flexible and iterative. For example: 

The music had a light tone that filled the kitchen.

Every blue button had white lettering, while the red buttons had yellow. 

The little girl had red hair with a white hat.

Qualitative data is important in determining the frequency of traits or characteristics. 

Understanding your data can help you understand your customers, users, or visitors better. And, when you understand your audience better, you can make them happier.  First-party data , which is collected directly from your own audience, is especially valuable as it provides the most accurate and relevant insights for your specific needs.

Qualitative data helps the market researcher answer questions like what issues or problems they are facing, what motivates them, and what improvements can be made.

Examples of qualitative data

You’ve most likely used qualitative data today. This type of data is found in your everyday work and in statistics all over the web. Here are some examples of qualitative data in descriptions, research, work, and statistics. 

Qualitative data in descriptions

Analysis of qualitative data requires descriptive context in order to support its theories and hypothesis. Here are some core examples of descriptive qualitative data:

The extremely short woman has curly hair and brilliant blue eyes.

A bright white light pierced the small dark space. 

The plump fish jumped out of crystal-clear waters. 

The fluffy brown dog jumped over the tall white fence. 

A soft cloud floated by an otherwise bright blue sky.

Qualitative data in research

Qualitative data research methods allow analysts to use contextual information to create theories and models. These open- and closed-ended questions can be helpful to understand the reasoning behind motivations, frustrations, and actions —in any type of case. 

Some examples of qualitative data collection in research:

What country do you work in? 

What is your most recent job title? 

How do you rank in the search engines? 

How do you rate your purchase: good, bad, or exceptional?

Qualitative data at work

Professionals in various industries use qualitative observations in their work and research. Examples of this type of data in the workforce include:

A manager gives an employee constructive criticism on their skills. "Your efforts are solid and you understand the product knowledge well, just have patience."

A judge shares the verdict with the courtroom. "The man was found not guilty and is free to go."

A sales associate collects feedback from customers. "The customer said the check-out button did not work.”

A teacher gives feedback to their student. "I gave you an A on this project because of your dedication and commitment to the cause."

A digital marketer watches a session replay to get an understand of how users use their platform.

Qualitative data in statistics

Qualitative data can provide important statistics about any industry, any group of users, and any products. Here are some examples of qualitative data set collections in statistics:

The age, weight, and height of a group of body types to determine clothing size charts. 

The origin, gender, and location for a census reading.

The name, title, and profession of people attending a conference to aid in follow-up emails.

Difference between qualitative and quantitative data

Qualitative and quantitative data are much different, but bring equal value to any data analysis. When it comes to understanding data research, there are different analysis methods, collection types and uses. 

Here are the differences between qualitative and quantitative data :

Qualitative data is individualized, descriptive, and relating to emotions.

Quantitative data is countable, measurable and relating to numbers.

Qualitative data helps us understand why, or how something occurred behind certain behaviors .

Quantitative data helps us understand how many, how much, or how often something occurred. 

Qualitative data is subjective and personalized.

Quantitative data is fixed and ubiquitous.

Qualitative research methods are conducted through observations or in-depth interviews.

Quantitative research methods are conducted through surveys and factual measuring. 

Qualitative data is analyzed by grouping the data into classifications and topics. 

Quantitative data is analyzed using statistical analysis.

Both provide a ton of value for any data collection and are key to truly understanding trending use cases and patterns in behavior . Dig deeper into quantitative data examples .

Qualtitative vs quantitative examples

The characteristics of qualitative data are vast. There are a few traits that stand out amongst other data that should be understood for successful data analysis. 

Descriptive : describing or classifying in an objective and nonjudgmental way.

Detailed : to give an account in words with full particulars.

Open-ended : having no determined limit or boundary.

Non-numerical : not containing numbers. 

Subjective : based on or influenced by personal feelings, tastes, or opinions.

With qualitative data samples, these traits can help you understand the meaning behind the equation—or for lack of a better term, what’s behind the results. 

As we narrow down the importance of qualitative data, you should understand that there are different data types. Data analysts often categorize qualitative data into three types:

1. Binary data

Binary data is numerically represented by a combination of zeros and ones. Binary data is the only category of data that can be directly understood and executed by a computer.

Data analysts use binary data to create statistical models that predict how often the study subject is likely to be positive or negative, up or down, right or wrong—based on a zero scale.

2. Nominal data

Nominal data , also referred to as “named, labeled data” or “nominal scaled data,” is any type of data used to label something without giving it a numerical value. 

Data analysts use nominal data to determine statistically significant differences between sets of qualitative data. 

For example, a multiple-choice test to profile participants’ skills in a study.

3. Ordinal data

Ordinal data is qualitative data categorized in a particular order or on a ranging scale. When researchers use ordinal data, the order of the qualitative information matters more than the difference between each category. Data analysts might use ordinal data when creating charts, while researchers might use it to classify groups, such as age, gender, or class.

For example, a Net Promoter Score ( NPS ) survey has results that are on a 0-10 satisfaction scale. 

When should you use qualitative research?

One of the important things to learn about qualitative data is when to use it. 

Qualitative data is used when you need to determine the particular trends of traits or characteristics or to form parameters for larger data sets to be observed. Qualitative data provides the means by which analysts can quantify the world around them.

You would use qualitative data to help answer questions like who your customers are, what issues or problems they’re facing, and where they need to focus their attention, so you can better solve those issues.

Qualitative data is widely used to understand language consumers speak—so apply it where necessary. 

Pros and cons of qualitative data

Qualitative data is a detailed, deep understanding of a topic through observing and interviewing a sample of people. There are both benefits and drawbacks to this type of data. 

Pros of qualitative data

Qualitative research is affordable and requires a small sample size.

Qualitative data provides a predictive element and provides specific insight into development.

Qualitative research focuses on the details of personal choice and uses these individual choices as workable data.

Qualitative research works to remove bias from its collected data by using an open-ended response process.

Qualitative data research provides useful content in any thematic analysis.

Cons of qualitative data 

Qualitative data can be time-consuming to collect and can be difficult to scale out to a larger population.

Qualitative research creates subjective information points.

Qualitative research can involve significant levels of repetition and is often difficult to replicate.

Qualitative research relies on the knowledge of the researchers.

Qualitative research does not offer statistical analysis, for that, you have to turn to quantitative data.

Qualitative data collection methods

Here are the main approaches and collection methods of qualitative studies and data: 

1. Interviews

Personal interviews are one of the most commonly used deductive data collection methods for qualitative research, because of its personal approach.

The interview may be informal and unstructured and is often conversational in nature. The interviewer or the researcher collects data directly from the interviewee one-to-one. Mostly the open-ended questions are asked spontaneously, with the interviewer allowing the flow of the interview to dictate the questions and answers.

The point of the interview is to obtain how the interviewee feels about the subject. 

2. Focus groups

Focus groups are held in a discussion-style setting with 6 to 10 people. The moderator is assigned to monitor and dictate the discussion based on focus questions.

Depending on the qualitative data that is needed, the members of the group may have something in common. For example, a researcher conducting a study on dog sled runners understands dogs, sleds, and snow and would have sufficient knowledge of the subject matter.

3. Data records 

Data doesn’t start with your collection, it has most likely been obtained in the past. 

Using already existing reliable data and similar sources of information as the data source is a surefire way to obtain qualitative research. Much like going to a library, you can review books and other reference material to collect relevant data that can be used in the research.

For example, if you were to study the trends of dictionaries, you would want to know the past history of every dictionary made, starting with the very first one. 

4. Observation

Observation is a longstanding qualitative data collection method, where the researcher simply observes behaviors in a participant's natural setting. They keep a keen eye on the participants and take down transcript notes to find out innate responses and reactions without prompting. 

Typically observation is an inductive approach, which is used when a researcher has very little or no idea of the research phenomenon. 

Other documentation methods, such as video recordings, audio recordings, and photo imagery, may be used to obtain qualitative data.

Further reading: Site observations through heatmaps

5. Case studies

Case studies are an intensive analysis of an individual person or community with a stress on developmental factors in relation to the environment. 

In this method, data is gathered by an in-depth analysis and is used to understand both simple and complex subjects. The goal of a case study is to see how using a product or service has positively impacted the subject, showcasing a solution to a problem or the like. 

6. Longitudinal studies

A longitudinal study is where people who share a single characteristic are studied over a period of time. 

This data collection method is performed on the same subject repeatedly over an extended period. It is an observational research method that goes on for a few years and, in some cases, decades. The goal is to find correlations of subjects with common traits.

For example, medical researchers conduct longitudinal studies to ascertain the effects of a drug or the symptoms related.

Qualitative data analysis tools

And, as with anything—you aren’t able to be successful without the right tools. Here are a few qualitative data analysis tools to have in your toolbox: 

MAXQDA —A qualitative and mixed-method data analysis software 

Fullstory —A behavioral data and analysis platform

ATLAS.ti —A powerful qualitative data tool that offers AI-based functions 

Quirkos —Qualitative data analysis software for the simple learner

Dedoose —A project management and analysis tool for collaboration and teamwork

Taguette —A free, open-source, data analysis and organization platform 

MonkeyLearn —AI-powered, qualitative text analysis, and visualization tool 

Qualtrics —Experience management software

Frequently asked questions about qualitative data

Is qualitative data subjective.

Yes, categorical data or qualitative data is information that cannot generally be proven. For instance, the statement “the chair is too small” depends on what it is used for and by whom it is being used.

Who uses qualitative data?

If you’re interested in the following, you should use qualitative data:

Understand emotional connections to your brand

Identify obstacles in any funnel, for example with session replay

Uncover confusion about your messaging

Locate product feature gaps 

Improve usability of your website, app, or experience

Observe how people talk, think, and feel about your brand

Learn how an organization selects vendors and partners

What are the steps for qualitative data?

1. Transcribe your data : Once you’ve collected all the data, you need to transcribe it. The first step in analyzing your data is arranging it systematically. Arranging data means converting all the data into a text format. 

2. Organize your data : Go back to your research objectives and organize the data based on the questions asked. Arrange your research objective in a table, so it appears visually clear. Avoid working with unorganized data, there will be no conclusive results obtained.

3. Categorize and assign the data : The coding process of qualitative data means categorizing and assigning variables, properties, and patterns. Coding is an important step in qualitative data analysis, as you can derive theories from relevant research findings. You can then begin to gain in-depth insight into the data that help make informed decisions.

4. Validate your data : Data validation is a recurring step that should be followed throughout the research process. There are two sides to validating data: the accuracy and reliability of your research methods, which is the extent to which the methods produce accurate data consistently. 

5. Conclude the data analysis : Present your data in a report that shares the method used to conduct the research studies, the outcomes, and the projected hypothesis of your findings in any related areas.

Is qualitative data better than quantitative data?

One is not better than the other, rather they work cohesively to create a better overall data analysis experience. Understanding the importance of both qualitative and quantitative data is going to produce the best possible data content analysis outcome for any study. 

Further reading : Qualitative vs. quantitative data — what's the difference?

Related posts

data analysis in research steps

Learn how to analyze qualitative data. We show examples of how to collect, organize, and analyze qualitative data to gain insights.

Here's how you can quantitatively analyze your qualitative digital experience data to unlock an entirely new workflow.

A number sign

Quantitative data is used for calculations or obtaining numerical results. Learn about the different types of quantitative data uses cases and more.

A person next to a chart graph

Qualitative and quantitative data differ on what they emphasize—qualitative focuses on meaning, and quantitative emphasizes statistical analysis.

Closing the data gaps in women’s health

Now more than ever, data is at the center of engagement and decision making. Healthcare and life sciences are no exception: data is central to discussions about public health and is core to enabling continued scientific advancement. The accelerated digitalization of healthcare during the pandemic has only expanded the amount of health-related data available—and cemented the key role that it plays in care delivery, disease prediction and diagnosis, biopharma and medtech innovation, and patient outcomes.

Despite the exponential growth in data generated across the healthcare ecosystem, notable gaps remain. One such area is women’s health, in which gaps span the entire data value chain—from defining women’s health (pre–data generation) to diagnosing (data generation) to tracking at the national level (data collection) to translating data into insights at the global level through epidemiological studies (data analysis). These data disparities ultimately influence health outcomes for women globally by creating blind spots in the insights that drive research design, investment decisions, and pipeline priorities. Certain subsets of women, such as those of different backgrounds, sexual orientations, and gender identities, are more vulnerable to the gaps and negative effects of these blind spots. 1 J.H. Flaskerud and A.M. Nyamathi, “Attaining gender and ethnic diversity in health intervention research: Cultural responsiveness versus resource provision,” Advances in Nursing Science , June 2000, Volume 22, Number 4; Madina Agénor et al., “Sexual orientation identity disparities in health behaviors, outcomes, and services use among men and women in the United States: A cross-sectional study,” BMC Public Health , August 2016, Volume 16. Furthermore, insufficient availability and analysis of women-specific health data undermine advancements in disease-state understanding and limit asset discovery opportunities across medical conditions with meaningful unmet need.

This article highlights these disparities and explores options to remedy them.

Understanding the data value chain

Health-related data has numerous sources and is analyzed for different applications. This article explores examples of data gaps in women’s health at key moments across the health data value chain (Exhibit 1).

The following sections examine these examples in greater detail.

Pre–data generation: Defining women’s health

Good data sets begin with good definitions. Without clear definitions, the metrics to track and the conclusions to draw remain murky. However, at present there is no one definition of “women’s health.” 2 Blog: Health Supplement , “Women’s health is more than female anatomy and our reproductive system—it’s about unraveling centuries of inequities due to living in a patriarchal healthcare system,” blog entry by Julia Cheek and Halle Tecco, Harvard Business School, January 18, 2022.

Historically, women’s health was largely defined as “reproductive health.” 3 Henry M. Greenberg, Stephen R. Leeder, and Susan U. Raymond, “Beyond reproduction: Women’s health in today’s developing world,” International Journal of Epidemiology , June 2005, Volume 34, Number 5. More recently, academics and clinicians have used a more expansive lens, recognizing that sex is a significant factor in the development and progression of many diseases. 4 Science in the News , “Treating men and women differently: Sex differences in the basis of disease,” Nathan Huey, October 30, 2018. The National Academy on Women’s Health Medical Education defines women’s health as “devoted to facilitating the preservation and wellness of and prevention of illness in women and includes screening, diagnosis and management of conditions that are unique in women, are more common in women, are more serious in women, and have manifestations, risk factors or interventions that are different in women.” 5 Pamela Carney, “Women’s health: An evolving mosaic,” Journal of General Internal Medicine , August 2000, Volume 15, Number 8.

For the purposes of this analysis, we define “women’s health” as encompassing both female-specific conditions, including those tied to female reproduction or female biology, and general health conditions that may affect women differently, such as cardiovascular diseases, or disproportionately, such as autoimmune diseases (Exhibit 2). 6 For more on our work to provide a practicable definition of “women’s health,” see “ Unlocking opportunities in women’s healthcare ,” McKinsey, February 14, 2022. It is crucial to understand sex-driven differences, as today’s care models often ignore those differences, resulting in health outcomes that can vary by sex (often to women’s disadvantage). 7 For more on the biological and emotional differences between men and women, see Alyson McGregor, Sex Matters: How Male-Centric Medicine Endangers Women’s Health and What We Can Do About It , New York, NY: Hachette Go, May 2020.

Data generation: Documenting women’s diagnoses in claims data

Insurance claims data—particularly in the United States—provides critical insight into the nature of health conditions and how they are treated. These data sets have inherent limitations, as their primary use for physicians is for billing purposes. However, this data is high quality and widely used, and thus the analysis for this article uses diagnosis codes on claims as a proxy for overall diagnosis rates.

According to US claims data from January 2019 through August 2022, the prevalence of women’s health conditions (estimated by epidemiological data sources 8 McKinsey analysis of the Global Burden of Disease Study 2019 (GBD 2019), Institute for Health Metrics and Evaluation, 2019. ) is roughly five times that of their documented diagnoses. In other words, for every one woman diagnosed with a women’s health condition, roughly four go undiagnosed. In comparison, the difference between epidemiological prevalence and documented diagnoses for men’s health conditions narrows to roughly 1.5 times (Exhibit 3). 9 McKinsey analysis of GBD 2019 and 2023 Komodo Health, Inc.

While this disconnection between prevalence and diagnosis indicates an inconsistency in women’s health data across various sources, such sex-based differences may also reflect structural drivers, such as biases in care delivery. Implicit biases, defined as “attitudes and beliefs about race, ethnicity, age, ability, gender, or other characteristics that operate outside our conscious awareness and can be measured only indirectly,” have been found to be associated with “diagnostic uncertainty.” 10 Janice A. Sabin, “Tackling implicit bias in health care,” New England Journal of Medicine , July 2022, Volume 387. This phenomenon is corroborated by surveyed patients: 20 percent of women say a healthcare provider has ignored or dismissed their symptoms, compared with 14 percent of men. 11 Emily Paulsen, “Recognizing, addressing unintended gender bias in patient care,” Duke Health, January 14, 2020.

Furthermore, we found that the sex of the diagnosing physician appears to be correlated with the likelihood of being diagnosed with a condition. In other words, in the claims data we analyzed, women appear to be more likely to be diagnosed with a female-specific condition, including menopause, polycystic ovary syndrome (PCOS), and endometriosis, if their physician is a woman. In our analysis of claims data, women represent approximately 40 percent of primary care physicians (PCPs) but nearly 50 percent of PCPs documenting diagnoses of these key women’s health conditions (Exhibit 4).

As Caroline Criado Perez writes in her book Invisible Women: Data Bias in a World Designed for Men , “It’s not always easy to convince someone a need exists if they don’t have that need themselves.” 12 For more on the adverse effects on women caused by gender bias in big data collection, see Caroline Criado Perez, Invisible Women: Data Bias in a World Designed for Men , New York, NY: Abrams Press, 2019. Given that claims data informs life sciences investment decisions, “blind spots” in these data sets may contribute to a perception of less unmet need and less need for continued innovation.

These biases in care delivery are also likely reinforced during medical training. Out of 112 internal-medicine residency programs reviewed in a 2016 study, approximately 25 percent did not include menopause in the core curriculum, 30 percent did not include contraception, nearly 40 percent did not include PCOS, and more than 70 percent did not include infertility. 13 Sebastian Casas et al., “Program directors’ perceptions of resident education in women’s health: A national survey,” Journal of Women’s Health , August 2016, Volume 26. These educational gaps are present even in programs dedicated to women’s health: a survey of US obstetrics and gynecology residents found that fewer than two in ten receive formal training in menopause medicine, but seven in ten would like to receive it. 14 “What do ob/gyns in training learn about menopause? Not nearly enough, new study suggests,” Johns Hopkins Medicine, May 1, 2013.

The rate of underdiagnosis for women is more striking in light of data that shows that women are, on average, more likely to seek out care. A 2013 Kaiser survey found that 68 percent of men and 81 percent of women identified a clinician they see for routine care, and women were more likely than men to have seen a provider in the past two years (91 percent versus 75 percent). 15 “Spotlight: Gender differences in healthcare,” US Department of Veterans Affairs, July 2015.

Data collection: Reporting sex-disaggregated health data at the national level

Globally, the quality and quantity of women’s health data collection is uneven. The World Bank tracks country-level reporting rates of gender-specific healthcare indicators and finds significant gaps. For example, in 2020, less than 10 percent of countries reported data related to female access to contraception. This lack of visibility undermines the ability to understand drivers of maternal and child health, maternal mortality, and sexually transmitted infections (STIs) such as HIV/AIDS. 16 “Contraceptive prevalence," Gender Data Portal Database, World Bank, accessed March 23, 2023. And while data availability related to contraceptive use has improved, many countries still only provide data for married women. Less than 5 percent of countries reported 2020 data on menstrual material usage, which is a critical indicator of public health, gender equity, and human rights. 17 “Women and girls who use menstrual material," 2020, Gender Data Portal Database, World Bank, accessed March 23, 2023.

The availability of sex-specific health data also differs by country. During the COVID-19 pandemic, for instance, 76 percent of high-income countries reported COVID-19 case data by sex, compared with 37 percent of low-income countries. 18 Kent Buse et al., “Recorded but not revealed: Exploring the relationship between sex and gender, country by income level, and COVID-19,” Lancet , June 2021, Volume 9, Number 6. Sex-disaggregated data provides important insights into the biological mechanisms and socioeconomic risk factors that drive disease prevention and may translate to the development of more effective biopharma (and other) interventions by sex. Without it, the picture of global women’s health remains incomplete, particularly in lower-income countries.

Data analysis: Improving the metrics of epidemiological studies

In addition to data published by individual nations, the global burden of disease (GBD) is often used by clinicians, payers, researchers, analysts, and policy makers to understand the evolving global healthcare landscape. The GBD is the world’s most comprehensive observational epidemiological study, spanning 204 countries, 369 diseases and injuries, and 87 risk factors.

However, traditional metrics tracked in the GBD may not capture the full scale of need in women’s health. Using the GBD tool of the Institute for Health Metrics and Evaluation (IHME), we investigated the prevalence and burden of disease associated with select women’s health conditions (Exhibit 5). Of the conditions defined in this analysis, 19 Note: not an exhaustive list of all conditions that affect women. nearly 60 percent of prevalence data was attributed to female-specific diseases such as maternal health, contraception, and menopause. However, these same female-specific conditions represent less than 25 percent of disability-adjusted life years 20 Each disability-adjusted life year is the equivalent of the loss of one year of health. associated with women’s health conditions. In other words, traditional health metrics do not accurately reflect the widespread suffering associated with female-specific conditions.

A delta between prevalence and disability-adjusted life years is not surprising in itself. However, the low disability-adjusted life years associated with female-specific conditions (some of which, such as menopause, are not tracked at all in the GBD) appear to meaningfully understate the disruption and suffering associated with female-specific conditions. For example:

  • Menopause. Approximately 80 percent of women indicate that menopause interferes with their lives, and roughly one-third of these women also experience depression. 21 “Suffering in silence: The biases and data gaps of menopause,” blog entry by Female Founders Fund, October 26, 2020; Jose Alvir et al., “Depression, quality of life, work productivity, resource use, and costs among women experiencing menopause and hot flashes: A cross-sectional study,” Primary Care Companion for CNS Disorders , November 2012, Volume 14, Number 6. Furthermore, with an estimated $810 billion in healthcare spending and productivity losses, menopause places a significant economic burden on the global economy. 22 Reenita Das, “Menopause reveals itself as the next big opportunity in femtech,” Forbes , July 24, 2019.
  • Infertility. About 40 percent of women with infertility are reported to experience depression, and another 35 percent are reported to experience anxiety. These rates are an estimated 1.5 to 2.0 times higher in low- and middle-income countries than in high-income countries. 23 Sepideh Hajian et al., “The prevalence of depression symptoms among infertile women: A systematic review and meta-analysis,” Fertility Research and Practice , March 2021, Volume 7; Vida Ghasemi et al., “The prevalence of anxiety symptoms in infertile women: A systematic review and meta-analysis,” Fertility Research and Practice , April 2020, Volume 6.
  • Endometriosis. Women with endometriosis have about three times higher healthcare costs on average. 24 Machaon Bonafede et al., “Real-world evaluation of direct and indirect economic burden among endometriosis patients in the United States,” Advances in Therapy , February 2018, Volume 35, Number 3. The time to diagnosis is estimated to be more than seven years, 25 Alice Broster, “Why it takes so long to be diagnosed with endometriosis, according to an expert,” Forbes , August 27, 2020. with an even more pronounced delay for Black women. 26 O. Bougie et al., “Influence of race/ethnicity on prevalence and presentation of endometriosis: A systematic review and meta-analysis,” British Journal of Obstetrics and Gynaecology , March 2019, Volume 126, Number 9. Furthermore, approximately half of women with endometriosis reported earning less money as a result of the impact of endometriosis symptoms. 27 Stephanie Chiuve et al., “Impact of endometriosis on women’s life decisions and goal attainment: A cross-sectional survey of members of an online patient community,” BMJ Open , April 2022, Volume 12, Number 4.
  • Dysmenorrhea. Period pain can disrupt women’s lives, with about 40 percent of young women reporting negative effects on classroom performance and about 20 percent reporting school absences due to pain. This impact is more pronounced in low- and middle-income countries. 28 Mike Armour et al., “The prevalence and academic impact of dysmenorrhea in 21,573 young women: A systematic review and meta-analysis,” Journal of Women’s Health , August 2019, Volume 28, Number 8.

Steps to close the data gaps in women’s health

Data is foundational to our understanding of disease states and is a crucial catalyst for continued life-sciences innovation. A 2023 report from the Enterprise Strategy Group and Splunk found that leaders in key data-maturity metrics also excelled in product innovation; these leaders report a higher number of product launches per year and a higher share of revenue accounted for by new innovations. 29 The Economic Impact of Data and Innovation 2023, Splunk.

In women’s health, data gaps coincide with lower rates of clinical development focused on women’s health: excluding oncology, just 1 percent of biopharma pipeline assets and 2 percent of medtech novel approvals are directed at addressing women’s health conditions. 30 The rates increase to 5 percent of biopharma and 4 percent of medtech approaches when oncology is included. McKinsey analysis of PharmaProjects Database, July 2021, and Med Tech EvaluatePharma, July 2021, Evaluate Ltd. (Including oncology, the rates increase to 5 and 4 percent, respectively.) Without data to accurately document the extent and nature of conditions, there is a limited fact base to fuel innovation—in women’s health specifically and in life sciences overall. But the opportunities to close gaps are plentiful and represent exciting opportunities for industry participants with a stake in improving women’s health outcomes.

Acknowledge the importance of sex in the definition and treatment of disease. The data gap is substantial and will take many hands and substantial effort to bridge. Today’s clinical trials, diagnoses, and treatment plans are built on existing data sets, only some of which include thorough analysis of robust sex-disaggregated data. Fortunately, working to close the data gap in women’s health could bring forth opportunities throughout the data value chain for life-sciences organizations, providers, payers, academics, and investors alike. The starting point is building a widely acknowledged definition of women’s health that includes all relevant conditions—not just those related to reproductive health—and highlights the biological relevance of sex to health outcomes.

Reinforce incentives at every step of the women’s health data value chain. At the regional and national levels, clinicians, academics, and researchers could benefit from updated guidance on the impact of sex-based differences on clinical outcomes, which demographic data to collect, and mechanisms for the collection of women’s health data. Improving visibility into women’s unmet medical needs will also require new mechanisms, incentives, and infrastructure to facilitate the generation of sex-disaggregated data. Some changes, such as those enhancing the understanding of sex-based differences in clinical outcomes, could have a substantial impact by creating a stronger foundation for women’s health data collection and analysis. Incumbent health and life sciences companies could also take the lead in working with companies that have unanalyzed sex-disaggregated data.

Improve the generation and use of data in care delivery. Clinician training in sex-specific biology and the implicit biases we know about today, with an emphasis on narrowing the gap between the prevalence of its condition and the volume of diagnoses, could help to improve health outcomes for women and critical sources of data generation. Once an expanded set of sex-disaggregated data is collected and assessed, that data can also guide future training on sex-based differences as data relates to healthcare research, clinical trials, pharmaceutical development, treatment, and more. Organizations that use clinical decision support systems to assist in clinical training may consider how a more robust set of sex-disaggregated data could improve models. The goal is to equip everyone—from researchers to providers to payers—with comprehensive training and tools built on best-in-class data practices.

Fund new ventures related to women’s health data. Investors could also seek out opportunities to fund new ventures focused on generating women’s health data and understanding the impact of sex-driven differences on health outcomes. The successful models that led to the proliferation of precision medicine in other therapeutic areas, such as oncology-focused electronic medical records companies, could guide these new ventures. Investors, entrepreneurs, and established life-sciences incumbents each have a role to play in investing in this white space.

Rethink traditional epidemiological metrics. Finally, stakeholders can consider how health metrics and studies are used, as well as the implications of those choices. Both key health outcome metrics and population-level analyses (for example, global epidemiological studies) should aim to accurately reflect the patient experience of different subpopulations. Revisiting and expanding these metrics could be a joint effort among governments, academics, clinicians, and public-health experts.

Our collective understanding of disease burden drives not only health outcomes but also investment decisions—and yet that understanding does not benefit from a comprehensive data set on women. The gaps are many, and women of different demographics feel the effects to varying degrees—but a new commitment to raising the bar in women’s health data could unlock the next generation of life-sciences innovations and care delivery for women globally. Moreover, taking care of women is taking care of communities. Everyone will benefit from a world with a comprehensive definition of women’s health, diagnosis rates more in line with condition prevalence, nationally reported sex-disaggregated data, and global epidemiological studies that consider the full breadth of women’s health experiences. The future of innovation in women’s health is only as strong as the data value chain that supports it. It’s time to close these gaps.

Delaney Burns is a consultant in McKinsey’s Paris office, Tara Grabowsky is a partner in the Philadelphia office, Emma Kemble is an associate partner in the New Jersey office, and Lucy Pérez is a senior partner in the Boston office.

The authors wish to thank Megan Greenfield, Valentina Sartori, Nicole Szlezak, Gila Tolub, and Yahui Wei for their contributions to this article.

Explore a career with us

Related articles.

Urban crowd of people from above

Health equity: A framework for the epidemiology of care

Young woman using smart phone on a city street

The dawn of the FemTech revolution

Young Hispanic female doctor shaking hand of senior woman patient

Unlocking opportunities in women’s healthcare

data analysis in research steps

usa flag

  • Policy & Compliance

Research Misconduct

Learn what research misconduct is, how allegations are handled, and find resources to address misconduct.

NIH aims to enable scientific discovery while assuring honesty, transparency, integrity, fair merit-based competition, and protection of intellectual capital and proprietary information. As the largest public funder of biomedical research, NIH sets the standard for innovation and scientific discovery. We promote the highest levels of scientific integrity and public accountability in the conduct of science.

The scientific research enterprise is built on a deep foundation of trust and shared values. As the National Academies wrote in their 2019 On Being a Scientist report, "this trust will endure only if the scientific community devotes itself to exemplifying and transmitting the values associated with ethical scientific conduct" Falsifying, fabricating, or plagiarizing data—all forms of research misconduct—are contrary to these values and ethical conduct.

What is Research Integrity?

Failing to uphold research integrity undermines the public’s confidence and trust in the outcomes of NIH supported research. Research integrity includes:

  • the use of honest and verifiable methods in proposing, performing, and evaluating research
  • reporting research results with particular attention to adherence to rules, regulations, and guidelines
  • following commonly accepted professional codes or norms

Shared values in scientific research

  • Honesty: convey information truthfully and honoring commitments
  • Accuracy: report findings precisely and take care to avoid errors
  • Efficiency: use resources wisely and avoid waste
  • Objectivity: let the facts speak for themselves and avoid bias

How does research integrity affect you?

  • Researchers rely on trustworthy results of other researchers to make scientific progress.
  • Researchers rely on public support, whether through public investments or their voluntary participation in research, to further science.
  • The public relies on scientific progress to better the lives of everyone.
  • The public could actually be harmed by researchers who are dishonest and act without regards to integrity

data analysis in research steps

About Research Misconduct

data analysis in research steps

Reporting a Concern

data analysis in research steps

Process for Handling Allegations of Research Misconduct

data analysis in research steps

NIH Expectations, Policies, and Requirements

data analysis in research steps

NIH Actions and Oversight after a Finding of Research Misconduct

data analysis in research steps

Notices, Statements and Reports

data analysis in research steps

Responsible Conduct of Research (RCR)

data analysis in research steps

Resources for NIH Staff

This page last updated on: July 15, 2024

  • Bookmark & Share
  • E-mail Updates
  • Privacy Notice
  • Accessibility
  • National Institutes of Health (NIH), 9000 Rockville Pike, Bethesda, Maryland 20892
  • NIH... Turning Discovery Into Health

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

remotesensing-logo

Article Menu

data analysis in research steps

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Detection of cliff top erosion drivers through machine learning algorithms between portonovo and trave cliffs (ancona, italy).

data analysis in research steps

1. Introduction

2. materials and methods, 2.1. fieldwork, 2.2. data analyses and surveys, 2.3. parameters extraction: cliff top retreat analysis and transect identification.

  • In order to obtain the baseline, the shoreline has been used as the reference line, considering a 100 m buffer. The 2022 orthophoto was used as base map and the shoreline was identified by the colour changing between the sand and the sea.
  • By using the option “Cast Transect”, a series of transects is generated, starting from this baseline and crossing the two delineated cliff edges, computing the linear distance between them. Then, we have defined a distance between each transect of 10 m, for a total of 310 transects, and a smoothing factor of 100 was used to avoid any crosscutting of these lines, thus keeping each transect as perpendicular as possible to the coastline. Furthermore, using the option “Cast Direction”, it was possible to indicate landward and seaward directions.
  • The intersections between transects and shoreline were created and, using the option “Calculate Change Statistics”, Net Shoreline Movement (NSM, the total movement measured in meters) and the End Point Rate (EPR, the rate of movement calculated in meters per years) along these transects were calculated, together with Confidence of End Point Rate (ECI or EPRunc in newer versions of DSAS), an index which takes into account the uncertainty of lines (accuracy error) as a factor for calculating the EPR confidence.

2.4. Machine Learning Analysis: Parameters

DriversDescriptionMapping MethodData Type
Cliff heightThe height of the cliff above sea level. It can influence slope stability [ , ]Data were extracted from 2022 DSM sampling of the highest point of the active cliffNumber
Cliff slopeThe slope of the active cliff wall. It can affect the frictional resistancesIt was manually computed on the extracted profile in a GIS environment, starting from the highest point of the active cliff down to the cliff base Number
AspectThe exposure of the cliff wall might be changing the erosion rate through differential weathering rates and different exposure to winds [ ]It was automatically computed in a GIS environment. The aspect was reported measured in degrees clockwise, with respect to the northNumber
UCS (MPa)The uniaxial compressive strength measured at the cliff base is related to cliff retreat [ , ]It was collected during the fieldwork using a pocket penetrometer and a Schmidt hammerNumber
GSIClassification of the rockmass, which takes into account the amount and quality of discontinuities controlling cliff erosion [ ]It was obtained during the fieldwork using the most updated versions of the classification for complex formation [ ]Number
Cliff top retreatValues of the cliff top retreat computed in the period 1978–2022, target value for the ML analysisIt was computed in a GIS environment using the tool DSAS of USGSNumber
Beach and talus widthThe corridor that separates the cliff base from the sea. This parameter determines if the cliff wall might be hit by waves [ ]It was manually measured on a GIS environment for every transect, starting from the cliff base to the shorelineNumber
Cliff base slopeThe slope of the space between the sea and the cliff base. It can affect wave run-up [ ]It was manually measured on a GIS environment for every transect starting from the cliff base to the shorelineNumber
Boulders at cliff baseBoulders at the base of the cliff can reduce the erosive power of waves, in fact they are used even as revetment [ ]It was manually added for each transect according to the 2022 orthophotoBinary (0 absence, 1 presence)
Beach retreat
(GIZC)
Beach retreat between 2008–2019 computed by Regione Marche in the project Gestione Integrata Zone Costiere (GIZC) The values registered in the GIZC were reported by a buffer in the shoreline, along with the values associated with each transectNumber
Vegetation at cliff topTrees and their roots in the upper part of the cliff can give more cohesion to soil or remove it when they are uprootedIt was manually added for each transect using the 2022 orthophotoBinary (0 absence, 1 presence)
Angle between shoreline and NE storms (Bora)The angle between the lines perpendicular (normal) to the shoreline and the wave front [ ]. The direction of Bora wave front was chosen according to RON data *It was manually measured on a GIS environment for every transect Number
Angle between shoreline and SE storms (Scirocco)The angle between the lines perpendicular (normal) to the shoreline and the wave front [ ]. The direction of Scirocco wave front was chosen according to RON data *It was manually measured on a GIS environment for every transect Number

2.5. Application of Machine Learning Models

2.6. slope stability numerical modeling, 3.1. fieldwork, 3.2. cliff top retreat analysis, 3.3. machine learning, 3.4. slope stability, 4. discussion, 5. conclusions.

  • Cliff top retreat calculations spanning the period from 1978 to 2022 reaffirm the findings of previous research [ 48 ], indicating notably higher values of NSM in the Trave sector.
  • The Mean Decrease in Impurity (MDI) analysis, conducted utilising Random Forest (RF) and XGBoost (XGB) ML algorithms, identified cliff height as the most significant parameter for cliff top erosion.
  • Limit Equilibrium Method (LEM) modeling confirms the correlation between FS and cliff height.

Supplementary Materials

Author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Martínez, M.L.; Intralawan, A.; Vázquez, G.; Pérez-Maqueo, O.; Sutton, P.; Landgrave, R. The Coasts of Our World: Ecological, Economic and Social Importance. Ecol. Econ. 2007 , 63 , 254–272. [ Google Scholar ] [ CrossRef ]
  • Young, A.P.; Carilli, J.E. Global Distribution of Coastal Cliffs. Earth Surf. Process. Landforms 2019 , 44 , 1309–1316. [ Google Scholar ] [ CrossRef ]
  • Naylor, L.A.; Stephenson, W.J.; Trenhaile, A.S. Rock Coast Geomorphology: Recent Advances and Future Research Directions. Geomorphology 2010 , 114 , 3–11. [ Google Scholar ] [ CrossRef ]
  • Kennedy, D.M.; Paulik, R.; Dickson, M.E. Subaerial Weathering versus Wave Processes in Shore Platform Development: Reappraising the Old Hat Island Evidence. Earth Surf. Process. Landforms 2011 , 36 , 686–694. [ Google Scholar ] [ CrossRef ]
  • Sunamura, T. Rocky Coast Processes: With Special Reference to the Recession of Soft Rock Cliffs. Proc. Japan Acad. Ser. B Phys. Biol. Sci. 2015 , 91 , 481–500. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Poate, T.; Masselink, G.; Austin, M.J.; Dickson, M.; McCall, R. The Role of Bed Roughness in Wave Transformation Across Sloping Rock Shore Platforms. J. Geophys. Res. Earth Surf. 2018 , 123 , 91–123. [ Google Scholar ] [ CrossRef ]
  • Moses, C.; Robinson, D. Chalk Coast Dynamics: Implications for Understanding Rock Coast Evolution. Earth-Sci. Rev. 2011 , 109 , 63–73. [ Google Scholar ] [ CrossRef ]
  • Kennedy, D.M.; Milkins, J. The Formation of Beaches on Shore Platforms in Microtidal Environments. Earth Surf. Process. Landforms 2015 , 40 , 34–46. [ Google Scholar ] [ CrossRef ]
  • Sunamura, T. The Elevation of Shore Platforms: A Laboratory Approach to the Unsolved Problem. J. Geol. 1991 , 99 , 761–766. [ Google Scholar ] [ CrossRef ]
  • Robinson, A.H.W. Erosion and Accretion along Part of the Suffolk Coast of East Anglia, England. Mar. Geol. 1980 , 37 , 133–146. [ Google Scholar ] [ CrossRef ]
  • Troiani, F.; Martino, S.; Marmoni, G.M.; Menichetti, M.; Torre, D.; Iacobucci, G.; Piacentini, D. Integrated Field Surveying and Land Surface Quantitative Analysis to Assess Landslide Proneness in the Conero Promontory Rocky Coast (Italy). Appl. Sci. 2020 , 14 , 4793. [ Google Scholar ] [ CrossRef ]
  • Naylor, L.A.; Stephenson, W.J. On the Role of Discontinuities in Mediating Shore Platform Erosion. Geomorphology 2010 , 114 , 89–100. [ Google Scholar ] [ CrossRef ]
  • Donati, D.; Stead, D.; Lato, M.; Gaib, S. Spatio-Temporal Characterization of Slope Damage: Insights from the Ten Mile Slide, British Columbia, Canada. Landslides 2020 , 17 , 1037–1049. [ Google Scholar ] [ CrossRef ]
  • Marmoni, G.M.; Martino, S.; Censi, M.; Menichetti, M.; Piacentini, D.; Scarascia Mugnozza, G.; Torre, D.; Troiani, F. Transition from Rock Mass Creep to Progressive Failure for Rockslide Initiation at Mt. Conero (Italy). Geomorphology 2023 , 437 , 108750. [ Google Scholar ] [ CrossRef ]
  • Rosser, N.J.; Brain, M.J.; Petley, D.N.; Lim, M.; Norman, E.C. Coastline Retreat via Progressive Failure of Rocky Coastal Cliffs. Geology 2013 , 41 , 939–942. [ Google Scholar ] [ CrossRef ]
  • Caputo, T.; Marino, E.; Matano, F.; Somma, R.; Troise, C.; De Natale, G. Terrestrial Laser Scanning (TLS) Data for the Analysis of Coastal Tuff Cliff Retreat: Application to Coroglio Cliff, Naples, Italy. Ann. Geophys. Geophys. 2018 , 61 , SE110. [ Google Scholar ] [ CrossRef ]
  • Esposito, G.; Salvini, R.; Matano, F.; Sacchi, M.; Danzi, M.; Somma, R.; Troise, C. Multitemporal Monitoring of a Coastal Landslide through SfM-Derived Point Cloud Comparison. Photogramm. Rec. 2017 , 32 , 459–479. [ Google Scholar ] [ CrossRef ]
  • Matano, F.; Pignalosa, A.; Marino, E.; Esposito, G.; Caccavale, M.; Caputo, T.; Sacchi, M.; Somma, R.; Troise, C.; De Natale, G. Laser Scanning Application for Geostructural Analysis of Tuffaceous Coastal Cliffs: The Case of Punta Epitaffio, Pozzuoli Bay, Italy. Eur. J. Remote Sens. 2015 , 48 , 615–637. [ Google Scholar ] [ CrossRef ]
  • Loiotine, L.; Andriani, G.F.; Jaboyedoff, M.; Parise, M.; Derron, M.H. Comparison of Remote Sensing Techniques for Geostructural Analysis and Cliff Monitoring in Coastal Areas of High Tourist Attraction: The Case Study of Polignano a Mare (Southern Italy). Remote Sens. 2021 , 13 , 5045. [ Google Scholar ] [ CrossRef ]
  • Francioni, M.; Coggan, J.; Eyre, M.; Stead, D. A Combined Field/Remote Sensing Approach for Characterizing Landslide Risk in Coastal Areas. Int. J. Appl. Earth Obs. Geoinf. 2018 , 67 , 79–95. [ Google Scholar ] [ CrossRef ]
  • Gómez-Pazo, A.; Pérez-Alberti, A.; Trenhaile, A. Tracking the Behavior of Rocky Coastal Cliffs in Northwestern Spain. Environ. Earth Sci. 2021 , 80 , 1–18. [ Google Scholar ] [ CrossRef ]
  • Prémaillon, M.; Dewez, T.J.B.; Regard, V.; Rosser, N.J.; Carretier, S.; Guillen, L. Conceptual Model of Fracture-Limited Sea Cliff Erosion: Erosion of the Seaward Tilted Flyschs of Socoa, Basque Country, France. Earth Surf. Process. Landforms 2021 , 46 , 2690–2709. [ Google Scholar ] [ CrossRef ]
  • Young, A.P.; Guza, R.T.; Matsumoto, H.; Merrifield, M.A.; O’Reilly, W.C.; Swirad, Z.M. Three Years of Weekly Observations of Coastal Cliff Erosion by Waves and Rainfall. Geomorphology 2021 , 375 , 107545. [ Google Scholar ] [ CrossRef ]
  • Piacentini, D.; Troiani, F.; Torre, D.; Menichetti, M. Land-Surface Quantitative Analysis to Investigate the Spatial Distribution of Gravitational Landforms along Rocky Coasts. Remote Sens. 2021 , 13 , 5012. [ Google Scholar ] [ CrossRef ]
  • Jaud, M.; Letortu, P.; Théry, C.; Grandjean, P.; Costa, S.; Maquaire, O.; Davidson, R.; Le Dantec, N. UAV Survey of a Coastal Cliff Face—Selection of the Best Imaging Angle. Measurement 2019 , 139 , 10–20. [ Google Scholar ] [ CrossRef ]
  • Bergillos, R.J.; Rodriguez-Delgado, C.; Medina, L.; Fernandez-Ruiz, J.; Rodriguez-Ortiz, J.M.; Iglesias, G. A Combined Approach to Cliff Characterization: Cliff Stability Index. Mar. Geol. 2022 , 444 , 106706. [ Google Scholar ] [ CrossRef ]
  • Lollino, P.; Pagliarulo, R.; Trizzino, R.; Santaloia, F.; Pisano, L.; Zumpano, V.; Perrotti, M.; Fazio, N.L. Multi-Scale Approach to Analyse the Evolution of Soft Rock Coastal Cliffs and Role of Controlling Factors: A Case Study in South-Eastern Italy. Geomat. Nat. Hazards Risk 2021 , 12 , 1058–1081. [ Google Scholar ] [ CrossRef ]
  • Earlie, C.S.; Masselink, G.; Russell, P.E.; Shail, R.K. Application of Airborne LiDAR to Investigate Rates of Recession in Rocky Coast Environments. J. Coast. Conserv. 2015 , 19 , 831–845. [ Google Scholar ] [ CrossRef ]
  • Torre, D.; Galve, J.P.; Reyes-Carmona, C.; Alfonso-Jorde, D.; Ballesteros, D.; Menichetti, M.; Piacentini, D.; Troiani, F.; Azañón, J.M. Geomorphological Assessment as Basic Complement of InSAR Analysis for Landslide Processes Understanding. Landslides 2024 , 21 , 1273–1292. [ Google Scholar ] [ CrossRef ]
  • Mantovani, M.; Devoto, S.; Forte, E.; Mocnik, A.; Pasuto, A.; Piacentini, D.; Soldati, M. A Multidisciplinary Approach for Rock Spreading and Block Sliding Investigation in the North-Western Coast of Malta. Landslides 2013 , 10 , 611–622. [ Google Scholar ] [ CrossRef ]
  • Di Luccio, D.; Aucelli, P.P.C.; Di Paola, G.; Pennetta, M.; Berti, M.; Budillon, G.; Florio, A.; Benassai, G. An Integrated Approach for Coastal Cliff Susceptibility: The Case Study of Procida Island (Southern Italy). Sci. Total Environ. 2023 , 855 , 158759. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Del Río, L.; Gracia, F.J. Erosion Risk Assessment of Active Coastal Cliffs in Temperate Environments. Geomorphology 2009 , 112 , 82–95. [ Google Scholar ] [ CrossRef ]
  • Anfuso, G.; Gracia, F.J.; Battocletti, G. Determination of Cliffed Coastline Sensitivity and Associated Risk for Human Structures: A Methodological Approach. J. Coast. Res. 2013 , 29 , 1292–1296. [ Google Scholar ] [ CrossRef ]
  • Tursi, M.F.; Anfuso, G.; Matano, F.; Mattei, G.; Aucelli, P.P.C. A Methodological Tool to Assess Erosion Susceptibility of High Coastal Sectors: Case Studies from Campania Region (Southern Italy). Water 2023 , 15 , 121. [ Google Scholar ] [ CrossRef ]
  • Hapke, C.; Plant, N. Predicting Coastal Cliff Erosion Using a Bayesian Probabilistic Model. Mar. Geol. 2010 , 278 , 140–149. [ Google Scholar ] [ CrossRef ]
  • Dickson, M.E.; Perry, G.L.W. Identifying the Controls on Coastal Cliff Landslides Using Machine-Learning Approaches. Environ. Model. Softw. 2016 , 76 , 117–127. [ Google Scholar ] [ CrossRef ]
  • He, L.; Coggan, J.; Francioni, M.; Eyre, M. Maximizing Impacts of Remote Sensing Surveys in Slope Stability—A Novel Method to Incorporate Discontinuities into Machine Learning Landslide Prediction. ISPRS Int. J. Geo-Inf. 2021 , 10 , 232. [ Google Scholar ] [ CrossRef ]
  • Himmelstoss, E.A.; Henderson, R.E.; Kratzmann, M.G.; Farris, A.S. Digital Shoreline Analysis System (DSAS) Version 5.0 User Guide ; Open-File Report 2018-1179; U.S. Geological Survey: Woods Hole, MA, USA, 2018. [ Google Scholar ]
  • Apostolopoulos, D.N.; Nikolakopoulos, K.G. Identifying Sandy Sites under Erosion Regime along the Prefecture of Achaia, Using Remote Sensing Techniques. J. Appl. Remote Sens. 2023 , 17 , 22206. [ Google Scholar ] [ CrossRef ]
  • Dey, M.; Jena, B.K. A Shoreline Change Detection (2012–2021) and Forecasting Using Digital Shoreline Analysis System (DSAS) Tool: A Case Study of Dahej Coast, Gulf of Khambhat, Gujarat, India. Indones. J. Geogr. 2021 , 53 , 295. [ Google Scholar ] [ CrossRef ]
  • Chrisben Sam, S.; Gurugnanam, B. Coastal Transgression and Regression from 1980 to 2020 and Shoreline Forecasting for 2030 and 2040, Using DSAS along the Southern Coastal Tip of Peninsular India. Geod. Geodyn. 2022 , 13 , 585–594. [ Google Scholar ] [ CrossRef ]
  • Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [ Google Scholar ] [ CrossRef ]
  • Breiman, L. Random Forests. Mach. Learn. 2001 , 45 , 5–32. [ Google Scholar ] [ CrossRef ]
  • Stead, D.; Coggan, J. Numerical Modeling of Rock-Slope Instability. In Landslides: Types, Mechanisms and Modeling ; Stead, D., Clague, J.J., Eds.; Cambridge University Press: Cambridge, UK, 2012; pp. 144–158. ISBN 9781107002067. [ Google Scholar ]
  • Cello, G.; Coppola, L. Modalità e Stili Deformativi Nell’area Anconetana. Stud. Geol. Camerti 1989 , XI , 37–48. [ Google Scholar ]
  • Coltorti, M.; Sarti, M. Note Illustrative Della Carta Geologica d’Italia Alla Scala 1:50.000 “Foglio 293—Osimo”. Progetto CARG: ISPRA, Servizio Geologico d’Italia. 2011. Available online: https://www.isprambiente.gov.it/Media/carg/293_OSIMO/Foglio.html (accessed on 14 July 2024).
  • Montanari, A.; Mainiero, M.; Coccioni, R.; Pignocchi, G. Catastrophic Landslide of Medieval Portonovo (Ancona, Italy). Geol. Soc. Am. Bull. 2016 , 128 , 1660–1678. [ Google Scholar ] [ CrossRef ]
  • Fullin, N.; Duo, E.; Fabbri, S.; Francioni, M.; Ghirotti, M.; Ciavola, P. Quantitative Characterization of Coastal Cliff Retreat and Landslide Processes at Portonovo—Trave Cliffs (Conero, Ancona, Italy) Using Multi-Source Remote Sensing Data. Remote Sens. 2023 , 15 , 4120. [ Google Scholar ] [ CrossRef ]
  • Hungr, O.; Leroueil, S.; Picarelli, L. The Varnes Classification of Landslide Types, an Update. Landslides 2014 , 11 , 167–194. [ Google Scholar ] [ CrossRef ]
  • Cruden, D.M.; Varnes, D.J. Landslide Types and Processes, Transportation Research Board, U.S. National Academy of Sciences, Special Report. Spec. Rep.-Natl. Res. Counc. Transp. Res. Board 1996 , 247 , 36–57. [ Google Scholar ]
  • Bisci, C.; Cantalamessa, G.; de Marco, R.; Spagnoli, F.; Tramontana, M. Caratteri Oceanografici Dell’Adriatico Centro-Settentrionale e Della Costa Marchigiana. Stud. Costieri 2021 , 30 , 7–12. [ Google Scholar ]
  • Acciarri, A.; Bisci, C.; Cantalamessa, G.; Cappucci, S.; Conti, M.; Di Pancrazio, G.; Spagnoli, F.; Valentini, E. Metrics for Short-Term Coastal Characterization, Protection and Planning Decisions of Sentina Natural Reserve, Italy. Ocean Coast. Manag. 2021 , 201 , 105472. [ Google Scholar ] [ CrossRef ]
  • Grottoli, E.; Bertoni, D.; Ciavola, P.; Pozzebon, A. Short Term Displacements of Marked Pebbles in the Swash Zone: Focus on Particle Shape and Size. Mar. Geol. 2015 , 367 , 143–158. [ Google Scholar ] [ CrossRef ]
  • Prémaillon, M.; Regard, V.; Dewez, T.J.B.; Auda, Y. GlobR2C2 (Global Recession Rates of Coastal Cliffs): A Global Relational Database to Investigate Coastal Rocky Cliff Erosion Rate Variations. Earth Surf. Dyn. 2018 , 6 , 651–668. [ Google Scholar ] [ CrossRef ]
  • Budetta, P.; Galietta, G.; Santo, A. A Methodology for the Study of the Relation between Coastal Cliff Erosion and the Mechanical Strength of Soils and Rock Masses. Eng. Geol. 2000 , 56 , 243–256. [ Google Scholar ] [ CrossRef ]
  • Barton, N. Suggested Methods for the Quantitative Description of Discontinuities in Rock Masses. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 1978 , 15 , 319–368. [ Google Scholar ]
  • Hoek, E.; Marinos, P.G.; Marinos, V.P. Characterisation and Engineering Properties of Tectonically Undisturbed but Lithologically Varied Sedimentary Rock Masses. Int. J. Rock Mech. Min. Sci. 2005 , 42 , 277–285. [ Google Scholar ] [ CrossRef ]
  • Marinos, P.V. New Proposed Gsi Classification Charts for Weak or Complex Rock Masses. Bull. Geol. Soc. Greece 2017 , 43 , 1248. [ Google Scholar ] [ CrossRef ]
  • Duo, E.; Fabbri, S.; Grottoli, E.; Ciavola, P. Uncertainty of Drone-Derived DEMs and Significance of Detected Morphodynamics in Artificially Scraped Dunes. Remote Sens. 2021 , 13 , 1823. [ Google Scholar ] [ CrossRef ]
  • Gindraux, S.; Boesch, R.; Farinotti, D. Accuracy Assessment of Digital Surface Models from Unmanned Aerial Vehicles’ Imagery on Glaciers. Remote Sens. 2017 , 9 , 186. [ Google Scholar ] [ CrossRef ]
  • Brunetta, R.; Duo, E.; Ciavola, P. Evaluating Short-Term Tidal Flat Evolution Through UAV Surveys: A Case Study in the Po Delta (Italy). Remote Sens. 2021 , 13 , 2322. [ Google Scholar ] [ CrossRef ]
  • Fabbri, S.; Grottoli, E.; Armaroli, C.; Ciavola, P. Using High-Spatial Resolution UAV-Derived Data to Evaluate Vegetation and Geomorphological Changes on a Dune Field Involved in a Restoration Endeavour. Remote Sens. 2021 , 13 , 1987. [ Google Scholar ] [ CrossRef ]
  • Talavera, L.; Benavente, J.; Del Río, L. UAS Identify and Monitor Unusual Small-Scale Rhythmic Features in the Bay of Cádiz (Spain). Remote Sens. 2021 , 13 , 1188. [ Google Scholar ] [ CrossRef ]
  • Brooks, S.M.; Spencer, T.; Boreham, S. Deriving Mechanisms and Thresholds for Cliff Retreat in Soft-Rock Cliffs under Changing Climates: Rapidly Retreating Cliffs of the Suffolk Coast, UK. Geomorphology 2012 , 153–154 , 48–60. [ Google Scholar ] [ CrossRef ]
  • Cenci, L.; Disperati, L.; Sousa, L.P.; Phillips, M.; Alves, F.L. Geomatics for Integrated Coastal Zone Management: Multitemporal Shoreline Analysis and Future Regional Perspective for the Portuguese Central Region. J. Coast. Res. 2013 , 65 , 1349–1354. [ Google Scholar ] [ CrossRef ]
  • Virdis, S.G.P.; Oggiano, G.; Disperati, L. A Geomatics Approach to Multitemporal Shoreline Analysis in Western Mediterranean: The Case of Platamona-Maritza Beach (Northwest Sardinia, Italy). J. Coast. Res. 2012 , 28 , 624–640. [ Google Scholar ] [ CrossRef ]
  • Buchanan, D.H.; Naylor, L.A.; Hurst, M.D.; Stephenson, W.J. Erosion of Rocky Shore Platforms by Block Detachment from Layered Stratigraphy. Earth Surf. Process. Landforms 2020 , 45 , 1028–1037. [ Google Scholar ] [ CrossRef ]
  • Crowell, M.; Leatherman, S.P.; Buckley, M.K. Historical Shoreline Change: Error Analysis and Mapping Accuracy. J. Coast. Res. 1991 , 7 , 839–852. [ Google Scholar ]
  • Fletcher, C.; Rooney, J.; Barbee, M.; Lim, S.; Richmond, B. Mapping Shoreline Change Using Digital Orthophotogrammetry on Maui, Hawaii. J. Coast. Res. 2003 , 38 , 106–124. [ Google Scholar ]
  • Del Río, L.; Gracia, F.J. Error Determination in the Photogrammetric Assessment of Shoreline Changes. Nat. Hazards 2013 , 65 , 2385–2397. [ Google Scholar ] [ CrossRef ]
  • Bloom, C.K.; Singeisen, C.; Stahl, T.; Howell, A.; Massey, C. Earthquake Contributions to Coastal Cliff Retreat. Earth Surf. Dyn. 2023 , 11 , 757–778. [ Google Scholar ] [ CrossRef ]
  • Trenhaile, A.S. Cliffs and Rock Coasts ; Elsevier Inc.: Amsterdam, The Netherlands, 2012; Volume 3, ISBN 9780080878850. [ Google Scholar ]
  • Wolters, G.; Müller, G. Effect of Cliff Shape on Internal Stresses and Rock Slope Stability. J. Coast. Res. 2008 , 24 , 43–50. [ Google Scholar ] [ CrossRef ]
  • Sunamura, T. Geomorphology of Rocky Coasts ; Coastal Morphology and Research; J. Wiley: Chichester, UK; New York, NY, USA, 1992; ISBN 0471917753. [ Google Scholar ]
  • Everts, C.H. Seacliff Retreat and Coarse Sediment Yields in Southern California. In Proceedings of the Coastal Sediments ’91 (American Society Civil Engineering), Seattle, WA, USA, 25–27 June 1991; pp. 1586–1598. [ Google Scholar ]
  • Stockdon, H.F.; Holman, R.A.; Howd, P.A.; Sallenger, A.H. Empirical Parameterization of Setup, Swash, and Runup. Coast. Eng. 2006 , 53 , 573–588. [ Google Scholar ] [ CrossRef ]
  • Goda, Y. Random Seas and Design of Maritime Structures ; World Scientific: Singapore, 2010; Volume 33, ISBN 978-981-4282-39-0. [ Google Scholar ]
  • Emery, K.O.; Kuhn, G.G. Sea Cliffs: Their Processes, Profiles, and Classification. GSA Bull. 1982 , 93 , 644–654. [ Google Scholar ] [ CrossRef ]
  • Ho, T.K. Random Decision Forests Tin Kam Ho Perceptron Training. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [ Google Scholar ]
  • Azar, A.T.; Elshazly, H.I.; Hassanien, A.E.; Elkorany, A.M. A Random Forest Classifier for Lymph Diseases. Comput. Methods Programs Biomed. 2014 , 113 , 465–473. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Quinlan, J.R. Chapter 2—Constructing Decision Trees ; Morgan Kaufmann: San Francisco, CA, USA, 1993; pp. 17–26. ISBN 978-0-08-050058-4. [ Google Scholar ]
  • Chen, X.; Huang, L.; Xie, D.; Zhao, Q. EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association Prediction. Cell Death Dis. 2018 , 9 , 3. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [ Google Scholar ] [ CrossRef ]
  • Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001 , 29 , 1189–1232. [ Google Scholar ] [ CrossRef ]
  • Ruder, S. An Overview of Gradient Descent Optimization Algorithms. arXiv 2016 , arXiv:1609.04747. [ Google Scholar ]
  • Guyon, I. A Scaling Law for the Validation-Set Training-Set Size Ratio ; AT&T Bell Laboratories: Murray Hill, NJ, USA, 1997; Volume 1. [ Google Scholar ]
  • Bej, S.; Davtyan, N.; Wolfien, M.; Nassar, M.; Wolkenhauer, O. LoRAS: An Oversampling Approach for Imbalanced Datasets. Mach. Learn. 2021 , 110 , 279–301. [ Google Scholar ] [ CrossRef ]
  • Liashchynskyi, P.; Liashchynskyi, P. Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS. arXiv 2019 , arXiv:1912.06059. [ Google Scholar ]
  • Morgenstern, N.U.; Price, V.E. The Analysis of the Stability of General Slip Surfaces. Geotechnique 1965 , 15 , 79–93. [ Google Scholar ] [ CrossRef ]
  • Cai, M.; Kaiser, P.K. Obtaining Modeling Parameters for Engineering Design by Rock Mass Characterization. In Proceedings of the 11th ISRM Congress, Lisbon, Portugal, 9–13 July 2007; pp. 381–384. [ Google Scholar ]
  • Brown, E. Estimating the Mechanical Properties of Rock Masses. In SHIRMS 2008, Proceedings of the First Southern Hemisphere International Rock Mechanics Symposium, Perth, Australia, 16–19 September 2008 ; Australian Centre for Geomechanics: Crawley, Australia, 2008; pp. 3–22. [ Google Scholar ] [ CrossRef ]
  • Delle Rose, M.; Parise, M. Speleogenesi e Geomorfologia Del Sistema Carsico Delle Grotte Della Poesia Nell’ambito Dell’evoluzione Quaternaria Della Costa Adriatica Salentina. Atti Mem. Comm. Grotte E. Boegan. 2005 , 40 , 153–173. [ Google Scholar ]
  • Miccadei, E.; Mascioli, F.; Ricci, F.; Piacentini, T. Geomorphology of Soft Clastic Rock Coasts in the Mid-Western Adriatic Sea (Abruzzo, Italy). Geomorphology 2019 , 324 , 72–94. [ Google Scholar ] [ CrossRef ]
  • Colantoni, P.; Mencucci, D.; Nesci, O. Coastal Processes and Cliff Recession between Gabicce and Pesaro (Northern Adriatic Sea): A Case History. Geomorphology 2004 , 62 , 257–268. [ Google Scholar ] [ CrossRef ]
  • Sunamura, T. A Relationship between Wave-Induced Cliff Erosion and Erosive Force of Waves. J. Geol. 1977 , 85 , 613–618. [ Google Scholar ] [ CrossRef ]
  • Trenhaile, A.S. Hard-Rock Coastal Modelling: Past Practice and Future Prospects in a Changing World. J. Mar. Sci. Eng. 2019 , 7 , 34. [ Google Scholar ] [ CrossRef ]
  • Trenhaile, A.S. The Effect of Holocene Changes in Relative Sea Level on the Morphology of Rocky Coasts. Geomorphology 2010 , 114 , 30–41. [ Google Scholar ] [ CrossRef ]
  • Hursta, M.D.; Rood, D.H.; Ellis, M.A.; Anderson, R.S.; Dornbusch, U. Recent Acceleration in Coastal Cliff Retreat Rates on the South Coast of Great Britain. Proc. Natl. Acad. Sci. USA 2016 , 113 , 13336–13341. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Limber, P.W.; Barnard, P.L.; Vitousek, S.; Erikson, L.H. A Model Ensemble for Projecting Multidecadal Coastal Cliff Retreat during the 21st Century. J. Geophys. Res. Earth Surf. 2018 , 123 , 1566–1589. [ Google Scholar ] [ CrossRef ]
  • Fraccaroli, M.; Mazzuchelli, G.; Bizzarri, A. Machine Learning Techniques for Extracting Relevant Features from Clinical Data for COVID-19 Mortality Prediction. In Proceedings of the 2021 IEEE Symposium on Computers and Communications (ISCC), Athens, Greece, 5–8 September 2021; pp. 1–7. [ Google Scholar ]
  • Fadja, A.N.; Fraccaroli, M.; Bizzarri, A.; Mazzuchelli, G.; Lamma, E. Neural-Symbolic Ensemble Learning for Early-Stage Prediction of Critical State of Covid-19 Patients. Med. Biol. Eng. Comput. 2022 , 60 , 3461–3474. [ Google Scholar ] [ CrossRef ]
  • Zeng, T.; Liang, Y.; Dai, Q.; Tian, J.; Chen, J.; Lei, B.; Yang, Z.; Cai, Z. Application of Machine Learning Algorithms to Screen Potential Biomarkers under Cadmium Exposure Based on Human Urine Metabolic Profiles. Chin. Chem. Lett. 2022 , 33 , 5184–5188. [ Google Scholar ] [ CrossRef ]
  • Parsa, M. A Data Augmentation Approach to XGboost-Based Mineral Potential Mapping: An Example of Carbonate-Hosted ZnPb Mineral Systems of Western Iran. J. Geochem. Explor. 2021 , 228 , 106811. [ Google Scholar ] [ CrossRef ]
  • Loggenberg, K.; Strever, A.; Greyling, B.; Poona, N. Modelling Water Stress in a Shiraz Vineyard Using Hyperspectral Imaging and Machine Learning. Remote Sens. 2018 , 10 , 202. [ Google Scholar ] [ CrossRef ]
  • Kardani, N.; Zhou, A.; Nazem, M.; Lin, X. Modelling of Municipal Solid Waste Gasification Using an Optimised Ensemble Soft Computing Model. Fuel 2021 , 289 , 119903. [ Google Scholar ] [ CrossRef ]
  • Ogawa, H.; Dickson, M.E.; Kench, P.S. Generalised Observations of Wave Characteristics on Near-Horizontal Shore Platforms: Synthesis of Six Case Studies from the North Island, New Zealand. N. Z. Geog. 2016 , 72 , 107–121. [ Google Scholar ] [ CrossRef ]
  • Booij, N.; Ris, R.C.; Holthuijsen, L.H. A Third-Generation Wave Model for Coastal Regions 1. Model Description and Validation. J. Geophys. Res. Ocean. 1999 , 104 , 7649–7666. [ Google Scholar ] [ CrossRef ]
  • Barton, N.; Shen, B.; Bar, N. Limited Heights of Vertical Cliffs and Mountain Walls Linked to Fracturing in Deep Tunnels—Q-Slope Application If Jointed Slopes. In Proceedings of the ISRM VIII Brazilian Symposium on Rock Mechanics SBMR 2018, Salvador, Brasil, 28 August–1 September 2018. [ Google Scholar ]
  • Barton, N.; Shen, B. Extension Strain and Rock Strength Limits for Deep Tunnels, Cliffs, Mountain Walls and the Highest Mountains. Rock Mech. Rock Eng. 2018 , 51 , 3945–3962. [ Google Scholar ] [ CrossRef ]
  • Quinn, J.D.; Rosser, N.J.; Murphy, W.; Lawrence, J.A. Identifying the Behavioural Characteristics of Clay Cliffs Using Intensive Monitoring and Geotechnical Numerical Modelling. Geomorphology 2010 , 120 , 107–122. [ Google Scholar ] [ CrossRef ]
  • Styles, T.D.; Coggan, J.S.; Pine, R.J. Back Analysis of the Joss Bay Chalk Cliff Failure Using Numerical Modelling. Eng. Geol. 2011 , 120 , 81–90. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Lithologyγ (KN/m )GSImiDUCS (MPa)
Argille Azzurre Fm.
(228–265)
24357025
Faulted rocks
(266–276)
232540.620
Argille Azzurre Fm.
(277–290)
24357025
Argille Azzurre Fm.
(291–310)
24457030
Orizzonte del Trave 255017050
SectorTransectsGSIUCS (MPa)
Portonovo1–795035
Mezzavalle80–18405
185–21501
Trave216–2274530
228–2653525
266–2762520
277–2903525
291–3104530
PeriodsPortonovoMezzavalleTrave
EPR (m/yr)ECI (m)EPR (m/yr)ECI (m)EPR (m/yr)ECI (m)
1978–2022−0.240.09−0.090.09−0.250.09
SectionFSCliff Height (m)
12.0437
21.8642
31.5250
41.1273
51.2180
60.91120
71.1497
80.9860
92.2650
104.3725
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Fullin, N.; Fraccaroli, M.; Francioni, M.; Fabbri, S.; Ballaera, A.; Ciavola, P.; Ghirotti, M. Detection of Cliff Top Erosion Drivers through Machine Learning Algorithms between Portonovo and Trave Cliffs (Ancona, Italy). Remote Sens. 2024 , 16 , 2604. https://doi.org/10.3390/rs16142604

Fullin N, Fraccaroli M, Francioni M, Fabbri S, Ballaera A, Ciavola P, Ghirotti M. Detection of Cliff Top Erosion Drivers through Machine Learning Algorithms between Portonovo and Trave Cliffs (Ancona, Italy). Remote Sensing . 2024; 16(14):2604. https://doi.org/10.3390/rs16142604

Fullin, Nicola, Michele Fraccaroli, Mirko Francioni, Stefano Fabbri, Angelo Ballaera, Paolo Ciavola, and Monica Ghirotti. 2024. "Detection of Cliff Top Erosion Drivers through Machine Learning Algorithms between Portonovo and Trave Cliffs (Ancona, Italy)" Remote Sensing 16, no. 14: 2604. https://doi.org/10.3390/rs16142604

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Theory and Practice in Language Studies

Rhetorical Move-Step Analysis of Argumentative Essays by Chinese EFL Undergraduate Students

  • Hongjian Liu Universiti Putra Malaysia
  • Lilliati Ismail Universiti Putra Malaysia
  • Norhakimah Khaiessa Ahmad Universiti Putra Malaysia

Rhetorical move-step analysis, an analytical approach within discourse analysis, is commonly employed to scrutinize the rhetorical structures inherent in various community genre practices. This method has also been extensively applied in academic and professional writing, particularly in published research articles and doctoral dissertations. However, little research has investigated the rhetorical move-step structures evident in argumentative writing by Chinese undergraduate students. Therefore, this study explores the rhetorical move-step structure of argumentative essays in Chinese EFL contexts. A corpus comprising 30 argumentative essays authored by undergraduate students at a Chinese university was assembled for analysis. The move-step structure of the data was analyzed using Hyland’s (1990) analytical framework. The results indicated that most students utilized Hyland’s model in crafting their argumentative essays. Additionally, the findings revealed that the argumentative essays by Chinese undergraduates adhered to a structure consisting of five obligatory moves, three conventional moves, one optional move, and multiple obligatory, conventional, and optional steps beyond the established analytical framework. These findings’ implications extend to pedagogical practices and further research in the domain of EFL students’ academic writing.

Author Biographies

Hongjian liu, universiti putra malaysia.

Faculty of Educational Studies

Lilliati Ismail, Universiti Putra Malaysia

Norhakimah khaiessa ahmad, universiti putra malaysia.

Ädel, A. (2023). Adopting a move rather than a marker approach to metadiscourse: A taxonomy for spoken student presentations. English for Specific Purposes, 69, 4–18. https://doi.org/10.1016/j.esp.2022.09.001

Alharbi, N. (2023). A rhetorical structural analysis of introductions in L2 Saudi students’ argumentative essays. Journal of Educational and Social Research, 13(4), 94. https://doi.org/10.36941/jesr-2023-0093

Alyousef, H. S. (2021). Structure of research article abstracts in political science: A genre-based study. SAGE Open, 11(3), 1–8. https://doi.org/10.1177/21582440211040797

Basturkmen, H. (2012). A genre-based investigation of discussion sections of research articles in Dentistry and disciplinary variation. Journal of English for Academic Purposes, 11(2), 134–144. https://doi.org/10.1016/j.jeap.2011.10.004

Bhatia, V. (1993). Analyzing genre: Language use in professional settings. Longman.

Biber, D., Connor, U., & Upton, T. A. (2007). Discourse on the move: Using corpus analysis to describe discourse structure. John Benjamins.

Chang, T. (2023). A move analysis of Chinese L2 student essays from the sociocognitive perspective: Genres, languages, and writing quality. Assessing Writing, 57, 100750. https://doi.org/10.1016/j.asw.2023.100750

Cheng, A. (2015). Genre analysis as a pre-instructional, instructional, and teacher development framework. Journal of English for Academic Purposes, 19, 125–136. https://doi.org/10.1016/j.jeap.2015.04.004

Cotos, E., Huffman, S., & Link, S. (2017). A move/step model for methods sections: Demonstrating rigor and credibility. English for Specific Purposes, 46, 90–106. https://doi.org/10.1016/j.esp.2017.01.001

Dong, J., & Lu, X. (2020). Promoting discipline-specific genre competence with corpus-based genre analysis activities. English for Specific Purposes, 58, 138–154. https://doi.org/10.1016/j.esp.2020.01.005

Friginal, E. & S. S. Mustafa. (2017). A comparison of U.S.-based and Iraqi English research article abstracts using corpora. Journal of English for Academic Purposes, 25, 45–57. https://doi.org/10.1016/j.jeap.2016.11.004

Hu, G. & Liu, Y. (2018). Three minute thesis presentations: A cross-disciplinary study of genre moves. Journal of English for Academic Purposes, 35, 16–30. https://doi.org/10.1016/j.jeap.2018.06.004

Hyland, K. (1990). A genre description of the argumentative essay. RELC Journal, 21(1), 66–78. https://doi.org/10.1177/003368829002100105

Hyland, K. (2004). Graduates’ gratitude: The generic structure of dissertation acknowledgements. English for Specific Purposes, 23(3), 303–324. https://doi.org/10.1016/S0889-4906(03)00051-6

Kanestion, A., & Sarjit, M. K. S. (2021). A corpus-based investigation of moves in the argument stage of argumentative essays. EPRA International Journal of Multidisciplinary Research (IJMR), 7(9), 199–204. https://doi.org/10.36713/epra8475

Kanoksilapatham, B. (2007). Rhetorical moves in biochemistry research articles. In D. Biber, U. Connor, & T. A. Upton (Eds.), Discourse on the move: Using corpus analysis to describe discourse structure (pp. 73-120). John Benjamins. https://doi.org/10.1075/scl.28.06kan

Kanoksilapatham, B. (2015). Distinguishing textual features characterizing structural variation in research articles across three engineering sub-discipline corpora. English for Specific Purposes, 37, 74–86. http://dx.doi.org/10.1016/j.esp.2014.06.008

Kessler, M. (2020). A text analysis and gatekeepers’ perspectives of a promotional genre: Understanding the rhetoric of Fulbright grant statements. English for Specific Purposes, 60, 182–192. https://doi.org/10.1016/j.esp.2020.07.003

Kessler, M. & Polio, C. (2023). Conducting genre-based research in applied linguistics: A methodological guide. Routledge.

Khany, R., & Malmir, B. (2020). A move-marker list: A study of rhetorical move-lexis linguistic realizations of research article abstracts in social and behavioral sciences. RELC Journal, 51(3), 381–396. https://doi.org/10.1177/0033688219833131

Lim, J. M. (2006). Method sections of management research articles: A pedagogically motivated qualitative study. English for Specific Purposes, 25, 282–309. https://doi.org/10.1016/j.esp.2005.07.001

Liu, D. (2015). Moves and wrap-up sentences in Chinese students’ essay conclusions. SAGE Open, 5(2), 1–9. http://dx.doi.org/10.1177/2158244015592681

Loi, C. K. (2010). Research article’s introductions in Chinese and English: A comparative genre-based study. Journal of English for Academic Purposes, 9(4), 267–279. https://doi.org/10.1016/j.jeap.2010.09.004

Neupane Bastola, M., & Ho, V. (2023). Rhetorical structure of literature review chapters in Nepalese PhD dissertations: Students’ engagement with previous scholarship. Journal of English for Academic Purposes, 65, 101271. https://doi.org/10.1016/j.jeap.2023.101271

Parkinson, J. (2017). The student laboratory report genre: A genre analysis. English for Specific Purposes, 45, 1–13. https://doi.org/10.1016/j.esp.2021.03.006

Park, S., Jeon, J., & Shim, E. (2021). Exploring request emails in English for business purposes: A move analysis. English for Specific Purposes, 63, 137–150. https://doi.org/10.1016/j.esp.2021.03.006

Rau, G., & Shih, Y. (2021). Evaluation of Cohen’s kappa and other measures of inter-rater agreement for genre analysis and other nominal data. Journal of English for Academic Purposes, 53, 101026. https://doi.org/10.1016.j.jeap.2021.101026

Schneer, D. (2014). Rethinking the argumentative essay. TESOL Journal, 5, 619–653. https://doi.org/10.1002/tesj.123

Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge University Press.

Swales, J. M. (2004). Research genres: Exploration and applications. Cambridge University Press.

Tardy, C. M., Caplan, N. A., & Johns. A. M. (2023). Genre explained: Frequently asked questions and answers about genre-based instruction. University of Michigan Press.

Toulmin, S. (2003). The uses of argument. Cambridge University Press.

Van Herck, R. Decock, S., & Fastrich, B. (2022). A unique blend of interpersonal and transactional strategies in English email responses to customer complaints in a B2C setting: A move analysis. English for Specific Purposes, 65, 30–48. https://doi.org/10.1016/j.esp.2021.08.001

Wang, Y. (2023). Demystifying academic promotional genre: A rhetorical move-step analysis of Teaching Philosophy Statements (TPSs). Journal of English for Academic Purposes, 65, 101284. https://doi.org/10.1016/j.jeap.2023.101284

Wingate, U. (2012), ‘Argument!’ Helping students understand what essay writing is about. Journal of English for Academic Purposes, 11, 145–154. https://doi.org/10.1016/j.jeap.2011.11.001

Wood, N. V. (2001). Perspectives on argument. Prentice Hall.

Yang, R., & Allison, D. (2003). Research articles in applied linguistics: Moving from results to conclusions. English for Specific Purposes, 22(4), 365–385. https://doi.org/10.1016/S0889-4906(02)00026-1

Ye, Y. (2019). Macrostructures and rhetorical moves in energy engineering research articles written by Chinese expert writers. Journal of English for Academic Purposes, 38, 48–61. https://doi.org/10.1016/j.jeap.2019.01.007

Zhang, Y., & Cui, J. (2023). The relationship between syntactic complexity and rhetorical stages in L2 learners’ texts: A comparative analysis. English for Specific Purposes, 72, 51–64. https://doi.org/10.1016/j.esp.2023.08.003

Copyright © 2015-2024 ACADEMY PUBLICATION — All Rights Reserved

More information about the publishing system, Platform and Workflow by OJS/PKP.

IMAGES

  1. Data Analysis Plan for Quantitative Research Analysis

    data analysis in research steps

  2. What is Data Analysis in Research

    data analysis in research steps

  3. Data analysis in research: Why data, types of data, data analysis in qualitative and

    data analysis in research steps

  4. What is Big Data Analytics and Why it is so Important?

    data analysis in research steps

  5. Types Of Data Analysis In Research Methodology Ppt

    data analysis in research steps

  6. How-To: Data Analytics for Beginners

    data analysis in research steps

VIDEO

  1. Analysis of Data? Some Examples to Explore

  2. Chi square and Fisher's Exact test

  3. Data Analysis in Research

  4. Upwork Introduction

  5. 6. Data Analysis

  6. DATA ANALYSIS

COMMENTS

  1. Expert Data Analysis Tutors

    Learn from the Nation's Largest Community of Data Analysis Tutors. Contact One Today! Find an Expert Data Analysis Tutor Who Suits Your Needs, Learning Style, and Budget.

  2. Easy-to-use Data Analysis Tool

    Qualitative Data Analysis Just Got Easier: Quick Start Your Data Analysis with MAXQDA. Start your Qualitative Data Analysis in Just a Few Minutes with MAXQDA. Free Trial!

  3. A Step-by-Step Guide to the Data Analysis Process

    The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the 'problem statement'. ... Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that ...

  4. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  5. Data Analysis

    The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome. ... Market research: Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective ...

  6. What is Data Analysis? An Expert Guide With Examples

    Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It is a multifaceted process involving various techniques and methodologies to interpret data from various sources in different formats, both structured and unstructured.

  7. The 6 Steps of a Data Analysis Process: Types of Data Analysis

    Step 1 of the data analysis process: Define a specific objective. The initial phase of any data analysis process is to define the specific objective of the analysis. That is, to establish what we want to achieve with the analysis. In the case of a business data analysis, our specific objective will be linked to a business goal and, as a ...

  8. What Is the Data Analysis Process? (A Complete Guide)

    The term "data analysis" can be a bit misleading, as it can seemingly imply that data analysis is a single step that's only conducted once. In actuality, data analysis is an iterative process. And while this is obvious to any experienced data analyst, it's important for aspiring data analysts, and those who are interested in a career in ...

  9. The Five Stages of The Data Analysis Process

    Step Four: Analyzing The Data. Now you're ready for the fun stuff. In this step, you'll begin to make sense of your data to extract meaningful insights. There are many different data analysis techniques and processes that you can use. Let's explore the steps in a standard data analysis. Data Analysis Steps & Techniques 1. Exploratory ...

  10. Six Steps of Data Analysis Process

    Collect Data. Data Cleaning. Analyzing the Data. Data Visualization. Presenting Data. Each step has its own process and tools to make overall conclusions based on the data. 1. Define the Problem or Research Question. In the first step of process the data analyst is given a problem/business task.

  11. PDF A Step-by-Step Guide to Qualitative Data Analysis

    Step 1: Organizing the Data. "Valid analysis is immensely aided by data displays that are focused enough to permit viewing of a full data set in one location and are systematically arranged to answer the research question at hand." (Huberman and Miles, 1994, p. 432) The best way to organize your data is to go back to your interview guide.

  12. What Is Data Analysis? (With Examples)

    Written by Coursera Staff • Updated on Apr 19, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...

  13. What is Data Analysis? (Types, Methods, and Tools)

    December 17, 2023. Data analysis is the process of cleaning, transforming, and interpreting data to uncover insights, patterns, and trends. It plays a crucial role in decision making, problem solving, and driving innovation across various domains. In addition to further exploring the role data analysis plays this blog post will discuss common ...

  14. A Really Simple Guide to Quantitative Data Analysis

    Data analysis is important as it paves way to drawing conclusions of a research study. ... Dissertation Research Methods: A Step-by-Step Guide to Writing Up Your Research in the Social Sciences ...

  15. The Beginner's Guide to Statistical Analysis

    Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypothesesand plan out your research design. Writing statistical hypotheses. The goal of research is often to investigate a relationship between variables within a population.

  16. 5 Steps of Data Analysis

    These include Infogram, DataBox, Data wrapper, GoogleCharts, Chartblocks and Tableau. Steps of Data Analysis. Below are 5 data analysis steps which can be implemented in the data analysis process by the data analyst. Step 1 - Determining the objective. The initial step is ofcourse to determine our objective, which can also be termed as a ...

  17. Data Analysis for Qualitative Research: 6 Step Guide

    How to analyze qualitative data from an interview. To analyze qualitative data from an interview, follow the same 6 steps for quantitative data analysis: Perform the interviews. Transcribe the interviews onto paper. Decide whether to either code analytical data (open, axial, selective), analyze word frequencies, or both.

  18. Learning to Do Qualitative Data Analysis: A Starting Point

    On the basis of Rocco (2010), Storberg-Walker's (2012) amended list on qualitative data analysis in research papers included the following: (a) the article should provide enough details so that reviewers could follow the same analytical steps; (b) the analysis process selected should be logically connected to the purpose of the study; and (c ...

  19. Data Analysis Process: Key Steps and Techniques to Use

    Data analysis step 4: Analyze data. One of the last steps in the data analysis process is analyzing and manipulating the data, which can be done in various ways. One way is through data mining, which is defined as "knowledge discovery within databases". Data mining techniques like clustering analysis, anomaly detection, association rule ...

  20. Qualitative Data Analysis: Step-by-Step Guide (Manual vs ...

    Step 1: Gather your qualitative data and conduct research (Conduct qualitative research) The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

  21. PDF The SAGE Handbook of Qualitative Data Analysis

    The SAGE Handbook of. tive Data AnalysisUwe FlickMapping the FieldData analys. s is the central step in qualitative research. Whatever the data are, it is their analysis that, in a de. isive way, forms the outcomes of the research. Sometimes, data collection is limited to recording and docu-menting naturally occurring ph.

  22. How to Analyze Qualitative Data?

    Step 5: Query your coded data and write up the analysis. Once you have coded your data, it is time to take the analysis a step further. When using software for qualitative data analysis, it is easy to compare and contrast subsets in your data, such as groups of participants or sets of themes.

  23. How to do a thematic analysis [6 steps]

    Generating themes. Reviewing themes. Defining and naming themes. Creating the report. It is important to note that even though the six steps are listed in sequence, thematic analysis is not necessarily a linear process that advances forward in a one-way, predictable fashion from step one through step six.

  24. A Step-by-Step Process of Thematic Analysis to Develop a Conceptual

    Thematic analysis is a research method used to identify and interpret patterns or themes in a data set; it often leads to new insights and understanding (Boyatzis, 1998; Elliott, 2018; Thomas, 2006).However, it is critical that researchers avoid letting their own preconceptions interfere with the identification of key themes (Morse & Mitcham, 2002; Patton, 2015).

  25. Steps in Secondary Data Analysis

    Steps in Secondary Data Analysis. Stepping Your Way through Effective Secondary Data Analysis. Determine your research question - As indicated above, knowing exactly what you are looking for. Locating data - Knowing what is out there and whether you can gain access to it. A quick Internet search, possibly with the help of a librarian, will ...

  26. How to Analyse Survey Data: Best practices, Tips and Tools

    Survey data analysis is the process of turning the raw material of your survey data into insights. ... Cross-tabulation is a valuable step in sifting through your data and uncovering its meaning. When you cross-tabulate, you're breaking out your data according to the sub-groups within your research population or your sample, and comparing the ...

  27. What is Qualitative Data? Types, Examples & Analysis

    Qualitative data in research. Qualitative data research methods allow analysts to use contextual information to create theories and models. These open- and closed-ended questions can be helpful to understand the reasoning behind motivations, frustrations, and actions—in any type of case. Some examples of qualitative data collection in research:

  28. Closing the gender data gap in healthcare

    Data analysis: Improving the metrics of epidemiological studies. ... "The prevalence of depression symptoms among infertile women: A systematic review and meta-analysis," Fertility Research and Practice, March 2021, Volume 7; ... Steps to close the data gaps in women's health.

  29. Research Misconduct

    Research integrity includes: the use of honest and verifiable methods in proposing, performing, and evaluating research reporting research results with particular attention to adherence to rules, regulations, and guidelines following commonly accepted professional codes or norms; Shared values in scientific research

  30. Detection of Cliff Top Erosion Drivers through Machine Learning ...

    Rocky coastlines are characterised by steep cliffs, which frequently experience a variety of natural processes that often exhibit intricate interdependencies, such as rainfall, ice and water run-off, and marine actions. The advent of high temporal and spatial resolution data, that can be acquired through remote sensing and geomatics techniques, has facilitated the safe exploration of otherwise ...

  31. Rhetorical Move-Step Analysis of Argumentative Essays by Chinese EFL

    Rhetorical move-step analysis, an analytical approach within discourse analysis, is commonly employed to scrutinize the rhetorical structures inherent in various community genre practices. This method has also been extensively applied in academic and professional writing, particularly in published research articles and doctoral dissertations.