deep learning research papers for beginners

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review Article
Published: 18 August 2021
Volume 2 , article number 420 , ( 2021 )

Cite this article

deep learning research papers for beginners

Iqbal H. Sarker ORCID: orcid.org/0000-0003-1740-5517 1 , 2

195k Accesses

674 Citations

24 Altmetric

Explore all metrics

Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various application areas like healthcare, visual recognition, text analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding turns DL methods into black-box machines that hamper development at the standard level. This article presents a structured and comprehensive view on DL techniques including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our taxonomy, we take into account deep networks for supervised or discriminative learning , unsupervised or generative learning as well as hybrid learning and relevant others. We also summarize real-world application areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future generation DL modeling with research directions . Overall, this article aims to draw a big picture on DL modeling that can be used as a reference guide for both academia and industry professionals.

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine learning and deep learning

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Avoid common mistakes on your manuscript.

Introduction

In the late 1980s, neural networks became a prevalent topic in the area of Machine Learning (ML) as well as Artificial Intelligence (AI), due to the invention of various efficient learning methods and network structures [ 52 ]. Multilayer perceptron networks trained by “Backpropagation” type algorithms, self-organizing maps, and radial basis function networks were such innovative methods [ 26 , 36 , 37 ]. While neural networks are successfully used in many applications, the interest in researching this topic decreased later on. After that, in 2006, “Deep Learning” (DL) was introduced by Hinton et al. [ 41 ], which was based on the concept of artificial neural network (ANN). Deep learning became a prominent topic after that, resulting in a rebirth in neural network research, hence, some times referred to as “new-generation neural networks”. This is because deep networks, when properly trained, have produced significant success in a variety of classification and regression challenges [ 52 ].

Nowadays, DL technology is considered as one of the hot topics within the area of machine learning, artificial intelligence as well as data science and analytics, due to its learning capabilities from the given data. Many corporations including Google, Microsoft, Nokia, etc., study it actively as it can provide significant results in different classification and regression problems and datasets [ 52 ]. In terms of working domain, DL is considered as a subset of ML and AI, and thus DL can be seen as an AI function that mimics the human brain’s processing of data. The worldwide popularity of “Deep learning” is increasing day by day, which is shown in our earlier paper [ 96 ] based on the historical data collected from Google trends [ 33 ]. Deep learning differs from standard machine learning in terms of efficiency as the volume of data increases, discussed briefly in Section “ Why Deep Learning in Today's Research and Applications? ”. DL technology uses multiple layers to represent the abstractions of data to build computational models. While deep learning takes a long time to train a model due to a large number of parameters, it takes a short amount of time to run during testing as compared to other machine learning algorithms [ 127 ].

While today’s Fourth Industrial Revolution (4IR or Industry 4.0) is typically focusing on technology-driven “automation, smart and intelligent systems”, DL technology, which is originated from ANN, has become one of the core technologies to achieve the goal [ 103 , 114 ]. A typical neural network is mainly composed of many simple, connected processing elements or processors called neurons, each of which generates a series of real-valued activations for the target outcome. Figure 1 shows a schematic representation of the mathematical model of an artificial neuron, i.e., processing element, highlighting input ( \(X_i\) ), weight ( w ), bias ( b ), summation function ( \(\sum\) ), activation function ( f ) and corresponding output signal ( y ). Neural network-based DL technology is now widely applied in many fields and research areas such as healthcare, sentiment analysis, natural language processing, visual recognition, business intelligence, cybersecurity, and many more that have been summarized in the latter part of this paper.

Schematic representation of the mathematical model of an artificial neuron (processing element), highlighting input ( \(X_i\) ), weight ( w ), bias ( b ), summation function ( \(\sum\) ), activation function ( f ) and output signal ( y )

Although DL models are successfully applied in various application areas, mentioned above, building an appropriate model of deep learning is a challenging task, due to the dynamic nature and variations of real-world problems and data. Moreover, DL models are typically considered as “black-box” machines that hamper the standard development of deep learning research and applications. Thus for clear understanding, in this paper, we present a structured and comprehensive view on DL techniques considering the variations in real-world problems and tasks. To achieve our goal, we briefly discuss various DL techniques and present a taxonomy by taking into account three major categories: (i) deep networks for supervised or discriminative learning that is utilized to provide a discriminative function in supervised deep learning or classification applications; (ii) deep networks for unsupervised or generative learning that are used to characterize the high-order correlation properties or features for pattern analysis or synthesis, thus can be used as preprocessing for the supervised algorithm; and (ii) deep networks for hybrid learning that is an integration of both supervised and unsupervised model and relevant others. We take into account such categories based on the nature and learning capabilities of different DL techniques and how they are used to solve problems in real-world applications [ 97 ]. Moreover, identifying key research issues and prospects including effective data representation, new algorithm design, data-driven hyper-parameter learning, and model optimization, integrating domain knowledge, adapting resource-constrained devices, etc. is one of the key targets of this study, which can lead to “Future Generation DL-Modeling”. Thus the goal of this paper is set to assist those in academia and industry as a reference guide, who want to research and develop data-driven smart and intelligent systems based on DL techniques.

The overall contribution of this paper is summarized as follows:

This article focuses on different aspects of deep learning modeling, i.e., the learning capabilities of DL techniques in different dimensions such as supervised or unsupervised tasks, to function in an automated and intelligent manner, which can play as a core technology of today’s Fourth Industrial Revolution (Industry 4.0).

We explore a variety of prominent DL techniques and present a taxonomy by taking into account the variations in deep learning tasks and how they are used for different purposes. In our taxonomy, we divide the techniques into three major categories such as deep networks for supervised or discriminative learning, unsupervised or generative learning, as well as deep networks for hybrid learning, and relevant others.

We have summarized several potential real-world application areas of deep learning, to assist developers as well as researchers in broadening their perspectives on DL techniques. Different categories of DL techniques highlighted in our taxonomy can be used to solve various issues accordingly.

Finally, we point out and discuss ten potential aspects with research directions for future generation DL modeling in terms of conducting future research and system development.

This paper is organized as follows. Section “ Why Deep Learning in Today's Research and Applications? ” motivates why deep learning is important to build data-driven intelligent systems. In Section“ Deep Learning Techniques and Applications ”, we present our DL taxonomy by taking into account the variations of deep learning tasks and how they are used in solving real-world issues and briefly discuss the techniques with summarizing the potential application areas. In Section “ Research Directions and Future Aspects ”, we discuss various research issues of deep learning-based modeling and highlight the promising topics for future research within the scope of our study. Finally, Section “ Concluding Remarks ” concludes this paper.

Why Deep Learning in Today’s Research and Applications?

The main focus of today’s Fourth Industrial Revolution (Industry 4.0) is typically technology-driven automation, smart and intelligent systems, in various application areas including smart healthcare, business intelligence, smart cities, cybersecurity intelligence, and many more [ 95 ]. Deep learning approaches have grown dramatically in terms of performance in a wide range of applications considering security technologies, particularly, as an excellent solution for uncovering complex architecture in high-dimensional data. Thus, DL techniques can play a key role in building intelligent data-driven systems according to today’s needs, because of their excellent learning capabilities from historical data. Consequently, DL can change the world as well as humans’ everyday life through its automation power and learning from experience. DL technology is therefore relevant to artificial intelligence [ 103 ], machine learning [ 97 ] and data science with advanced analytics [ 95 ] that are well-known areas in computer science, particularly, today’s intelligent computing. In the following, we first discuss regarding the position of deep learning in AI, or how DL technology is related to these areas of computing.

The Position of Deep Learning in AI

Nowadays, artificial intelligence (AI), machine learning (ML), and deep learning (DL) are three popular terms that are sometimes used interchangeably to describe systems or software that behaves intelligently. In Fig. 2 , we illustrate the position of deep Learning, comparing with machine learning and artificial intelligence. According to Fig. 2 , DL is a part of ML as well as a part of the broad area AI. In general, AI incorporates human behavior and intelligence to machines or systems [ 103 ], while ML is the method to learn from data or experience [ 97 ], which automates analytical model building. DL also represents learning methods from data where the computation is done through multi-layer neural networks and processing. The term “Deep” in the deep learning methodology refers to the concept of multiple levels or stages through which data is processed for building a data-driven model.

An illustration of the position of deep learning (DL), comparing with machine learning (ML) and artificial intelligence (AI)

Thus, DL can be considered as one of the core technology of AI, a frontier for artificial intelligence, which can be used for building intelligent systems and automation. More importantly, it pushes AI to a new level, termed “Smarter AI”. As DL are capable of learning from data, there is a strong relation of deep learning with “Data Science” [ 95 ] as well. Typically, data science represents the entire process of finding meaning or insights in data in a particular problem domain, where DL methods can play a key role for advanced analytics and intelligent decision-making [ 104 , 106 ]. Overall, we can conclude that DL technology is capable to change the current world, particularly, in terms of a powerful computational engine and contribute to technology-driven automation, smart and intelligent systems accordingly, and meets the goal of Industry 4.0.

Understanding Various Forms of Data

As DL models learn from data, an in-depth understanding and representation of data are important to build a data-driven intelligent system in a particular application area. In the real world, data can be in various forms, which typically can be represented as below for deep learning modeling:

Sequential Data Sequential data is any kind of data where the order matters, i,e., a set of sequences. It needs to explicitly account for the sequential nature of input data while building the model. Text streams, audio fragments, video clips, time-series data, are some examples of sequential data.

Image or 2D Data A digital image is made up of a matrix, which is a rectangular array of numbers, symbols, or expressions arranged in rows and columns in a 2D array of numbers. Matrix, pixels, voxels, and bit depth are the four essential characteristics or fundamental parameters of a digital image.

Tabular Data A tabular dataset consists primarily of rows and columns. Thus tabular datasets contain data in a columnar format as in a database table. Each column (field) must have a name and each column may only contain data of the defined type. Overall, it is a logical and systematic arrangement of data in the form of rows and columns that are based on data properties or features. Deep learning models can learn efficiently on tabular data and allow us to build data-driven intelligent systems.

The above-discussed data forms are common in the real-world application areas of deep learning. Different categories of DL techniques perform differently depending on the nature and characteristics of data, discussed briefly in Section “ Deep Learning Techniques and Applications ” with a taxonomy presentation. However, in many real-world application areas, the standard machine learning techniques, particularly, logic-rule or tree-based techniques [ 93 , 101 ] perform significantly depending on the application nature. Figure 3 also shows the performance comparison of DL and ML modeling considering the amount of data. In the following, we highlight several cases, where deep learning is useful to solve real-world problems, according to our main focus in this paper.

DL Properties and Dependencies

A DL model typically follows the same processing stages as machine learning modeling. In Fig. 4 , we have shown a deep learning workflow to solve real-world problems, which consists of three processing steps, such as data understanding and preprocessing, DL model building, and training, and validation and interpretation. However, unlike the ML modeling [ 98 , 108 ], feature extraction in the DL model is automated rather than manual. K-nearest neighbor, support vector machines, decision tree, random forest, naive Bayes, linear regression, association rules, k-means clustering, are some examples of machine learning techniques that are commonly used in various application areas [ 97 ]. On the other hand, the DL model includes convolution neural network, recurrent neural network, autoencoder, deep belief network, and many more, discussed briefly with their potential application areas in Section 3 . In the following, we discuss the key properties and dependencies of DL techniques, that are needed to take into account before started working on DL modeling for real-world applications.

An illustration of the performance comparison between deep learning (DL) and other machine learning (ML) algorithms, where DL modeling from large amounts of data can increase the performance

Data Dependencies Deep learning is typically dependent on a large amount of data to build a data-driven model for a particular problem domain. The reason is that when the data volume is small, deep learning algorithms often perform poorly [ 64 ]. In such circumstances, however, the performance of the standard machine-learning algorithms will be improved if the specified rules are used [ 64 , 107 ].

Hardware Dependencies The DL algorithms require large computational operations while training a model with large datasets. As the larger the computations, the more the advantage of a GPU over a CPU, the GPU is mostly used to optimize the operations efficiently. Thus, to work properly with the deep learning training, GPU hardware is necessary. Therefore, DL relies more on high-performance machines with GPUs than standard machine learning methods [ 19 , 127 ].

Feature Engineering Process Feature engineering is the process of extracting features (characteristics, properties, and attributes) from raw data using domain knowledge. A fundamental distinction between DL and other machine-learning techniques is the attempt to extract high-level characteristics directly from data [ 22 , 97 ]. Thus, DL decreases the time and effort required to construct a feature extractor for each problem.

Model Training and Execution time In general, training a deep learning algorithm takes a long time due to a large number of parameters in the DL algorithm; thus, the model training process takes longer. For instance, the DL models can take more than one week to complete a training session, whereas training with ML algorithms takes relatively little time, only seconds to hours [ 107 , 127 ]. During testing, deep learning algorithms take extremely little time to run [ 127 ], when compared to certain machine learning methods.

Black-box Perception and Interpretability Interpretability is an important factor when comparing DL with ML. It’s difficult to explain how a deep learning result was obtained, i.e., “black-box”. On the other hand, the machine-learning algorithms, particularly, rule-based machine learning techniques [ 97 ] provide explicit logic rules (IF-THEN) for making decisions that are easily interpretable for humans. For instance, in our earlier works, we have presented several machines learning rule-based techniques [ 100 , 102 , 105 ], where the extracted rules are human-understandable and easier to interpret, update or delete according to the target applications.

The most significant distinction between deep learning and regular machine learning is how well it performs when data grows exponentially. An illustration of the performance comparison between DL and standard ML algorithms has been shown in Fig. 3 , where DL modeling can increase the performance with the amount of data. Thus, DL modeling is extremely useful when dealing with a large amount of data because of its capacity to process vast amounts of features to build an effective data-driven model. In terms of developing and training DL models, it relies on parallelized matrix and tensor operations as well as computing gradients and optimization. Several, DL libraries and resources [ 30 ] such as PyTorch [ 82 ] (with a high-level API called Lightning) and TensorFlow [ 1 ] (which also offers Keras as a high-level API) offers these core utilities including many pre-trained models, as well as many other necessary functions for implementation and DL model building.

A typical DL workflow to solve real-world problems, which consists of three sequential stages (i) data understanding and preprocessing (ii) DL model building and training (iii) validation and interpretation

Deep Learning Techniques and Applications

In this section, we go through the various types of deep neural network techniques, which typically consider several layers of information-processing stages in hierarchical structures to learn. A typical deep neural network contains multiple hidden layers including input and output layers. Figure 5 shows a general structure of a deep neural network ( \(hidden \; layer=N\) and N \(\ge\) 2) comparing with a shallow network ( \(hidden \; layer=1\) ). We also present our taxonomy on DL techniques based on how they are used to solve various problems, in this section. However, before exploring the details of the DL techniques, it’s useful to review various types of learning tasks such as (i) Supervised: a task-driven approach that uses labeled training data, (ii) Unsupervised: a data-driven process that analyzes unlabeled datasets, (iii) Semi-supervised: a hybridization of both the supervised and unsupervised methods, and (iv) Reinforcement: an environment driven approach, discussed briefly in our earlier paper [ 97 ]. Thus, to present our taxonomy, we divide DL techniques broadly into three major categories: (i) deep networks for supervised or discriminative learning; (ii) deep networks for unsupervised or generative learning; and (ii) deep networks for hybrid learning combing both and relevant others, as shown in Fig. 6 . In the following, we briefly discuss each of these techniques that can be used to solve real-world problems in various application areas according to their learning capabilities.

A general architecture of a a shallow network with one hidden layer and b a deep neural network with multiple hidden layers

A taxonomy of DL techniques, broadly divided into three major categories (i) deep networks for supervised or discriminative learning, (ii) deep networks for unsupervised or generative learning, and (ii) deep networks for hybrid learning and relevant others

Deep Networks for Supervised or Discriminative Learning

This category of DL techniques is utilized to provide a discriminative function in supervised or classification applications. Discriminative deep architectures are typically designed to give discriminative power for pattern classification by describing the posterior distributions of classes conditioned on visible data [ 21 ]. Discriminative architectures mainly include Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN or ConvNet), Recurrent Neural Networks (RNN), along with their variants. In the following, we briefly discuss these techniques.

Multi-layer Perceptron (MLP)

Multi-layer Perceptron (MLP), a supervised learning approach [ 83 ], is a type of feedforward artificial neural network (ANN). It is also known as the foundation architecture of deep neural networks (DNN) or deep learning. A typical MLP is a fully connected network that consists of an input layer that receives input data, an output layer that makes a decision or prediction about the input signal, and one or more hidden layers between these two that are considered as the network’s computational engine [ 36 , 103 ]. The output of an MLP network is determined using a variety of activation functions, also known as transfer functions, such as ReLU (Rectified Linear Unit), Tanh, Sigmoid, and Softmax [ 83 , 96 ]. To train MLP employs the most extensively used algorithm “Backpropagation” [ 36 ], a supervised learning technique, which is also known as the most basic building block of a neural network. During the training process, various optimization approaches such as Stochastic Gradient Descent (SGD), Limited Memory BFGS (L-BFGS), and Adaptive Moment Estimation (Adam) are applied. MLP requires tuning of several hyperparameters such as the number of hidden layers, neurons, and iterations, which could make solving a complicated model computationally expensive. However, through partial fit, MLP offers the advantage of learning non-linear models in real-time or online [ 83 ].

Convolutional Neural Network (CNN or ConvNet)

The Convolutional Neural Network (CNN or ConvNet) [ 65 ] is a popular discriminative deep learning architecture that learns directly from the input without the need for human feature extraction. Figure 7 shows an example of a CNN including multiple convolutions and pooling layers. As a result, the CNN enhances the design of traditional ANN like regularized MLP networks. Each layer in CNN takes into account optimum parameters for a meaningful output as well as reduces model complexity. CNN also uses a ‘dropout’ [ 30 ] that can deal with the problem of over-fitting, which may occur in a traditional network.

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers

CNNs are specifically intended to deal with a variety of 2D shapes and are thus widely employed in visual recognition, medical image analysis, image segmentation, natural language processing, and many more [ 65 , 96 ]. The capability of automatically discovering essential features from the input without the need for human intervention makes it more powerful than a traditional network. Several variants of CNN are exist in the area that includes visual geometry group (VGG) [ 38 ], AlexNet [ 62 ], Xception [ 17 ], Inception [ 116 ], ResNet [ 39 ], etc. that can be used in various application domains according to their learning capabilities.

Recurrent Neural Network (RNN) and its Variants

A Recurrent Neural Network (RNN) is another popular neural network, which employs sequential or time-series data and feeds the output from the previous step as input to the current stage [ 27 , 74 ]. Like feedforward and CNN, recurrent networks learn from training input, however, distinguish by their “memory”, which allows them to impact current input and output through using information from previous inputs. Unlike typical DNN, which assumes that inputs and outputs are independent of one another, the output of RNN is reliant on prior elements within the sequence. However, standard recurrent networks have the issue of vanishing gradients, which makes learning long data sequences challenging. In the following, we discuss several popular variants of the recurrent network that minimizes the issues and perform well in many real-world application domains.

Long short-term memory (LSTM) This is a popular form of RNN architecture that uses special units to deal with the vanishing gradient problem, which was introduced by Hochreiter et al. [ 42 ]. A memory cell in an LSTM unit can store data for long periods and the flow of information into and out of the cell is managed by three gates. For instance, the ‘Forget Gate’ determines what information from the previous state cell will be memorized and what information will be removed that is no longer useful, while the ‘Input Gate’ determines which information should enter the cell state and the ‘Output Gate’ determines and controls the outputs. As it solves the issues of training a recurrent network, the LSTM network is considered one of the most successful RNN.

Bidirectional RNN/LSTM Bidirectional RNNs connect two hidden layers that run in opposite directions to a single output, allowing them to accept data from both the past and future. Bidirectional RNNs, unlike traditional recurrent networks, are trained to predict both positive and negative time directions at the same time. A Bidirectional LSTM, often known as a BiLSTM, is an extension of the standard LSTM that can increase model performance on sequence classification issues [ 113 ]. It is a sequence processing model comprising of two LSTMs: one takes the input forward and the other takes it backward. Bidirectional LSTM in particular is a popular choice in natural language processing tasks.

Gated recurrent units (GRUs) A Gated Recurrent Unit (GRU) is another popular variant of the recurrent network that uses gating methods to control and manage information flow between cells in the neural network, introduced by Cho et al. [ 16 ]. The GRU is like an LSTM, however, has fewer parameters, as it has a reset gate and an update gate but lacks the output gate, as shown in Fig. 8 . Thus, the key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). The GRU’s structure enables it to capture dependencies from large sequences of data in an adaptive manner, without discarding information from earlier parts of the sequence. Thus GRU is a slightly more streamlined variant that often offers comparable performance and is significantly faster to compute [ 18 ]. Although GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets [ 18 , 34 ], both variants of RNN have proven their effectiveness while producing the outcome.

Basic structure of a gated recurrent unit (GRU) cell consisting of reset and update gates

Overall, the basic property of a recurrent network is that it has at least one feedback connection, which enables activations to loop. This allows the networks to do temporal processing and sequence learning, such as sequence recognition or reproduction, temporal association or prediction, etc. Following are some popular application areas of recurrent networks such as prediction problems, machine translation, natural language processing, text summarization, speech recognition, and many more.

Deep Networks for Generative or Unsupervised Learning

This category of DL techniques is typically used to characterize the high-order correlation properties or features for pattern analysis or synthesis, as well as the joint statistical distributions of the visible data and their associated classes [ 21 ]. The key idea of generative deep architectures is that during the learning process, precise supervisory information such as target class labels is not of concern. As a result, the methods under this category are essentially applied for unsupervised learning as the methods are typically used for feature learning or data generating and representation [ 20 , 21 ]. Thus generative modeling can be used as preprocessing for the supervised learning tasks as well, which ensures the discriminative model accuracy. Commonly used deep neural network techniques for unsupervised or generative learning are Generative Adversarial Network (GAN), Autoencoder (AE), Restricted Boltzmann Machine (RBM), Self-Organizing Map (SOM), and Deep Belief Network (DBN) along with their variants.

Generative Adversarial Network (GAN)

A Generative Adversarial Network (GAN), designed by Ian Goodfellow [ 32 ], is a type of neural network architecture for generative modeling to create new plausible samples on demand. It involves automatically discovering and learning regularities or patterns in input data so that the model may be used to generate or output new examples from the original dataset. As shown in Fig. 9 , GANs are composed of two neural networks, a generator G that creates new data having properties similar to the original data, and a discriminator D that predicts the likelihood of a subsequent sample being drawn from actual data rather than data provided by the generator. Thus in GAN modeling, both the generator and discriminator are trained to compete with each other. While the generator tries to fool and confuse the discriminator by creating more realistic data, the discriminator tries to distinguish the genuine data from the fake data generated by G .

Schematic structure of a standard generative adversarial network (GAN)

Generally, GAN network deployment is designed for unsupervised learning tasks, but it has also proven to be a better solution for semi-supervised and reinforcement learning as well depending on the task [ 3 ]. GANs are also used in state-of-the-art transfer learning research to enforce the alignment of the latent feature space [ 66 ]. Inverse models, such as Bidirectional GAN (BiGAN) [ 25 ] can also learn a mapping from data to the latent space, similar to how the standard GAN model learns a mapping from a latent space to the data distribution. The potential application areas of GAN networks are healthcare, image analysis, data augmentation, video generation, voice generation, pandemics, traffic control, cybersecurity, and many more, which are increasing rapidly. Overall, GANs have established themselves as a comprehensive domain of independent data expansion and as a solution to problems requiring a generative solution.

Auto-Encoder (AE) and Its Variants

An auto-encoder (AE) [ 31 ] is a popular unsupervised learning technique in which neural networks are used to learn representations. Typically, auto-encoders are used to work with high-dimensional data, and dimensionality reduction explains how a set of data is represented. Encoder, code, and decoder are the three parts of an autoencoder. The encoder compresses the input and generates the code, which the decoder subsequently uses to reconstruct the input. The AEs have recently been used to learn generative data models [ 69 ]. The auto-encoder is widely used in many unsupervised learning tasks, e.g., dimensionality reduction, feature extraction, efficient coding, generative modeling, denoising, anomaly or outlier detection, etc. [ 31 , 132 ]. Principal component analysis (PCA) [ 99 ], which is also used to reduce the dimensionality of huge data sets, is essentially similar to a single-layered AE with a linear activation function. Regularized autoencoders such as sparse, denoising, and contractive are useful for learning representations for later classification tasks [ 119 ], while variational autoencoders can be used as generative models [ 56 ], discussed below.

Sparse Autoencoder (SAE) A sparse autoencoder [ 73 ] has a sparsity penalty on the coding layer as a part of its training requirement. SAEs may have more hidden units than inputs, but only a small number of hidden units are permitted to be active at the same time, resulting in a sparse model. Figure 10 shows a schematic structure of a sparse autoencoder with several active units in the hidden layer. This model is thus obliged to respond to the unique statistical features of the training data following its constraints.

Denoising Autoencoder (DAE) A denoising autoencoder is a variant on the basic autoencoder that attempts to improve representation (to extract useful features) by altering the reconstruction criterion, and thus reduces the risk of learning the identity function [ 31 , 119 ]. In other words, it receives a corrupted data point as input and is trained to recover the original undistorted input as its output through minimizing the average reconstruction error over the training data, i.e, cleaning the corrupted input, or denoising. Thus, in the context of computing, DAEs can be considered as very powerful filters that can be utilized for automatic pre-processing. A denoising autoencoder, for example, could be used to automatically pre-process an image, thereby boosting its quality for recognition accuracy.

Contractive Autoencoder (CAE) The idea behind a contractive autoencoder, proposed by Rifai et al. [ 90 ], is to make the autoencoders robust of small changes in the training dataset. In its objective function, a CAE includes an explicit regularizer that forces the model to learn an encoding that is robust to small changes in input values. As a result, the learned representation’s sensitivity to the training input is reduced. While DAEs encourage the robustness of reconstruction as discussed above, CAEs encourage the robustness of representation.

Variational Autoencoder (VAE) A variational autoencoder [ 55 ] has a fundamentally unique property that distinguishes it from the classical autoencoder discussed above, which makes this so effective for generative modeling. VAEs, unlike the traditional autoencoders which map the input onto a latent vector, map the input data into the parameters of a probability distribution, such as the mean and variance of a Gaussian distribution. A VAE assumes that the source data has an underlying probability distribution and then tries to discover the distribution’s parameters. Although this approach was initially designed for unsupervised learning, its use has been demonstrated in other domains such as semi-supervised learning [ 128 ] and supervised learning [ 51 ].

Schematic structure of a sparse autoencoder (SAE) with several active units (filled circle) in the hidden layer

Although, the earlier concept of AE was typically for dimensionality reduction or feature learning mentioned above, recently, AEs have been brought to the forefront of generative modeling, even the generative adversarial network is one of the popular methods in the area. The AEs have been effectively employed in a variety of domains, including healthcare, computer vision, speech recognition, cybersecurity, natural language processing, and many more. Overall, we can conclude that auto-encoder and its variants can play a significant role as unsupervised feature learning with neural network architecture.

Kohonen Map or Self-Organizing Map (SOM)

A Self-Organizing Map (SOM) or Kohonen Map [ 59 ] is another form of unsupervised learning technique for creating a low-dimensional (usually two-dimensional) representation of a higher-dimensional data set while maintaining the topological structure of the data. SOM is also known as a neural network-based dimensionality reduction algorithm that is commonly used for clustering [ 118 ]. A SOM adapts to the topological form of a dataset by repeatedly moving its neurons closer to the data points, allowing us to visualize enormous datasets and find probable clusters. The first layer of a SOM is the input layer, and the second layer is the output layer or feature map. Unlike other neural networks that use error-correction learning, such as backpropagation with gradient descent [ 36 ], SOMs employ competitive learning, which uses a neighborhood function to retain the input space’s topological features. SOM is widely utilized in a variety of applications, including pattern identification, health or medical diagnosis, anomaly detection, and virus or worm attack detection [ 60 , 87 ]. The primary benefit of employing a SOM is that this can make high-dimensional data easier to visualize and analyze to understand the patterns. The reduction of dimensionality and grid clustering makes it easy to observe similarities in the data. As a result, SOMs can play a vital role in developing a data-driven effective model for a particular problem domain, depending on the data characteristics.

Restricted Boltzmann Machine (RBM)

A Restricted Boltzmann Machine (RBM) [ 75 ] is also a generative stochastic neural network capable of learning a probability distribution across its inputs. Boltzmann machines typically consist of visible and hidden nodes and each node is connected to every other node, which helps us understand irregularities by learning how the system works in normal circumstances. RBMs are a subset of Boltzmann machines that have a limit on the number of connections between the visible and hidden layers [ 77 ]. This restriction permits training algorithms like the gradient-based contrastive divergence algorithm to be more efficient than those for Boltzmann machines in general [ 41 ]. RBMs have found applications in dimensionality reduction, classification, regression, collaborative filtering, feature learning, topic modeling, and many others. In the area of deep learning modeling, they can be trained either supervised or unsupervised, depending on the task. Overall, the RBMs can recognize patterns in data automatically and develop probabilistic or stochastic models, which are utilized for feature selection or extraction, as well as forming a deep belief network.

Deep Belief Network (DBN)

A Deep Belief Network (DBN) [ 40 ] is a multi-layer generative graphical model of stacking several individual unsupervised networks such as AEs or RBMs, that use each network’s hidden layer as the input for the next layer, i.e, connected sequentially. Thus, we can divide a DBN into (i) AE-DBN which is known as stacked AE, and (ii) RBM-DBN that is known as stacked RBM, where AE-DBN is composed of autoencoders and RBM-DBN is composed of restricted Boltzmann machines, discussed earlier. The ultimate goal is to develop a faster-unsupervised training technique for each sub-network that depends on contrastive divergence [ 41 ]. DBN can capture a hierarchical representation of input data based on its deep structure. The primary idea behind DBN is to train unsupervised feed-forward neural networks with unlabeled data before fine-tuning the network with labeled input. One of the most important advantages of DBN, as opposed to typical shallow learning networks, is that it permits the detection of deep patterns, which allows for reasoning abilities and the capture of the deep difference between normal and erroneous data [ 89 ]. A continuous DBN is simply an extension of a standard DBN that allows a continuous range of decimals instead of binary data. Overall, the DBN model can play a key role in a wide range of high-dimensional data applications due to its strong feature extraction and classification capabilities and become one of the significant topics in the field of neural networks.

In summary, the generative learning techniques discussed above typically allow us to generate a new representation of data through exploratory analysis. As a result, these deep generative networks can be utilized as preprocessing for supervised or discriminative learning tasks, as well as ensuring model accuracy, where unsupervised representation learning can allow for improved classifier generalization.

Deep Networks for Hybrid Learning and Other Approaches

In addition to the above-discussed deep learning categories, hybrid deep networks and several other approaches such as deep transfer learning (DTL) and deep reinforcement learning (DRL) are popular, which are discussed in the following.

Hybrid Deep Neural Networks

Generative models are adaptable, with the capacity to learn from both labeled and unlabeled data. Discriminative models, on the other hand, are unable to learn from unlabeled data yet outperform their generative counterparts in supervised tasks. A framework for training both deep generative and discriminative models simultaneously can enjoy the benefits of both models, which motivates hybrid networks.

Hybrid deep learning models are typically composed of multiple (two or more) deep basic learning models, where the basic model is a discriminative or generative deep learning model discussed earlier. Based on the integration of different basic generative or discriminative models, the below three categories of hybrid deep learning models might be useful for solving real-world problems. These are as follows:

Hybrid \(Model\_1\) : An integration of different generative or discriminative models to extract more meaningful and robust features. Examples could be CNN+LSTM, AE+GAN, and so on.

Hybrid \(Model\_2\) : An integration of generative model followed by a discriminative model. Examples could be DBN+MLP, GAN+CNN, AE+CNN, and so on.

Hybrid \(Model\_3\) : An integration of generative or discriminative model followed by a non-deep learning classifier. Examples could be AE+SVM, CNN+SVM, and so on.

Thus, in a broad sense, we can conclude that hybrid models can be either classification-focused or non-classification depending on the target use. However, most of the hybrid learning-related studies in the area of deep learning are classification-focused or supervised learning tasks, summarized in Table 1 . The unsupervised generative models with meaningful representations are employed to enhance the discriminative models. The generative models with useful representation can provide more informative and low-dimensional features for discrimination, and they can also enable to enhance the training data quality and quantity, providing additional information for classification.

Deep Transfer Learning (DTL)

Transfer Learning is a technique for effectively using previously learned model knowledge to solve a new task with minimum training or fine-tuning. In comparison to typical machine learning techniques [ 97 ], DL takes a large amount of training data. As a result, the need for a substantial volume of labeled data is a significant barrier to address some essential domain-specific tasks, particularly, in the medical sector, where creating large-scale, high-quality annotated medical or health datasets is both difficult and costly. Furthermore, the standard DL model demands a lot of computational resources, such as a GPU-enabled server, even though researchers are working hard to improve it. As a result, Deep Transfer Learning (DTL), a DL-based transfer learning method, might be helpful to address this issue. Figure 11 shows a general structure of the transfer learning process, where knowledge from the pre-trained model is transferred into a new DL model. It’s especially popular in deep learning right now since it allows to train deep neural networks with very little data [ 126 ].

A general structure of transfer learning process, where knowledge from pre-trained model is transferred into new DL model

Transfer learning is a two-stage approach for training a DL model that consists of a pre-training step and a fine-tuning step in which the model is trained on the target task. Since deep neural networks have gained popularity in a variety of fields, a large number of DTL methods have been presented, making it crucial to categorize and summarize them. Based on the techniques used in the literature, DTL can be classified into four categories [ 117 ]. These are (i) instances-based deep transfer learning that utilizes instances in source domain by appropriate weight, (ii) mapping-based deep transfer learning that maps instances from two domains into a new data space with better similarity, (iii) network-based deep transfer learning that reuses the partial of network pre-trained in the source domain, and (iv) adversarial based deep transfer learning that uses adversarial technology to find transferable features that both suitable for two domains. Due to its high effectiveness and practicality, adversarial-based deep transfer learning has exploded in popularity in recent years. Transfer learning can also be classified into inductive, transductive, and unsupervised transfer learning depending on the circumstances between the source and target domains and activities [ 81 ]. While most current research focuses on supervised learning, how deep neural networks can transfer knowledge in unsupervised or semi-supervised learning may gain further interest in the future. DTL techniques are useful in a variety of fields including natural language processing, sentiment classification, visual recognition, speech recognition, spam filtering, and relevant others.

Deep Reinforcement Learning (DRL)

Reinforcement learning takes a different approach to solving the sequential decision-making problem than other approaches we have discussed so far. The concepts of an environment and an agent are often introduced first in reinforcement learning. The agent can perform a series of actions in the environment, each of which has an impact on the environment’s state and can result in possible rewards (feedback) - “positive” for good sequences of actions that result in a “good” state, and “negative” for bad sequences of actions that result in a “bad” state. The purpose of reinforcement learning is to learn good action sequences through interaction with the environment, typically referred to as a policy.

Schematic structure of deep reinforcement learning (DRL) highlighting a deep neural network

Deep reinforcement learning (DRL or deep RL) [ 9 ] integrates neural networks with a reinforcement learning architecture to allow the agents to learn the appropriate actions in a virtual environment, as shown in Fig. 12 . In the area of reinforcement learning, model-based RL is based on learning a transition model that enables for modeling of the environment without interacting with it directly, whereas model-free RL methods learn directly from interactions with the environment. Q-learning is a popular model-free RL technique for determining the best action-selection policy for any (finite) Markov Decision Process (MDP) [ 86 , 97 ]. MDP is a mathematical framework for modeling decisions based on state, action, and rewards [ 86 ]. In addition, Deep Q-Networks, Double DQN, Bi-directional Learning, Monte Carlo Control, etc. are used in the area [ 50 , 97 ]. In DRL methods it incorporates DL models, e.g. Deep Neural Networks (DNN), based on MDP principle [ 71 ], as policy and/or value function approximators. CNN for example can be used as a component of RL agents to learn directly from raw, high-dimensional visual inputs. In the real world, DRL-based solutions can be used in several application areas including robotics, video games, natural language processing, computer vision, and relevant others.

Several potential real-world application areas of deep learning

Deep Learning Application Summary

During the past few years, deep learning has been successfully applied to numerous problems in many application areas. These include natural language processing, sentiment analysis, cybersecurity, business, virtual assistants, visual recognition, healthcare, robotics, and many more. In Fig. 13 , we have summarized several potential real-world application areas of deep learning. Various deep learning techniques according to our presented taxonomy in Fig. 6 that includes discriminative learning, generative learning, as well as hybrid models, discussed earlier, are employed in these application areas. In Table 1 , we have also summarized various deep learning tasks and techniques that are used to solve the relevant tasks in several real-world applications areas. Overall, from Fig. 13 and Table 1 , we can conclude that the future prospects of deep learning modeling in real-world application areas are huge and there are lots of scopes to work. In the next section, we also summarize the research issues in deep learning modeling and point out the potential aspects for future generation DL modeling.

Research Directions and Future Aspects

While existing methods have established a solid foundation for deep learning systems and research, this section outlines the below ten potential future research directions based on our study.

Automation in Data Annotation According to the existing literature, discussed in Section 3 , most of the deep learning models are trained through publicly available datasets that are annotated. However, to build a system for a new problem domain or recent data-driven system, raw data from relevant sources are needed to collect. Thus, data annotation, e.g., categorization, tagging, or labeling of a large amount of raw data, is important for building discriminative deep learning models or supervised tasks, which is challenging. A technique with the capability of automatic and dynamic data annotation, rather than manual annotation or hiring annotators, particularly, for large datasets, could be more effective for supervised learning as well as minimizing human effort. Therefore, a more in-depth investigation of data collection and annotation methods, or designing an unsupervised learning-based solution could be one of the primary research directions in the area of deep learning modeling.

Data Preparation for Ensuring Data Quality As discussed earlier throughout the paper, the deep learning algorithms highly impact data quality, and availability for training, and consequently on the resultant model for a particular problem domain. Thus, deep learning models may become worthless or yield decreased accuracy if the data is bad, such as data sparsity, non-representative, poor-quality, ambiguous values, noise, data imbalance, irrelevant features, data inconsistency, insufficient quantity, and so on for training. Consequently, such issues in data can lead to poor processing and inaccurate findings, which is a major problem while discovering insights from data. Thus deep learning models also need to adapt to such rising issues in data, to capture approximated information from observations. Therefore, effective data pre-processing techniques are needed to design according to the nature of the data problem and characteristics, to handling such emerging challenges, which could be another research direction in the area.

Black-box Perception and Proper DL/ML Algorithm Selection In general, it’s difficult to explain how a deep learning result is obtained or how they get the ultimate decisions for a particular model. Although DL models achieve significant performance while learning from large datasets, as discussed in Section 2 , this “black-box” perception of DL modeling typically represents weak statistical interpretability that could be a major issue in the area. On the other hand, ML algorithms, particularly, rule-based machine learning techniques provide explicit logic rules (IF-THEN) for making decisions that are easier to interpret, update or delete according to the target applications [ 97 , 100 , 105 ]. If the wrong learning algorithm is chosen, unanticipated results may occur, resulting in a loss of effort as well as the model’s efficacy and accuracy. Thus by taking into account the performance, complexity, model accuracy, and applicability, selecting an appropriate model for the target application is challenging, and in-depth analysis is needed for better understanding and decision making.

Deep Networks for Supervised or Discriminative Learning: According to our designed taxonomy of deep learning techniques, as shown in Fig. 6 , discriminative architectures mainly include MLP, CNN, and RNN, along with their variants that are applied widely in various application domains. However, designing new techniques or their variants of such discriminative techniques by taking into account model optimization, accuracy, and applicability, according to the target real-world application and the nature of the data, could be a novel contribution, which can also be considered as a major future aspect in the area of supervised or discriminative learning.

Deep Networks for Unsupervised or Generative Learning As discussed in Section 3 , unsupervised learning or generative deep learning modeling is one of the major tasks in the area, as it allows us to characterize the high-order correlation properties or features in data, or generating a new representation of data through exploratory analysis. Moreover, unlike supervised learning [ 97 ], it does not require labeled data due to its capability to derive insights directly from the data as well as data-driven decision making. Consequently, it thus can be used as preprocessing for supervised learning or discriminative modeling as well as semi-supervised learning tasks, which ensure learning accuracy and model efficiency. According to our designed taxonomy of deep learning techniques, as shown in Fig. 6 , generative techniques mainly include GAN, AE, SOM, RBM, DBN, and their variants. Thus, designing new techniques or their variants for an effective data modeling or representation according to the target real-world application could be a novel contribution, which can also be considered as a major future aspect in the area of unsupervised or generative learning.

Hybrid/Ensemble Modeling and Uncertainty Handling According to our designed taxonomy of DL techniques, as shown in Fig 6 , this is considered as another major category in deep learning tasks. As hybrid modeling enjoys the benefits of both generative and discriminative learning, an effective hybridization can outperform others in terms of performance as well as uncertainty handling in high-risk applications. In Section 3 , we have summarized various types of hybridization, e.g., AE+CNN/SVM. Since a group of neural networks is trained with distinct parameters or with separate sub-sampling training datasets, hybridization or ensembles of such techniques, i.e., DL with DL/ML, can play a key role in the area. Thus designing effective blended discriminative and generative models accordingly rather than naive method, could be an important research opportunity to solve various real-world issues including semi-supervised learning tasks and model uncertainty.

Dynamism in Selecting Threshold/ Hyper-parameters Values, and Network Structures with Computational Efficiency In general, the relationship among performance, model complexity, and computational requirements is a key issue in deep learning modeling and applications. A combination of algorithmic advancements with improved accuracy as well as maintaining computational efficiency, i.e., achieving the maximum throughput while consuming the least amount of resources, without significant information loss, can lead to a breakthrough in the effectiveness of deep learning modeling in future real-world applications. The concept of incremental approaches or recency-based learning [ 100 ] might be effective in several cases depending on the nature of target applications. Moreover, assuming the network structures with a static number of nodes and layers, hyper-parameters values or threshold settings, or selecting them by the trial-and-error process may not be effective in many cases, as it can be changed due to the changes in data. Thus, a data-driven approach to select them dynamically could be more effective while building a deep learning model in terms of both performance and real-world applicability. Such type of data-driven automation can lead to future generation deep learning modeling with additional intelligence, which could be a significant future aspect in the area as well as an important research direction to contribute.

Lightweight Deep Learning Modeling for Next-Generation Smart Devices and Applications: In recent years, the Internet of Things (IoT) consisting of billions of intelligent and communicating things and mobile communications technologies have become popular to detect and gather human and environmental information (e.g. geo-information, weather data, bio-data, human behaviors, and so on) for a variety of intelligent services and applications. Every day, these ubiquitous smart things or devices generate large amounts of data, requiring rapid data processing on a variety of smart mobile devices [ 72 ]. Deep learning technologies can be incorporate to discover underlying properties and to effectively handle such large amounts of sensor data for a variety of IoT applications including health monitoring and disease analysis, smart cities, traffic flow prediction, and monitoring, smart transportation, manufacture inspection, fault assessment, smart industry or Industry 4.0, and many more. Although deep learning techniques discussed in Section 3 are considered as powerful tools for processing big data, lightweight modeling is important for resource-constrained devices, due to their high computational cost and considerable memory overhead. Thus several techniques such as optimization, simplification, compression, pruning, generalization, important feature extraction, etc. might be helpful in several cases. Therefore, constructing the lightweight deep learning techniques based on a baseline network architecture to adapt the DL model for next-generation mobile, IoT, or resource-constrained devices and applications, could be considered as a significant future aspect in the area.

Incorporating Domain Knowledge into Deep Learning Modeling Domain knowledge, as opposed to general knowledge or domain-independent knowledge, is knowledge of a specific, specialized topic or field. For instance, in terms of natural language processing, the properties of the English language typically differ from other languages like Bengali, Arabic, French, etc. Thus integrating domain-based constraints into the deep learning model could produce better results for such particular purpose. For instance, a task-specific feature extractor considering domain knowledge in smart manufacturing for fault diagnosis can resolve the issues in traditional deep-learning-based methods [ 28 ]. Similarly, domain knowledge in medical image analysis [ 58 ], financial sentiment analysis [ 49 ], cybersecurity analytics [ 94 , 103 ] as well as conceptual data model in which semantic information, (i.e., meaningful for a system, rather than merely correlational) [ 45 , 121 , 131 ] is included, can play a vital role in the area. Transfer learning could be an effective way to get started on a new challenge with domain knowledge. Moreover, contextual information such as spatial, temporal, social, environmental contexts [ 92 , 104 , 108 ] can also play an important role to incorporate context-aware computing with domain knowledge for smart decision making as well as building adaptive and intelligent context-aware systems. Therefore understanding domain knowledge and effectively incorporating them into the deep learning model could be another research direction.

Designing General Deep Learning Framework for Target Application Domains One promising research direction for deep learning-based solutions is to develop a general framework that can handle data diversity, dimensions, stimulation types, etc. The general framework would require two key capabilities: the attention mechanism that focuses on the most valuable parts of input signals, and the ability to capture latent feature that enables the framework to capture the distinctive and informative features. Attention models have been a popular research topic because of their intuition, versatility, and interpretability, and employed in various application areas like computer vision, natural language processing, text or image classification, sentiment analysis, recommender systems, user profiling, etc [ 13 , 80 ]. Attention mechanism can be implemented based on learning algorithms such as reinforcement learning that is capable of finding the most useful part through a policy search [ 133 , 134 ]. Similarly, CNN can be integrated with suitable attention mechanisms to form a general classification framework, where CNN can be used as a feature learning tool for capturing features in various levels and ranges. Thus, designing a general deep learning framework considering attention as well as a latent feature for target application domains could be another area to contribute.

To summarize, deep learning is a fairly open topic to which academics can contribute by developing new methods or improving existing methods to handle the above-mentioned concerns and tackle real-world problems in a variety of application areas. This can also help the researchers conduct a thorough analysis of the application’s hidden and unexpected challenges to produce more reliable and realistic outcomes. Overall, we can conclude that addressing the above-mentioned issues and contributing to proposing effective and efficient techniques could lead to “Future Generation DL” modeling as well as more intelligent and automated applications.

Concluding Remarks

In this article, we have presented a structured and comprehensive view of deep learning technology, which is considered a core part of artificial intelligence as well as data science. It starts with a history of artificial neural networks and moves to recent deep learning techniques and breakthroughs in different applications. Then, the key algorithms in this area, as well as deep neural network modeling in various dimensions are explored. For this, we have also presented a taxonomy considering the variations of deep learning tasks and how they are used for different purposes. In our comprehensive study, we have taken into account not only the deep networks for supervised or discriminative learning but also the deep networks for unsupervised or generative learning, and hybrid learning that can be used to solve a variety of real-world issues according to the nature of problems.

Deep learning, unlike traditional machine learning and data mining algorithms, can produce extremely high-level data representations from enormous amounts of raw data. As a result, it has provided an excellent solution to a variety of real-world problems. A successful deep learning technique must possess the relevant data-driven modeling depending on the characteristics of raw data. The sophisticated learning algorithms then need to be trained through the collected data and knowledge related to the target application before the system can assist with intelligent decision-making. Deep learning has shown to be useful in a wide range of applications and research areas such as healthcare, sentiment analysis, visual recognition, business intelligence, cybersecurity, and many more that are summarized in the paper.

Finally, we have summarized and discussed the challenges faced and the potential research directions, and future aspects in the area. Although deep learning is considered a black-box solution for many applications due to its poor reasoning and interpretability, addressing the challenges or future aspects that are identified could lead to future generation deep learning modeling and smarter systems. This can also help the researchers for in-depth analysis to produce more reliable and realistic outcomes. Overall, we believe that our study on neural networks and deep learning-based advanced analytics points in a promising path and can be utilized as a reference guide for future research and implementations in relevant application domains by both academic and industry professionals.

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin Ma, Ghemawat S, Irving G, Isard M, et al. Tensorflow: a system for large-scale machine learning. In: 12th { USENIX } Symposium on operating systems design and implementation ({ OSDI } 16), 2016; p. 265–283.

Abdel-Basset M, Hawash H, Chakrabortty RK, Ryan M. Energy-net: a deep learning approach for smart energy management in iot-based smart cities. IEEE Internet of Things J. 2021.

Aggarwal A, Mittal M, Battineni G. Generative adversarial network: an overview of theory and applications. Int J Inf Manag Data Insights. 2021; p. 100004.

Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K. Deep learning approach combining sparse autoencoder with svm for network intrusion detection. IEEE Access. 2018;6:52843–56.

Article Google Scholar

Ale L, Sheta A, Li L, Wang Y, Zhang N. Deep learning based plant disease detection for smart agriculture. In: 2019 IEEE Globecom Workshops (GC Wkshps), 2019; p. 1–6. IEEE.

Amarbayasgalan T, Lee JY, Kim KR, Ryu KH. Deep autoencoder based neural networks for coronary heart disease risk prediction. In: Heterogeneous data management, polystores, and analytics for healthcare. Springer; 2019. p. 237–48.

Anuradha J, et al. Big data based stock trend prediction using deep cnn with reinforcement-lstm model. Int J Syst Assur Eng Manag. 2021; p. 1–11.

Aqib M, Mehmood R, Albeshri A, Alzahrani A. Disaster management in smart cities by forecasting traffic plan using deep learning and gpus. In: International Conference on smart cities, infrastructure, technologies and applications. Springer; 2017. p. 139–54.

Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.

Aslan MF, Unlersen MF, Sabanci K, Durdu A. Cnn-based transfer learning-bilstm network: a novel approach for covid-19 infection detection. Appl Soft Comput. 2021;98:106912.

Bu F, Wang X. A smart agriculture iot system based on deep reinforcement learning. Futur Gener Comput Syst. 2019;99:500–7.

Chang W-J, Chen L-B, Hsu C-H, Lin C-P, Yang T-C. A deep learning-based intelligent medicine recognition system for chronic patients. IEEE Access. 2019;7:44441–58.

Chaudhari S, Mithal V, Polatkan Gu, Ramanath R. An attentive survey of attention models. arXiv preprint arXiv:1904.02874, 2019.

Chaudhuri N, Gupta G, Vamsi V, Bose I. On the platform but will they buy? predicting customers’ purchase behavior using deep learning. Decis Support Syst. 2021; p. 113622.

Chen D, Wawrzynski P, Lv Z. Cyber security in smart cities: a review of deep learning-based applications and case studies. Sustain Cities Soc. 2020; p. 102655.

Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2017; p. 1251–258.

Chung J, Gulcehre C, Cho KH, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.

Coelho IM, Coelho VN, da Eduardo J, Luz S, Ochi LS, Guimarães FG, Rios E. A gpu deep learning metaheuristic based model for time series forecasting. Appl Energy. 2017;201:412–8.

Da'u A, Salim N. Recommendation system based on deep learning methods: a systematic review and new directions. Artif Intel Rev. 2020;53(4):2709–48.

Deng L. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process. 2014; p. 3.

Deng L, Dong Yu. Deep learning: methods and applications. Found Trends Signal Process. 2014;7(3–4):197–387.

Article MathSciNet MATH Google Scholar

Deng S, Li R, Jin Y, He H. Cnn-based feature cross and classifier for loan default prediction. In: 2020 International Conference on image, video processing and artificial intelligence, volume 11584, page 115841K. International Society for Optics and Photonics, 2020.

Dhyani M, Kumar R. An intelligent chatbot using deep learning with bidirectional rnn and attention model. Mater Today Proc. 2021;34:817–24.

Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.

Du K-L, Swamy MNS. Neural networks and statistical learning. Berlin: Springer Science & Business Media; 2013.

MATH Google Scholar

Dupond S. A thorough review on the current advance of neural network structures. Annu Rev Control. 2019;14:200–30.

Google Scholar

Feng J, Yao Y, Lu S, Liu Y. Domain knowledge-based deep-broad learning framework for fault diagnosis. IEEE Trans Ind Electron. 2020;68(4):3454–64.

Garg S, Kaur K, Kumar N, Rodrigues JJPC. Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in sdn: a social multimedia perspective. IEEE Trans Multimed. 2019;21(3):566–78.

Géron A. Hands-on machine learning with Scikit-Learn, Keras. In: and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media; 2019.

Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning, vol. 1. Cambridge: MIT Press; 2016.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014; p. 2672–680.

Google trends. 2021. https://trends.google.com/trends/ .

Gruber N, Jockisch A. Are gru cells more specific and lstm cells more sensitive in motive classification of text? Front Artif Intell. 2020;3:40.

Gu B, Ge R, Chen Y, Luo L, Coatrieux G. Automatic and robust object detection in x-ray baggage inspection using deep convolutional neural networks. IEEE Trans Ind Electron. 2020.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.

Haykin S. Neural networks and learning machines, 3/E. London: Pearson Education; 2010.

He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2016; p. 770–78.

Hinton GE. Deep belief networks. Scholarpedia. 2009;4(5):5947.

Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

Huang C-J, Kuo P-H. A deep cnn-lstm model for particulate matter (pm2. 5) forecasting in smart cities. Sensors. 2018;18(7):2220.

Huang H-H, Fukuda M, Nishida T. Toward rnn based micro non-verbal behavior generation for virtual listener agents. In: International Conference on human-computer interaction, 2019; p. 53–63. Springer.

Hulsebos M, Hu K, Bakker M, Zgraggen E, Satyanarayan A, Kraska T, Demiralp Ça, Hidalgo C. Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on knowledge discovery & data mining, 2019; p. 1500–508.

Imamverdiyev Y, Abdullayeva F. Deep learning method for denial of service attack detection based on restricted Boltzmann machine. Big Data. 2018;6(2):159–69.

Islam MZ, Islam MM, Asraf A. A combined deep cnn-lstm network for the detection of novel coronavirus (covid-19) using x-ray images. Inf Med Unlock. 2020;20:100412.

Ismail WN, Hassan MM, Alsalamah HA, Fortino G. Cnn-based health model for regular health factors analysis in internet-of-medical things environment. IEEE. Access. 2020;8:52541–9.

Jangid H, Singhal S, Shah RR, Zimmermann R. Aspect-based financial sentiment analysis using deep learning. In: Companion Proceedings of the The Web Conference 2018, 2018; p. 1961–966.

Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

Kameoka H, Li L, Inoue S, Makino S. Supervised determined source separation with multichannel variational autoencoder. Neural Comput. 2019;31(9):1891–914.

Karhunen J, Raiko T, Cho KH. Unsupervised deep learning: a short review. In: Advances in independent component analysis and learning machines. 2015; p. 125–42.

Kawde P, Verma GK. Deep belief network based affect recognition from physiological signals. In: 2017 4th IEEE Uttar Pradesh Section International Conference on electrical, computer and electronics (UPCON), 2017; p. 587–92. IEEE.

Kim J-Y, Seok-Jun B, Cho S-B. Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci. 2018;460:83–102.

Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

Kingma DP, Welling M. An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691, 2019.

Kiran PKR, Bhasker B. Dnnrec: a novel deep learning based hybrid recommender system. Expert Syst Appl. 2020.

Kloenne M, Niehaus S, Lampe L, Merola A, Reinelt J, Roeder I, Scherf N. Domain-specific cues improve robustness of deep learning-based segmentation of ct volumes. Sci Rep. 2020;10(1):1–9.

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.

Kohonen T. Essentials of the self-organizing map. Neural Netw. 2013;37:52–65.

Kök İ, Şimşek MU, Özdemir S. A deep learning model for air quality prediction in smart cities. In: 2017 IEEE International Conference on Big Data (Big Data), 2017; p. 1983–990. IEEE.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012; p. 1097–105.

Latif S, Rana R, Younis S, Qadir J, Epps J. Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353, 2018.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Li B, François-Lavet V, Doan T, Pineau J. Domain adversarial reinforcement learning. arXiv preprint arXiv:2102.07097, 2021.

Li T-HS, Kuo P-H, Tsai T-N, Luan P-C. Cnn and lstm based facial expression analysis model for a humanoid robot. IEEE Access. 2019;7:93998–4011.

Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Yunsheng M, Chen S, Hou P. A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure. IEEE Trans Serv Comput. 2017;11(2):249–61.

Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing. 2017;234:11–26.

López AU, Mateo F, Navío-Marco J, Martínez-Martínez JM, Gómez-Sanchís J, Vila-Francés J, Serrano-López AJ. Analysis of computer user behavior, security incidents and fraud using self-organizing maps. Comput Secur. 2019;83:38–51.

Lopez-Martin M, Carro B, Sanchez-Esguevillas A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst Appl. 2020;141:112963.

Ma X, Yao T, Menglan H, Dong Y, Liu W, Wang F, Liu J. A survey on deep learning empowered iot applications. IEEE Access. 2019;7:181721–32.

Makhzani A, Frey B. K-sparse autoencoders. arXiv preprint arXiv:1312.5663, 2013.

Mandic D, Chambers J. Recurrent neural networks for prediction: learning algorithms, architectures and stability. Hoboken: Wiley; 2001.

Book Google Scholar

Marlin B, Swersky K, Chen B, Freitas N. Inductive principles for restricted boltzmann machine learning. In: Proceedings of the Thirteenth International Conference on artificial intelligence and statistics, p. 509–16. JMLR Workshop and Conference Proceedings, 2010.

Masud M, Muhammad G, Alhumyani H, Alshamrani SS, Cheikhrouhou O, Ibrahim S, Hossain MS. Deep learning-based intelligent face recognition in iot-cloud environment. Comput Commun. 2020;152:215–22.

Memisevic R, Hinton GE. Learning to represent spatial transformations with factored higher-order boltzmann machines. Neural Comput. 2010;22(6):1473–92.

Article MATH Google Scholar

Minaee S, Azimi E, Abdolrashidi AA. Deep-sentiment: sentiment analysis using ensemble of cnn and bi-lstm models. arXiv preprint arXiv:1904.04206, 2019.

Naeem M, Paragliola G, Coronato A. A reinforcement learning and deep learning based intelligent system for the support of impaired patients in home treatment. Expert Syst Appl. 2021;168:114285.

Niu Z, Zhong G, Hui Yu. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62.

Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:8026–37.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

MathSciNet MATH Google Scholar

Pi Y, Nath ND, Behzadan AH. Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv Eng Inf. 2020;43:101009.

Piccialli F, Giampaolo F, Prezioso E, Crisci D, Cuomo S. Predictive analytics for smart parking: A deep learning approach in forecasting of iot data. ACM Trans Internet Technol (TOIT). 2021;21(3):1–21.

Puterman ML. Markov decision processes: discrete stochastic dynamic programming. Hoboken: Wiley; 2014.

Qu X, Lin Y, Kai G, Linru M, Meng S, Mingxing K, Mu L, editors. A survey on the development of self-organizing maps for unsupervised intrusion detection. Mob Netw Appl. 2019; p. 1–22.

Rahman MW, Tashfia SS, Islam R, Hasan MM, Sultan SI, Mia S, Rahman MM. The architectural design of smart blind assistant using iot with deep learning paradigm. Internet of Things. 2021;13:100344.

Ren J, Green M, Huang X. From traditional to deep learning: fault diagnosis for autonomous vehicles. In: Learning control. Elsevier. 2021; p. 205–19.

Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: Explicit invariance during feature extraction. In: Icml, 2011.

Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ. A knowledge-based recommendation system that includes sentiment analysis and deep learning. IEEE Trans Ind Inf. 2018;15(4):2124–35.

Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.

Article MathSciNet Google Scholar

Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet of Things. 2019;5:180–93.

Sarker IH. Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things. 2021;14:100393.

Sarker IH. Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci. 2021.

Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Computer. Science. 2021;2(3):1–16.

MathSciNet Google Scholar

Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Computer. Science. 2021;2(3):1–21.

Sarker IH, Abushark YB, Alsolami F, Khan AI. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.

Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.

Sarker IH, Colman A, Han J. Recencyminer: mining recency-based personalized behavior from contextual smartphone data. J Big Data. 2019;6(1):1–21.

Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah K. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2020;25(3):1151–61.

Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J. 2018;61(3):349–68.

Sarker IH, Furhad MH, Nowrozy R. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Computer. Science. 2021;2(3):1–18.

Sarker IH, Hoque MM, Uddin MK. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl. 2021;26(1):285–303.

Sarker IH, Kayes ASM. Abc-ruleminer: User behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020;168:102762.

Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big data. 2020;7(1):1–29.

Sarker IH, Kayes ASM, Watters P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.

Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet of Things. 2019;8:100106.

Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeec, 2017; p. 1089–1093.

Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans Comput Biol Bioinf. 2018;16(6):2089–100.

Sujay Narumanchi H, Ananya Pramod Kompalli Shankar A, Devashish CK. Deep learning based large scale visual recommendation and search for e-commerce. arXiv preprint arXiv:1703.02344, 2017.

Shao X, Kim CS. Multi-step short-term power consumption forecasting using multi-channel lstm with time location considering customer behavior. IEEE Access. 2020;8:125263–73.

Siami-Namini S, Tavakoli N, Namin AS. The performance of lstm and bilstm in forecasting time series. In: 2019 IEEE International Conference on Big Data (Big Data), 2019; p. 3285–292. IEEE.

Ślusarczyk B. Industry 4.0: are we ready? Pol J Manag Stud. 2018; p. 17

Sumathi P, Subramanian R, Karthikeyan VV, Karthik S. Soil monitoring and evaluation system using edl-asqe: enhanced deep learning model for ioi smart agriculture network. Int J Commun Syst. 2021; p. e4859.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2015; p. 1–9.

Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International Conference on artificial neural networks, 2018; p. 270–279. Springer.

Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Netw. 2000;11(3):586–600.

Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11(12).

Wang J, Liang-Chih Yu, Robert Lai K, Zhang X. Tree-structured regional cnn-lstm model for dimensional sentiment analysis. IEEE/ACM Trans Audio Speech Lang Process. 2019;28:581–91.

Wang S, Wan J, Li D, Liu C. Knowledge reasoning with semantic data for real-time data processing in smart factory. Sensors. 2018;18(2):471.

Wang W, Zhao M, Wang J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput. 2019;10(8):3035–43.

Wang X, Liu J, Qiu T, Chaoxu M, Chen C, Zhou P. A real-time collision prediction mechanism with deep learning for intelligent transportation system. IEEE Trans Veh Technol. 2020;69(9):9497–508.

Wang Y, Huang M, Zhu X, Zhao L. Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on empirical methods in natural language processing, 2016; p. 606–615.

Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.

Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.

Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. Ieee access. 2018;6:35365–81.

Xu W, Sun H, Deng C, Tan Y. Variational autoencoder for semi-supervised text classification. In: Thirty-First AAAI Conference on artificial intelligence, 2017.

Xue Q, Chuah MC. New attacks on rnn based healthcare learning system and their detections. Smart Health. 2018;9:144–57.

Yousefi-Azar M, Hamey L. Text summarization using unsupervised deep learning. Expert Syst Appl. 2017;68:93–105.

Yuan X, Shi J, Gu L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst Appl. 2020;p. 114417.

Zhang G, Liu Y, Jin X. A survey of autoencoder-based recommender systems. Front Comput Sci. 2020;14(2):430–50.

Zhang X, Yao L, Huang C, Wang S, Tan M, Long Gu, Wang C. Multi-modality sensor data classification with selective attention. arXiv preprint arXiv:1804.05493, 2018.

Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D, Zhang Y. A survey on deep learning based brain computer interface: recent advances and new frontiers. arXiv preprint arXiv:1905.04149, 2019; p. 66.

Zhang Y, Zhang P, Yan Y. Attention-based lstm with multi-task learning for distant speech recognition. In: Interspeech, 2017; p. 3857–861.

Download references

Author information

Authors and affiliations.

Swinburne University of Technology, Melbourne, VIC, 3122, Australia

Iqbal H. Sarker

Chittagong University of Engineering & Technology, Chittagong, 4349, Bangladesh

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iqbal H. Sarker .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K. N. and M. Shivakumar.

Rights and permissions

Reprints and permissions

About this article

Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN COMPUT. SCI. 2 , 420 (2021). https://doi.org/10.1007/s42979-021-00815-1

Download citation

Received : 29 May 2021

Accepted : 07 August 2021

Published : 18 August 2021

DOI : https://doi.org/10.1007/s42979-021-00815-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Deep learning
Artificial neural network
Artificial intelligence
Discriminative learning
Generative learning
Hybrid learning
Intelligent systems
Find a journal
Publish with us
Track your research

How to Read Research Papers: A Pragmatic Approach for ML Practitioners

Is it necessary for data scientists or machine-learning experts to read research papers?

The short answer is yes. And don’t worry if you lack a formal academic background or have only obtained an undergraduate degree in the field of machine learning.

Reading academic research papers may be intimidating for individuals without an extensive educational background. However, a lack of academic reading experience should not prevent Data scientists from taking advantage of a valuable source of information and knowledge for machine learning and AI development .

This article provides a hands-on tutorial for data scientists of any skill level to read research papers published in academic journals such as NeurIPS , JMLR , ICML, and so on.

Before diving wholeheartedly into how to read research papers, the first phases of learning how to read research papers cover selecting relevant topics and research papers.

Step 1: Identify a topic

The domain of machine learning and data science is home to a plethora of subject areas that may be studied. But this does not necessarily imply that tackling each topic within machine learning is the best option.

Although generalization for entry-level practitioners is advised, I’m guessing that when it comes to long-term machine learning, career prospects, practitioners, and industry interest often shifts to specialization.

Identifying a niche topic to work on may be difficult, but good. Still, a rule of thumb is to select an ML field in which you are either interested in obtaining a professional position or already have experience.

Deep Learning is one of my interests, and I’m a Computer Vision Engineer that uses deep learning models in apps to solve computer vision problems professionally. As a result, I’m interested in topics like pose estimation, action classification, and gesture identification.

Based on roles, the following are examples of ML/DS occupations and related themes to consider.

For this article, I’ll select the topic Pose Estimation to explore and choose associated research papers to study.

Step 2: Finding research papers

One of the most excellent tools to use while looking at machine learning-related research papers, datasets, code, and other related materials is PapersWithCode .

We use the search engine on the PapersWithCode website to get relevant research papers and content for our chosen topic, “Pose Estimation.” The following image shows you how it’s done.

The search results page contains a short explanation of the searched topic, followed by a table of associated datasets, models, papers, and code. Without going into too much detail, the area of interest for this use case is the “Greatest papers with code”. This section contains the relevant papers related to the task or topic. For the purpose of this article, I’ll select the DensePose: Dense Human Pose Estimation In The Wild .

Step 3: First pass (gaining context and understanding)

At this point, we’ve selected a research paper to study and are prepared to extract any valuable learnings and findings from its content.

It’s only natural that your first impulse is to start writing notes and reading the document from beginning to end, perhaps taking some rest in between. However, having a context for the content of a study paper is a more practical way to read it. The title, abstract, and conclusion are three key parts of any research paper to gain an understanding.

The goal of the first pass of your chosen paper is to achieve the following:

Assure that the paper is relevant.
Obtain a sense of the paper’s context by learning about its contents, methods, and findings.
Recognize the author’s goals, methodology, and accomplishments.

The title is the first point of information sharing between the authors and the reader. Therefore, research papers titles are direct and composed in a manner that leaves no ambiguity.

The research paper title is the most telling aspect since it indicates the study’s relevance to your work. The importance of the title is to give a brief perception of the paper’s content.

In this situation, the title is “DensePose: Dense Human Pose Estimation in the Wild.” This gives a broad overview of the work and implies that it will look at how to provide pose estimations in environments with high levels of activity and realistic situations properly.

The abstract portion gives a summarized version of the paper. It’s a short section that contains 300-500 words and tells you what the paper is about in a nutshell. The abstract is a brief text that provides an overview of the article’s content, researchers’ objectives, methods, and techniques.

When reading an abstract of a machine-learning research paper, you’ll typically come across mentions of datasets, methods, algorithms, and other terms. Keywords relevant to the article’s content provide context. It may be helpful to take notes and keep track of all keywords at this point.

For the paper: “ DensePose: Dense Human Pose Estimation In The Wild “, I identified in the abstract the following keywords: pose estimation, COCO dataset, CNN, region-based models, real-time.

It’s not uncommon to experience fatigue when reading the paper from top to bottom at your first initial pass, especially for Data Scientists and practitioners with no prior advanced academic experience. Although extracting information from the later sections of a paper might seem tedious after a long study session, the conclusion sections are often short. Hence reading the conclusion section in the first pass is recommended.

The conclusion section is a brief compendium of the work’s author or authors and/or contributions and accomplishments and promises for future developments and limitations.

Before reading the main content of a research paper, read the conclusion section to see if the researcher’s contributions, problem domain, and outcomes match your needs.

Following this particular brief first pass step enables a sufficient understanding and overview of the research paper’s scope and objectives, as well as a context for its content. You’ll be able to get more detailed information out of its content by going through it again with laser attention.

Step 4: Second pass (content familiarization)

Content familiarization is a process that’s relevant to the initial steps. The systematic approach to reading the research paper presented in this article. The familiarity process is a step that involves the introduction section and figures within the research paper.

As previously mentioned, the urge to plunge straight into the core of the research paper is not required because knowledge acclimatization provides an easier and more comprehensive examination of the study in later passes.

Introduction

Introductory sections of research papers are written to provide an overview of the objective of the research efforts. This objective mentions and explains problem domains, research scope, prior research efforts, and methodologies.

It’s normal to find parallels to past research work in this area, using similar or distinct methods. Other papers’ citations provide the scope and breadth of the problem domain, which broadens the exploratory zone for the reader. Perhaps incorporating the procedure outlined in Step 3 is sufficient at this point.

Another aspect of the benefit provided by the introduction section is the presentation of requisite knowledge required to approach and understand the content of the research paper.

Graph, diagrams, figures

Illustrative materials within the research paper ensure that readers can comprehend factors that support problem definition or explanations of methods presented. Commonly, tables are used within research papers to provide information on the quantitative performances of novel techniques in comparison to similar approaches.

Image showing the Comparison of DensePose with other single person pose estimation solutions,

Generally, the visual representation of data and performance enables the development of an intuitive understanding of the paper’s context. In the Dense Pose paper mentioned earlier, illustrations are used to depict the performance of the author’s approach to pose estimation and create. An overall understanding of the steps involved in generating and annotating data samples.

In the realm of deep learning, it’s common to find topological illustrations depicting the structure of artificial neural networks. Again this adds to the creation of intuitive understanding for any reader. Through illustrations and figures, readers may interpret the information themselves and gain a fuller perspective of it without having any preconceived notions about what outcomes should be.

Image showing the cross-cascading architecture of DensePose.

Step 5: Third pass (deep reading)

The third pass of the paper is similar to the second, though it covers a greater portion of the text. The most important thing about this pass is that you avoid any complex arithmetic or technique formulations that may be difficult for you. During this pass, you can also skip over any words and definitions that you don’t understand or aren’t familiar with. These unfamiliar terms, algorithms, or techniques should be noted to return to later.

Image of a magnifying glass depicting deep reading.

During this pass, your primary objective is to gain a broad understanding of what’s covered in the paper. Approach the paper, starting again from the abstract to the conclusion, but be sure to take intermediary breaks in between sections. Moreover, it’s recommended to have a notepad, where all key insights and takeaways are noted, alongside the unfamiliar terms and concepts.

The Pomodoro Technique is an effective method of managing time allocated to deep reading or study. Explained simply, the Pomodoro Technique involves the segmentation of the day into blocks of work, followed by short breaks.

What works for me is the 50/15 split, that is, 50 minutes studying and 15 minutes allocated to breaks. I tend to execute this split twice consecutively before taking a more extended break of 30 minutes. If you are unfamiliar with this time management technique, adopt a relatively easy division such as 25/5 and adjust the time split according to your focus and time capacity.

Step 6: Forth pass (final pass)

The final pass is typically one that involves an exertion of your mental and learning abilities, as it involves going through the unfamiliar terms, terminologies, concepts, and algorithms noted in the previous pass. This pass focuses on using external material to understand the recorded unfamiliar aspects of the paper.

In-depth studies of unfamiliar subjects have no specified time length, and at times efforts span into the days and weeks. The critical factor to a successful final pass is locating the appropriate sources for further exploration.

Unfortunately, there isn’t one source on the Internet that provides the wealth of information you require. Still, there are multiple sources that, when used in unison and appropriately, fill knowledge gaps. Below are a few of these resources.

The Machine Learning Subreddit
The Deep Learning Subreddit
PapersWithCode
Top conferences such as NIPS , ICML , ICLR
Research Gate
Machine Learning Apple

The Reference sections of research papers mention techniques and algorithms. Consequently, the current paper either draws inspiration from or builds upon, which is why the reference section is a useful source to use in your deep reading sessions.

Step 7: Summary (optional)

In almost a decade of academic and professional undertakings of technology-associated subjects and roles, the most effective method of ensuring any new information learned is retained in my long-term memory through the recapitulation of explored topics. By rewriting new information in my own words, either written or typed, I’m able to reinforce the presented ideas in an understandable and memorable manner.

An image of someone blogging on a laptop

To take it one step further, it’s possible to publicize learning efforts and notes through the utilization of blogging platforms and social media. An attempt to explain the freshly explored concept to a broad audience, assuming a reader isn’t accustomed to the topic or subject, requires understanding topics in intrinsic details.

Undoubtedly, reading research papers for novice Data Scientists and ML practitioners can be daunting and challenging; even seasoned practitioners find it difficult to digest the content of research papers in a single pass successfully.

The nature of the Data Science profession is very practical and involved. Meaning, there’s a requirement for its practitioners to employ an academic mindset, more so as the Data Science domain is closely associated with AI, which is still a developing field.

To summarize, here are all of the steps you should follow to read a research paper:

Identify A Topic.
Finding associated Research Papers
Read title, abstract, and conclusion to gain a vague understanding of the research effort aims and achievements.
Familiarize yourself with the content by diving deeper into the introduction; including the exploration of figures and graphs presented in the paper.
Use a deep reading session to digest the main content of the paper as you go through the paper from top to bottom.
Explore unfamiliar terms, terminologies, concepts, and methods using external resources.
Summarize in your own words essential takeaways, definitions, and algorithms.

Thanks for reading!

Related resources

DLI course: Building Transformer-Based Natural Language Processing
GTC session: Enterprise MLOps 101
GTC session: Intro to Large Language Models: LLM Tutorial and Disease Diagnosis LLM Lab
GTC session: Build AI Applications with GPU Vector Databases
NGC Containers: MATLAB
Webinar: Empowering Future Engineers and Scientists With AI and NVIDIA Modulus

About the Authors

Letters, numbers, and padlocks on black background

Improving Machine Learning Security Skills at a DEF CON Competition

Community Spotlight: Democratizing Computer Vision and Conversational AI in Kenya

An Important Skill for Data Scientists and Machine Learning Practitioners

AI Pioneers Write So Should Data Scientists

Meet the Researcher: Peerapon Vateekul, Deep Learning Solutions for Medical Diagnosis and NLP

Next-Generation Seismic Monitoring with Neural Operators

Simulating realistic traffic behavior with a bi-level imitation learning ai model.

Analyzing the Security of Machine Learning Research Code

Research unveils breakthrough deep learning tool for understanding neural activity and movement control, generative ai research empowers creators with guided image structure control.

Docs »
Key Papers in Deep RL
Edit on GitHub

Key Papers in Deep RL ¶

What follows is a list of papers in deep RL that are worth reading. This is far from comprehensive, but should provide a useful starting point for someone looking to do research in the field.

Table of Contents

1. Model-Free RL
2. Exploration
3. Transfer and Multitask RL
4. Hierarchy
6. Model-Based RL
8. Scaling RL
9. RL in the Real World
11. Imitation Learning and Inverse Reinforcement Learning
12. Reproducibility, Analysis, and Critique
13. Bonus: Classic Papers in RL Theory or Review

1. Model-Free RL ¶

A. deep q-learning ¶, b. policy gradients ¶, c. deterministic policy gradients ¶, d. distributional rl ¶, e. policy gradients with action-dependent baselines ¶, f. path-consistency learning ¶, g. other directions for combining policy-learning and q-learning ¶, h. evolutionary algorithms ¶, 2. exploration ¶, a. intrinsic motivation ¶, b. unsupervised rl ¶, 3. transfer and multitask rl ¶, 4. hierarchy ¶, 5. memory ¶, 6. model-based rl ¶, a. model is learned ¶, b. model is given ¶, 7. meta-rl ¶, 8. scaling rl ¶, 9. rl in the real world ¶, 10. safety ¶, 11. imitation learning and inverse reinforcement learning ¶, 12. reproducibility, analysis, and critique ¶, 13. bonus: classic papers in rl theory or review ¶.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications

The most cited deep learning papers

terryum/awesome-deep-learning-papers

Folders and files, repository files navigation, awesome - most cited deep learning papers.

[Notice] This list is not being maintained anymore because of the overwhelming amount of deep learning papers published every day since 2017.

A curated list of the most cited deep learning papers (2012-2016)

We believe that there exist classic deep learning papers which are worth reading regardless of their application domain. Rather than providing overwhelming amount of papers, We would like to provide a curated list of the awesome deep learning papers which are considered as must-reads in certain research domains.

Before this list, there exist other awesome deep learning lists , for example, Deep Vision and Awesome Recurrent Neural Networks . Also, after this list comes out, another awesome list for deep learning beginners, called Deep Learning Papers Reading Roadmap , has been created and loved by many deep learning researchers.

Although the Roadmap List includes lots of important deep learning papers, it feels overwhelming for me to read them all. As I mentioned in the introduction, I believe that seminal works can give us lessons regardless of their application domain. Thus, I would like to introduce top 100 deep learning papers here as a good starting point of overviewing deep learning researches.

To get the news for newly released papers everyday, follow my twitter or facebook page !

Awesome list criteria

A list of top 100 deep learning papers published from 2012 to 2016 is suggested.
If a paper is added to the list, another paper (usually from *More Papers from 2016" section) should be removed to keep top 100 papers. (Thus, removing papers is also important contributions as well as adding papers)
Papers that are important, but failed to be included in the list, will be listed in More than Top 100 section.
Please refer to New Papers and Old Papers sections for the papers published in recent 6 months or before 2012.

(Citation criteria)

< 6 months : New Papers (by discussion)
2016 : +60 citations or "More Papers from 2016"
2015 : +200 citations
2014 : +400 citations
2013 : +600 citations
2012 : +800 citations
~2012 : Old Papers (by discussion)

Please note that we prefer seminal deep learning papers that can be applied to various researches rather than application papers. For that reason, some papers that meet the criteria may not be accepted while others can be. It depends on the impact of the paper, applicability to other researches scarcity of the research domain, and so on.

We need your contributions!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. (Please read the contributing guide for further instructions, though just letting me know the title of papers can also be a big contribution to us.)

(Update) You can download all top-100 papers with this and collect all authors' names with this . Also, bib file for all top-100 papers are available. Thanks, doodhwala, Sven and grepinsight !

Can anyone contribute the code for obtaining the statistics of the authors of Top-100 papers?

Understanding / Generalization / Transfer

Optimization / training techniques, unsupervised / generative models.

Convolutional Network Models
Image Segmentation / Object Detection

Image / Video / Etc

Natural language processing / rnns, speech / other domain, reinforcement learning / robotics, more papers from 2016.

(More than Top 100)

New Papers : Less than 6 months
Old Papers : Before 2012
HW / SW / Dataset : Technical reports

Book / Survey / Review

Video lectures / tutorials / blogs.

Appendix: More than Top 100 : More papers not in the list
Distilling the knowledge in a neural network (2015), G. Hinton et al. [pdf]
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (2015), A. Nguyen et al. [pdf]
How transferable are features in deep neural networks? (2014), J. Yosinski et al. [pdf]
CNN features off-the-Shelf: An astounding baseline for recognition (2014), A. Razavian et al. [pdf]
Learning and transferring mid-Level image representations using convolutional neural networks (2014), M. Oquab et al. [pdf]
Visualizing and understanding convolutional networks (2014), M. Zeiler and R. Fergus [pdf]
Decaf: A deep convolutional activation feature for generic visual recognition (2014), J. Donahue et al. [pdf]
Training very deep networks (2015), R. Srivastava et al. [pdf]
Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy [pdf]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), K. He et al. [pdf]
Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al. [pdf]
Adam: A method for stochastic optimization (2014), D. Kingma and J. Ba [pdf]
Improving neural networks by preventing co-adaptation of feature detectors (2012), G. Hinton et al. [pdf]
Random search for hyper-parameter optimization (2012) J. Bergstra and Y. Bengio [pdf]
Pixel recurrent neural networks (2016), A. Oord et al. [pdf]
Improved techniques for training GANs (2016), T. Salimans et al. [pdf]
Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. [pdf]
DRAW: A recurrent neural network for image generation (2015), K. Gregor et al. [pdf]
Generative adversarial nets (2014), I. Goodfellow et al. [pdf]
Auto-encoding variational Bayes (2013), D. Kingma and M. Welling [pdf]
Building high-level features using large scale unsupervised learning (2013), Q. Le et al. [pdf]

Convolutional Neural Network Models

Rethinking the inception architecture for computer vision (2016), C. Szegedy et al. [pdf]
Inception-v4, inception-resnet and the impact of residual connections on learning (2016), C. Szegedy et al. [pdf]
Identity Mappings in Deep Residual Networks (2016), K. He et al. [pdf]
Deep residual learning for image recognition (2016), K. He et al. [pdf]
Spatial transformer network (2015), M. Jaderberg et al., [pdf]
Going deeper with convolutions (2015), C. Szegedy et al. [pdf]
Very deep convolutional networks for large-scale image recognition (2014), K. Simonyan and A. Zisserman [pdf]
Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al. [pdf]
OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al. [pdf]
Maxout networks (2013), I. Goodfellow et al. [pdf]
Network in network (2013), M. Lin et al. [pdf]
ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al. [pdf]

Image: Segmentation / Object Detection

You only look once: Unified, real-time object detection (2016), J. Redmon et al. [pdf]
Fully convolutional networks for semantic segmentation (2015), J. Long et al. [pdf]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015), S. Ren et al. [pdf]
Fast R-CNN (2015), R. Girshick [pdf]
Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al. [pdf]
Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al. [pdf]
Semantic image segmentation with deep convolutional nets and fully connected CRFs , L. Chen et al. [pdf]
Learning hierarchical features for scene labeling (2013), C. Farabet et al. [pdf]
Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al. [pdf]
A neural algorithm of artistic style (2015), L. Gatys et al. [pdf]
Deep visual-semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei-Fei [pdf]
Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al. [pdf]
Show and tell: A neural image caption generator (2015), O. Vinyals et al. [pdf]
Long-term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al. [pdf]
VQA: Visual question answering (2015), S. Antol et al. [pdf]
DeepFace: Closing the gap to human-level performance in face verification (2014), Y. Taigman et al. [pdf] :
Large-scale video classification with convolutional neural networks (2014), A. Karpathy et al. [pdf]
Two-stream convolutional networks for action recognition in videos (2014), K. Simonyan et al. [pdf]
3D convolutional neural networks for human action recognition (2013), S. Ji et al. [pdf]
Neural Architectures for Named Entity Recognition (2016), G. Lample et al. [pdf]
Exploring the limits of language modeling (2016), R. Jozefowicz et al. [pdf]
Teaching machines to read and comprehend (2015), K. Hermann et al. [pdf]
Effective approaches to attention-based neural machine translation (2015), M. Luong et al. [pdf]
Conditional random fields as recurrent neural networks (2015), S. Zheng and S. Jayasumana. [pdf]
Memory networks (2014), J. Weston et al. [pdf]
Neural turing machines (2014), A. Graves et al. [pdf]
Neural machine translation by jointly learning to align and translate (2014), D. Bahdanau et al. [pdf]
Sequence to sequence learning with neural networks (2014), I. Sutskever et al. [pdf]
Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014), K. Cho et al. [pdf]
A convolutional neural network for modeling sentences (2014), N. Kalchbrenner et al. [pdf]
Convolutional neural networks for sentence classification (2014), Y. Kim [pdf]
Glove: Global vectors for word representation (2014), J. Pennington et al. [pdf]
Distributed representations of sentences and documents (2014), Q. Le and T. Mikolov [pdf]
Distributed representations of words and phrases and their compositionality (2013), T. Mikolov et al. [pdf]
Efficient estimation of word representations in vector space (2013), T. Mikolov et al. [pdf]
Recursive deep models for semantic compositionality over a sentiment treebank (2013), R. Socher et al. [pdf]
Generating sequences with recurrent neural networks (2013), A. Graves. [pdf]
End-to-end attention-based large vocabulary speech recognition (2016), D. Bahdanau et al. [pdf]
Deep speech 2: End-to-end speech recognition in English and Mandarin (2015), D. Amodei et al. [pdf]
Speech recognition with deep recurrent neural networks (2013), A. Graves [pdf]
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012), G. Hinton et al. [pdf]
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition (2012) G. Dahl et al. [pdf]
Acoustic modeling using deep belief networks (2012), A. Mohamed et al. [pdf]
End-to-end training of deep visuomotor policies (2016), S. Levine et al. [pdf]
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection (2016), S. Levine et al. [pdf]
Asynchronous methods for deep reinforcement learning (2016), V. Mnih et al. [pdf]
Deep Reinforcement Learning with Double Q-Learning (2016), H. Hasselt et al. [pdf]
Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al. [pdf]
Continuous control with deep reinforcement learning (2015), T. Lillicrap et al. [pdf]
Human-level control through deep reinforcement learning (2015), V. Mnih et al. [pdf]
Deep learning for detecting robotic grasps (2015), I. Lenz et al. [pdf]
Playing atari with deep reinforcement learning (2013), V. Mnih et al. [pdf] )
Layer Normalization (2016), J. Ba et al. [pdf]
Learning to learn by gradient descent by gradient descent (2016), M. Andrychowicz et al. [pdf]
Domain-adversarial training of neural networks (2016), Y. Ganin et al. [pdf]
WaveNet: A Generative Model for Raw Audio (2016), A. Oord et al. [pdf] [web]
Colorful image colorization (2016), R. Zhang et al. [pdf]
Generative visual manipulation on the natural image manifold (2016), J. Zhu et al. [pdf]
Texture networks: Feed-forward synthesis of textures and stylized images (2016), D Ulyanov et al. [pdf]
SSD: Single shot multibox detector (2016), W. Liu et al. [pdf]
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size (2016), F. Iandola et al. [pdf]
Eie: Efficient inference engine on compressed deep neural network (2016), S. Han et al. [pdf]
Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1 (2016), M. Courbariaux et al. [pdf]
Dynamic memory networks for visual and textual question answering (2016), C. Xiong et al. [pdf]
Stacked attention networks for image question answering (2016), Z. Yang et al. [pdf]
Hybrid computing using a neural network with dynamic external memory (2016), A. Graves et al. [pdf]
Google's neural machine translation system: Bridging the gap between human and machine translation (2016), Y. Wu et al. [pdf]

Newly published papers (< 6 months) which are worth reading

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017), Andrew G. Howard et al. [pdf]
Convolutional Sequence to Sequence Learning (2017), Jonas Gehring et al. [pdf]
A Knowledge-Grounded Neural Conversation Model (2017), Marjan Ghazvininejad et al. [pdf]
Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour (2017), Priya Goyal et al. [pdf]
TACOTRON: Towards end-to-end speech synthesis (2017), Y. Wang et al. [pdf]
Deep Photo Style Transfer (2017), F. Luan et al. [pdf]
Evolution Strategies as a Scalable Alternative to Reinforcement Learning (2017), T. Salimans et al. [pdf]
Deformable Convolutional Networks (2017), J. Dai et al. [pdf]
Mask R-CNN (2017), K. He et al. [pdf]
Learning to discover cross-domain relations with generative adversarial networks (2017), T. Kim et al. [pdf]
Deep voice: Real-time neural text-to-speech (2017), S. Arik et al., [pdf]
PixelNet: Representation of the pixels, by the pixels, and for the pixels (2017), A. Bansal et al. [pdf]
Batch renormalization: Towards reducing minibatch dependence in batch-normalized models (2017), S. Ioffe. [pdf]
Wasserstein GAN (2017), M. Arjovsky et al. [pdf]
Understanding deep learning requires rethinking generalization (2017), C. Zhang et al. [pdf]
Least squares generative adversarial networks (2016), X. Mao et al. [pdf]

Classic papers published before 2012

An analysis of single-layer networks in unsupervised feature learning (2011), A. Coates et al. [pdf]
Deep sparse rectifier neural networks (2011), X. Glorot et al. [pdf]
Natural language processing (almost) from scratch (2011), R. Collobert et al. [pdf]
Recurrent neural network based language model (2010), T. Mikolov et al. [pdf]
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion (2010), P. Vincent et al. [pdf]
Learning mid-level features for recognition (2010), Y. Boureau [pdf]
A practical guide to training restricted boltzmann machines (2010), G. Hinton [pdf]
Understanding the difficulty of training deep feedforward neural networks (2010), X. Glorot and Y. Bengio [pdf]
Why does unsupervised pre-training help deep learning (2010), D. Erhan et al. [pdf]
Learning deep architectures for AI (2009), Y. Bengio. [pdf]
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations (2009), H. Lee et al. [pdf]
Greedy layer-wise training of deep networks (2007), Y. Bengio et al. [pdf]
Reducing the dimensionality of data with neural networks, G. Hinton and R. Salakhutdinov. [pdf]
A fast learning algorithm for deep belief nets (2006), G. Hinton et al. [pdf]
Gradient-based learning applied to document recognition (1998), Y. LeCun et al. [pdf]
Long short-term memory (1997), S. Hochreiter and J. Schmidhuber. [pdf]

HW / SW / Dataset

SQuAD: 100,000+ Questions for Machine Comprehension of Text (2016), Rajpurkar et al. [pdf]
OpenAI gym (2016), G. Brockman et al. [pdf]
TensorFlow: Large-scale machine learning on heterogeneous distributed systems (2016), M. Abadi et al. [pdf]
Theano: A Python framework for fast computation of mathematical expressions, R. Al-Rfou et al.
Torch7: A matlab-like environment for machine learning, R. Collobert et al. [pdf]
MatConvNet: Convolutional neural networks for matlab (2015), A. Vedaldi and K. Lenc [pdf]
Imagenet large scale visual recognition challenge (2015), O. Russakovsky et al. [pdf]
Caffe: Convolutional architecture for fast feature embedding (2014), Y. Jia et al. [pdf]
On the Origin of Deep Learning (2017), H. Wang and Bhiksha Raj. [pdf]
Deep Reinforcement Learning: An Overview (2017), Y. Li, [pdf]
Neural Machine Translation and Sequence-to-sequence Models(2017): A Tutorial, G. Neubig. [pdf]
Neural Network and Deep Learning (Book, Jan 2017), Michael Nielsen. [html]
Deep learning (Book, 2016), Goodfellow et al. [html]
LSTM: A search space odyssey (2016), K. Greff et al. [pdf]
Tutorial on Variational Autoencoders (2016), C. Doersch. [pdf]
Deep learning (2015), Y. LeCun, Y. Bengio and G. Hinton [pdf]
Deep learning in neural networks: An overview (2015), J. Schmidhuber [pdf]
Representation learning: A review and new perspectives (2013), Y. Bengio et al. [pdf]
CS231n, Convolutional Neural Networks for Visual Recognition, Stanford University [web]
CS224d, Deep Learning for Natural Language Processing, Stanford University [web]
Oxford Deep NLP 2017, Deep Learning for Natural Language Processing, University of Oxford [web]

(Tutorials)

NIPS 2016 Tutorials, Long Beach [web]
ICML 2016 Tutorials, New York City [web]
ICLR 2016 Videos, San Juan [web]
Deep Learning Summer School 2016, Montreal [web]
Bay Area Deep Learning School 2016, Stanford [web]
OpenAI [web]
Distill [web]
Andrej Karpathy Blog [web]
Colah's Blog [Web]
WildML [Web]
FastML [web]
TheMorningPaper [web]

Appendix: More than Top 100

A character-level decoder without explicit segmentation for neural machine translation (2016), J. Chung et al. [pdf]
Dermatologist-level classification of skin cancer with deep neural networks (2017), A. Esteva et al. [html]
Weakly supervised object localization with multi-fold multiple instance learning (2017), R. Gokberk et al. [pdf]
Brain tumor segmentation with deep neural networks (2017), M. Havaei et al. [pdf]
Professor Forcing: A New Algorithm for Training Recurrent Networks (2016), A. Lamb et al. [pdf]
Adversarially learned inference (2016), V. Dumoulin et al. [web] [pdf]
Understanding convolutional neural networks (2016), J. Koushik [pdf]
Taking the human out of the loop: A review of bayesian optimization (2016), B. Shahriari et al. [pdf]
Adaptive computation time for recurrent neural networks (2016), A. Graves [pdf]
Densely connected convolutional networks (2016), G. Huang et al. [pdf]
Region-based convolutional networks for accurate object detection and segmentation (2016), R. Girshick et al.
Continuous deep q-learning with model-based acceleration (2016), S. Gu et al. [pdf]
A thorough examination of the cnn/daily mail reading comprehension task (2016), D. Chen et al. [pdf]
Achieving open vocabulary neural machine translation with hybrid word-character models, M. Luong and C. Manning. [pdf]
Very Deep Convolutional Networks for Natural Language Processing (2016), A. Conneau et al. [pdf]
Bag of tricks for efficient text classification (2016), A. Joulin et al. [pdf]
Efficient piecewise training of deep structured models for semantic segmentation (2016), G. Lin et al. [pdf]
Learning to compose neural networks for question answering (2016), J. Andreas et al. [pdf]
Perceptual losses for real-time style transfer and super-resolution (2016), J. Johnson et al. [pdf]
Reading text in the wild with convolutional neural networks (2016), M. Jaderberg et al. [pdf]
What makes for effective detection proposals? (2016), J. Hosang et al. [pdf]
Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks (2016), S. Bell et al. [pdf] .
Instance-aware semantic segmentation via multi-task network cascades (2016), J. Dai et al. [pdf]
Conditional image generation with pixelcnn decoders (2016), A. van den Oord et al. [pdf]
Deep networks with stochastic depth (2016), G. Huang et al., [pdf]
Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics (2016), Yee Whye Teh et al. [pdf]
Ask your neurons: A neural-based approach to answering questions about images (2015), M. Malinowski et al. [pdf]
Exploring models and data for image question answering (2015), M. Ren et al. [pdf]
Are you talking to a machine? dataset and methods for multilingual image question (2015), H. Gao et al. [pdf]
Mind's eye: A recurrent visual representation for image caption generation (2015), X. Chen and C. Zitnick. [pdf]
From captions to visual concepts and back (2015), H. Fang et al. [pdf] .
Towards AI-complete question answering: A set of prerequisite toy tasks (2015), J. Weston et al. [pdf]
Ask me anything: Dynamic memory networks for natural language processing (2015), A. Kumar et al. [pdf]
Unsupervised learning of video representations using LSTMs (2015), N. Srivastava et al. [pdf]
Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (2015), S. Han et al. [pdf]
Improved semantic representations from tree-structured long short-term memory networks (2015), K. Tai et al. [pdf]
Character-aware neural language models (2015), Y. Kim et al. [pdf]
Grammar as a foreign language (2015), O. Vinyals et al. [pdf]
Trust Region Policy Optimization (2015), J. Schulman et al. [pdf]
Beyond short snippents: Deep networks for video classification (2015) [pdf]
Learning Deconvolution Network for Semantic Segmentation (2015), H. Noh et al. [pdf]
Learning spatiotemporal features with 3d convolutional networks (2015), D. Tran et al. [pdf]
Understanding neural networks through deep visualization (2015), J. Yosinski et al. [pdf]
An Empirical Exploration of Recurrent Network Architectures (2015), R. Jozefowicz et al. [pdf]
Deep generative image models using a laplacian pyramid of adversarial networks (2015), E.Denton et al. [pdf]
Gated Feedback Recurrent Neural Networks (2015), J. Chung et al. [pdf]
Fast and accurate deep network learning by exponential linear units (ELUS) (2015), D. Clevert et al. [pdf]
Pointer networks (2015), O. Vinyals et al. [pdf]
Visualizing and Understanding Recurrent Networks (2015), A. Karpathy et al. [pdf]
Attention-based models for speech recognition (2015), J. Chorowski et al. [pdf]
End-to-end memory networks (2015), S. Sukbaatar et al. [pdf]
Describing videos by exploiting temporal structure (2015), L. Yao et al. [pdf]
A neural conversational model (2015), O. Vinyals and Q. Le. [pdf]
Improving distributional similarity with lessons learned from word embeddings, O. Levy et al. [[pdf]] ( https://www.transacl.org/ojs/index.php/tacl/article/download/570/124 )
Transition-Based Dependency Parsing with Stack Long Short-Term Memory (2015), C. Dyer et al. [pdf]
Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs (2015), M. Ballesteros et al. [pdf]
Finding function in form: Compositional character models for open vocabulary word representation (2015), W. Ling et al. [pdf]
DeepPose: Human pose estimation via deep neural networks (2014), A. Toshev and C. Szegedy [pdf]
Learning a Deep Convolutional Network for Image Super-Resolution (2014, C. Dong et al. [pdf]
Recurrent models of visual attention (2014), V. Mnih et al. [pdf]
Empirical evaluation of gated recurrent neural networks on sequence modeling (2014), J. Chung et al. [pdf]
Addressing the rare word problem in neural machine translation (2014), M. Luong et al. [pdf]
On the properties of neural machine translation: Encoder-decoder approaches (2014), K. Cho et. al.
Recurrent neural network regularization (2014), W. Zaremba et al. [pdf]
Intriguing properties of neural networks (2014), C. Szegedy et al. [pdf]
Towards end-to-end speech recognition with recurrent neural networks (2014), A. Graves and N. Jaitly. [pdf]
Scalable object detection using deep neural networks (2014), D. Erhan et al. [pdf]
On the importance of initialization and momentum in deep learning (2013), I. Sutskever et al. [pdf]
Regularization of neural networks using dropconnect (2013), L. Wan et al. [pdf]
Learning Hierarchical Features for Scene Labeling (2013), C. Farabet et al. [pdf]
Linguistic Regularities in Continuous Space Word Representations (2013), T. Mikolov et al. [pdf]
Large scale distributed deep networks (2012), J. Dean et al. [pdf]
A Fast and Accurate Dependency Parser using Neural Networks. Chen and Manning. [pdf]

Acknowledgement

Thank you for all your contributions. Please make sure to read the contributing guide before you make a pull request.

To the extent possible under law, Terry T. Um has waived all copyright and related or neighboring rights to this work.

Contributors 32

Python 14.3%

How to read Machine Learning and Deep Learning Research papers

Tips on preparing Literature survey of a field and how to read a ML / DL research papers. The 3 pass method to read ML or DL research papers is discussed.

Jul 31, 2021 • Sai Amrit Patnaik • 23 min read

research reading_papers

Introduction

Dynamically expanding field of deep learning, why to read research papers, step 1: assembling all available resources, step2 - filtering out relevant and irrelevant resources, step3: taking systematic notes, organization of a paper, how to read a research paper, second pass, important questions to answer.

How to read a research paper, is probably the most important skill which any one who is into research or even anyone who wishes to be updated in the field with latest advancements has to master. When someone thinks of starting out in a domain, the first advice that comes is to look for relevant literature in the domain and read papers to develop an understanding of the domain. Papers are the most reliable and updated source of information about a particular domain. A research paper is a result of days of brainstorming of ideas, and structured and systematic experimentation to express an approach.

But why is reading papers considered such an important skill to be learnt ? Why is even reading papers necessary ? Let’s take on some motivation as to why is reading papers important to keep-up with the latest advances.

This article is the summary of a talk that I delivered for the Introductory Paper Reading Session generously supported by Weights and Biases whose recorded version can be found here and slides can be found here .

The field of deep learning has grown very rapidly in the recent years. We can quantify growth in a field by theh number of papers that come up everyday. Here is an illustration from one of the studies by ArXiv which is one of the platform where almost all of the papers, whether published or unpublished are putup.

From the figure we can see that the average no of papers has grown to 5X averaging from 300 papers per month in 2017 to around 1500 papers per month in 2019. The figure would probably be close to or above 2k papers per month in 2021. This is a huge number of papers coming up everyday. This shows how dynamic the field is at the current time and it is just growing exponentially in terms of number of papers and amount of new ideas and experiments coming up everyday.

Let’s look at another figure from another study by arXiv

From the figure, the number of papers in the field of Computer Science has grown like a step exponential curve and we see that around 36k papers come out each year out of which around 24k of them as we saw in the previous section are in the field of ML and DL. We can also see in both the figures that the DL field in Green and CV in yellow are among the dominant areas in terms of percentages of papers coming out every year since the early 2000s while the field of CV has grown and opened up a lot after 2012 probably when the prominent work on Image classification by deep networks showed significant performance. These studies definitely speak how fast the field of computer Science is growing and amongst it, how the sub areas related to Machine Learning and Deep Learning are evolving too.

I hope these give a good idea of how fast the field has been evolving and would continue to evolve even faster in the future. But in this fast evolving field, How can we keep up with the pace and develop a expertise in the field ?

Quoting Dr. Jennifer Raff , To form a truly educated opinion on a scientific subject, you need to become familiar with current research in that field. And to be able to distinguish between good and bad interpretations of research, you have to be willing and able to read the primary research literature for yourself.

To have a better grasp and understanding of the field: For a particular field, there may be a lot of video lectures and books but with the rate at which the field has been growing, no book or video lecture can accomodate the latest information as soon as they get published. So research papers provide the most updated and reliable information in the field.
To be able to contribute to the field in terms of novel ideas: When we start working in a field, the first thing that we are advised to do is to do an extensive literature survey, going through all of the latest papers that have come up in the field till date. That is advised because we can have a very good understanding of the directions of works in the field and how the people actively working in the field are thinking by reading papers. Only then we can start coming up with our own ideas to experiment upon.
To develop confidence in the field: Once we start learning about the latest works in the field and we start to develop a good understanding by performing a extensive literature survey, we start developing more confidence to perform more experiments and exploring deeper in the field.
Most condensed and authentic source of latest knowledge in the field: A reseach paper comes out of days and months, or some times even years of brainstorming of ideas, performing extensive experiments and validating the expected outcomes. The condensed experiments and thoughts is what is best expressed in a research paper that the authors write. Any new content that comes in the field in terms of state-of-the-art works is through research papers. Research papers are the source through which works that push the limits of knowledge in a field come up.

Motivated enough ?

Now that we have attained enough motivations as to why we should read research papers, lets look at how to do literature survey in a domain.

Let’s do it !

Literature survey of a domain

The basic steps to perform literature survey in a field are the following:

Assemble collections of resources in the form of research papers, Medium articles, blog posts, videos, GitHub repository etc.
Conduct a deep dive to classify the relevant and irrelevant material.
Take structured notes that summarises the key discoveries, findings and techniques within a paper.

We shall take Pose Estimation as a example domain and understand each step.

First of all we collect all the resources in the form of blog posts, github repositories, medium articles and research papers available in the field, for our case it’s pose estimation. The important question here is, where can we find relevant resources in the field ?

Following are sources where we can find the latest papers and resources:

Twitter : We can follow top researchers, groups and labs actively working and publishing in our field of domain and be updated with what they are currently working on.
ML subreddit
arXiv : Platform where almost all of the papers be it accepted to a conference or not, are uploaded.
Arxiv Sanity Preserver : Created by Anderj Karpathy which used ML techniques to suggest relevant papers based on previous searches and interests.
Papers With Code : Redirects to the paper’s abstract page on arXiv, open source implementation of the papers along with links to datasets used and a lot of other analysis and meta information like the current state-of-the art method, comparision of performance of all previous methods in the field e.t.c.
Top ML, DL Conferences ( CVPR , ICCV , NeurIPS , ICML , ICLR etc): Proceedings of the following conferences are a great place to look for latest accepted works in the domains accepted by the conference.
Google Search

Once listed down all the papers that we wish to look at and all resources we could find be it relevant or irrelevant, a table of this format shown in figure 3 can be prepared and in the first column, all the resources collected can be listed down.

Once listed down all the resources and prepared a table like the one shown in figure 3, the next step is to keep the relevant resources and reject the un-necessary ones which may not be directly related to what we want to work on our our research objectives. Follow the following steps to do that:

For all the resources listed down, finish 10% of reading of each resource or research paper(first pass reading, we will discuss about it later). If we find it not related to our research objective, we can reject it.

If that resurce is related to our objective and is relevant and important to us, do a complete full pass reading over the paper. From the references, if we find any other relevant reference then mark those in the original paper and add them to the list and repeat the same over this new paper or resource now.

So after this, this is what the final table might look like this,

Notice that the 2nd, 4th and 6th resources were important and relevant so we read it in detail but the other oned were not very important or the entire thing was not relevant so we read through some portion of each, whatever was necessary and left the rest.

Such a table can be really useful when we return back to it after some months or years to look for or recall what we have read or the papers we have already looked at and rejected. It helps us to save a lot of time iterating over unnecessary resources and helps us effectively dedicate time to the useful resources.

Once decided on which papers to read, this step depends on the individial about they want to go about taking notes. I personally follow a annotation tool to annotate different sections of the paper according to my comfort. I prepare some flow charts for the entire flow of the paper, write some explaining notes on the paper and summarise each paper to the best of my understanding to a github repository. Here I would Like to give a shoutout to Akshay Uppal who had generously shared his blogpost with his annotated version of the MLP Mixer paper for the Weights and Biases paper reading group . I also wish to share one of my repositories of literature survey when I started working on the field of face spoofing.

Tip: You can use your own ways of making yourself comfortable with the content and taking notes either on github, notion or google docs e.t.c to organise notes.

The majority of papers follow, more or less, the same convention of organization:

Title: Hopefully catchy ! Includes additional info about the authors and their institutions.
Abstract: High level summary of the entire work of the paper.
Introduction: Background info on the field and related research leading up to this paper.
Related works: Describe the already existing literature on the particular domain.
Methods: Highly detailed section on the study that was conducted, how it was set up, any instruments used, and finally, the process and workflow.
Results: Authors talk about the data that was created or collected, it should read as an unbiased account of what occurred.
Discussions: Here is where authors interpret the results, and convince the readers of their findings and hypothesis.
References: Any other work that was cited in the body of the text will show up here.
Appendix: More figures, additional treatments on related math, or extra items of interest can find their way in an appendix.

Finally coming to the most awaited section of the blogpost !

Now that we know about the different sections of a paper, to understand how to read a paper, we need to understand how a author writes a paper. The intension of an author writing a paper is to get it accepted at a conference. In conferences, reviewers read all the submissions and take a decision based on the work and the scope and expectations of the conference. Let’s have a quick understanding of how the review process works at a very high level.

Warning: Reading a paper sequentially one section after another is not a good option.

In most of the top conferences, there are two submission deadlines: one, the abstract submission deadline. Second, the actual paper submission deadline. So why exactly are there 2 deadlines ? A separate deadline for abstract even before the actual paper deadline definitely implies that abstract is an important part of the paper. But Why is abstract important ?

Note: While Considering to submit for a conference, always note they have 2 deadlines: One, for abstract submission. Second, for the full paper submission.

Every year, a lot of papers get submitted to each conference. The number of submissions are in tens of thousands and it is not feasible to read through all the papers irrespective of how many reviewers the conference can have. So to make the review process easier and quicker, there is a guideline how different sections of a paper must be written and the reviewer also reads in that same pattern.

The first level of review is always the abstract filtering . The abstract is supposed to summarise the entire work briefly and it should clearly state the problem statement and the solution very briefly. If the abstract doesnot satisfy these criterias, the paper gets rejected in this filtering. So the abstract should clearly expain the gist of the work. Hence while reading paper too, the abstract is the place where we can find the gist of the paper clearly and briefly. Hence the abstract is read first to get an overall idea of the entire work. The authors also spend a lot of efforts in getting one figure which gives a visual illustration of the entire approach or a complete flow chart of the entire work. Even this figure contains a gist of the entire method of the paper. The authors try to condense and pack of information about thier work in a single figure.

Note: The abstract is one of the most important sections in a paper and it explains the entire gist of the paper in brief and the most important figure summarises the method adopted.

The reviewers then read the introduction section as it should explain the problem statement in a detailed way and the main proposal of the paper and the contributions. Immediately after this section, once you know what the paper is assuming, the conclusion section tells about the conclusion of the work and whether the assumptions and expectations presented in the introduction are satisfied or not.

Note: The introduction section is supposed to explain the problem statement in detail and the major contributions of the paper. We get to know the intent of the author from this section. The Conclusion section validates the assumptions and propositions given in the introduction through experiments and proofs.

After validating that the assumed propositions have been validated successfully, the method section is seen in detail to see what approach was taken to acheive the goal. In the discussion section, the experiments are explained as to why exactly the proposed method works. This is basically how a reviewer reads a paper and it is the same approach that is to be taken by a reader like us to read a paper.

3 pass approach to read a research paper

A 3 pass approach is taken to read research papers. The content covered in each passes is in sync with the discussion on the review procedure from last section. Following are the 3 passes:

Should be able to answer the five C’s (Category, Context, Correctness, Contribution, Clarity)
Second Pass: Read the Introduction, Conclusion and rest figures and skim rest of the sections(ignoring the details such as mathematical derivations proofs e.t.c.).
Third Pass: Reading the entire paper with an intention to reimplement it.

Lets go into detail of each section.

The main intension in the first pass is to understand the overall gist of the paper and have a bird’s eye view of the paper. The intension is to get into the authors intent about the problem statement and his thought process to develop a solution to it. The major sections which should be focused in this pass are the Abstract and the summarising figure and extract the beat possible information of the problem statemant the paper is addressing, solution and the method. The following points are what we cover in the first pass:

Read through the Title, abstract and the summarising figure.
Skip all other details of the paper.
Glance at the paper and understand its overall structure.
Category : Which category of paper is it, whether its an architecture paper, or a new training strategy, or a new loss function ar is it a review paper e.t.c.
Context : What previous works and area does it relate to. E.g - while Reading the DenseNet paper, it falls in the context of architectural papers and it falls into the resnet kind of networks architecture context.
Correctness : How correct and valid is the problem statement that the problem is addressing and how correct does the proposed solution sound. Honesty this can’t be totally jugded from just the first pass completely as a complete answer and unserstanding of correctness would need looking at the conclusion section, but try to judge as best to your knowledge about the correctness.
Contribution : What exactly is the contribution of the paper to the community. Eg - the resnet paper contributed the resisual block and skip connection architecture.
Clarity : How clearly does the abstract explain the problem statement and their approach towards it.
Based on our understanding of first pass, we decide weather to go forward or stop with the paper for a detailed study into further passes.

While discussing about literature survey, I mentioned about the 10% study on each resource to figure out if that resource is relevant to us. The 10% basically meant doing a first pass over all the resources.

Note: After the first pass, we understand the gist of the paper and get into the intent and thinking of the author.

After getting an overall gist of the paper after the first pass, we headon to the 2nd pass of the paper. The main intention of this section is to understand the paper in a litle more detail in terms of understanding the problemstatement in detail, validating if the paper validates the propositions it made to solve, understand the method in detail and understand the experiments well through the discussion section. The following is what we do in a 2nd pass:

Reading more in depth through the Introduction, conclusion and other figures.
Literature survey, Mathematical derivations, proofs etc and any thing that seems complicated and needs extended study from the references or other resources are skipped.
Understand the other figures in the paper properly, develop intuition about the tables, charts and analysis presented. These figures contain a lot of latent information and explain a lot more things. so it is important to extract the maximum understanding from the figures
Discuss the gist of the paper and main contents with a friend or colleague.
Mark relevant references that may be required to be revisited later.
Decide weather to go forward or stop based on this pass.

After the 2nd pass, we have a good understanding of the paper in terms of the method of the paper, experiments and conclusions out of them. Depending on understanding from it, we go on to the next pass.

Tip: A second pass is suitable for papers that you are interested but not from your field or is not directly related to your research goal.

After getting a more indepth understanding of the paper after a second pass, we go on to the final pass of reading which is the most detailed pass over the paper. This pass is only for papers which are most important for the research objective and are directly related to the objective we are working on. Following are the key points for a third pass:

Reading with an intention to reimplement the paper.
Consider every minor assumption and details and make note of it.
Recreate the exact work as in the paper and compare it with original work
Identify, question and Challenge every assumption in the paper.
Make a flow chart of the entire process considering each step.
Try deriving the mathematical derivations from scratch.
Start looking at the code implementation of various components if an open source implementation is available else try to implement it.

After a third pass, we should be knowing the paper inside out including every minor assumption and detail in it along with a clear understanding of the implementation and good understanding of the hyperparameters of each experiment perform and presented in the paper. After all the passes we can claim to have a clear understanding of the research paper.

To validate our understanding of the paper, there are a few generic question we can try to answer about the paper and if we are able to answer these questions, we have more or less understood the paper to a level where we can use it for our own research as per our requrement and our objective.

Answer to this can be found in brief in the abstract section and in detail in the Introduction section.
self assessment of the problem statement.
Answer to this can be found from the Introduction section.
Answer to this can be found from the Introduction section section in the contributions section and also the methoda section.
Answer to this is the entire method sections and discussions section.
Answer to this is the entire conclusion section.
Many a times a paper has many key elements which they put together to solve their problem statement. At times your problem statement maybe just a subset of the papers problem set or viceversa or a particular element of the paper may be solving some problem youa re interested into and not the others. So it is important to figure out what part of the paper is useful to you.
Some sections of the paper may seem complicated or you may need to look at some previous references to understand this work completely. Also you might find some papers from the citations which are also useful to your research. So figure out the necessary references and refer to them.

Being able to answer all these question to the ebst of our understanding and abilities validates our level of understanding of the paper. These questions can also be attempted after the 2nd pass itself and we can check our understanding after the 2nd pass itself. Then again try to answer them after a 3rd pass and judge if our understanding has improved over the 2nd pass or another pass with deeper exploration is again needed.

Tip: Nothing teaches better than implementing the entire thing from scratch and experimenting and comparing the results with original results. Even if a open source implementation is available, experimentation with the opensource code and coming up with own tweeks to the code, running different hyperparameters can improve our understanding a lot.

Finishing with a important note that reading papers is a skill that can be learnt with consistency over a long period of time. It is not a sprint but a marathon and demands lot of patience and consistency.

I hope I have been able to justify the title of the blog post and explain everything in detail about how to do literature survey of a domain and how to read an ML / DL research paper. Incase I missed out on anything or you have any other comments, reach me out @SaiAmritPatnaik

Thank you !

Andrew Ng’s lecture in CS230 on how to read research papers
S. Keshav’s paper on how to read research papers
Slides of the talk
Blog Post 1 on reading Papers
Blog Post 2 on reading Papers

Subscribe to the PwC Newsletter

Join the community, trending research, kan: kolmogorov-arnold networks.

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

prometheus-eval/prometheus-eval • 2 May 2024

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs.

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.

Improving Diffusion Models for Virtual Try-on

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

VILA: On Pre-training for Visual Language Models

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

2471023025/ralm_survey • 30 Apr 2024

Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge.

Spectrally Pruned Gaussian Fields with Neural Compensation

runyiyang/sundae • 1 May 2024

However, this comes with high memory consumption, e. g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory.

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Introduction to Deep Learning and its related case studies

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Research Papers for Beginners

I am going through the Deep Learning Specialization. I have finished the first two courses and will finish the rest as well.
I am proficient in coding.
I am familiar w/ the basics of calculus, linear algebra and probability having done electrical engineering courses in college and computer science courses in grad school.
I am taking courses in the Math for ML and Data Science Specialization to refresh my math concepts. I have finished the first two courses (though they seemed way too basic).

As I am doing these courses, I wanted to get in the habit of reading 1-2 research papers every week. However, I do not want to start w/ papers that are way too specialized or deal w/ a narrow field or are experimental or have advanced math or a poorly written for the general audience.

I prefer well-written seminal/foundational/mainstream/popular papers (even if they are old) that deal w/ the general concepts and the most popular/well-established algorithms. I prefer papers with some math and that have links to any code/notebooks.

PLEASE can you recommend me a reading list. A list of 10 papers should suffice. I will REALLY appreciate it!

The number of research papers that meet your criteria is extremely small. No one writes research papers on basic concepts.

I think you may have misunderstood my point - sorry if I haven’t explained myself well.

By “basic”, I mean the initial/foundational papers that say came up w/ the concept of Gradient Descent or Neural Networks. As opposed to say some advanced and specialized tuning methods of a minor hyper-parameter. For example, if I had to learn about MapReduce I would read the popular/foundational white paper that came out of the Google team authored by Jeff Dean.

In any case, for a beginner in ML/AI who is interested in Deep Learning/Neural Nets which papers would you recommend they read first after they have done some basic coursework? There must be some selection criteria, no?

The FAQ for the Deep Learning Specialization has a list of reading materials. You can find it here:

Hi @Nandan1 ,

I don’t recall we have such a list of papers as you requested, but I do think you can filter papers with the idea of how you found the paper for MapReduce.

I googled “most cited machine learning papers”, and found this , for example. Scrolling through the list therein, technique-oriented and DLS-covered papers include Adam, BatchNorm, Dropout, Resnet, and so on.

They are well cited, covered by DLS lectures, and popular so that you can easily find other explanations online.

You can decide whether they are right for your level, taking into account other explanations available to you.

If you need more papers, you may google for more similar lists with different keywords (many people like to share their lists on their Github besides in articles), simply scan through the DLS’s tables of content and sort a list of skills in which you are interested and find their papers directly, or use Google Scholar.

In fact, Andrew might have cited some papers in those videos that you are interested in.

If you are interested in a particular subfield, add “literature review” to your keywords.

If you want to help future learners who may have the same request as yours, I encourage you to share your list to the DLS Resources category.

Enjoy your research process!

Thank you Raymond for such a detailed and thoughtful answer!

Hey nandan, im sorry for out of topic but can you share about your leaning experiences in mathematic especially calculus? because i dont know how to understand it xD

IMAGES

(PDF) Deep Learning: An overview and its practical examples
(PDF) Image classification using Deep learning
Deep Learning on Graphs: An Introduction (Chapter 1)
The 9 Deep Learning Papers You Need to Know About 3
[Deep Learning Paper] The Unreasonable Effectiveness of Deep Features
(PDF) Speech Emotion Recognition Using Deep Neural Network and Extreme

VIDEO

Implementing a Deep Learning Research paper in python (Part -1)
Deep Learning Facts 110#shorts
Deep Learning Facts 133#shorts
Deep Learning Facts 148#shorts
Deep Learning Facts 116#shorts
Deep Learning Facts 108#shorts

COMMENTS

7 Best Research Papers To Read To Get Started With Deep Learning
This is especially the case for a beginner who is just trying to get engrossed in the world of deep learning. It might be hard to figure out which research papers are the best starting point for developing new projects and gaining an intuitive understanding of the subject. ... Research Paper: Deep Residual Learning for Image Recognition ...
23 Deep Learning Papers To Get You Started
This paper provides 11 handy tips/lessons, equally applicable to machine learning and deep learning. Learning = Representation + Evaluation + Optimization : Representation is choosing the right ...
Learn To Implement Papers: Beginner's Guide
Usually, older papers describe simpler concepts, which is a big plus for you as a beginner. Paper structure: what to skip, what to read. Typical Deep Learning paper has the following structure: Abstract; Introduction; Related Work; Approach in Details; Experiments; Conclusion; References; Structure of a typical Deep Learning paper. Image by ...
Deep Learning Papers Reading Roadmap
The following papers will take you in-depth understanding of the Deep Learning method, Deep Learning in different areas of application and the frontiers. I suggest that you can choose the following papers based on your interests and research direction. #2 Deep Learning Method
Learn To Implement Papers: Beginner's Guide
Old highly cited papers usually explain very fundamental concepts that became a basis for more recent research. You know the fundamentals — you'll better understand recent papers as well. For Deep Learning, papers before 2016 are considered to be already old. Highly cited papers are reproducible.
How to learn deep learning by reading papers
But if you want to read and understand papers, you need to understand the references. Additionally, this amazing deep learning papers roadmap repository contains chronologically ordered papers from more foundational and generic to more specific. You can use this too to guide your learning. 2. Organizing paper reading.
Deep Learning: A Comprehensive Overview on Techniques ...
Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today's Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various ...
The Principles of Deep Learning Theory arXiv:2106.10165v2 [cs.LG] 24
The Principles of Deep Learning Theory An Eﬀective Theory Approach to Understanding Neural Networks Daniel A. Roberts and Sho Yaida based on research in collaboration with Boris Hanin arXiv:2106.10165v2 [cs.LG] 24 Aug 2021 [email protected], [email protected]. ii. Contents Preface vii
Deep Learning Research and How to Get Immersed
The publication was founded to communicate research in a more transparent and visual way, with interactive widgets, code snippets, and animations embedded into the paper. Awesome Deep Learning Papers is a bit outdated (the last update was made two years ago) but it does list the most cited papers from 2012-2016, sorted by discipline, such as ...
10 Must-Read Research Papers for Deep Learning Developers
The authors showed that their model achieved state-of-the-art performance on several benchmark datasets for object detection and instance segmentation. In summary, these 10 research papers have ...
How to Read Research Papers: A Pragmatic Approach for ML Practitioners
Step 2: Finding research papers. One of the most excellent tools to use while looking at machine learning-related research papers, datasets, code, and other related materials is PapersWithCode. We use the search engine on the PapersWithCode website to get relevant research papers and content for our chosen topic, "Pose Estimation.".
Key Papers in Deep RL
This is far from comprehensive, but should provide a useful starting point for someone looking to do research in the field. Table of Contents. Key Papers in Deep RL. 1. Model-Free RL. 2. Exploration. 3.
Awesome
Before this list, there exist other awesome deep learning lists, for example, Deep Vision and Awesome Recurrent Neural Networks.Also, after this list comes out, another awesome list for deep learning beginners, called Deep Learning Papers Reading Roadmap, has been created and loved by many deep learning researchers.. Although the Roadmap List includes lots of important deep learning papers, it ...
How to read Machine Learning and Deep Learning Research papers
Dynamically Expanding field of Deep Learning. Why to read research Papers. Literature survey of a domain. Step 1: Assembling all available resources. Step2 - Filtering out relevant and Irrelevant resources. Step3: Taking Systematic Notes. Organization of a Paper. How to read a Research Paper. 3 pass approach to read a research paper.
The Complete Beginner's Guide to Deep Learning: Artificial Neural
At a very basic level, deep learning is a machine learning technique. It teaches a computer to filter inputs through layers to learn how to predict and classify information. Observations can be in the form of images, text, or sound. The inspiration for deep learning is the way that the human brain filters information.
Deep learning for healthcare: review, opportunities and challenges
Deep learning framework. Machine learning is a general-purpose method of artificial intelligence that can learn relationships from the data without the need to define them a priori [].The major appeal is the ability to derive predictive models without a need for strong assumptions about the underlying mechanisms, which are usually unknown or insufficiently defined [].
Learn To Implement Papers: Beginner's Guide
Step-by-step instructions on how to understand Deep Learning papers and implement the described approaches. Being able to implement the latest scientific papers is an extremely competitive skill ...
The Complete Beginners Guide to Deep Learning
Essentially, deep learning is a part of the machine learning family that's based on learning data representations (rather than task-specific algorithms). Deep learning is actually closely related to a class of theories about brain development proposed by cognitive neuroscientists in the early '90s. Just like in the brain (or, more ...
The latest in Machine Learning
Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. Papers With Code highlights trending Machine Learning research and the code to implement it.
Introduction to Deep Learning and its related case studies
This paper initially introduces deep learning. The next step in machine learning is deep learning. This paper make deep insightsinto the review of the literature related to deep learning. The papers used various deep learning approaches such as an Autoencoder (AE), convolutional neural network (CNN), deep belief network (DBN), recurrent neural network (RNN). Offshore wind farms are the subject ...
Research Papers for Beginners
Hello, I am going through the Deep Learning Specialization. I have finished the first two courses and will finish the rest as well. I am proficient in coding. I am familiar w/ the basics of calculus, linear algebra and probability having done electrical engineering courses in college and computer science courses in grad school. I am taking courses in the Math for ML and Data Science ...
Deep Learning For Beginners. If you work in the tech sector or have
Deep Learning For Beginners. ... so get the basics sorted and start experimenting with neural networks if you want to go deeper into deep learning. Also, look for research papers on neural networks (architecture, applications, etc.) and read them! I am learning myself, so if you come across any interesting learning material, please do share it ...
AI Papers to Read in 2022
Reason 1: This is a very practical paper. Nearly all of the changes to ResNet can be extended to other models. Section 2.6, in particular, is very actionable and can give you results today. Reason 2: There is quite a hype over Transformers. However, there is more to these papers than Attention.

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Cite this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine learning and deep learning

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Introduction

Why Deep Learning in Today’s Research and Applications?

The Position of Deep Learning in AI

Understanding Various Forms of Data

DL Properties and Dependencies

Deep Learning Techniques and Applications

Deep Networks for Supervised or Discriminative Learning

Multi-layer Perceptron (MLP)

Convolutional Neural Network (CNN or ConvNet)

Recurrent Neural Network (RNN) and its Variants

Deep Networks for Generative or Unsupervised Learning

Generative Adversarial Network (GAN)

Auto-Encoder (AE) and Its Variants

Kohonen Map or Self-Organizing Map (SOM)

Restricted Boltzmann Machine (RBM)

Deep Belief Network (DBN)

Deep Networks for Hybrid Learning and Other Approaches

Hybrid Deep Neural Networks

Deep Transfer Learning (DTL)

Deep Reinforcement Learning (DRL)

Deep Learning Application Summary

Research Directions and Future Aspects

Concluding Remarks

Author information

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Share this article

How to Read Research Papers: A Pragmatic Approach for ML Practitioners

Step 1: Identify a topic

Step 2: Finding research papers

Step 3: First pass (gaining context and understanding)

Step 4: Second pass (content familiarization)

Introduction

Graph, diagrams, figures

Step 5: Third pass (deep reading)

Step 6: Forth pass (final pass)

Step 7: Summary (optional)

Related resources

About the Authors

Related posts

Improving Machine Learning Security Skills at a DEF CON Competition

Community Spotlight: Democratizing Computer Vision and Conversational AI in Kenya

An Important Skill for Data Scientists and Machine Learning Practitioners

AI Pioneers Write So Should Data Scientists

Meet the Researcher: Peerapon Vateekul, Deep Learning Solutions for Medical Diagnosis and NLP

Next-Generation Seismic Monitoring with Neural Operators

Analyzing the Security of Machine Learning Research Code

Key Papers in Deep RL ¶

1. Model-Free RL ¶

Navigation Menu

Saved searches

terryum/awesome-deep-learning-papers

Awesome list criteria

Understanding / Generalization / Transfer

Image / Video / Etc

Book / Survey / Review

Convolutional Neural Network Models

Image: Segmentation / Object Detection

HW / SW / Dataset

Appendix: More than Top 100

Acknowledgement

Contributors 32

How to read Machine Learning and Deep Learning Research papers

Introduction

Literature survey of a domain

3 pass approach to read a research paper

Subscribe to the PwC Newsletter

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

Improving Diffusion Models for Virtual Try-on