## Coursera NLP Module 1 Week 4 Notes

Sep 4, 2020 - 12:09 • Marcos Leal

## Machine Translation: An Overview

## Transforming word vector

Given a set of english words X, a transformation matrix R and a desired set of french word Y the transformation

- \[XR \approx Y\]
- We initialize the weights R randomly and in a loop execute the following steps
- \[Loss = || XR - Y||_F\]
- \[g = \frac{d}{dR} Loss\]
- \[R = R - \alpha g\]

The Frobenius Norm takes all the squares of each elements of the matrix and sum them up.

- \[||A||_F = \sqrt{\sum_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}|^2}\]

To simplify we can take the norm squared, thus:

- \[||A||^2_F = \sum_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}|^2\]
- \[g = \frac{d}{dR} Loss = \frac{2}{m} (X^T (XR-Y))\]

## Hash tables and hash functions

## Locality sensitive hashing

## Ungraded Lab: Rotation Matrices in R2

Notebook / HTML

## Ungraded Lab: Hash Tables and Multiplanes

Programming assignment: word translation.

## Assignment 4 - Naive Machine Translation and LSH

Assignment 4 - naive machine translation and lsh #.

You will now implement your first machine translation system and then you will see how locality sensitive hashing works. Let’s get started by importing the required functions!

If you are running this notebook in your local computer, don’t forget to download the twitter samples and stopwords from nltk.

## Important Note on Submission to the AutoGrader #

Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:

You have not added any extra print statement(s) in the assignment.

You have not added any extra code cell(s) in the assignment.

You have not changed any of the function parameters.

You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.

You are not changing the assignment code where it is not required, like creating extra variables.

If you do any of the following, you will get something like, Grader not found (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don’t remember the changes you have made, you can get a fresh copy of the assignment by following these instructions .

## This assignment covers the folowing topics: #

1. The word embeddings data for English and French words

1.1 Generate embedding and transform matrices

2. Translations

2.1 Translation as linear transformation of embeddings

2.2 Testing the translation

3. LSH and document search

3.1 Getting the document embeddings

3.2 Looking up the tweets

3.3 Finding the most similar tweets with LSH

3.4 Getting the hash number for a vector

3.5 Creating a hash table

Exercise 10

3.6 Creating all hash tables

Exercise 11

## 1. The word embeddings data for English and French words #

Write a program that translates English to French.

The full dataset for English embeddings is about 3.64 gigabytes, and the French embeddings are about 629 megabytes. To prevent the Coursera workspace from crashing, we’ve extracted a subset of the embeddings for the words that you’ll use in this assignment.

## The subset of data #

To do the assignment on the Coursera workspace, we’ll use the subset of word embeddings.

## Look at the data #

en_embeddings_subset: the key is an English word, and the value is a 300 dimensional array, which is the embedding for that word.

fr_embeddings_subset: the key is a French word, and the value is a 300 dimensional array, which is the embedding for that word.

## Load two dictionaries mapping the English to French words #

A training dictionary

and a testing dictionary.

## Looking at the English French dictionary #

en_fr_train is a dictionary where the key is the English word and the value is the French translation of that English word.

en_fr_test is similar to en_fr_train , but is a test set. We won’t look at it until we get to testing.

## 1.1 Generate embedding and transform matrices #

Exercise 01: translating english dictionary to french by using embeddings #.

You will now implement a function get_matrices , which takes the loaded data and returns matrices X and Y .

en_fr : English to French dictionary

en_embeddings : English to embeddings dictionary

fr_embeddings : French to embeddings dictionary

Matrix X and matrix Y , where each row in X is the word embedding for an english word, and the same row in Y is the word embedding for the French version of that English word.

Use the en_fr dictionary to ensure that the ith row in the X matrix corresponds to the ith row in the Y matrix.

Instructions : Complete the function get_matrices() :

Iterate over English words in en_fr dictionary.

Check if the word have both English and French embedding.

- Sets are useful data structures that can be used to check if an item is a member of a group.
- You can get words which are embedded into the language by using keys method.
- Keep vectors in `X` and `Y` sorted in list. You can use np.vstack() to merge them into the numpy matrix.
- numpy.vstack stacks the items in a list as rows in a matrix.

Now we will use function get_matrices() to obtain sets X_train and Y_train of English and French word embeddings into the corresponding vector space models.

## 2. Translations #

Write a program that translates English words to French words using word embeddings and vector space models.

## 2.1 Translation as linear transformation of embeddings #

Given dictionaries of English and French word embeddings you will create a transformation matrix R

Given an English word embedding, \(\mathbf{e}\) , you can multiply \(\mathbf{eR}\) to get a new word embedding \(\mathbf{f}\) .

Both \(\mathbf{e}\) and \(\mathbf{f}\) are row vectors .

You can then compute the nearest neighbors to f in the french embeddings and recommend the word that is most similar to the transformed word embedding.

## Describing translation as the minimization problem #

Find a matrix R that minimizes the following equation.

## Frobenius norm #

The Frobenius norm of a matrix \(A\) (assuming it is of dimension \(m,n\) ) is defined as the square root of the sum of the absolute squares of its elements:

## Actual loss function #

In the real world applications, the Frobenius norm loss:

is often replaced by it’s squared value divided by \(m\) :

where \(m\) is the number of examples (rows in \(\mathbf{X}\) ).

The same R is found when using this loss function versus the original Frobenius norm.

The reason for taking the square is that it’s easier to compute the gradient of the squared Frobenius.

The reason for dividing by \(m\) is that we’re more interested in the average loss per embedding than the loss for the entire training set.

The loss for all training set increases with more words (training examples), so taking the average helps us to track the average loss regardless of the size of the training set.

## [Optional] Detailed explanation why we use norm squared instead of the norm: #

Exercise 02: implementing translation mechanism described in this section. #, step 1: computing the loss #.

The loss function will be squared Frobenoius norm of the difference between matrix and its approximation, divided by the number of training examples \(m\) .

Its formula is: $ \( L(X, Y, R)=\frac{1}{m}\sum_{i=1}^{m} \sum_{j=1}^{n}\left( a_{i j} \right)^{2}\) $

where \(a_{i j}\) is value in \(i\) th row and \(j\) th column of the matrix \(\mathbf{XR}-\mathbf{Y}\) .

## Instructions: complete the compute_loss() function #

Compute the approximation of Y by matrix multiplying X and R

Compute difference XR - Y

Compute the squared Frobenius norm of the difference and divide it by \(m\) .

- Useful functions: Numpy dot , Numpy sum , Numpy square , Numpy norm
- Be careful about which operation is elementwise and which operation is a matrix multiplication.
- Try to use matrix operations instead of the numpy norm function. If you choose to use norm function, take care of extra arguments and that it's returning loss squared, and not the loss itself.

Expected output:

## Exercise 03 #

Step 2: computing the gradient of loss in respect to transform matrix r #.

Calculate the gradient of the loss with respect to transform matrix R .

The gradient is a matrix that encodes how much a small change in R affect the change in the loss function.

The gradient gives us the direction in which we should decrease R to minimize the loss.

\(m\) is the number of training examples (number of rows in \(X\) ).

The formula for the gradient of the loss function \(𝐿(𝑋,𝑌,𝑅)\) is:

Instructions : Complete the compute_gradient function below.

- Transposing in numpy
- Finding out the dimensions of matrices in numpy
- Remember to use numpy.dot for matrix multiplication

## Step 3: Finding the optimal R with gradient descent algorithm #

Gradient descent #.

Gradient descent is an iterative algorithm which is used in searching for the optimum of the function.

Earlier, we’ve mentioned that the gradient of the loss with respect to the matrix encodes how much a tiny change in some coordinate of that matrix affect the change of loss function.

Gradient descent uses that information to iteratively change matrix R until we reach a point where the loss is minimized.

## Training with a fixed number of iterations #

Most of the time we iterate for a fixed number of training steps rather than iterating until the loss falls below a threshold.

## OPTIONAL: explanation for fixed number of iterations #

- You cannot rely on training loss getting low -- what you really want is the validation loss to go down, or validation accuracy to go up. And indeed - in some cases people train until validation accuracy reaches a threshold, or -- commonly known as "early stopping" -- until the validation accuracy starts to go down, which is a sign of over-fitting.
- Why not always do "early stopping"? Well, mostly because well-regularized models on larger data-sets never stop improving. Especially in NLP, you can often continue training for months and the model will continue getting slightly and slightly better. This is also the reason why it's hard to just stop at a threshold -- unless there's an external customer setting the threshold, why stop, where do you put the threshold?
- Stopping after a certain number of steps has the advantage that you know how long your training will take - so you can keep some sanity and not train for months. You can then try to get the best performance within this time budget. Another advantage is that you can fix your learning rate schedule -- e.g., lower the learning rate at 10% before finish, and then again more at 1% before finishing. Such learning rate schedules help a lot, but are harder to do if you don't know how long you're training.

Pseudocode:

Calculate gradient \(g\) of the loss with respect to the matrix \(R\) .

Update \(R\) with the formula: $ \(R_{\text{new}}= R_{\text{old}}-\alpha g\) $

Where \(\alpha\) is the learning rate, which is a scalar.

## Learning rate #

The learning rate or “step size” \(\alpha\) is a coefficient which decides how much we want to change \(R\) in each step.

If we change \(R\) too much, we could skip the optimum by taking too large of a step.

If we make only small changes to \(R\) , we will need many steps to reach the optimum.

Learning rate \(\alpha\) is used to control those changes.

Values of \(\alpha\) are chosen depending on the problem, and we’ll use learning_rate \(=0.0003\) as the default value for our algorithm.

## Exercise 04 #

Instructions: implement align_embeddings() #.

- Use the 'compute_gradient()' function to get the gradient in each step

Expected Output:

## Calculate transformation matrix R #

Using those the training set, find the transformation matrix \(\mathbf{R}\) by calling the function align_embeddings() .

NOTE: The code cell below will take a few minutes to fully execute (~3 mins)

## Expected Output #

2.2 testing the translation #, k-nearest neighbors algorithm #.

k-Nearest neighbors algorithm

k-NN is a method which takes a vector as input and finds the other vectors in the dataset that are closest to it.

The ‘k’ is the number of “nearest neighbors” to find (e.g. k=2 finds the closest two neighbors).

## Searching for the translation embedding #

Since we’re approximating the translation function from English to French embeddings by a linear transformation matrix \(\mathbf{R}\) , most of the time we won’t get the exact embedding of a French word when we transform embedding \(\mathbf{e}\) of some particular English word into the French embedding space.

This is where \(k\) -NN becomes really useful! By using \(1\) -NN with \(\mathbf{eR}\) as input, we can search for an embedding \(\mathbf{f}\) (as a row) in the matrix \(\mathbf{Y}\) which is the closest to the transformed vector \(\mathbf{eR}\)

## Cosine similarity #

Cosine similarity between vectors \(u\) and \(v\) calculated as the cosine of the angle between them. The formula is

\(\cos(u,v)\) = \(1\) when \(u\) and \(v\) lie on the same line and have the same direction.

\(\cos(u,v)\) is \(-1\) when they have exactly opposite directions.

\(\cos(u,v)\) is \(0\) when the vectors are orthogonal (perpendicular) to each other.

## Note: Distance and similarity are pretty much opposite things. #

We can obtain distance metric from cosine similarity, but the cosine similarity can’t be used directly as the distance metric.

When the cosine similarity increases (towards \(1\) ), the “distance” between the two vectors decreases (towards \(0\) ).

We can define the cosine distance between \(u\) and \(v\) as $ \(d_{\text{cos}}(u,v)=1-\cos(u,v)\) $

Exercise 05 : Complete the function nearest_neighbor()

A set of possible nearest neighbors candidates

k nearest neighbors to find.

The distance metric should be based on cosine similarity.

cosine_similarity function is already implemented and imported for you. It’s arguments are two vectors and it returns the cosine of the angle between them.

Iterate over rows in candidates , and save the result of similarities between current row and vector v in a python list. Take care that similarities are in the same order as row vectors of candidates .

Now you can use numpy argsort to sort the indices for the rows of candidates .

- numpy.argsort sorts values from most negative to most positive (smallest to largest)
- The candidates that are nearest to 'v' should have the highest cosine similarity
- To reverse the order of the result of numpy.argsort to get the element with highest cosine similarity as the first element of the array you can use tmp[::-1]. This reverses the order of an array. Then, you can extract the first k elements.

Expected Output :

[[2 0 1] [1 0 5] [9 9 9]]

## Test your translation and compute its accuracy #

Exercise 06 : Complete the function test_vocabulary which takes in English embedding matrix \(X\) , French embedding matrix \(Y\) and the \(R\) matrix and returns the accuracy of translations from \(X\) to \(Y\) by \(R\) .

Iterate over transformed English word embeddings and check if the closest French word vector belongs to French word that is the actual translation.

Obtain an index of the closest French embedding by using nearest_neighbor (with argument k=1 ), and compare it to the index of the English embedding you have just transformed.

Keep track of the number of times you get the correct translation.

Calculate accuracy as $ \(\text{accuracy}=\frac{\#(\text{correct predictions})}{\#(\text{total predictions})}\) $

Let’s see how is your translation mechanism working on the unseen data:

You managed to translate words from one language to another language without ever seing them with almost 56% accuracy by using some basic linear algebra and learning a mapping of words from one language to another!

## 3. LSH and document search #

In this part of the assignment, you will implement a more efficient version of k-nearest neighbors using locality sensitive hashing. You will then apply this to document search.

Process the tweets and represent each tweet as a vector (represent a document with a vector embedding).

Use locality sensitive hashing and k nearest neighbors to find tweets that are similar to a given tweet.

## 3.1 Getting the document embeddings #

Bag-of-words (bow) document models #.

Text documents are sequences of words.

The ordering of words makes a difference. For example, sentences “Apple pie is better than pepperoni pizza.” and “Pepperoni pizza is better than apple pie” have opposite meanings due to the word ordering.

However, for some applications, ignoring the order of words can allow us to train an efficient and still effective model.

This approach is called Bag-of-words document model.

## Document embeddings #

Document embedding is created by summing up the embeddings of all words in the document.

If we don’t know the embedding of some word, we can ignore that word.

Exercise 07 : Complete the get_document_embedding() function.

The function get_document_embedding() encodes entire document as a “document” embedding.

It takes in a document (as a string) and a dictionary, en_embeddings

It processes the document, and looks up the corresponding embedding of each word.

It then sums them up and returns the sum of all word vectors of that processed tweet.

- You can handle missing words easier by using the `get()` method of the python dictionary instead of the bracket notation (i.e. "[ ]"). See more about it here
- The default value for missing word should be the zero vector. Numpy will broadcast simple 0 scalar into a vector of zeros during the summation.
- Alternatively, skip the addition if a word is not in the dictonary.
- You can use your `process_tweet()` function which allows you to process the tweet. The function just takes in a tweet and returns a list of words.

Expected output :

## Exercise 08 #

Store all document vectors into a dictionary #.

Now, let’s store all the tweet embeddings into a dictionary. Implement get_document_vecs()

## 3.2 Looking up the tweets #

Now you have a vector of dimension (m,d) where m is the number of tweets (10,000) and d is the dimension of the embeddings (300). Now you will input a tweet, and use cosine similarity to see which tweet in our corpus is similar to your tweet.

## 3.3 Finding the most similar tweets with LSH #

You will now implement locality sensitive hashing (LSH) to identify the most similar tweet.

Instead of looking at all 10,000 vectors, you can just search a subset to find its nearest neighbors.

Let’s say your data points are plotted like this:

You can divide the vector space into regions and search within one region for nearest neighbors of a given vector.

## Choosing the number of planes #

Each plane divides the space to \(2\) parts.

So \(n\) planes divide the space into \(2^{n}\) hash buckets.

We want to organize 10,000 document vectors into buckets so that every bucket has about \(~16\) vectors.

For that we need \(\frac{10000}{16}=625\) buckets.

We’re interested in \(n\) , number of planes, so that \(2^{n}= 625\) . Now, we can calculate \(n=\log_{2}625 = 9.29 \approx 10\) .

## 3.4 Getting the hash number for a vector #

For each vector, we need to get a unique number associated to that vector in order to assign it to a “hash bucket”.

## Hyperplanes in vector spaces #

In \(3\) -dimensional vector space, the hyperplane is a regular plane. In \(2\) dimensional vector space, the hyperplane is a line.

Generally, the hyperplane is subspace which has dimension \(1\) lower than the original vector space has.

A hyperplane is uniquely defined by its normal vector.

Normal vector \(n\) of the plane \(\pi\) is the vector to which all vectors in the plane \(\pi\) are orthogonal (perpendicular in \(3\) dimensional case).

## Using Hyperplanes to split the vector space #

We can use a hyperplane to split the vector space into \(2\) parts.

All vectors whose dot product with a plane’s normal vector is positive are on one side of the plane.

All vectors whose dot product with the plane’s normal vector is negative are on the other side of the plane.

## Encoding hash buckets #

For a vector, we can take its dot product with all the planes, then encode this information to assign the vector to a single hash bucket.

When the vector is pointing to the opposite side of the hyperplane than normal, encode it by 0.

Otherwise, if the vector is on the same side as the normal vector, encode it by 1.

If you calculate the dot product with each plane in the same order for every vector, you’ve encoded each vector’s unique hash ID as a binary number, like [0, 1, 1, … 0].

## Exercise 09: Implementing hash buckets #

We’ve initialized hash table hashes for you. It is list of N_UNIVERSES matrices, each describes its own hash table. Each matrix has N_DIMS rows and N_PLANES columns. Every column of that matrix is a N_DIMS -dimensional normal vector for each of N_PLANES hyperplanes which are used for creating buckets of the particular hash table.

Exercise : Your task is to complete the function hash_value_of_vector which places vector v in the correct hash bucket.

First multiply your vector v , with a corresponding plane. This will give you a vector of dimension \((1,\text{N_planes})\) .

You will then convert every element in that vector to 0 or 1.

You create a hash vector by doing the following: if the element is negative, it becomes a 0, otherwise you change it to a 1.

You then compute the unique number for the vector by iterating over N_PLANES

Then you multiply \(2^i\) times the corresponding bit (0 or 1).

You will then store that sum in the variable hash_value .

Intructions: Create a hash for the vector in the function below. Use this formula:

## Create the sets of planes #

Create multiple (25) sets of planes (the planes that divide up the region).

You can think of these as 25 separate ways of dividing up the vector space with a different set of planes.

Each element of this list contains a matrix with 300 rows (the word vector have 300 dimensions), and 10 columns (there are 10 planes in each “universe”).

- numpy.squeeze() removes unused dimensions from an array; for instance, it converts a (10,1) 2D array into a (10,) 1D array

## 3.5 Creating a hash table #

Exercise 10 #.

Given that you have a unique number for each vector (or tweet), You now want to create a hash table. You need a hash table, so that given a hash_id, you can quickly look up the corresponding vectors. This allows you to reduce your search by a significant amount of time.

We have given you the make_hash_table function, which maps the tweet vectors to a bucket and stores the vector there. It returns the hash_table and the id_table . The id_table allows you know which vector in a certain bucket corresponds to what tweet.

- a dictionary comprehension, similar to a list comprehension, looks like this: `{i:0 for i in range(10)}`, where the key is 'i' and the value is zero for all key-value pairs.

## Expected output #

3.6 creating all hash tables #.

You can now hash your vectors and store them in a hash table that would allow you to quickly look up and search for similar vectors. Run the cell below to create the hashes. By doing so, you end up having several tables which have all the vectors. Given a vector, you then identify the buckets in all the tables. You can then iterate over the buckets and consider much fewer vectors. The more buckets you use, the more accurate your lookup will be, but also the longer it will take.

## Approximate K-NN #

Exercise 11 #.

Implement approximate K nearest neighbors using locality sensitive hashing, to search for documents that are similar to a given document at the index doc_id .

doc_id is the index into the document list all_tweets .

v is the document vector for the tweet in all_tweets at index doc_id .

planes_l is the list of planes (the global variable created earlier).

k is the number of nearest neighbors to search for.

num_universes_to_use : to save time, we can use fewer than the total number of available universes. By default, it’s set to N_UNIVERSES , which is \(25\) for this assignment.

hash_tables : list with hash tables for each universe.

id_tables : list with id tables for each universe.

The approximate_knn function finds a subset of candidate vectors that are in the same “hash bucket” as the input vector ‘v’. Then it performs the usual k-nearest neighbors search on this subset (instead of searching through all 10,000 tweets).

- There are many dictionaries used in this function. Try to print out planes_l, hash_tables, id_tables to understand how they are structured, what the keys represent, and what the values contain.
- To remove an item from a list, use `.remove()`
- To append to a list, use `.append()`
- To add to a set, use `.add()`

## 4 Conclusion #

Congratulations - Now you can look up vectors that are similar to the encoding of your tweet using LSH!

- Calendar for updated times of all lectures, sections, OHs and due dates.
- Canvas web site
- Ed : please use this for all course-related questions. You can make use of Private posts for personal matters.
- You can email us at [email protected] for emergencies, or personal matters that you don't wish to put in a private Ed post.
- All the lectures/precepts/office hours are held on Zoom and the Zoom links can be found on Canvas.
- Some useful documents: Working with Assignment Materials Working with LaTex Working with Google Colab

What is this course about?

Course staff:

- TAs: Anika Maskara , Colin Wang , Howard Yen , Yihan Wang
- Undergrad CAs: Ben Shi, Evan Wang, Yash Parikh, Zachary Siegel

Time/location:

- Lectures: Fridays, 1:30-4:20pm (w/ 10min break), Friend 101
- Precepts: Tue 3:30-4:30pm, Friend 009. This is an optional 1-hour precept hosted by TAs.
- Office hours will generally follow this schedule. To keep up with any changes, please consult the course calendar below.
- Karthik: Fri 11am-12pm, CS 422
- Anika: Tue 1:30-2:30pm, Friend 010
- Colin: Mon 11am-12pm, CS 003
- Howard: Mon 2pm-3pm, Friend 010
- Yihan: Wed 4pm-5pm, Friend 010
- Ben: Fri 10am-11am, CS 402
- Evan: Thu 2:30pm-3:30pm, Friend 010
- Yash: Thu 3:30pm-4:30pm, Friend 010
- Zach: Thu 10am-11am, Friend 010
- Assignment 1: language models, text classification (10%)
- Assignment 2: word embeddings, sequence modeling (10%)
- Assignment 3: recurrent neural networks, feedforward neural networks (10%)
- Assignment 4: machine translation, Transformers (10%)

Date: Thursday, Oct 24 Location: COS 104

- Final project (35%): The final project offers you a chance to apply your newly acquired skills towards an in-depth application. You are required to turn in a project proposal (due on April 5th ) and complete a paper written in the style of a conference (e.g., ACL) submission (due on May 7). There will be also project presentations at the end of the semester.
- Extra credit (5%): For participation in class and Ed discussion. Limited to overall score of max 100%.

Useful Links:

- Ed for all questions related to lectures, homeworks, and projects, and to find announcements. For external queries, emergencies, or personal matters, you can use a private Piazza post visible only to Instructors.
- Previous Offerings:

Prerequisites:

- Required: COS 324 , knowledge of probability, linear algebra, multivariate calculus.
- Proficiency in Python: programming assignments and projects will require use of Python, Numpy and PyTorch.
- COS 324 (or equivalent intro. to ML courses) is strongly recommended.
- Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft).
- Jacob Eisenstein. Natural Language Processing
- Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing .

Previous offerings:

- COS 484 (Spring 2022)
- COS 484 (Spring 2021)
- COS 484 (Fall 2019)

Lecture schedule is tentative and subject to change. All assignments are due at 12pm EST before Friday lectures.

Assignments

- Looking at the writeup or code of another student.
- Showing your writeup or code to another student.
- Discussing homework problems in such detail that your solution (writeup or code) is almost identical to another student's answer.
- Uploading your writeup or code to a public repository (e.g. github, bitbucket, pastebin) so that it can be accessed by other students.

Final Project

- Proposal (0%): You need to turn in a one-page proposal on April 5th . The proposal should outline what you propose to do and a rough plan for how you will pursue the project. We will then provide feedback and guidance on the direction to maximize the project’s chance of succeeding. This proposal is not graded.
- Project presentation (10%): At the end of the semester, we will schedule project presentations for all the projects in the class.
- Final paper (25%): You need to complete a final report in the style of a conference submission (we recommend you to use the ACL 2023 template ). It should begin with an abstract and introduction, clearly describe the proposed idea or exploration, present technical details, give results, compare to baselines, provide analysis and discussion of the results, and cite any sources you used.
- The final projects are required to be implemented in Python. You can use any deep learning framework such as PyTorch and Tensorflow.
- You are free to discuss ideas and implementation details with other teams. However, under no circumstances may you look at another team's code, or incorporate their code into your project.
- Do not share your code publicly (e.g. in a public GitHub repo) until after the class has finished.

For assignments with a programming component, we may automatically sanity check your code with some basic test cases, but we will grade your code on additional test cases. Important : just because you pass the basic test cases, you are by no means guaranteed to get full credit on the other, hidden test cases, so you should test the program more thoroughly yourself!

- Q: Are 584 students required to attend 584 precepts? A: Yes. Missing a session or two is all right given the remote environment, but participation is part of your grade. If you need to miss precept on a regular basis, then we recommend switching into 484 instead.
- Q: Can 484 and 584 students work together on the final project? A: It's not encouraged since 484 and 584 have different expectations and group sizes. If you really want to do so, then you must write to the instructors and get permission.
- Q: Can I attend 484 or 584 lectures (without grade or credit)? A: COS 584 is not open to students who are not enrolled, unfortunately. You can however attend COS 484 - the lectures and assignments will be shared with 584. You can find the Zoom link for the lectures on our Canvas site. If you are a Princeton student and cannot access Canvas, write to us ([email protected]) and we can try adding you manually. If you are not a Princeton student, you can participate in the Community Auditing Program (CAP) for COS484.
- Q: Can we do the project in a team of 2? A: We strongly encourage you to have 3 members on your team. If you really want work in a team of 2, please write to on Ed with a justification and a rough plan of the project. We want to make sure the scope and workload of the project is reasonable and note that we will grade projects regardless of team sizes.
- Q: I'm not enrolled (or enrolled late) in the course. Can I have an extension on A1? A: The assignment materials are available publicly online (above in the Schedule section) and if you are planning on enrolling in the class, we recommend working on it. As for submission, the last day we're accepting late submissions for A1 on Gradescope is 2/14. If you enrolled late, then please send an email to us ([email protected]) with proof of your enrollment date and we can provide late day waivers as needed. If there are other concerns about not being able to complete the assignment on time due to enrollment details or personal matters, send an email to [email protected] and we'll be happy to discuss your scenario in more detail.

## Deep-Learning-Specialization

Coursera deep learning specialization, sequence models.

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others.

- Understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as GRUs and LSTMs.
- Be able to apply sequence models to natural language problems, including text synthesis.
- Be able to apply sequence models to audio applications, including speech recognition and music synthesis.

## Week 1: Sequence Models

Learn about recurrent neural networks. This type of model has been proven to perform extremely well on temporal data. It has several variants including LSTMs, GRUs and Bidirectional RNNs, which you are going to learn about in this section.

## Assignment of Week 1

- Quiz 1: Recurrent Neural Networks
- Programming Assignment: Building a recurrent neural network - step by step
- Programming Assignment: Dinosaur Island - Character-Level Language Modeling
- Programming Assignment: Jazz improvisation with LSTM

## Week 2: Natural Language Processing & Word Embeddings

Natural language processing with deep learning is an important combination. Using word vector representations and embedding layers you can train recurrent neural networks with outstanding performances in a wide variety of industries. Examples of applications are sentiment analysis, named entity recognition and machine translation.

## Assignment of Week 2

- Quiz 2: Natural Language Processing & Word Embeddings
- Programming Assignment: Operations on word vectors - Debiasing
- Programming Assignment: Emojify

## Week 3: Sequence models & Attention mechanism

Sequence models can be augmented using an attention mechanism. This algorithm will help your model understand where it should focus its attention given a sequence of inputs. This week, you will also learn about speech recognition and how to deal with audio data.

## Assignment of Week 3

- Quiz 3: Sequence models & Attention mechanism
- Programming Assignment: Neural Machine Translation with Attention
- Programming Assignment: Trigger word detection

## Course Certificate

- Use logistic regression, naïve Bayes, and word vectors to implement sentiment analysis, complete analogies, and translate words, and use locality sensitive hashing for approximate nearest neighbors.
- Use dynamic programming, hidden Markov models, and word embeddings to autocorrect misspelled words, autocomplete partial sentences, and identify part-of-speech tags for words.
- Use dense and recurrent neural networks, LSTMs, GRUs, and Siamese networks in TensorFlow and Trax to perform advanced sentiment analysis, text generation, named entity recognition, and to identify duplicate questions.
- Use encoder-decoder, causal, and self-attention to perform advanced machine translation of complete sentences, text summarization, question-answering and to build chatbots. Models covered include T5, BERT, transformer, reformer, and more!
- Natural Language Processing (NLP) uses algorithms to understand and manipulate human language. This technology is one of the most broadly applied areas of machine learning. As AI continues to expand, so will the demand for professionals skilled at building models that analyze speech and language, uncover contextual patterns, and produce insights from text and audio.
- By the end of this specialization, you will be ready to design NLP applications that perform question-answering, sentiment analysis, language translation and text summarization, and even build chatbots. These and other NLP applications are going to be at the forefront of the coming transformation to an AI-powered future.
- This Specialization is designed and taught by two experts in NLP, machine learning, and deep learning. Younes Bensouda Mourri is an Instructor of AI at Stanford University who also helped build the Deep Learning Specialization. Łukasz Kaiser is a Staff Research Scientist at Google Brain and the co-author of Tensorflow, the Tensor2Tensor and Trax libraries, and the Transformer paper.

## Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, word embeddings.

1094 papers with code • 0 benchmarks • 52 datasets

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.

Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.

( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )

## Benchmarks Add a Result

## Most implemented papers

Enriching word vectors with subword information.

facebookresearch/fastText • TACL 2017

A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

## FastText.zip: Compressing text classification models

facebookresearch/fastText • 12 Dec 2016

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

## Universal Sentence Encoder

For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.

## Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features.

## Word Translation Without Parallel Data

We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.

## Named Entity Recognition with Bidirectional LSTM-CNNs

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance.

## Evaluation of sentence embeddings in downstream and linguistic probing tasks

Despite the fast developmental pace of new sentence embedding methods, it is still challenging to find comprehensive evaluations of these different techniques.

## Topic Modeling in Embedding Spaces

To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings.

## Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck.

## Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

uzairakbar/info-retrieval • NeurIPS 2016

Geometrically, gender bias is first shown to be captured by a direction in the word embedding.

Polyglot Code • AI Code Translator🖥️

Usage: Choose input language ➡️ Write code ➡️ Click Translate 😍

JavaScript🌐

This program will make mistakes when translating code & it is limited to 150 words of code.

## CS224N: Natural Language Processing with Deep Learning

Stanford / winter 2024.

Note: In the 2023–24 academic year, CS224N will be taught in both Winter and Spring 2024.

Natural language processing (NLP) is a crucial part of artificial intelligence (AI), modeling how people share information. In recent years, deep learning approaches have obtained very high performance on many NLP tasks. In this course, students gain a thorough introduction to cutting-edge neural networks for NLP.

## Instructors

## Course Manager

## Teaching Assistants

- Lectures: are on Tuesday/Thursday 4:30 PM - 5:50 PM Pacific Time in NVIDIA Auditorium . In-person lectures will start with the first lecture. The lectures will also be livestreamed on Canvas via Panopto.
- Publicly available lecture videos and versions of the course: Complete videos for the CS224N course are available (free!) on the CS224N 2023 YouTube playlist . the CS224N 2021 YouTube channel . --> Anyone is welcome to enroll in XCS224N: Natural Language Processing with Deep Learning , the Stanford Artificial Intelligence Professional Program version of this course, throughout the year (medium fee, community TAs and certificate). Stanford students enroll normally in CS224N and others can also enroll in CS224N via Stanford online in the (northern hemisphere) Autumn to do the course in the Winter (high cost, limited enrollment, gives Stanford credit). The lecture slides and assignments are updated online each year as the course progresses. We are happy for anyone to use these resources, and we are happy to get acknowledgements.
- Office hours : Hybrid format with remote (over Zoom) or in person options. Information here .
- Contact : Students should ask all course-related questions in the Ed forum, where you will also find announcements. You will find the course Ed on the course Canvas page or in the header link above. For external enquiries, emergencies, or personal matters that you don't wish to put in a private Ed post, you can email us at [email protected] . Please send all emails to this mailing list - do not email the instructors directly.

## What is this course about?

Natural language processing (NLP) or computational linguistics is one of the most important technologies of the information age. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, virtual agents, medical reports, politics, etc. In the last decade, deep learning (or neural network) approaches have obtained very high performance across many different NLP tasks, using single end-to-end neural models that do not require traditional, task-specific feature engineering. In this course, students will gain a thorough introduction to cutting-edge research in Deep Learning for NLP. Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models, using the Pytorch framework.

“Take it. CS221 taught me algorithms. CS229 taught me math. CS224N taught me how to write machine learning models.” – A CS224N student on Carta

## Previous offerings

Below you can find archived websites and student project reports from previous years. Disclaimer: assignments are subject to change - please do not assume that assignments will be unchanged from last year!

## Prerequisites

All class assignments will be in Python (using NumPy and PyTorch ). If you need to remind yourself of Python, or you're not very familiar with NumPy, you can come to the Python review session in week 1 (listed in the schedule ). If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Java/Javascript), you will probably be fine.

You should be comfortable taking (multivariable) derivatives and understanding matrix/vector notation and operations.

You should know the basics of probabilities, gaussian distributions, mean, standard deviation, etc.

We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. If you already have basic machine learning and/or deep learning knowledge, the course will be easier; however it is possible to take CS224n without it. There are many introductions to ML, in webpage, book, and video form. One approachable introduction is Hal Daumé’s in-progress A Course in Machine Learning . Reading the first 5 chapters of that book would be good background. Knowing the first 7 chapters would be even better!

## Reference Texts

The following texts are useful, but none are required. All of them can be read free online.

- Dan Jurafsky and James H. Martin. Speech and Language Processing (2024 pre-release)
- Jacob Eisenstein. Natural Language Processing
- Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning
- Delip Rao and Brian McMahan. Natural Language Processing with PyTorch (requires Stanford login).
- Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Natural Language Processing with Transformers

If you have no background in neural networks but would like to take the course anyway, you might well find one of these books helpful to give you more background:

- Michael A. Nielsen. Neural Networks and Deep Learning
- Eugene Charniak. Introduction to Deep Learning

## Assignments (54%)

There are five weekly assignments, which will improve both your theoretical understanding and your practical skills. All assignments contain both written questions and programming parts. In office hours, TAs may look at students’ code for assignments 1, 2 and 3 but not for assignments 4 and 5.

- Assignment 1 (6%): Introduction to word vectors
- Assignment 2 (12%): Derivatives and implementation of word2vec algorithm
- Assignment 3 (12%): Dependency parsing and neural network foundations
- Assignment 4 (12%): Neural Machine Translation with sequence-to-sequence, attention, and subwords
- Assignment 5 (12%): Self-supervised learning and fine-tuning with Transformers
- Deadlines : All assignments are due on either a Tuesday or a Thursday before class (i.e. before 4:30pm). All deadlines are listed in the schedule .
- Submission : Assignments are submitted via Gradescope . You will be able to access the course Gradescope page on Canvas. If you need to sign up for a Gradescope account, please use your @stanford.edu email address. Further instructions are given in each assignment handout. Do not email us your assignments .
- Late start : If the result gives you a higher grade, we will not use your assignment 1 score, and we will give you an assignment grade based on counting each of assignments 2–5 at 13.5%.
- Collaboration : Study groups are allowed, but students must understand and complete their own assignments, and hand in one assignment per student. If you worked in a group, please put the names of the members of your study group at the top of your assignment. Please ask if you have any questions about the collaboration policy.
- Honor Code : We expect students to not look at solutions or implementations online. Like all other classes at Stanford, we take the student Honor Code seriously. We sometimes use automated methods to detect overly similar assignment solutions.

## Final Project (43%)

The Final Project offers you the chance to apply your newly acquired skills towards an in-depth application. Students have two options: the Default Final Project (in which students tackle a predefined task, namely implementing a minimalist version of BERT) or a Custom Final Project (in which students choose their own project involving human language and deep learning). Examples of both can be seen on last year's website . Note: TAs may not look at students' code for either the default or custom final projects.

## Important information

- Project proposal (5%)
- Project milestone (5%)
- Project poster (3%)
- Project report (30%)
- Deadlines : The project proposal, milestone and report are all due at 4:30pm. All deadlines are listed in the schedule .
- Default Final Project : In this project, students implement parts of the BERT architecture and use it to tackle 3 downstream tasks. Similar to previous years, the code is in PyTorch.
- Project advice [ lecture slides ] [ custom project tips ]: The Practical Tips for Final Projects lecture provides guidance for choosing and planning your project. To get project advice from staff members, first look at each staff member's areas of expertise on the office hours page . This should help you find a staff member who is knowledgable about your project area.
- Ethics-related questions : For guidance on projects dealing with ethical questions, or ethical questions that arise during your project, please contact Benji Xie ( [email protected] ) or Regina Wang ( [email protected] ).
- Project ideas from Stanford researchers : We have collected a list of project ideas from members of the Stanford AI Lab — these are a great opportunity to work on an interesting research problem with an external mentor. If you want to do these, get started early!

## Practicalities

- Team size : Students may do final projects solo, or in teams of up to 3 people. We strongly recommend you do the final project in a team. Larger teams are expected to do correspondingly larger projects, and you should only form a 3-person team if you are planning to do an ambitious project where every team member will have a significant contribution.
- Contribution : In the final report we ask for a statement of what each team member contributed to the project. Team members will typically get the same grade, but we may differentiate in extreme cases of unequal contribution. You can contact us in confidence in the event of unequal contribution.
- External collaborators : You can work on a project that has external (non CS224n student) collaborators, but you must make it clear in your final report which parts of the project were your work.
- Sharing projects : You can share a single project between CS224n and another class, but we expect the project to be accordingly bigger, and you must declare that you are sharing the project in your project proposal.
- Mentors : Every custom project team has a mentor, who gives feedback and advice during the project. Default project teams do not have mentors. A project may have an external (i.e., not course staff) mentor; otherwise, we will assign a CS224n staff mentor to custom project teams after project proposals.
- Computing resources : All teams will receive credits to use Google Cloud Platform, thanks to a kind donation by Google!
- You can use any deep learning framework you like (PyTorch, TensorFlow, etc.)
- More generally, you may use any existing code, libraries, etc. and consult any papers, books, online references, etc. for your project. However, you must cite your sources in your writeup and clearly indicate which parts of the project are your contribution and which parts were implemented by others.
- Under no circumstances may you look at another CS224n group’s code, or incorporate their code into your project.

## Participation (3%)

We appreciate everyone being actively involved in the class! There are several ways of earning participation credit, which is capped at 3%:

- Attending guest speakers' lectures :
- In the second half of the class, we have four invited speakers. Our guest speakers make a significant effort to come lecture for us, so (both to show our appreciation and to continue attracting interesting speakers) we do not want them lecturing to a largely empty room. As such, we encourage students to attend these virtual lectures live, and participate in Q&A.
- All students get 0.375% per speaker (1.5% total) for either attending the guest lecture in person, or by writing a reaction paragraph if you watched the talk remotely; details will be provided. Students do not need to attend lecture live to write these reaction paragraphs; they may watch asynchronously.
- Completing feedback surveys : We will send out two feedback surveys (mid-quarter and end-of-quarter) to help us understand how the course is going, and how we can improve. Each of the two surveys are worth 0.5%.
- Ed participation : The top ~20 contributors to Ed will get 3%; others will get credit in proportion to the participation of the ~20th person.
- Karma point : Any other act that improves the class, like helping out another student in office hours, which a CS224n TA or instructor notices and deems worthy: 1%
- Each student has 6 late days to use. A late day extends the deadline 24 hours. You can use up to 3 late days per assignment (including all five assignments, project proposal, project milestone and project final report).
- Final project teams can share late days between members. For example, a group of three people must have at least six late days between them to extend the deadline by two days. If any late days are being shared, this must be clearly marked at the beginning of the report, and we will release a form on Ed that teams should fill out. .
- Once you have used all 6 late days, the penalty is 1% off the final course grade for each additional late day.

## Regrade Requests

If you feel you deserved a better grade on an assignment, you may submit a regrade request on Gradescope within 3 days after the grades are released. Your request should briefly summarize why you feel the original grade was unfair. Your TA will reevaluate your assignment as soon as possible, and then issue a decision. If you are still not happy, you can ask for your assignment to be regraded by an instructor.

## Credit/No credit enrollment

If you take the class credit/no credit then you are graded in the same way as those registered for a letter grade. The only difference is that, providing you reach a C- standard in your work, it will simply be graded as CR.

## All students welcome

We are committed to doing what we can to work for equity and to create an inclusive learning environment that actively values the diversity of backgrounds, identities, and experiences of everyone in CS224N. We also know that we will sometimes make missteps. If you notice some way that we could do better, we hope that you will let someone in the course staff know about it.

## Well-Being and Mental Health

If you are experiencing personal, academic, or relationship problems and would like to talk to someone with training and experience, reach out to the Counseling and Psychological Services (CAPS) on campus. CAPS is the university’s counseling center dedicated to student mental health and wellbeing. Phone assessment appointments can be made at CAPS by calling 650-723-3785, or by accessing the VadenPatient portal through the Vaden website.

## Auditing the course

In general we are happy to have auditors if they are a member of the Stanford community (registered student, official visitor, staff, or faculty). If you want to actually master the material of the class, we very strongly recommend that auditors do all the assignments. However, due to high enrollment, we cannot grade the work of any students who are not officially enrolled in the class.

## Students with Documented Disabilities

We assume that all of us learn in different ways, and that the organization of the course must accommodate each student differently. We are committed to ensuring the full participation of all enrolled students in this class. If you need an academic accommodation based on a disability, you should initiate the request with the Office of Accessible Education (OAE) . The OAE will evaluate the request, recommend accommodations, and prepare a letter for faculty. Students should contact the OAE as soon as possible and at any rate in advance of assignment deadlines, since timely notice is needed to coordinate accommodations. Students should also send your accommodation letter to either the staff mailing list ( [email protected] ) or make a private post on Ed, as soon as possible.

OAE accommodations for group projects: OAE accommodations will not be extended to collaborative assignments.

## AI Tools Policy

Students are required to independently submit their solutions for CS224N homework assignments. Collaboration with generative AI tools such as Co-Pilot and ChatGPT is allowed, treating them as collaborators in the problem-solving process. However, the direct solicitation of answers or copying solutions, whether from peers or external sources, is strictly prohibited.

Employing AI tools to substantially complete assignments or exams will be considered a violation of the Honor Code. For additional details, please refer to the Generative AI Policy Guidance here .

## Sexual violence

Academic accommodations are available for students who have experienced or are recovering from sexual violence. If you would like to talk to a confidential resource, you can schedule a meeting with the Confidential Support Team or call their 24/7 hotline at: 650-725-9955. Counseling and Psychological Services also offers confidential counseling services. Non-confidential resources include the Title IX Office, for investigation and accommodations, and the SARA Office, for healing programs. Students can also speak directly with the teaching staff to arrange accommodations. Note that university employees – including professors and TAs – are required to report what they know about incidents of sexual or relationship violence, stalking and sexual harassment to the Title IX Office. Students can learn more at https://vaden.stanford.edu/sexual-assault .

Updated lecture slides will be posted here shortly before each lecture. (All CS 224N slides originally by Prof. Chris Manning, unless otherwise specified.) Other links contain last year's slides, which are mostly similar.

Lecture notes will be uploaded a few days after most lectures. The notes (which cover approximately the first half of the course content) give supplementary detail beyond the lectures.

## Voice speed

Text translation, source text, translation results, document translation, drag and drop.

## Website translation

Enter a URL

## IMAGES

## VIDEO

## COMMENTS

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai. - amanchadha/coursera-natural-language-processing-specialization ... Use logistic regression, naïve Bayes, and word vectors to implement sentiment analysis, complete analogies, and translate words, and use locality ...

Programming Assignment: Assignment: Word Embeddings; Week 4 - Machine Translation and Document Search. Lab: Rotation matrices in R2; Lab: Hash tables; Programming Assignment: Word Translation; 2. Natural Language Processing with Probabilistic Models. Details. Week 1 - Autocorrect.

Write a better auto-complete algorithm using an N-gram model (similar models are used for translation, determining the author of a text, and speech recognition) Week 4: Word2Vec and Stochastic Gradient Descent. Write your own Word2Vec model that uses a neural network to compute word embeddings using a continuous bag-of-words model

Machine Translation: An Overview. Transforming word vector. Given a set of english words X, a transformation matrix R and a desired set of french word Y the transformation \[XR \approx Y\] ... Programming Assignment: Word Translation. Notebook / HTML. #deep-learning #lecture-notes #coursera #nlp. E-Mail ;

This assignment covers the folowing topics: 1. The word embeddings data for English and French words. 1.1 Generate embedding and transform matrices. Exercise 1. 2. Translations. 2.1 Translation as linear transformation of embeddings. Exercise 2.

Assignments (40%): There will be four assignments with both written and programming parts. Each homework is centered around an application and will also deepen your understanding of the theoretical concepts. Assignment 1: language models, text classification (10%) Assignment 2: word embeddings, sequence modeling (10%)

CSC413/2516 Winter 2021 with Professor Jimmy Ba & Bo Wang Programming Assignment 3 Data The data for this task consists of pairs of words {(s(i),t(i))}N i=1 where the source s (i) is an English word, and the target t(i) is its translation in Pig-Latin. In this assignment, you will investigate the e↵ect of dataset size on generalization ability.

In Course 1 of the Natural Language Processing Specialization, you will: a) Perform sentiment analysis of tweets using logistic regression and then naïve Bayes, b) Use vector space models to discover relationships between words and use PCA to reduce the dimensionality of the vector space and visualize those relationships, and c) Write a simple English to French translation algorithm using pre ...

Word Alignment Models for Machine Translation Programming Assignment 2 CS 224N / Ling 284 Due: 5pm April 26, 2010 This assignment may be done individually or in groups of two. We strongly encourage collaboration; however your submission must include a statement describing the contributions of each collaborator. See the collaboration policy on ...

To translate a whole sentence from English to Pig-Latin, we simply apply these rules to each word independently: i went shopping !iway entway oppingshay We would like a neural machine translation model to learn the rules of Pig-Latin implicitly, from (English, Pig-Latin) word pairs. Since the translation to Pig Latin involves moving characters

Autocorrect: Learn about autocorrect, minimum edit distance, and dynamic programming, then build a spellchecker to correct misspelled words.; Part of Speech Tagging and Hidden Markov Models: Learn about Markov chains and Hidden Markov models, then use them to create part-of-speech tags for a Wall Street Journal text corpus.; Autocomplete and Language Models: Learn about how N-gram language ...

Machine Translation: Transforming Word Vectors. To build our first basic translation program, let's use word vectors to to align words in two different languages. Later, we'll explore locality sensitive hashing to speed things up! To get an overview of machine translation, let's starting with an example of English → → French translation.

Programming Assignment: Jazz improvisation with LSTM; Week 2: Natural Language Processing & Word Embeddings. Natural language processing with deep learning is an important combination. Using word vector representations and embedding layers you can train recurrent neural networks with outstanding performances in a wide variety of industries.

Neural Machine Translation. Welcome to your first programming assignment for this week! You will build a Neural Machine Translation (NMT) model to translate human-readable dates ("25th of June, 2009") into machine-readable dates ("2009-06-25"). You will do this using an attention model, one of the most sophisticated sequence-to-sequence models.

Use logistic regression, naïve Bayes, and word vectors to implement sentiment analysis, complete analogies, and translate words, and use locality sensitive hashing for approximate nearest neighbors. Use dynamic programming, hidden Markov models, and word embeddings to autocorrect misspelled words, autocomplete partial sentences, and identify ...

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document ...

Use logistic regression, naïve Bayes, and word vectors to implement sentiment analysis, complete analogies, and translate words, and use locality sensitive hashing for approximate nearest neighbors. ... Please do not copy any part of the code as-is (the programming assignments are fairly easy if you read the instructions carefully). Similarly ...

First, we will read the file using the function defined below. Python Code: We can now use these functions to read the text into an array in our desired format. data = read_text("deu.txt") deu_eng = to_lines(data) deu_eng = array(deu_eng) The actual data contains over 150,000 sentence-pairs.

Effortlessly translate your code across multiple programming languages with our free and user-friendly code translator. Our platform offers seamless translation between popular languages such as JavaScript, Python, C++, and Java, making it easier for developers to code in their preferred language. Say goodbye to language barriers and start coding with ease today.

All assignments contain both written questions and programming parts. In office hours, TAs may look at students' code for assignments 1, 2 and 3 but not for assignments 4 and 5. Credit: Assignment 1 (6%): Introduction to word vectors; Assignment 2 (12%): Derivatives and implementation of word2vec algorithm

Write a simple English-to-French translation algorithm using pre-computed word embeddings and locality sensitive hashing to relate words via approximate k-nearest neighbors search. ... Some results from the programming assignments of this specialization. Neural Machine Translation English to German Word Embeddings projection in 2D space

Google's service, offered free of charge, instantly translates words, phrases, and web pages between English and over 100 other languages.

Welcome to the fourth (and last) programming assignment of Course 2! In this assignment, you will practice how to compute word embeddings and use them for sentiment analysis. To implement sentiment analysis, you can go beyond counting the number of positive words and negative words.