Domain Based Common words

week 3

The idea of our project is to find the domain based common words based on the high dimensional represention of the words. In this week I’ve run the continous bag of words CBOW bundle that exist in the ECL to a sample dataset to extract the vector represention of the words in the corpus. I was able to extract the vocabulary from the corpus, extract the high dimensional representation for each word on corpus and find some information about the corpus(vocabulary size, number of occurrence
for each term).

source: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s