Domain Based Common words

week 4

In this week I’ve Applied text vectors (CBOW) method in PubMed data sets, by represent each unique token in the corpus by hundred dimension I was able to find the center of the words that represented by hundred-dimension vector. I’ve used Euclidian distance to find the distance from center to every unique word in corpus. but Unfortunatly I’ve stucked here I did not get the expected results so I wrote a python code to compare the results between ECL and python. I found that in Python I’ve got what I need so We figured out that there is a problem in applying CBOW in ECL. I’ve met with Kevin and Roger many times to solve the problem and we were able to fix it.

source: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s