Domain Based Common words

week 2

In this week I ‘ve started preparing my dataset that I will use in my internship. I’ve collected and download the Pubmed datasets from https://catalog.data.gov/dataset/pubmed. It has more than 26 million citations for biomedical literature from MEDLINE, life science journals, and online books.

In order to use these data set to be a feature of the classification methods. I’ve picked some journals to be used in testing the domain based methodology. Also, it is important to clean the above dataset, so I wrote an ECL code to do that. I’ve read the data in ECL and convert the documents to Record.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s