Domain Based Common words

week 2

In this week I ‘ve started preparing my dataset that I will use in my internship. I’ve collected and download the Pubmed datasets from It has more than 26 million citations for biomedical literature from MEDLINE, life science journals, and online books.

In order to use these data set to be a feature of the classification methods. I’ve picked some journals to be used in testing the domain based methodology. Also, it is important to clean the above dataset, so I wrote an ECL code to do that. I’ve read the data in ECL and convert the documents to Record.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s