eTBLAST
eTBLAST is a free text similarity service search engine currently offering access to the MEDLINE database, the National Institutes of Health (NIH) CRISP database, the Institute of Physics (IOP) database, Wikipedia, arXiv, the NASA technical reports database, Virginia Tech class descriptions and a variety of databases of clinical interest. It is continuously expanding with additional text-based databases. eTBLAST searches citation databases[1][2] and databases containing full text,[3] such as PUBMED. The eTBLAST server compares a user's natural text query to target databases using a hybrid search algorithm consisting of a low-sensitivity weighted keyword-based first pass followed by a novel sentence-alignment based second pass. eTBLAST is a free web-based service of The Innovation Laboratory at the Virginia Bioinformatics Institute.
eTBLAST, as a text similarity engine, made possible a large study of duplicate publications and potential plagiarisms in the biomedical literature. Thousands of random samples of Medline abstracts were submitted to eTBLAST, and those with the highest similarity were studied and entered into an on-line database. This study is on-going, with the database maturing as the entries are manually inspected and classified. This work revealed several trends, including an increasing rate of duplication in the biomedical literature, as reported in the journals Bioinformatics,[4][5] Anaesthesia and Intensive Care,[6] Clinical Chemistry,[7] Urologic Oncology,[8] Nature,[9] and Science.[10]
Interface
Because eTBLAST is a text-similarity engine rather than a simple keyword-based search tool, it is claimed that the user need not identify and manipulate query keywords and Boolean operators, as must be done for other search engines.
eTBLAST aims to help the user rapidly to find references, evaluate novelty, find experts and journals in a given topical area[11] and track the popularity of the topic as defined by the user’s query. There also is information found within the results as a set, in addition to those found within individual 'hits'. eTBLAST can also infer possible hypothese from inspection of implicit keywords found within the top most similar 'hits'. A matrix of similarity and a heat map are also displayed for the most similar 'hits'.
A typical query of 120 words takes less than 10 seconds to return results after a comparison to MEDLINE that as of 8/1/2011 contains over 20 million records.
See also
- BLAST (Basic Local Alignment Search Tool)
- Natural language processing
- Medical literature retrieval
References
- ↑ Lewis, J; Ossowski, S; Hicks, J; Errami, M; Garner, HR (2006). "Text similarity: An alternative way to search MEDLINE". Bioinformatics 22 (18): 2298–304. doi:10.1093/bioinformatics/btl388. PMID 16926219.
- ↑ Pertsemlidis, A; Garner, HR (2004). "Text comparison based on dynamic programming". IEEE Engineering in Medicine and Biology Magazine 23 (6): 66–71. doi:10.1109/MEMB.2004.1378640. PMID 15688594.
- ↑ Sun, Z; Errami, M; Long, T; Renard, C; Choradia, N; Garner, H (2010). Curioso, Walter H, ed. "Systematic Characterizations of Text Similarity in Full Text Biomedical Publications". PLoS ONE 5 (9): e12704. doi:10.1371/journal.pone.0012704. PMC 2939881. PMID 20856807.
- ↑ Errami, M; Hicks, JM; Fisher, W; Trusty, D; Wren, JD; Long, TC; Garner, HR (2007). "Deja vu a study of duplicate citations in Medline". Bioinformatics 24 (2): 243–9. doi:10.1093/bioinformatics/btm574. PMID 18056062.
- ↑ Errami, M; Sun, Z; George, AC; Long, TC; Skinner, MA; Wren, JD; Garner, HR (2010). "Identifying duplicate content using statistically improbable phrases". Bioinformatics 26 (11): 1453–7. doi:10.1093/bioinformatics/btq146. PMC 2872002. PMID 20472545.
- ↑ Loadsman, JA; Garner, HR; Drummond, GB (2008). "Towards the elimination of duplication in Anaesthesia and Intensive Care". Anaesthesia and Intensive Care 36 (5): 643–5. PMID 18853580.
- ↑ George, AC; Long, TC; Garner, HR (2010). "Quaere Verum". Clinical Chemistry 56 (4): 673–4. doi:10.1373/clinchem.2009.130468. PMID 20093558.
- ↑ Garner, HR (2011). "Combating unethical publications with plagiarism detection services". Urologic Oncology 29: 95–9. doi:10.1016/j.urolonc.2010.09.016. PMC 3035174. PMID 21194644.
- ↑ Errami, M; Garner, H (2008). "A tale of two citations". Nature 451 (7177): 397–9. doi:10.1038/451397a. PMID 18216832.
- ↑ Long, TC; Errami, M; George, AC; Sun, Z; Garner, HR (2009). "Responding to Possible Plagiarism". Science 323 (5919): 1293–4. doi:10.1126/science.1167408. PMID 19265004.
- ↑ Errami, M; Wren, JD; Hicks, JM; Garner, HR (2007). "ETBLAST: A web server to identify expert reviewers, appropriate journals and similar publications". Nucleic Acids Research 35 (Web Server issue): W12–5. doi:10.1093/nar/gkm221. PMC 1933238. PMID 17452348.
External links
- eTBLAST
- "NetWatch". Science 304 (5673): 935. 2004. doi:10.1126/science.304.5673.935b.