Lexical similarity

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue's method of calculation consists in comparing a standardized set of wordlists and counting those forms that show similarity in both form and meaning. Using such a method, English was evaluated to have a lexical similarity of 60% with German and 27% with French.

Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related dialects.^[1]

The lexical similarity is only one indication of the mutual intelligibility of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. It is worth noting that the variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.

Indo-European languages

The table below shows some lexical similarity values for pairs of selected Romance, Germanic, and Slavic languages, as collected and published by Ethnologue.^[2]

Lang. code	Language 1 ↓	Lexical similarity coefficients
		Catalan	English	French	German	Italian	Portuguese	Romanian	Romansh	Russian	Sardinian	Spanish
cat	Catalan	1	-	-	-	0.87	0.85	0.73	0.76	-	0.75	0.85
eng	English	-	1	0.27	0.60	-	-	-	-	0.24	-	-
fra	French	-	0.27	1	0.29	0.89	0.75	0.75	0.78	-	0.80	0.75
deu	German	-	0.60	0.29	1	-	-	-	-	-	-	-
ita	Italian	0.87	-	0.89	-	1	-	0.77	0.78	-	0.85	0.82
por	Portuguese	0.85	-	0.75	-	-	1	0.72	0.74	-	-	0.89
ron	Romanian	0.73	-	0.75	-	0.77	0.72	1	0.72	-	0.74	0.71
roh	Romansh	0.76	-	0.78	-	0.78	0.74	0.72	1	-	0.74	0.74
rus	Russian	-	0.24	-	-	-	-	-	-	1	-	-
srd	Sardinian	0.75	-	0.80	-	0.85	-	0.74	0.74	-	1	0.76
spa	Spanish	0.85	-	0.75	-	0.82	0.89	0.71	0.74	-	0.76	1
		Catalan	English	French	German	Italian	Portuguese	Romanian	Romansh	Russian	Sardinian	Spanish
Language 2 →		cat	eng	fra	deu	ita	por	ron	roh	rus	srd	spa

Notes:

Language codes are from standard ISO 639-3.
Ethnologue does not specify for which Sardinian variety the lexical similarity was calculated.
"-" denotes that comparison data are not available.
In the case of English-French lexical similarity, at least two other studies^[3]^[4] estimate the number of English words directly inherited from French at 28.3% and 41% respectively, with respectively 28.24% and 15% of other English words derived from Latin, putting English-French lexical similarity at around 0.56, with reciprocally lower English-German lexical similarities. Another study estimates the number of English words with an Italic origin at 51%, consistent with the two previous analyses.^[5]

References

Ethnologue.com (lexical similarity values available at some of the individual language entries)
Definition of lexical similarity at Ethnologue.com
Rensch, Calvin R. 1992. "Calculating lexical similarity." In Eugene H. Casad (ed.), Windows on bilingualism , 13-15. (Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110). Dallas: Summer Institute of Linguistics and the University of Texas at Arlington.

Notes

↑ http://www.ethnologue.com/ethno_docs/introduction.asp
↑ See, for instance, lexical similarity data for French, German, English
↑ Finkenstaedt, Thomas; Dieter Wolff (1973). Ordered profusion; studies in dictionaries and the English lexicon. C. Winter. ISBN 3-533-02253-6.
↑ "Joseph M. Willams, Origins of the English Language at". Amazon.com. Retrieved 2010-04-21.
↑ Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge University Press. p. 477. ISBN 0-521-80498-1.

External links

Most similar languages

This article is issued from Wikipedia - version of the Sunday, April 12, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Lexical similarity

Indo-European languages

See also

References

Notes

External links