Caverphone

The Caverphone phonetic matching algorithm^[1]^[2] was created by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002, revised in 2004. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches. The algorithm is optimised for accents present in the study area (southern part of the city of Dunedin, New Zealand).

Procedure

The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.

The algorithm is as follows:

Convert to lowercase
Remove anything not A-Z
If the name starts with...
1. cough, replace it by cou2f
2. rough, replace it by rou2f
3. tough, replace it by tou2f
4. enough, replace it by enou2f
5. gn, replace it by 2n
If the name ends with
1. mb, replace it by m2
Replace
1. cq with 2q
2. ci with si
3. ce with se
4. cy with sy
5. tch with 2ch
6. c with k
7. q with k
8. x with k
9. v with f
10. dg with 2g
11. tio with sio
12. tia with sia
13. d with t
14. ph with fh
15. b with p
16. sh with s2
17. z with s
18. any initial vowel with an A
19. all other vowels with a 3
20. 3gh3 with 3kh3
21. gh with 22
22. g with k
23. groups of the letter s with a S
24. groups of the letter t with a T
25. groups of the letter p with a P
26. groups of the letter k with a K
27. groups of the letter f with a F
28. groups of the letter m with a M
29. groups of the letter n with a N
30. w3 with W3
31. wy with Wy
32. wh3 with Wh3
33. why with Why
34. w with 2
35. any initial h with an A
36. all other occurrences of h with a 2
37. r3 with R3
38. ry with Ry
39. r with 2
40. l3 with L3
41. ly with Ly
42. l with 2
43. j with y
44. y3 with Y3
45. y with 2
remove all
1. 2s
2. 3s
put six 1s on the end
take the first six characters as the code

Examples

Lee -> lee
lee -> l33
l33 -> L33
L33 -> L
L -> L111111
L111111 -> L11111

Thompson -> thompson
thompson -> th3mps3n
th3mps3n -> th3mpS3n
th3mpS3n -> Th3mpS3n
Th3mpS3n -> Th3mPS3n
Th3mPS3n -> Th3MPS3n
Th3MPS3n -> Th3MPS3N
Th3MPS3N -> T23MPS3N
T23MPS3N ->  TMPSN
TMPSN111111 -> TMPSN1

References

↑ Milette, Greg; Stroud, Adam (2012-05-18). Professional Android Sensor Programming. John Wiley & Sons. pp. 421–. ISBN 9781118240458. Retrieved 19 February 2013.
↑ Phua, Clifton; Lee, Vincent; Smith, Kate (2006). "The Personal Name Problem And a Recommended Data Mining Solution". Encyclopedia of Data Warehousing and Mining. CiteSeerX: 10.1.1.127.5111.

External links

Caversham Project http://caversham.otago.ac.nz/
Original (2002) Caverphone algorithm http://caversham.otago.ac.nz/files/working/ctp060902.pdf
Revised (2004) Caverphone algorithm http://caversham.otago.ac.nz/files/working/ctp150804.pdf
Implementations:
- C# Revised Implementation: http://sounditout.codeplex.com/
- Java implementation in the Apache Commons Codec project
- PHP implementation https://github.com/kiphughes/caverphone
- Python Implementation caverphone algorithm (version 2.0) - AdvaS Advanced Search project

This article is issued from Wikipedia - version of the Wednesday, October 21, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Caverphone

Procedure

Examples

See also

References

External links