Captricity
Operating system | All |
---|---|
Type | Optical character recognition (OCR); ICR; Handwriting Recognition, Redaction |
Captricity is a data capture software program (and the company that sells it) that uses a combination of machine-learning and human verification to perform OCR data capture from hand-filled forms.
Background
Captricity was incubated in the Code for America incubator program and is used by government agencies, health clinics and global health practitioners, and researchers such as NYU's Center for Technology and Economic Development.
Captricity was founded in 2011 by Kuang Chen and former Harvey Danger musician Jeff J. Lin. The idea for Captricity came from Chen’s PhD dissertation at UC Berkeley. His research focused on data-centric approaches to increase the efficiency of low-resource organizations, so they could better serve disadvantaged clients.
Company
Captricity is currently headquartered in located in downtown Oakland, CA,[1] and according to its LinkedIn profile, it has 51-200 employees.[2]
Technology
Captricity capitalizes on the process of crowd sourcing, parceling out OCR verification tasks to human operators.[3] Captricity claims that their technology achieves 99.81% accuracy.[4] Captricity’s machine learning elements combine OCR, ICR and OMR.
Captricity captures handwritten information from forms. This data then populates searchable spreadsheets (like a .csv Excel file). Captricity does not support unstructured data.
Privacy
To maintain the privacy of the information in the forms, each form is “shredded” into distinct fields and each field is verified by one or more different people.[5] Captricity claims that since no one person can see more than one field from a document, privacy is maintained. Captricity uses Amazon's Mechanical Turk System to perform this human verification step.[6] For example, a worker may see a stream of 4-digit numbers, not knowing that it is the last portion of a collection of US social security numbers.
Data redaction
Captricity performs redaction in addition to OCR. Redaction is a service in which any field or collection of fields can be “blacked out” in the document template.[7] Any information contained in those fields will not be read by the system. For example, if a courthouse wants to release their records to the public, but wants to keep the arresting officer’s name private, the field containing this information can be redacted.
Captricity and Non-profits
Non-profit and academic researchers often conduct survey research in order to conduct Monitoring and Evaluation of their programs or projects. The Center for Effective Global Action (CEGA), which is affiliated with UC Berkeley, announced a partnership with Captricity in August of 2012.[8] Captricity donates digitization services to non-profits via its Data for Communities program, and offers discounts to non-profit organizations such as CEGA members.
References
- ↑ "Captricity Oakland, CA".
- ↑ "LinkedIn Page for Captricity". LinkedIn. Retrieved 2015.
- ↑ Howard, Alex. "A startup takes on "the paper problem" with crowdsourcing and machine learning". strata.oreilly. OReilly. Retrieved 5 October 2012.
- ↑ Chen, Kuang. "Shreddr: pipelined paper digitization for low-resource organizations" (PDF). University of California - Berkeley. Retrieved 2011.
- ↑ G., Little (2011). "Human ocr: Insights from a complex human computation process".
- ↑ HARDY, QUENTIN. "How Big Data Gets Real". NY Times. Retrieved 4 June 2012.
- ↑ WILLIAM, SAFIRE. "Redact This". New York Times. NY Times. Retrieved 9 September 2007.
- ↑ "CEGA partners". UC Berkeley. Retrieved August 2012.
Further reading
- How Big Data Gets Real (The New York Times, 4 June 2012)