Expert’s Profiling Specialists

Retrieve the Person name, Organization and many metadata from the millions of cases’ PDF files. The source of these files is a bucket at S3 Amazon Server OR emails.
There are multiple tools / frameworks we used to overcome this challenge. We accomplished this using a 3 step process with the tools below. Amazon S3 web service Tika.jar Solr - NER Step 1. Amazon S3 was used to retrieve the PDF files from the S3 bucket Step 2. We used the tika tool to generate the plain text files found in the PDF's . Step 3. We integrated the Solr NER that are used to search and retrieve the individuals name from some snippets of a plain text file and then stored those metadata in Mysql database.
We developed the search base application to help with identification research. There are multiple search criteria's which we embedded in the system. For example, search with court name, judges, case type and disciplines. There are multiple ways to modify the cases by adding attorneys and multiple documents related to the case.
Our client has a research team to study the cases which are in the PDF files. We developed this application where all aspects of the case has been stored in the database. This allows the researcher to easily identify any case by searching using the stored metadata. This greatly reduces the time that an attorney or other law professional has to research a particular case.