Luis Beltrán
• Microsoft MVP (AI, Developer Technologies, Azure)
• Researcher @ Tomás Bata University in Zlín
• Lecturer @ Tecnológico Nacional de México en
Celaya
@darkicebeam
luis@luisbeltran.mx
luisbeltran.mx
Invoices Credit application
forms
Tax statements Banking forms
Medical forms Authorization forms Construction forms …
And other different
form types
Introduction
Machine Learning
• Data Science + Statistics
• It allows a computer to learn without being explicitly programmed.
• ML Models:
• Regression
• Classification
• Clustering
• Anomaly Detection
Vision
It allows your apps to
understand images and
videos, detecting faces
and feelings
Voice
Listen and identify users
through their voice,
develop apps that talk to
them, understand their
intention, filter noise.
Decision
Make smart decisions in
seconds by identifying
unwanted content, anomalies
and also create personalized
experiences
Language
It processes text and identifies
what users want.
Democratizing Artificial Intelligence
Azure Cognitive Services
microsoft.com/cognitive
Form Recognizer Service
Data extraction in any business process that takes forms
and needs to generate structured data
No labeling required and only a handful of sample
documents are needed to train a model (5 forms)
1. Credit Card Application
Company
Applicant
IBANs
2. Handwritten data
Name
Job
Contact details
3. Invoice
Tables
Amount
Name
Address
Form Recognizer
Benefits
• Save a significant amount of time on repetitive document reading and data
entry.
• Eliminate the manual data input error
• Automate workflows from start to finish by adding intelligent decision-making.
• Find and direct anomalous documents to a user for review
• Reduce time and costs, allowing employees to focus on more important tasks.
Pricing
Instance Document type Price (USD)
Free: web / container All $0: 0 – 500 pages free per month
S0: web / container Custom $50 per 1000 pages
S0: web / container Pre-built:
- Document
- Layout
- Receipt
- Invoice
- ID
- W-2
- Card (Insurance, Vaccine,
Business)
$10 per 1000 pages
Learn more at…
Form Recognizer Documentation
https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/
Microsoft Learn: Introduction to Form Recognizer
https://docs.microsoft.com/en-us/learn/modules/intro-to-form-recognizer/
Form Recognizer Studio
https://formrecognizer.appliedai.azure.com/studio
Thank you for your attention!
Luis Beltrán
https://about.me/luis-beltran
Carla Mamani
https://linktr.ee/carlychavez1
Hinweis der Redaktion
One of the most common problems that companies have is that there is too much valuable information that is often not processed or has to be transcribed manually, such as invoices, receipts where there is data such as how many items were purchased, amounts and also credit application forms, tax forms which often have a defined format or a very aligned structure in which the only thing that changes is the information that is contained
It might happen that this information needs to be transcribed, that is, there is someone dedicated to capture the data and enter it into a system; transcription errors can happen, so how to solve this issue? One option is to use machine learning.
Machine Learning is a field of study that mixes data science with statistics to allow computers the ability to "learn" without being explicitly programmed. This allows users to extend the experience and improve outcomes with minimal human intervention.
ML creates results using regression, anomaly detection, clustering, and classification models. What business questions are you trying to solve? That's the key to determine the type of algorithm or method that you're going to apply.
So now you might think. Do I need to be an expert in Machine Learning, python, neural networks, and other techniques to analyze huge data sets in order to extract data from documents with high accuracy, precision, and confidence? Is this a task only within the reach of data scientists?
Thanks to the cloud cognitive services offered by Microsoft Azure, you don't need to be an expert or data scientist to inject artificial intelligence into your applications.
Cognitive Services include APIs, SDKs, and available services that aim to help developers build intelligent applications without the use of "direct artificial intelligence" and without the need of data science skills and knowledge. In short, Azure Cognitive Services allows developers to easily add cognitive features to their applications: so applications will be able to see, listen, speak, understand and even begin to reason. The Azure Cognitive Services offering can be divided into four main pillars: vision, speech, language, and decision.
There is also another part of Cognitive Services called Azure Applied AI Services which allows businesses to get AI solutions up and running in a matter of days, not months.
It enables them to solve common usage scenarios with AI for specific tasks in order to deliver tangible value to organizations quickly, accelerating development and maximizing data security and privacy.
Applied AI services combine Azure Cognitive Services, task-specific AI, and business logic to provide you with key AI services. And Form Recognizer is included in this offer.
Form Recognizer is a cognitive service that uses machine learning, deep learning, and optical character recognition to automatically read information from images and PDFs. Form Recognizer identifies and extracts data from your documents and organizes the information for you, so it can detect sales amounts, names of people which are in a specific section of the document, such as in a table or at the top.
In this service there are two options, provide examples with or without labeling in order to generate a custom machine learning model.
Document recognition can be custom or pre-built
Form Recognizer Layout API can extract text and table structures, including the row and column numbers associated with the text, and their bounding box coordinates.
Pre-built Form Recognizer models are available for four applications: invoices, sales receipts, ID cards, and business cards.
Invoice model
The pre-built invoice model extracts data from invoices in various formats and returns structured data. This model extracts key information such as invoice ID, customer and supplier details, shipping and billing information, price totals, and tax amounts.
In addition, this model is designed to analyze and return all text and tables into structured data to automate the billing process.
Receipt model
This model is used to analyze English-language sales revenue from restaurants, retail stores, gas stations, and more, from Australia, Canada, Great Britain, India, and the United States. It extracts the information you need, such as the time and date of the transaction, merchant information, and total and tax amounts. Data can be extracted from different types of receipts, both in scanned copies and in telephone images.
ID Model
This model extracts information from world passports and U.S. driver's licenses, such as document number, name, country of residence, and expiration date, and returns it in a structured list.
Business card model
This model extracts key information, such as names and contact numbers, and compiles it into an organized JSON response.
Custom models
Form Recognizer offers custom data extraction 'models' that can be tailored to your specific forms to extract text, key/value pairs, and table data.
Custom models are created by loading five or more sample forms and labeling them to specify the data you are interested in. Form Recognizer then "trains" a custom model that can extract data tailored specifically to your forms. After you train a custom model, you can test and retrain it to reliably extract data from more forms based on your needs.
Form Recognizer can be integrated into applications, with SDKs available in many languages, shells, and through REST, allowing you to enhance existing workflows with unstructured data from digital or paper-based documents.
For example, you can build a mobile application in which a web request is sent with the image of your document to be processed in the cloud and you get the answer you are looking for.
Organizations often receive various types of forms, from which it can be difficult to extract data without painful manual data entry. By extracting data digitally and combining it with existing operating systems and data storage services, organizations can gain insights and deliver value to their customers and business users.
Many transactional use cases require manual intervention, but data entry can be tedious and lead to errors. Form Recognizer can be integrated into existing applications and provide a standard JSON response, allowing developers to verify manual data entry with what Form Recognizer detects. Standard responses allow for simple logic to check whether the data points of manual data entry and Form Recognizer are different, for example, if a decimal value was lost during manual data entry. This can help reduce errors and increase accountability in companies. For example, banks can ensure that customers don't get loans they shouldn't get, or that they're not denied for small human errors, such as an out-of-place decimal point.
Here you can see the pay as you go offer for Form Recognizer. You can create free instances at no cost for test, small scenarios or set up a standard instance for production, real-world solutions
Here you have some links to learn more in case you are interested