Handwritten Text Recognition for manuscripts and early printed texts
Ldml - public
1. Lightweight Data Markup Language
and Information Transfer
Sayandeep Khan
Drakoon Aerospace
Invention Report
Public Release
March 13 2012
2. Containts
→The notion of Language
⬔ What is missing
→A language with an inter-sentence relation
⬔ The notion of Sprache
⬔ The statement relations
⬔ Combinatorial Description
→Application of Sprache: the Design of LDML
⬔ Basics
⬔ Translation : Description guided action
⬔ Application : Machine guided investigation
3. The Notion of Language
Alphabet: A set of charachter (basic symbols that can not
be decomposed), written ∑
String: Any finite length sequence of elements of ∑. The
total sets of strings is written ∑*
Grammar: A quadruple (V, T, G, S), where S is a set of
start symbols, and T is a set of what is called terminal
symbols. V is called total vocabulary. S,T ⊂ V. G is a set
of rules, that maps
σ → τ where both σ and τ ∊ (V∪T)*, and τ≠ϕ
Language: The set {w ∊T : S generates w} is a language
generated by the grammar
4. What is missing?
⬔ The language is basically a set of terminal symbols.
⬔ The generation of the terminal symbols are governed
by the grammar
⬔ However no strict relation between each terminal
statement is defined.
⬔ In science, every two statement is Strictly related: with
help of the one, the other can be deduced.
5. Example
⬔ Statements in english language (Each terminal statement):
» Iron is heavier than water
» Iron sinks in water
» Water is denser than air
with zero assitance from physics (which defines terms like
„sinking“ and „denser“, and assigns logical relations), these
sentences can not be linked together.
⬔ Using knowledge of physics, the axiom of transitivity may
be applied
Iron sinks in water AND water is denser than air
⇒ Iron sinks in water AND water sinks in air (From definition)
⇒ Iron sinks in Air. (Transitivity)
6. Remarks
⬔ Notice that the English language alone can not deduce
the two steps as shown in the example.
⬔ Hence the english language alone can not relate the
statements in an order relation like
{statement one, statement two} > {statement three}
⬔ Hence, we propose a language that has such an order
relation defined onto it. Hence, we have {language, order
relation}. We call this tuple a Sprache. Written as
§(G,k) :={L(G), k} where k is the set of order relations.
7. Notion of Sprache
⬔ The sprache is built upon a Language, with an
introduced order relation.
⬔ Asssume the following applies:
∀ α,β ∊ L(G), ∃ ≻ | A ≻ β , α ∊ A, α ⊁ ϕ
⬔ Define:
k : ⋃≻
⬔ Then the sprache is defined as:
§(G) : {L(G), k}
8. The statement relations
⬔ α and β are commutatively related: Written α,β
⬔ α and β are non commutatively related: Written α > β
⬔ α is defined as β : Written α : β
⬔ α is equivalent as β : Written α = β
⬔ α is nagetive to β : Written α ~ β
⬔ α maps to β : Written α # β
⬔ α and β related via unknown : Written α ?
10. Application of Sprache
⬔ Imagine, we want to desccribe the properties of an
object O . Imagine, properties A, and B are conjectured to
be intrinsic to O, but not observed. We write: O > (A,B)
⬔ Imagine, of object O , properties C, and D are
observed . We write: O > (C,D)
⬔ Imagine of an object O , properties E is measured to
be F. We write: O>(E:F)
⬔ It is clear that the notion of Sprache, with a finite set
of relations, can relate the properties of O, generating a
complete scientific description.
11. Conclusion
⬔ Using the notion of Sprache, the description of data
related to anything can be reduced to a strictly related
set of statements. Missing relations indicate lack of
knowledge, worth investigating.
⬔ The notion of sprache can highlight where
knowledge is missing, so a scientist examining the
object can immediately focus on missing knowledge
⬔ Next : the combinatorial model of application of
sprache, a Sprache Prototype developed by BDA, the
LDML, the LDML grammar, and applications of LDML