ppt

RFID Data Management
Kamlesh Laddhad (05329014)
Karthik B.(05329021)
Guide: Prof. Bernard Menezes

Outline
• Introduction to RFID Technology.
• Issues with RFID Technology.
• RFID Data Characteristics.
• Data Warehousing.
– Expressive Temporal Model: Dynamic Relationship ER Model
– RFID - Cuboids.
– Use of Bitmap Datatype.
• Data Cleaning.
– Extensible Sensor stream Processing (ESP)
– Statistical sMoothing for Unreliable RFid data.(SMURF)
• Future Plans.

Introduction
• Radio Frequency Identification:
– It is an Automatic Identification and Data Capture Technology.
– Fast
– No contact or line of sight.
– Uses radio-frequency waves to transfer data
• Components
– Tag: small, low-cost device that can hold a limited amount of data.
• Associated with objects, such as pallets, cases, and even individual items.
– Reader: Recognize presence of tag and read info stored on it.
• Unique electronic product code (EPC) associated with a tag.
• By placing RFID tag readers at various locations, one can track the
movement of objects through supply chain networks.

Applications and Adoptions
• Supply Chain Management: real-time inventory
tracking.
– US Department Of Defense: shipments to armed forces
• Retail: Active shelves monitor product availability
– Wal-Mart, Albertson: Major Retails stores
• Access control: toll collection, transportation.
– Airline luggage management:
• British airways:20 million bags a year
• Implemented to reduce lost/misplaced luggage
• Anti-counterfeiting and security:
– Food and Drug Administration: To reduce counterfeit in
pharmaceutical supply chain

Prospective for RFID research
• The physics of building tags and readers
– Tags have few gates: Apart from basic operation, very less computing power.
– Radio-frequency has some issues with operating in certain physical mediums.
• The privacy and safety issues:
– Complex encryption schemes are not possible on RFID tags.
– Counterfeiting by means of either illegitimate readers or spoofed tags are
possible
– Reader-tag communication is wireless: Third parties can eavesdrop on signals.
• Software Architecture to collect, filter, organize, and answer online
queries:
– No. of tags are proportional to No of items being serviced/tracked.
– No. of readers are proportional to traceable strategic locations/areas
• Each Reader picks up tag signals on continuous basis.
• Data generated by RFID systems is enormous:
• E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day.
• Our Focus: Third Stream.

Data Management Challenges
• Data Explosion : Example
– A retailer with 3,000 stores, selling 10,000 items a
day per store.
– Each item moves 10 times on average before being
sold
• Movement recorded as (EPC, location, second)
– Data volume: 300 million tuples per day.
– Example OLAP Query: “Average time for items to
move from warehouse to checkout counter in March
2006?”.
• Costly to answer if there are a billion tuples for March
2006.

Data Characteristics
• Temporal and history oriented
– Applications dynamically generate observations (readings).
– Objects location and containment relationship among objects changes
– Need: Expressive data model.
• Inaccurate data and implicit semantics
– False positive: Non-existing tag incorrectly read.
– False Negative: Reader missed a tag which was in its vicinity.
– Noisy data & duplicate readings (redundancy): Same tag read more than
once.
– Need: Automated data filtering and transformation.
• Streaming and large volume
– Object stay in place for longer duration: Readers records them
periodically. Large data keeps generating.
– We need to preserve this data for tracking and monitoring.
– Need: Scalable storage scheme, compression techniques to reduce data.
• Data Granularity
– Data collection granularity needs to be decided
– Differs across applications.

Warehousing Helps!!
• Lossless compression
– Remove redundancy: (r1,l1,t1) (r1,l1,t2) ... (r1,l1,t10) => (r1,l1,t1,t10)
– Group objects that move and stay together.
• Data cleaning: Multi-reading, missed-reading, error-reading, bulky movement.
• Data mining: Find trends, outliers, frequent, sequential, flow patterns.
• Multi-dimensional summary: product, location, time, …
– Store manager: Check item movements from the backroom to different shelves
in his store
– Region manager: Collapse intra-store movements and look at distribution
centers, warehouses, and stores
• Query Processing
– Support for OLAP: roll-up, drill-down, slice, and dice
– Path query: New to RFID-Warehouses, about the structure of paths
• What products that go through quality control have shorter paths?
• What locations are common to the paths of a set of defective auto-parts?
• Identify containers at a port that have deviated from their historic paths

Dynamic Relationship ER Model
• Proposed by Wang and Liu from Siemens.
• RFID entities are static and are not altered.
• RFID relationships: dynamic and change all the
time.
• Two types of dynamic relationships added:
– Event-based dynamic relationship. A timestamp
attribute added to represent the occurring timestamp
of the event.
– State-based dynamic relationship. tstart and tend
attributes added to represent the lifespan of a state.

• Static entity table
– OBJECT (object_epc, name, description)
– LOCATION (location_id, name, owner)
• Dynamic relationship tables
– OBSERVATION(sensor_epc, value, timestamp)
– OBJECTLOCATION(epc, location_id, tstart, tend)
– TRANSACTIONITEM(transaction_id, epc,
timestamp)
– SENSOR (sensor_epc, name, description)
– TRANSACTION (transaction_id, transaction_type)
– CONTAINMENT(epc, parent_epc, tstart, tend)
– SENSORLOCATION(sensor epc, location
id,position, tstart, tend)

Monitoring.
• Missing RFID Object Detection:
– Find when and where object holding EPC= `MEPC’
was lost.
• select location_id, tstart, tend from objectlocaiton
where epc='MEPC' and tstart = ( select max(o.tstart) from
objectlocation o where o.epc='MEPC' )
– Check if there are missing objects at current location C,
knowing that all objects were complete at previous
location L at time T.
• select l.epc from objectlocation l where l.location_id =
'L' and l.tstart <= 'T' and l.tend >= 'T' and l.epc not
in ( select c.epc from objectlocation c where
c.location_id = 'C' )

Tracking
• RFID Object Moving Time Inquiry:
– Time it takes to supply ‘OEPC’ from location S to
location E?
• select (e.tstart-s.tstart) as supplying_time from
objectlocation e, objectlocation s where e.epc =
'OEPC' and s.epc='OEPC' and s.location_id ='S' and
e.locaiton_id='E'

Compression Idea
• Bulky object movements
– Objects often move and stay together through the supply chain.
– If 1000 packs of product P stay together at the distribution center,
register a single record.
– (GID, distribution center, time_in, time_out).
– GID is a generalized identifier that represents the 1000 packs that stayed
together at the distribution center
• Analysis usually takes place at a much higher level of abstraction
than the one present in raw RFID data
Factory
Dist. Center 1
Dist. Center2
…
10 pallets
(1000 cases)
store 1
store 2
…
20 cases
(1000 packs)
shelf 1
shelf 2
…
10 packs
(12 sodas)

RFID Cuboids
• Fact Table: (EPC, location, time_in, time_out).
• In supply chain: Items travel through a series of locations.
• Query: what is the average time that product P stays at store in
Location A?
• Traditional cubes miss the path structure of the data
• Stay Table: (GIDs, location, time_in, time_out: measures):
– Records information on items that stay together at a given location
– If using record transitions: difficult to answer queries, lots of
intersections needed
• Map Table: (GID, <GID1,..,GIDn>)
– Links together stages that belong to the same path. Provides additional:
compression and query processing efficiency
– High level GID points to lower level GIDs
– If saving complete EPC Lists: high costs of IO to retrieve long lists,
costly query processing
• Information Table: (EPC list, attribute 1,...,attribute n)
– Records path-independent attributes of the items, e.g., color,
manufacturer, price..

EPC Overview
• Electronic product code
– Standard naming scheme, proposed by Auto-Id Center.
– An EPC uniquely identifies an item.
– Format: <Header, Manager_No., Object Class, Serial No.>
• Header: Identifies the length, type, structure, version and generation
of EPC.
• Manager Number: Identifies an organizational entity.
• Object Class: Identifies a “class”, or type of thing.
• Serial Number: Specific instance of the Object Class being tagged.
– We will refer to
• <Header, Manager No, Object Class>: Prefix
• <Serial No.>: Suffix

Use of Bitmap Datatype
• Observation: Items move together.
– Groups of items in the same proximity - e.g. on a shelf, on a
shipment
– Groups of items with same property - e.g. Same product
• Use a bitmap type for modeling a collection of EPCs
that can occur in item tracking applications.
– Instead of storing a tuple per item store a tuple for all the
items having same prefix.
– New extra fields instead of epc:
• <Len, Suffix_length, Prefix, suffix_start, Suffix_end, bitmap>

Example: Product Inventory
• With EPC Collections • With epc_bitmaps
Store_id Prod_id Time Item_collection
s1 p1 t1 epc11,
epc12,
epc13,
…
s1 p2 t2 epc21,
epc22,
epc23,
…
… … … …
Store_id Prod_id Time Item_bmap
s1 p1 t1 bmap1
s1 p2 t2 bmap2
… … … …

Use of Bitmap Datatype
Header EPC_Manager Object_Class Serial_Number
2-bits 21-bits 17-bits 24-bits
0x4AA890001F62C160
…………………………
0x4AA890001FA0B38E
Len Suff_len Prefix Suff_start Suff_end bitmap
64 24 0x4AA890001F 0x62C160 0xA0B38E 101001…00010

Bitmap Operations
• To use this with such datatype in SQL, we need
operations on such bitmaps.
• Conversion and couting Operations: epc2Bmap,
bmap2Epc and bmap2Count
• Pairwise Logical Operations: bmapAnd, bmapOr,
bmapMinus, and bmapXor
• Maintenance Operations: bmapInsert and bmapDelete
• Membership Testing Operation: bmapExists
• Comparison Operation: bmapEqual

Use of these operations in SQL
• Items added to a given shelf between time t1 and t2.
– SELECT bmap2Epc(bmapMinus(s2.item_bmap,
s1.item_bmap)) FROM Shelf_Inventory s1, Shelf_Inventory
s2 WHERE s1.shelf_id = <sid1> AND s1.shelf_id =
s2.shelf_id AND s1.time = <t1> AND s2.time = <t2>;
• Book store categorizes books in various categories.
– Following query determines the shelves where the books with
property ’Adventure’ and ’Romance’, are currently present in
the store.
– SELECT s.shelf_id FROM Shelf_Inventory s WHERE
bmap2Count(bmapAnd( s.item_bmap, SELECT
bmapAnd(p.Adventure, p.Romance) FROM
Propery_Inventory p) ) > 0; AND s.time=<current_date>;

Road Ahead
• Extension to bitmap proposal:
– Bitmap datatype is more appropriate for initial bulk-load & batch updates.
– It performs badly for incremental updates.
– A ‘hybrid Scheme’ for incremental Updates:
• Maintain inventories periodic checkpoints using bitmaps.
• For changes occurring between checkpoints, Maintain a traditional item-level
table.
• Answer queries by merging the latest checkpoint bitmap with the
corresponding duration’s item-level data.
• The epc_suffix in the collection may not be contiguous
– The bitmap will be sparse- Lot of zeros.
– Compress this using some encoding scheme
• Good for initial bulk loading and batch updates
• May reduce efficiency of bitmap operations.

Open Problems
• Efficient methods data mining problems
– Trend analysis
– Outlier detection
– Path clustering
• We will try exploring data mining applications to
RFID data.

Issues in Data Cleaning
• Lack of Completeness
– RFID readers capture only 60-70% of all tags that are in the
vicinity
– Smoothing of data is done to rectify the loss of intermediate
messages
• Temporal Nature of data or tag dynamics
– RFID tags are in motion and that is what makes them more
difficult to handle
– But motion of a tag causes dropping of messages
• RFID data streams are very fast and are huge in
number
– Hence filtering is important before sending them to database

Current Strategies
• Temporal Granule:
– Based on the fact that tag data do not differ much
over a small time period
– Data can be clubbed on a small time frame
• Spatial Granule:
– Similarly, data from physically close readers are also
homogeneous

Stages of ESP
• Point: operates over a single value in a sensor
stream, filtered by a predicate in the WHERE
clause
• Smooth: granularity defined by applications to
correct for missed readings temporally (over one
input only); uses aggregate function over the
input.
• Merge: granularity specified by the application
to correct for missed readings spatially; grouped
by the specified spatial granule.

Stages of ESP (contd.)
• Arbitrate: deals with
conflicts between different
spatial granules; grouped by
spatial granule first and then
uses HAVING construct to
determine those conflicts
• Virtualize: used for
combining data streams from
different sources, could also
be different devices; join
construct is used to combine
the different data streams
and then filtered using some
predicate

Smooth stage
• False Positives: (erroneous readings) reporting objects
that are not actually present
• False Negatives: (missed readings) not reporting objects
that actually are present
False positives and False Negatives [Jeff06]

Tag List
• The reader has an internal table called the Tag List.
• An epoch is the smallest unit of interaction between the reader
and the middleware.
• Every epoch consists of certain number of Interrogation cycles
• Interrogation Cycle is one run of the reader protocol to
determine all tags
• At every epoch the reader sends the tag list to the middleware.
Tag ID Responses Timestamp
12341234 6 t1
12347890 1 t2

SMURF – Per tag Cleaning
• SMURF uses statistical methods to reduce the false
negative and false positives happening in the RFID
stream.
• The goal here is two fold: one is to determine the
statistical window size, and secondly, ensuring that the
transition of the tags is determined.
• To determine the window size we need to fit a
probability distribution to the sample size
• And to determine the transition of the tag out of the
reader's vicinity, we define a 98% confidence interval
within that probability distribution function on the
sample size |Si|.

SMURF – Per tag Cleaning (contd.)
• Using the tag list, per-epoch sampling
probability, pi,t is determined,
pi,t = number of times tag was read in a epoch /
interrogation cycles per epoch
• We average this over the sample size |Si| to get
the average read rate (pi
avg) for a tag i.
• If same probability of pi is assumed for each
epoch throughout the window then each
successful observation is like a Bernoulli trail.

SMURF – Per tag Cleaning (contd.)
• So, |Si| is the binomial random variable for a sample Si
with mean = wi. pi
avg and variance = wi. pi
avg. (1-pi
avg)
• Now using this we can express the window size as a limit,
• If the current window size is less than the calculated one
then the window size is adjusted accordingly.
• Similarly using the Central limit theorem for transition
detection we get ||Si| - μ| > 2 σ

Normal Sliding window….
• Epoch based mid-point sliding window
• Emits a reading with an epoch value corresponding to
the middle of the window

Ensuring Completeness
• In the first window, pi
avg demands a larger window
• Thus window size is increased

Transition Detection
• In the first window the number of readings decreases
significantly (and statistically)
• Thus a transition is likely to have occurred; so window
is halved
[Fraklin06]

SMURF – Multi-tag aggregate
Cleaning
• Similar to per-tag cleaning, the window for multi-tag cleaning is
determined by:
Here, pavg is the average per-epoch sampling probability over all
observed tags.
• To detect the transition in population count, we estimate the
population count of two windows [t – wi, t] and [t – wi/2, t]; with
true populations: Nw & Nw’
• Thus, for a transition to have happened, we need the difference
between the two estimates to be within the limit:
2(σw + σw’)

SMURF – Multi-tag aggregate
Cleaning
• To calculate the estimate of population count, we use
π-estimators; The estimated population count is given
by:
• Similarly by π-estimators, and assuming independence
across different tags, the variance of the estimate is
estimated as:
• Here πi is probability of reading the tag i at least once
during the whole window, given by 1 – (1 – pi
avg)w

The Road ahead…
• Applications in RFID do not accept any delays in the
data delivery
• Data is either present in the cache or the database; data
in the database increases processing time and data in
cache does not understand SQL like queries
• Anomaly detection in object tracking is also an
important part of object tracking
• Issues like untraceability, forward security, and database
desynchronization are still not completely resolved.
• One more serious problem with RFID is counterfeiting
• In the next stage we expect to look into some of these
issues

References
• Xiaolei Li, Hector Gonzalez, Jiawei Han and
Diego Klabjan. Warehousing and analyzing
massive RFID data sets. ICDE, 2006.
• Fusheng Wang and Peiya Liu. Temporal
management of RFID data. VLDB, 2005.
• Timothy Chorma, Ying Hu, Seema Sundara and
Jagannathan Srinivasan. Supporting RFID-based
item tracking applications in oracle DBMS using
a bitmap datatype. VLDB, 2005.

References
• Minos Garofalakis, Shawn R. Jeffery and Michael J.
Franklin. Adaptive cleaning for RFID data streams.
VLDB, 2006.
• J. Franklin, Wei Hong, Shawn R. Jeffery, Gustavo
Alonso and Jennifer Widom. Declarative support for
sensor data cleaning. In Pervasive, 2006.
• Sridhar Ramachandran Sudarshan S. Chawathe, Venkat
Krishnamurthy and Sanjay E. Sarma. Managing RFID
data. VLDB, 2004.

ppt

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie ppt

Ähnlich wie ppt (20)

Mehr von Videoguy

Mehr von Videoguy (20)

ppt