SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
.
The Over-the-Network Problem
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 2/18
...
2/18
.
Over-the-Network Problem
Data
Indexer
Index
Network
Traditional
Client
Data
Indexer
IndexRead,
Write
Stringex
Client
The
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 3/18
...
3/18
.
Everything is Over-the-Network
• ... in clouds
• ... inside data centers
• ... in home networks
.
When running over-the-network
..
.
... the biggest problem is that there is a hard physical limit to
throughput
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 4/18
...
4/18
.
The "Best" Tools Today
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 5/18
...
5/18
.
The Closests Tools
1. Lucene running locally only
2. Google Data APIs, that allow for shared control
◦ not really indexing, through
3. .... that's pretty much it!
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 6/18
...
6/18
.
Target Applications
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 7/18
...
7/18
.
Target Applications
Data
Indexer
Index
Stringex
Client
The
• server-less applications (read:
fully distributed)
• large-scale crowdsourcing
connected via cloud storage
• distributed storage --
the same problem
• ....
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 8/18
...
8/18
.
The Stringex Problem
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 9/18
...
9/18
.
The Stringex Problem
• a very straightforward optimization problem
minimize w1ROUT + w2RIN (1)
subject to (2)
0 < RIN ≤ ROUT ≤ C, (3)
SLOCAL ≤ M ≤ SREMOTE, (4)
NLOCAL ≤ NREMOTE ≤ NUSER, (5)
• R is rate, throughput, etc.
• S is storage size, can be local and
remote
• C and M are constants, set by user
• N is number of files over which the
index is split
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 10/18
...
10/18
.
Naive Stringex Client
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 11/18
...
11/18
.
Practical Assumptions
• JSON input, only top level is indexed, otherwise stringified
• several efficiency tricks
1. split index in relatively small files
2. distribute smoothly using random hashing
3. update parts on timeout -- accumulate multiple intensive updates
4. create special mapswhich allow for browsing
• JSON aggregations in files : one line is base64( JSON sring)
◦ if bzip2 algorithm is within reach, you can have base64( bzip2( JSON
string))
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 12/18
...
12/18
.
Naive Client: Data Structure
INPUT JSON { name : value1, age : value2, …}
Files
…name .imap
{
bk : {
ik : start,end ,
… next ik
},
… next bk
}
name .vmap
{
value : bk ,
… next value
}
name .bk1
name .bk2
…
Key: name
…
Key: age
docs .imap
{
bk : {
docid :
start,end ,
… next docid
},
… next bk
}
docs .bk1
docs .bk2
…
Docs
No . vmap
SameSame
Index Data
• meta is separate from
data
• smart maps, lets to read/
write sections of files
◦ specifically for chunk*
API in Dropbox
• filenames are head 2-3
symbols of MD5 hashes
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 13/18
...
13/18
.
Naive Client: Sync Engine Design
Stringex
Index
Stringex
Client
The
Sync
Engine
Optimization
Local
Cache
Check
1 2
Use
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 14/18
...
14/18
.
Evaluation
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 15/18
...
15/18
.
Stringex vs Lucene
3.15 3.85 4.55 5.25 5.95 6.65
Index Size (log)
2.55
2.65
2.75
2.85
2.95
3.05
3.15
3.25
Throughput(logofbytes/doc)
Lucene
Stringex
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 16/18
...
16/18
.
Wrapup
• https://github.com/maratishe/stringex has JS client
• I also have a PHP client for command line Stringex
• stringex is better for browsing because items cluster naturally -- better than
Lucene
◦ I use it for small browsable summaries of datasets
◦ ... and context-based browsable datasets
• many other uses are possible
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 17/18
...
17/18
.
That’s all, thank you ...
M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 18/18
...
18/18

Weitere ähnliche Inhalte

Ähnlich wie A New Practical Design for Browsable Over-the-Network Indexing

NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...Tokyo University of Science
 
Com 135 final project user manual
Com 135 final project user manualCom 135 final project user manual
Com 135 final project user manualbiasimistfur1984
 
A Software Design and Algorithms for Multicore Capture in Data Center Forensics
A Software Design and Algorithms for Multicore Capture in Data Center ForensicsA Software Design and Algorithms for Multicore Capture in Data Center Forensics
A Software Design and Algorithms for Multicore Capture in Data Center ForensicsTokyo University of Science
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraPyData
 
Running head network design 1 netwo
Running head network design                             1 netwoRunning head network design                             1 netwo
Running head network design 1 netwoAKHIL969626
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rulesFreddy Buenaño
 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out CodeTokyo University of Science
 
Course Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docxCourse Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docxmarilucorr
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...
Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...
Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...Tokyo University of Science
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Ccna3 mod1-classless routing
Ccna3 mod1-classless routingCcna3 mod1-classless routing
Ccna3 mod1-classless routingdborsan
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Matrix - One-year in, Matthew Hodgson, Matrix.org
Matrix - One-year in, Matthew Hodgson, Matrix.orgMatrix - One-year in, Matthew Hodgson, Matrix.org
Matrix - One-year in, Matthew Hodgson, Matrix.orgAlan Quayle
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3edKhánh Ghẻ
 

Ähnlich wie A New Practical Design for Browsable Over-the-Network Indexing (20)

NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
 
Com 135 final project user manual
Com 135 final project user manualCom 135 final project user manual
Com 135 final project user manual
 
A Software Design and Algorithms for Multicore Capture in Data Center Forensics
A Software Design and Algorithms for Multicore Capture in Data Center ForensicsA Software Design and Algorithms for Multicore Capture in Data Center Forensics
A Software Design and Algorithms for Multicore Capture in Data Center Forensics
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- Frontera
 
Running head network design 1 netwo
Running head network design                             1 netwoRunning head network design                             1 netwo
Running head network design 1 netwo
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rules
 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
 
Course Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docxCourse Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docx
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...
Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...
Towards a Practical Method for Interactive Traffic Visualizations in Data Cen...
 
OHM CAD SYSTEM Capabilities
OHM CAD SYSTEM CapabilitiesOHM CAD SYSTEM Capabilities
OHM CAD SYSTEM Capabilities
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Ccna3 mod1-classless routing
Ccna3 mod1-classless routingCcna3 mod1-classless routing
Ccna3 mod1-classless routing
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Matrix - One-year in, Matthew Hodgson, Matrix.org
Matrix - One-year in, Matthew Hodgson, Matrix.orgMatrix - One-year in, Matthew Hodgson, Matrix.org
Matrix - One-year in, Matthew Hodgson, Matrix.org
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3ed
 
PACE-IT: Introduction to IPv4 (part 2) - N10 006
PACE-IT: Introduction to IPv4 (part 2) - N10 006 PACE-IT: Introduction to IPv4 (part 2) - N10 006
PACE-IT: Introduction to IPv4 (part 2) - N10 006
 

Mehr von Tokyo University of Science

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...Tokyo University of Science
 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesTokyo University of Science
 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Tokyo University of Science
 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?Tokyo University of Science
 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Tokyo University of Science
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Tokyo University of Science
 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Tokyo University of Science
 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingTokyo University of Science
 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...Tokyo University of Science
 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesTokyo University of Science
 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesTokyo University of Science
 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicTokyo University of Science
 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsTokyo University of Science
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsTokyo University of Science
 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksTokyo University of Science
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in CloudsTokyo University of Science
 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTokyo University of Science
 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Tokyo University of Science
 
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
Browser Visualization using PNGs Generated by HTML5 Workers on MulticoreBrowser Visualization using PNGs Generated by HTML5 Workers on Multicore
Browser Visualization using PNGs Generated by HTML5 Workers on MulticoreTokyo University of Science
 

Mehr von Tokyo University of Science (20)

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...
 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching Logic
 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on Metromaps
 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service Networks
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
 
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
Browser Visualization using PNGs Generated by HTML5 Workers on MulticoreBrowser Visualization using PNGs Generated by HTML5 Workers on Multicore
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

A New Practical Design for Browsable Over-the-Network Indexing

  • 1.
  • 2. . The Over-the-Network Problem M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 2/18 ... 2/18
  • 3. . Over-the-Network Problem Data Indexer Index Network Traditional Client Data Indexer IndexRead, Write Stringex Client The M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 3/18 ... 3/18
  • 4. . Everything is Over-the-Network • ... in clouds • ... inside data centers • ... in home networks . When running over-the-network .. . ... the biggest problem is that there is a hard physical limit to throughput M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 4/18 ... 4/18
  • 5. . The "Best" Tools Today M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 5/18 ... 5/18
  • 6. . The Closests Tools 1. Lucene running locally only 2. Google Data APIs, that allow for shared control ◦ not really indexing, through 3. .... that's pretty much it! M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 6/18 ... 6/18
  • 7. . Target Applications M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 7/18 ... 7/18
  • 8. . Target Applications Data Indexer Index Stringex Client The • server-less applications (read: fully distributed) • large-scale crowdsourcing connected via cloud storage • distributed storage -- the same problem • .... M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 8/18 ... 8/18
  • 9. . The Stringex Problem M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 9/18 ... 9/18
  • 10. . The Stringex Problem • a very straightforward optimization problem minimize w1ROUT + w2RIN (1) subject to (2) 0 < RIN ≤ ROUT ≤ C, (3) SLOCAL ≤ M ≤ SREMOTE, (4) NLOCAL ≤ NREMOTE ≤ NUSER, (5) • R is rate, throughput, etc. • S is storage size, can be local and remote • C and M are constants, set by user • N is number of files over which the index is split M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 10/18 ... 10/18
  • 11. . Naive Stringex Client M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 11/18 ... 11/18
  • 12. . Practical Assumptions • JSON input, only top level is indexed, otherwise stringified • several efficiency tricks 1. split index in relatively small files 2. distribute smoothly using random hashing 3. update parts on timeout -- accumulate multiple intensive updates 4. create special mapswhich allow for browsing • JSON aggregations in files : one line is base64( JSON sring) ◦ if bzip2 algorithm is within reach, you can have base64( bzip2( JSON string)) M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 12/18 ... 12/18
  • 13. . Naive Client: Data Structure INPUT JSON { name : value1, age : value2, …} Files …name .imap { bk : { ik : start,end , … next ik }, … next bk } name .vmap { value : bk , … next value } name .bk1 name .bk2 … Key: name … Key: age docs .imap { bk : { docid : start,end , … next docid }, … next bk } docs .bk1 docs .bk2 … Docs No . vmap SameSame Index Data • meta is separate from data • smart maps, lets to read/ write sections of files ◦ specifically for chunk* API in Dropbox • filenames are head 2-3 symbols of MD5 hashes M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 13/18 ... 13/18
  • 14. . Naive Client: Sync Engine Design Stringex Index Stringex Client The Sync Engine Optimization Local Cache Check 1 2 Use M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 14/18 ... 14/18
  • 15. . Evaluation M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 15/18 ... 15/18
  • 16. . Stringex vs Lucene 3.15 3.85 4.55 5.25 5.95 6.65 Index Size (log) 2.55 2.65 2.75 2.85 2.95 3.05 3.15 3.25 Throughput(logofbytes/doc) Lucene Stringex M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 16/18 ... 16/18
  • 17. . Wrapup • https://github.com/maratishe/stringex has JS client • I also have a PHP client for command line Stringex • stringex is better for browsing because items cluster naturally -- better than Lucene ◦ I use it for small browsable summaries of datasets ◦ ... and context-based browsable datasets • many other uses are possible M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 17/18 ... 17/18
  • 18. . That’s all, thank you ... M.Zhanikeev -- maratishe@gmail.com -- A New Practical Design for Browsable Over-the-Network Indexing -- http://bit.do/140428 -- 18/18 ... 18/18