Scanning the Internet for External Cloud Exposures via SSL Certs
Евгений Курпилянский "Индексирование поверх Cassandra". Выступление на Cassandra conf 2013
1. Indexing Cassandra data in SQL-storage
Indexing Cassandra data in SQL-storage
Kurpilyansky Eugene
SKB Kontur
December 9th, 2013
2. Indexing Cassandra data in SQL-storage
What do we want?
Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
3. Indexing Cassandra data in SQL-storage
What do we want?
Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
4. Indexing Cassandra data in SQL-storage
What do we want?
Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
5. Indexing Cassandra data in SQL-storage
What do we want?
Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
6. Indexing Cassandra data in SQL-storage
What do we want?
Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
7. Indexing Cassandra data in SQL-storage
Using native Cassandra indexes
We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
8. Indexing Cassandra data in SQL-storage
Using native Cassandra indexes
We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
9. Indexing Cassandra data in SQL-storage
Using native Cassandra indexes
We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
10. Indexing Cassandra data in SQL-storage
Using native Cassandra indexes
We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
11. Indexing Cassandra data in SQL-storage
Using native Cassandra indexes
We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
12. Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Main idea
Main idea
Run IndexService application which is synchronizing data in
SQL-storage with data in Cassandra (constantly,
in background thread).
To perform a search we should make a query to IndexService
which will return the search result after nishing SQL-storage
synchronization process.
13. Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Main idea
Main idea
Run IndexService application which is synchronizing data in
SQL-storage with data in Cassandra (constantly,
in background thread).
To perform a search we should make a query to IndexService
which will return the search result after nishing SQL-storage
synchronization process.
14. Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Implementation of EventLog
Create event log
One event per one write-request or delete-request.
Event log sorted by time of event.
15. Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Implementation of EventLog
Create event log
One event per one write-request or delete-request.
Event log sorted by time of event.
16. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog
Event
string EventId;
long Timestamp;
string ObjectId;
interface IEventLog
void AddEvent(Event event);
IEnumerableEvent GetEvents(long fromTicks);
New implementation of IObjectStorage
Before writing or deleting objects call method
IEventLog.AddEvent.
17. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog
Event
string EventId;
long Timestamp;
string ObjectId;
interface IEventLog
void AddEvent(Event event);
IEnumerableEvent GetEvents(long fromTicks);
New implementation of IObjectStorage
Before writing or deleting objects call method
IEventLog.AddEvent.
18. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog
Event
string EventId;
long Timestamp;
string ObjectId;
interface IEventLog
void AddEvent(Event event);
IEnumerableEvent GetEvents(long fromTicks);
New implementation of IObjectStorage
Before writing or deleting objects call method
IEventLog.AddEvent.
19. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog
EventLog.AddEvent(Event event)
Create column:
ColumnName = event.Timestamp + ':' + event.EventId
ColumnValue = event
EventLog.GetEvents(long fromTicks)
Execute get_slice from exclusive column for one row.
20. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog
EventLog.AddEvent(Event event)
Create column:
ColumnName = event.Timestamp + ':' + event.EventId
ColumnValue = event
EventLog.GetEvents(long fromTicks)
Execute get_slice from exclusive column for one row.
We should split all event log into rows using
PartitionInterval to limit size of rows.
PartitionInterval is some constant period of time (e.g.
one hour, or six minutes).
21. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog
We should split all event log into rows using
PartitionInterval to limit size of rows.
PartitionInterval is some constant period of time (e.g.
one hour, or six minutes).
EventLog.AddEvent(Event event)
Create column:
RowKey = event.Timestamp / PartitionInterval.Ticks
ColumnName = event.Timestamp + ':' + event.EventId
ColumnValue = event
EventLog.GetEvents(long fromTicks)
Execute get_slice from exclusive column for one or
more rows.
22. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
23. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
24. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
25. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
26. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
27. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
28. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
29. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
30. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
31. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
ProcessEvents(Event[] events)
This function actualizes values of related objects in SQL-storage.
Remember, that we update object after creating an event.
So, we can not process some of events at the moment, because
correspoding object isn't updated yet.
32. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
ProcessEvents(Event[] events)
This function actualizes values of related objects in SQL-storage.
Remember, that we update object after creating an event.
So, we can not process some of events at the moment, because
correspoding object isn't updated yet.
33. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Event[] ProcessEvents(Event[] events)
This function actualizes values of related objects in SQL-storage
and returns events, which have not been processed.
How will this function be implemented?
For every event we should analyze corresponding objects from both
Cassandra and SQL-storage.
34. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Event[] ProcessEvents(Event[] events)
This function actualizes values of related objects in SQL-storage
and returns events, which have not been processed.
How will this function be implemented?
For every event we should analyze corresponding objects from both
Cassandra and SQL-storage.
35. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Event[] ProcessEvents(Event[] events)
This function actualizes values of related objects in SQL-storage
and returns events, which have not been processed.
How will this function be implemented?
For every event we should analyze corresponding objects from both
Cassandra and SQL-storage.
36. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
What should we do?
37. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
Write cassObj in SQL-storage and mark event as processed.
38. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
Write cassObj in SQL-storage and mark event as processed.
Example 2
event = {Timestamp: 2012}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
What should we do?
39. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
Write cassObj in SQL-storage and mark event as processed.
Example 2
event = {Timestamp: 2012}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
Timestamp of event is greater than timestamp of cassObj.
Probably, it needs to wait for updating of object.
40. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
Write cassObj in SQL-storage and mark event as processed.
Example 2
event = {Timestamp: 2012}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}
Timestamp of event is greater than timestamp of cassObj.
Probably, it needs to wait for updating of object.
Write cassObj in SQL-storage and mark event as unprocessed.
41. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing
What should we do?
42. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing
Probably, that event corresponds to the creation of object.
43. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing
Probably, that event corresponds to the creation of object.
Mark event as unprocessed.
44. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing
Probably, that event corresponds to the creation of object.
Mark event as unprocessed.
Example 4
event = {Timestamp: 2017}
cassObj is missing
sqlObj = {Timestamp: 2012, School: 'UFU'}
What should we do?
45. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing
Probably, that event corresponds to the creation of object.
Mark event as unprocessed.
Example 4
event = {Timestamp: 2017}
cassObj is missing
sqlObj = {Timestamp: 2012, School: 'UFU'}
Two cases are possible:
1 That event corresponds to the deletion of object.
2 That event corresponds to the creation of object. sqlObj is
not missing, because there were two operationsin a row: delete
and create.
46. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Example 4
event = {Timestamp: 2017}
cassObj is missing
sqlObj = {Timestamp: 2012, School: 'UFU'}
Two cases are possible:
1 That event corresponds to the deletion of object.
2 That event corresponds to the creation of object. sqlObj is
not missing, because there were two operationsin a row: delete
and create.
Delete sqlObj from SQL-storage and mark event as unprocessed.
47. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Event[] ProcessEvents(Event[] events)
Read objects, which occured in these events, from Cassandra and
SQL-storage (some of them can be missing).
For each (event, cassObj, sqlObj) do
If cassObj is not missing
cassObj in SQL-storage
event.Timestamp = cassObj.Timestamp
Save
If
then mark
else mark
event as processed;
event as unprocessed.
else (i.e. cassObj is missing)
sqlObj from SQL-storage
event as unprocessed.
Delete
Mark
if it's not missing.
Return events which has been marked as unprocessed.
48. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
49. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
50. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
51. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
52. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
53. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
54. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
55. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by
writeTimeout
=
attemptsCount · connectionTimeout.
We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
56. Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService
Executing search request
57. Indexing Cassandra data in SQL-storage
Advantages.
Scalability.
Availability.
Fault tolerance.
Sharding.
58. Indexing Cassandra data in SQL-storage
Advantages.
Scalability.
Availability.
Fault tolerance.
Sharding.
59. Indexing Cassandra data in SQL-storage
Questions
Thank you for your attention. Any questions?