9. Smarter Faster Cheaper CDR Processing
6 Billion CDRs per day, dedups over 15 days, processing latency from 12 hours to a few seconds
6 machines (using ½ processor capacity)
InfoSphere Streams
xDR Hub
Key Requirements:
Price/Performance and Scaling
10. Surveilance and Physical Security: TerraEchos (Business
Partner)
Use scenario
– State-of-the-art covert surveillance system
based on Streams platform
– Acoustic signals from buried fiber optic cables
are monitored, analyzed and reported in real time
for necessary action
– Currently designed to scale up to 1600 streams
of raw binary data
Requirement
– Real-time processing of multi-modal signals
(acoustics. video, etc)
– Easy to expand, dynamic
– 3.5M data elements per second
Winner 2010 IBM CTO Innovation Award
Streams key messages: Velocity – low latency – microsecond response times Variety – all kinds of data, all kinds of analytics Volume – millions of events per second Agility – respond quickly to changes; fast development of Streaming apps
That one sentence is the shortest statement I can think of that captures the essence of Streams Platform : Streams is not a solution or application, nor is it a limited-purpose tool. Instead, it is a platform. This means that it does not do anything useful when you first install it. It needs to be programmed before it can do anything. In that sense, it is much like a DBMS, or an operating system. It comes with the tools, language, and building blocks that let you build programs for it, and with a runtime environment that lets you run those programs. The bad news is that you have to program it (you can’t just pick a few options from a menu); the good news is that once you learn how to do that, it can do anything you can imagine (or, more accurately, anything you can find algorithms and code for). Real-time : The Streams programs you create do their processing and analysis in as close to real-time as it is possible to get on a standard IT platform. In this case, real-time means very low latency, where latency is the delay from the time a packet of data arrives to the time the result is available. A key factor here is that Streams does everything in memory; it has no concept of mass storage (disk). Analytics : Because Streams is fast, scalable, and programmable, the kinds of analysis you can apply ranges from the simple to the extremely sophisticated. You are not limited to simple averages or if-then-else rules. BIG data : Actually, make that infinite data. For purposes of program and algorithm design, streaming data has no beginning and no end and is therefore by definition infinite in volume. In practical terms, this means that Streams can process any kind of data feed, including those that would be much too slow or expensive to capture and store in their entirety. We’ve added Agility to Big Data’s V 3 : A few features of the programming language and runtime make it easy to break an application into modules and add new modules and modify existing ones without interrupting the overall solution.
The blue items are included in the Streams product (Mining in Microseconds, Statistics, Text Analysis) The red items have been built for Streams (Acoustic, Image & Video, Geospatial, Predictive, and Advanced Mathematics) and have been used in various projects and engagements, but they have not been made a part of the product. Contact development if you think any of them could help you in an opportunity. More later in this presentation on the toolkits available in the Streams product. A few notes: The “simple” in “Simple & Advanced Text” refers to basic functions for regular expression matching (and replacement) in string values; these are built into the Standard Toolkit, meaning they’re basically part of the language. “Advanced” refers to real text analysis using the same System T code used in BigInsights. This is new in Fix Pack 3 (November 2011). Statistics included with Streams range from simple aggregate functions (average, sum, etc.) to the more advanced metrics included in the Financial Services Toolkit. For true time series analysis, we have an advanced toolkit in development; this is covered under Advanced Mathematical Models in this slide.
Streams targets applications where processing goals are best achieved by on-the-fly mining and processing of streaming data. That is, where ingest volume and rate exceed storage capacity Requires data filtering, aggregating, and other processing to reduce the volume so it can be handled by a data storage and management system at the back end. Even if the goal is not analysis on the fly, Streams is needed simply to prepare the data for analysis later. Example: extremely noisy and voluminous data feeds in radio astronomy ingest rate plus processing requirements exceed ability of single application on a single processor to process the data InfoSphere Streams is specifically designed with parallelism and deployment across computing clusters in mind. result is needed in (near) real-time Latency refers to the delay, or elapsed time, between the appearance of a data item at the input and the availability of the result at the output Analysis of stored data is not a viable option when delays measured in milliseconds add up to lives or money lost. Data must be processed in memory, as it comes in, before it can be committed to persistent storage. [continued on next page]
The integration from the previous slide is depicted here again, in a slightly different and simpler way. The key point is that the Big Data approach, and in particular Streams/RTAP (Real-Time Analytic Processing), is (almost) always in addition to, not instead of, traditional warehousing and IT. All three approaches are not necessarily present in every project, but there are always bidirectional connections between them.
Key Points Representative and impressive use cases A way to augment and improve an existing use case and traditional technology - algorithmic trading, risk, fraud detection, security New opportunities enabled by the software - Galway Bay, Neonatal intensive care, Radio astronomy, traffic management Virtually every field needs this: Streams is suitable to build solutions for a variety of domains Stock market – Impact of weather on commodity pricing or stock prices – demo built to show impact of hurricane on oil assets in the Gulf of Mexico, Sample Equity and Options trading apps included with Streams Natural systems – A university with access to US Weather satellites and unmanned aerial vehicles wants to build a real-time wildfire analysis and monitoring system that can detect smoke from wildfires, and task satellites and UAV’s to monitor fires and direct firefighters. Several organizations are looking to better understand watersheds and oceans to predict weather patters and manage fishing stocks Transportation -- Real time fleet management is one application Another is integrating vehicle traffic information with subway, train and taxi information to improve transit operation. Pilot done for Stockholm with KTH University Manufacturing – IBM Burlington used Streams in a pilot to automate analysis of chip testing to improve yield Health & Life sciences – by correlating information across neonatal monitors, detection of systemic infection can be discovered 6 to 24 hours earlier than experienced nurses. Now being extended for brain trauma in Neurologic ICU Telephony – call detail record processing transform telephone records in ASN.1 format to standard ASCII and puts unto databases for billing. It also summarizes to a dashboard. Other apps set up a social network of who is talking to Whom, and who is likely to leave your company. Geomapping allows a telco to provide location based services. e-Science – Space Weather prediction can help alleviate damage from solar flares to satellite and electric grid. Detection of transient events and imaging remote universe from an array of radio telescopes – a worldwide $2 billion USD multiyear effort. Synchrotron research enables re-use of data across many research organizations. Fraud prevention – Detecting multi-party fraud and finding fraud faster can result in less fraud Law Enforcement – monitoring cameras to detect faces and then do facial recognition can be used to find criminals; combining many data types can improve analysis and provide better situational awareness Other – smart grid to talk to many meters, text analytics to understand content, Who’s talking to Whom – voice analysis to build up social network, Weather summarization to optimize commodities purchase and FGPA acceleration to improve number crunching (mathematical analysis) in Streams)
This is about all we can say about the original use case that led to the government-IBM partnership that produced Distillery / System S / Streams. Government continues to be one of two major industry focus areas, along with telco.
This is still very much a research project. ICU = Intensive Care Unit. In this case (neonatal), for premature babies. UOIT = University of Ontario Institute of Technology (Toronto) Important observations: The instrumentation is there: dozens of sensors for each infant. But the instruments work in isolation and only produce visual signals, auditory signals in case of anomalies, and sometimes a paper record on demand. Streams is a highly usable platform for finally capturing all that data and combining it in useful ways, in real time. Since this kind of combined analysis has not been done before, Streams first needs to enable the research effort of figuring out which patterns are relevant and predictive, before any of this can be standardized and put into an approved product and practice. Both of these factors are present in many other organizations and industries.
In collaboration with KTH university in Stockholm, we built a system to take GPS data from taxis and trucks and, in real time, derive information about traffic speed and density for each road segment, as well as travel time between specific locations. It turns out that processing GPS data is not just a matter of gathering up the reported locations. You have to filter out bad data and “snap” the location to the nearest road segment that goes in the right direction—because there is uncertainty in the location and vehicles are assumed to be on roads, not in the middle of a building or a field. This requires an interesting combination of keeping static map data (the road segments) in memory and dividing up the map into tiles to take advantage of multiple processing nodes. There are some performance numbers in the slide, but the important point is that we were able to use the Streams platform to build and incorporate fairly sophisticated spatial operators and achieve the desired performance by dividing up the area in to regions and distributing the load—an example of real-time map-reduce (the algorithmic principle, not the Hadoop operating system).
This slide is here as a placeholder: it is often fun and useful to show the recorded demo of this video processing app. This was developed by the researcher in the picture on Streams 1.2 using user-defined operators build around the OpenCV open-source image processing library. It is not part of any customer engagement (although the library is). The reason this is useful is that for those who are unfamiliar with the concept of streaming data, it is easy to grasp that a video is nothing but a stream of still images (we call it “streaming video” on the web, for good reason). So by showing how each image (video frame) is processed through a series of steps—from color-to-greyscale conversion through noise filtering and thresholding (to create a one-bit, black-and-white image) to the edge-detection step that creates the contours—we can get across Streams’s fundamental computing paradigm: that of discrete packets of data flowing from operator to operator in a graph consisting of such operators connected by streams. If you play the video (make it loop while you talk), notice that Streams does all that to each frame, 24 or 30 of them per second, and when the processed and original frame come back together, no delay is noticeable. You can get it here: http://cattail.boulder.ibm.com/cattail/#view=gilesjam@us.ibm.com/files/842A08609BCA3DDA8AB3367D093F23B6
This slide and the following one are old but serve their purpose well. As with the video processing demo, the aim is for people to get familiar with the concept of streaming data processing as the flow of discrete packets of data from one operator to the next in a flow graph; each operator does something with the data that comes in and produces new data as a result. Seeing those little blocks move from bubble to bubble in the animated slide, relentlessly and never ending, conveys the point rather well, so let the audience stare at this for a while, especially once you reveal what goes on behind the grey panel. At this point, however, all you’re showing is that a Streams application (here shown as a black box—or rather, a grey one) continuously ingests data from multiple sources (often many different types of data arriving at very different rates), and continuously sends analysis results and processed data to disk files, databases, real-time displays, or any other computational or IT processes (depicted as the proverbial cloud).
Click-by-click (3 and 4 are timed—no click needed): The overall application is composed of many individual operators (or, maybe, Processing Elements consisting of one or more operators each); those are the bubbles in this graph. The data packets (sometimes called messages; we call them tuples) generated by one bubble travel to the next downstream bubble, and so on. Sometimes the output from one bubble becomes input to multiple bubbles; conversely, sometimes multiple output streams merge as input to one bubble. But the tuples keep flowing. Having all these separate processes working together gives us flexibility and scalability. If we need more power than a single computer can provide, we can simply distribute the bubbles over multiple boxes (hosts). Some (compute-intensive) bubbles get their very own box; other bubbles want to be in the same box, because they must exchange a lot of data and going across the network would slow that down. The Streams runtime services can help distribute the load. In the picture, the streams flowing within a host (interprocess communication) turn read, indicating that they are faster than the blue ones that go across the network between hosts. Finally, we label some of the boxes with some of the processing stages typically found in many Streams solutions: Filter and sample to reduce the amount of data flowing through the system; transform, changing the data format, computing new quantities, and dropping unneeded attributes; correlate to join or otherwise combine data from multiple sources; classify based on models generated from historical or training data; annotate to enrich the data by looking up static information based on keys present in the data.
The programming model is to define a data flow graph that defines the connections among data sources (inputs), operators, and data sinks (outputs). Operators, and the streams that connect them, are the building blocks of the logical program. The deployment model is to group, or “fuse”, operators into units called Processing Elements , or PE s. PEs are separate executables that form the building blocks of the distributed application. A PE contains one or more operators A program consisting of many operators may be fused into a single PE; at the other extreme, each operator may run in its own PE Operators fused into a single PE are tightly coupled; data transfer is local and fast Communication between PEs uses the network protocol and is therefore slower; but these more loosely coupled components can be flexibly distributed over available processing nodes and are more easily reusable At the physical level, a Streams job (a running application) consists of multiple intercommunicating PEs. This lets you place resource-hungry analytics on appropriately sized nodes Reuse of generic components is a side-benefit, but choosing the optimum level of component granularity and performance is currently more art than science Streams provides the infrastructure to support the decomposition of applications into scalable chunks (PEs), and the deployment and operation of these PEs across stream-connected processing nodes. Currently, the product only supports nodes with x86 architecture, running a Linux (Red Hat) operating system.
A stream processing application is modeled as a graph (also known as a network): a set of vertices (or nodes ) connected by edges . In Streams, the vertices are called operators and the edges are called, well, streams . In mathematics, there are different types of graph; ours is directed , which means that the edges have a direction from one vertex to another but not back; they are arrows. Also, our graph may be cyclic , which means that loops are allowed. There are restrictions on this (only certain types of backward connections are allowed) but this is good enough for now. The term job simply refers to a running application. It’s looking at it from the runtime execution perspective rather than the programming perspective.
Another view of the same thing, to get a clear model in everyone’s mind. The Streams Instance is the runtime environment that makes it possible to run Streams applications: analogous to an application server or a database server. The instance is a collection of Linux components and services, managing one or more hosts for running applications. We’ve already seen the Processing Element as well: the fundamental unit of execution, akin to a shared library. Each PE encapsulates one or more operators; at deployment time, each PE is loaded by its own executable program, the PE Container . When a Streams application runs in the Instance, it is called a Job . The job, in turn, manifests itself as a collection of one or more PEs. When you monitor an instance (using one of a variety of tools, including a command-line program, a browser-based management console, and an interactive view of the flow graph in Streams Studio), you can get information about jobs and PEs, submit and cancel jobs, terminate and restart PEs, view PE-specific logs, and so on.
There is a separate deck with competitive information about Complex Event Processing and vendors in that space. This is just one summary slide. CEP as a product category has been known in the industry a bit longer than stream processing. To this day, there are no obvious competitors that can approach Streams on the true stream-processing end, so we look here at CEP. Both use very similar terms to describe what they do. “ Real-time” or near-real-time refers to latency but also throughput, and to the in-memory nature of processing. The question is, how quick do you have to be to call yourself “real time”? Likewise, how low is “ultra-low”? CEP literature also talks about stream processing, but they call it “event stream processing”. In practice this can cover whatever you want, but in concept it is different from processing a stream of low-level, discrete data packets, as opposed to higher-level, “complex” events made up of a number of business process steps. How we look at the data coming in is fundamentally different (even if it’s the same data). Our streams are continuous flows of discrete data packets; theirs are separate, individual events. That colors how we then deal with the data in our environment. (Note that “continuous” here is not used in the mathematical sense of a function without steps, holes, or spikes, but in the everyday-language sense of “always flowing, never ceasing”. What arrives on these flows are discrete packets of data, which suggests the opposite of a different meaning of continuous.) The other distinctions made here are of degree in the areas of complexity of analysis, performance, and data variety.
There is a lot of new stuff in Streams 3.0. Aside from numerous improvements under the covers, the ones that stand out are Drag & drop editing: this opens up a whole new way of introducing Streams and giving prospective customers and partners a hands-on experience without getting lost in the details of the SPL syntax. Data visualization in the Streams Console: there are few easy-to-use tools that can graphically display dynamically changing data, and while this capability does not have the sophistication of Cognos RTM and other dashboarding tools, this is perfect for development, demos, and even POCs. Several new analytics and integration toolkits that greatly enhance the spectrum of problem domains and industry areas that can be served.
Streams Processing Language is unique to IBM InfoSphere Streams. This is what gives Streams the power and flexibility that other systems lack. Of course, the downside is that there is yet another language to learn. But in practice, this is not the burden it may seem, because the greatest step in the learning curve is getting comfortable with a new computing paradigm, the distributed nature of typical deployments, and so on. Once you’ve taken that step, the specific syntax of the language is a minor addition—especially since the IDE helps complete your stream and operator declarations with all the brackets and braces in the right place. Type-generic operators : The built-in operators work on any combination of the data types supported by Streams Declarative operators : To configure the operators, you declare them, give them names, define input and output stream(s), and assign parameters User-defined operators : These can take different forms, including one that makes them behave exactly like all the built-in operators (type-generic) Procedural constructs : Additional language elements that control repetition, loops, and other aspects of the streams connecting the operators The following slides cover all of this in greater detail.
Reference slide. The Relational operators Aggregate, Sort, and Join are named after analogous operations in relational databases, but in stream processing there is a twist: they can only operate on windows of tuples, never the “entire” stream. Functor is a general-purpose attribute transformation and filtering operator with many of the capabilities of the SQL SELECT statement. Punctor is unique to Streams: it exists solely to add window markers to the stream, thus separating groups of tuples into disjoint windows that can be used by downstream operators for their window-based operations. Note that Import and Export are not “real” operators in that they don’t generate any code. Instead, they add metadata for the Streams Application Manager service to use in making stream connections between applications. For convenience, they are made to look syntactically like regular operators. And since they’re not about communicating with the outside world, they’re not really “adapters” either.
Reference slide. Note that Import and Export are not “real” operators in that they don’t generate any code. Instead, they add metadata for the Streams Application Manager service to use in making stream connections between applications. For convenience, they are made to look syntactically like regular operators. And since they’re not about communicating with the outside world, they’re not really “adapters” either.
Reference slide. Many of these operators only make sense in a streaming-data system. The Custom operator is like a blank slate. It can perform any kind of custom logic in response to the arrival of a tuple on one of its input ports, including writing to the console, sending out tuples and punctuation markers, and calling system functions, by using the simple but powerful SPL expression language. The Gate operator can help even out tuple flow and avoid overwhelming a downstream operator by only sending tuples on after receiving acknowledgment of previous tuples. The Switch operator can temporarily block a stream altogether, for example while a downstream operator is initializing or refreshing lookup information. Operators that are new in 3.0 are highlighted in blue .
Reference slide. Some parallel-flow operators are for performance optimization and load balancing (e.g., ThreadedSplit); others support High Availability scenarios (DeDuplicate). Yet others support parallel processing by applying independent algorithmic steps in parallel to the same data (Barrier, Union). DynamicFilter is a very powerful operator for maintaining a list of filter criteria that may change over time, such as key words for a simple text-based filter. Parse, Format, Compress, and Decompress expose some of the capabilities that are built in to the standard adapters for files and TCP and UDP sockets; this allows you to apply the same features to data coming from or going to other types of connectors. Operators that are new in 3.0 are highlighted in blue .
IBM IOD 2011 10/18/12 Prensenter name here.ppt 10/18/12 09:05 XML processing is still somewhat complicated, but it is now possible to read and write XML data and do something intelligent with it using built-in capabilities. The standard toolkit adapters can use the txt format parameter to read and write XML literals (double-quoted strings containing complete documents, including line breaks). They can also be read as strings (format: line) or blobs (format: block) and later converted to tuples or the xml data type. The xml data type validates syntax when a value is assigned; if a schema is specified ( xml<" Schema-URI "> , it also checks validity against that schema. XML data can be parsed and turned into structured-tuple data by the XMLParse operator, supported by XPath expressions. Or it can be left as xml-type values and used on the fly in XQuery functions.
Streams is extensible at heart; extensibility is not an afterthought. New operators are developed all the time, by IBM itself as well as by users. With call-level APIs and code-generation facilities available to all users, user-developed toolkits have all the same power and access to system capabilities as IBM-developed ones. The only limitation is that it is not possible to create new “built-in” types. For example, the xml data type introduced in 3.0 had to be added to the system by IBM; it could not have been added by a third party.
InetSource is good for REST interfaces (HTTP GET and POST requests) as well as the other types listed. For files, the difference with FileSource is that it periodically reads, either just the new data since the last read, or the entire file again. More operators will probably be added to the Internet toolkit in future, such as for using javascript-based data visualization, http sources that require authentication and other capabilities not currently supported, etc. Check the Streams Exchange on developerWorks for an early view of such operators. Regarding the supported databases in the ODBC* operators, there are some nuances. Any database that has a driver for UnixODBC (2.3 or later) can be supported. For the specific databases listed, there are restrictions: on RHEL 6, Oracle is not supported, and on IBM Power7 systems running RHEL 6, solidDB, Netezza, and Oracle are not supported. SolidDBEnrich is there in addition to ODBCEnrich, because it is much more efficient to use the proprietary SolidDB API than to go through ODBC and full SQL. There are restrictions, but for stream enrichment you usually don’t need all the flexibility of SQL. Additional target-specific operators have been and will continue to be added to this toolkit; operators for high-speed writing to partitioned DB2 databases and to Netezza already exist.
IBM InfoSphere Data Explorer can index and consume results from Streams. The DataExplorerPush operator uses the BigSearch API. The Hadoop Distributed File System requires a separate set of source and sink adapters, because it is a proprietary file system that is not POSIX-compatible. The HDFS* operators all have counterparts of the same name in the Standard Toolkit, and operate in similar ways. HDFSSplit is there to improve throughput by allowing you to parallelize the HDFS I/O operations: in iterative fashion, send a specific amount of data to each of several HDFSSink operators, followed by a window punctuation marker to indicate that it is time to write that batch. This fits the parallel and batch-oriented nature of Hadoop. Note that there is no need for a similar set of adapters for BigInsights when configured with GPFS, because GPFS is POSIX-compliant and the standard-toolkit adapters work just fine. The HDFS* operators support Hadoop versions 0.20.2 and 1.0.0 (on Power, 0.20 only). They use both Hadoop and JNI (Java Native Interface).
The IBM InfoSphere DataStage Integration Toolkit is supported by DataStage version 9.1. Streams can enhance a DataStage ETL job by performing deeper real-time analytic processing (RTAP) on the data as it is being loaded. In a slightly different scenario, DataStage can perform additional enrichment and transformation on the output of a Streams RTAP application, before storing the end results in a warehouse.
NOTE: This toolkit is not supported on POWER7 systems. The XMSSource and XMSSink operators use the standard XMS architecture and APIs (CreateProducer, CreateConsumer, CreateConnection).
The Text Toolkit does not require Hadoop or BigInsights, but in Text Analytics it is almost inevitable that the analytical model, as expressed in the AQL program, is developed and tested against a repository of text documents, usually in something like BigInsights. An example of text analytics is sentiment analysis based on a Twitter feed (the documents are tweets, in that case).
IBM IOD 2011 10/18/12 Prensenter name here.ppt 10/18/12 09:05 The pattern parameter contains a string with a regular expression. The regular expression works on the granularity of input tuples. For example, rise+ matches one or more tuples that satisfy the rise predicate. The partitionBy parameter specifies one or more attributes of the input tuple as a key, and the pattern applies to tuples for each key separately and independently. The predicates parameter defines the symbols used in the regular expression—in this example, rise, drop, and deep. Each predicate is just a boolean expression written in SPL. Finally, the output clause assigns attributes of the output tuple when a match is detected. Both the predicates and the output assignments use aggregations (First, Last, Count, Max). Aggregations are computed from all the tuples in the match so far.
Note that this is a toolkit that provides no operators; only types and functions. Geospatial and spatiotemporal (including the time dimension) computing form a rich domain that requires specific expertise, such as knowledge of cartography (maps and map projections), geospatial geometry (shapes and locations and their representations), set theory (spatiotemporal relationships), interoperability standards and conventions, as well as the specifics of the application (location intelligence and location-based services for businesses, security and surveillance, geographic information systems, traffic patterns and route finding, etc.). This toolkit provides the beginnings of a set of capabilities that lets Streams provide the real-time component for emerging location-based applications.
A time series is a sequence of numerical data that represents the value of an object or multiple objects over time. For example, you can have time series that contain: the monthly electricity usage that is collected from smart meters; daily stock prices and trading volumes; ECG recordings; seismographs; or network performance records. Time series have a temporal order and this temporal order is the foundation of all time series analysis algorithms. A time series can be regular (sampled at regular intervals) or irregular (sampled at arbitrary times, for example when triggered by events). In addition, a time series can be univariate (tracking a single, scalar variable) or vector (containing a collection of scalar variables); it can even be expanding , if the number of variables increases over time. This toolkit contains 25 operators; too many to list them all here.
The Streams Data Mining Toolkit adds several operators that can analyze and score streaming data according to PMML-standard models developed separately—for example, with SPSS—based on historical data. The PMML support and scoring code is ported directly from the IBM InfoSphere Warehouse, ensuring consistency of results. Currently, the Streams Mining operators have no built-in provision for learning and letting the model evolve; they simply apply the model to each incoming tuple in turn, with no reference to any data that passed through before. They can, however, respond to model changes. As the warehouse-based analysis results in an updated model, simply overwrite the model file; the operator detects the update, reads in the new model, and applies it from then on to newly arriving data.
The IBM SPSS Analytics Toolkit for InfoSphere Streams enables the use of SPSS Modeler predictive analytics and SPSS Collaboration and Deployment Services model management features in Streams applications. The Analytics Toolkit consists of three Streams operators to implement Modeler analytics in Streams applications The SPSS Scoring operator enables the scoring against predictive models designed in SPSS Modeler in the flow of Streams applications. It works by passing an in-memory request to SPSS Modeler Solution Publisher which conducts scoring via a Modeler scoring stream and returns results in-memory back to the SPSS Scoring operator. The SPSS Repository operator, meanwhile, enables the use of SPSS Collaboration and Deployment Services model management functionality. Collaboration and Deployment Services is used to manage the lifecycle of SPSS predictive analytics assets. It provides automated evaluation of model effectiveness and automated refreshing of models when warranted. The SPSS Repository operator monitors the Collaboration and Deployment Services repository. When it receives notification that a scoring model is refreshed, it retrieves the updated model from the repository, prepares it and passes it to the SPSS Publish operator. The SPSS Publish operator in turn generates the executable images required to update the scoring model used in the Streams application. Upon publication of the new images, the new scoring model is automatically swapped into place without disruption to the processing underway in the Streams application.
The Financial Information eXchange (FIX) protocol is a messaging standard to support real-time electronic exchange of security (option and stock) transactions. The Fix protocol is documented at http://www.fixprotocol.org/ WebSphere Front Office for Financial Markets contains feed handlers for a large number of financial-market feeds. For a detailed description of the WebSphere Front Office suite, how to configure and use feed handlers and clients, refer to the WFO documentation at http://publib.boulder.ibm.com/infocenter/imds/v3r0/index.jsp. IBM WebSphere MQ Low Latency Messaging is a high-throughput, low-latency messaging transport. It is optimized for the very high-volume, low-latency requirements typical of financial markets firms and other industries where speed of data delivery is paramount. In addition to the mentioned adapters, there are some simulation operators, which generate simulated feeds—continuous streams of cyclically repeating data: EquityMarketTradefeed, EquityMarketQuoteFeed, and OrderExecutor. These operators support the included sample applications and are not intended for production use.
URLs: Passport Advantage : http://www.ibm.com/software/howtobuy/passportadvantage/index.html PartnerWorld : https://www.ibm.com/partnerworld/mem/valuepack/ Academic Initiative : https://www.ibm.com/developerworks/university/software/get_software.html XL (internal) Software Downloads : http://w3.ibm.com/software/xl/download/ticket.do Trials and Demos : http://www14.software.ibm.com/webapp/download/search.jsp?pn=InfoSphere+Streams NOTE: some of these links don’t open properly from PowerPoint Copy and paste into your browser to get around such problems. Passport Advantage notes: Passport Advantage covers nearly all IBM Software from Software Group (SWG), including WebSphere, IM, Lotus, Tivoli and Rational The more you buy (including maintenance), the greater the discounts The discounts apply to all the software you buy (including maintenance) Maintenance, which includes telephone support for usage and defects is included in new license prices Maintenance is 20% of the new license price
Everyone should have been introduced to the Information Management Acceleration Zone (IMAZ). Streams is under Big Data and Warehousing The Sales Kit is a bit stale, so favor the IMAZ. URLs: IM Acceleration Zone : https://w3-connections.ibm.com/wikis/home?lang=en_US#/wiki/Info%20Mgmt%20Client%20Technical%20Professional%20Resources%20Wiki/page/Streams Media Library : http://w3.tap.ibm.com/medialibrary/search?displayMediaSet=0&qt=infosphere%20streams&narrow_filter=all&m_oB=3&m_oD=1 Software Sellers Workplace : http://w3-103.ibm.com/software/xl/portal/content?synKey=Q836166N45949G23#overview PartnerWorld : https://www.ibm.com/partnerworld/wps/servlet/mem/ContentHandler/Q836166N45949G23/lc=en_US developerWorks : http://www.ibm.com/developerworks/wikis/display/streams/Home The Streams Exchange is a place for Streams application developers to share ideas, source code, design patterns, experience reports, rules of thumb, best practices, etc.