A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv) @PAPIs Connect — São Paulo 2017

•

6 gefällt mir•940 views

News recommendations are particularly challenging given the high number of new contents produced every day and the fast deterioration of its value for the users, demanding models and infrastructure able to deal with those nuances and serve a newly trained model about 100 times per day. Attending this presentation you're going to follow a detailed overview of how R&D team of Hearst's TV division is putting together Google BigQuery, Kubernetes cluster and Tensorflow to build a hybrid recommendation system combining model-based matrix factorization, content recency, and content semantics through NLP.

Technologie

A Tensorﬂow
Recommending System
for News
Fabricio Vargas Matos

Manhattan, NYTV Stations
Local and National News

Article’s page: recommendations
for continuous scroll section
Recommended articles

Agenda
1.Recency and cold-start problem
2.Data acquisition
3.Matrix factorization
4.Tensorﬂow implementation
5.Hybrid Model: NLP and feature engineering
6.Hybrid Model: Hybrid matrix factorization
7.Conclusions

Cold-start problem
Existent
Items
New
Items
Existent Users New Users

Cold-start solution
Existent
Items
New
Items
Existent Users New Users
Not personalized!
Curated by Editors
+
Highly viewed

Cold-start solution
Existent
Items
New
Items
Existent Users New Users
Not personalized!
Curated by Editors
+
Highly viewed
Hybrid
Matrix
Factorization

Data Acquisition
Page views with
user’s time on page
Google Analytics Google BigQuery CMS
Content corpus: title,
body, timestamp,
meta-data (sections,
tags, etc.)
Contents
TFRecord/CSV ﬁles

"Users x Items" Sparsity
Dataset Sparsity
MovieLens (movies) 98.61%
Netﬂix (movies) 98.82%
TV Stations (news) 99.94%
Yahoo! KDD (music) 99.96%

VU
Latent Factors Model
R
Items
Users
≈
Latent
factors
Latent
factors
Items
xuserbias
item bias
i
j
i
j
R[i,j] ≈ U[i] x V[j]

Initial Results
• Training time ≈ 15min (Kubernetes cluster)
• TimeOnPage Prediction Error (RMSE) ≈ 125 sec
• Qualitative recommendation tests with chosen
‘personas’ revealed poor personalization

Natural Language
Processing
Concatenate content data
(title, body, sections, tags, …)
Remove stop words, symbols
and HTML tags
Train word2vec Neural Network
Combine all word-vectors of
each article into one (doc2vec)
CMS
articles
doc2vec
contents

Entertainment
National News
Health
Sports
Local News

Features Engineering
NLP (doc2vec)
items clustering (k-means)
embed items:
similarity to each cluster centroid
embed users:
viewed contents combined
CMS
articles
k-dimension
items/users
embeddings
Google
Cloud
Storage

Items Parallel coordinates: 40 features/clusters

Who are they?
Magenta contents (health) with high
values for feature #1 (economy)?

Content/User Embeddings
+
Matrix Factorization

VU
Matrix Factorization
R
Items
Users
≈
Latent
factors
Latent
factors
Items
xuserbias
item bias
i
j
i
j
R[i,j] ≈ U[i] x V[j]

Hybrid Matrix Factorization
• R ≈ U* x V*
where:
• U* = UUsersxKClusters x AKClustersxLatent_factors
• V* = BLatent_factorsxKClusters x VKClustersxItems
*Only A and B are variables to be trained. U and V are constants.

Results
• Training time ≈ 20min (Kubernetes cluster)
• TimeOnPage Prediction Error (RMSE) ≈ 100 sec
(20% better)
• Qualitative recommendation tests with chosen
‘personas’ revealed very good personalization
• R&D Project - Not yet publicly available

Let’s talk online
fabriciovargasmatos@
Fabricio Vargas Matos

Weitere ähnliche Inhalte

Was ist angesagt?

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

Ian Foster

Presented at ACS Boston 2015 at a Session on the growing impact of Open Science chaired by Andy Lang and Tony Williams dedicated to the work, memory and legacy of JC Bradley and the work we carry forward! One important goal of OpenTox is to support the development of an Open Standards-based predictive toxicology framework that provides a unified access to toxicological data and models. OpenTox supports the development of tools for the integration of data, for the generation and validation of in silico models for toxic effects, libraries for the development and integration of modelling algorithms, and scientifically sound validation and reporting routines. The OpenTox Application Programming Interface (API) is an important open standards development for software development purposes. It provides a specification against which development of global interoperable toxicology resources by the broader community can be carried out. The use of OpenTox API-compliant web services to communicate instructions between linked resources with URI addresses supports the use of a wide variety of commands to carry out operations such as data integration, algorithm use, model building and validation. The OpenTox Framework currently includes, with its APIs, services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, reporting, investigations, studies, assays, and authentication and authorisation, which may be combined into multiple applications satisfying a variety of different user needs. As OpenTox creates a semantic web for toxicology, it should be an ideal framework for incorporating toxicology data, ontology and modelling developments, thus supporting both a mechanistic framework for toxicology and best practices in statistical analysis and computational modelling. In this presentation I will review the recent OpenTox-based development of applications including the ToxBank data infrastructure supporting integrated analysis across biochemical, functional and omics datasets supporting the safety assessment goals of the SEURAT-1 program which aims to develop alternatives to animal testing. Finally, I will provide an overview of the working group activities of the newly formed OpenTox Association which aim to progress the development of open source, data, standards and tools in this area.

OpenTox - an open community and framework supporting predictive toxicology an...

Barry Hardy

What to do About FAIR… In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns. · Why making data Findable, Actionable, Interoperable and Reusable is important. Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Tom Plasterer

Open science, open-source, and open data: Collaboration as an emergent property?

Hilmar Lapp

What is a Data Commons and Why Should You Care?

Robert Grossman

Elephant in the Room: Scaling Storage for the HathiTrust Research Center

Robert H. McDonald

Access methods for analysing sensitive data (amased)

Jisc

Metadata Quality Assurance

Péter Király

BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.

Dataset Catalogs as a Foundation for FAIR* Data

Tom Plasterer

NeXO Web Poster for ISMB 2014 BioVis SIG

Keiichiro Ono

FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.

FAIR Data Knowledge Graphs

Tom Plasterer

Open Science Globally: Some Developments/Dr Simon Hodson

African Open Science Platform

With online publication and social media taking the main role in dissemination of news, and with the decline of traditional printed media, it has become necessary to devise ways to automatically extract meaningful information from the plethora of sources available and to make that information readily available to interested parties. In this paper we present a method of automated analysis of the underlying structure of online newspapers based on Q-analysis and modularity. We show how the combination of the two strategies allows for the identification of well defined news clusters that are free of noise (unrelated stories) and provide automated clustering of information on trending topics on news published online.

Identifying news clusters using Q-analysis and Modularity

David Sousa-Rodrigues

Business Rule Learning with Interactive Selection of Association Rules - Rule...

Stanislav Vojíř

SemanticWebApp

Adela Beres

Was ist angesagt? (15)

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

OpenTox - an open community and framework supporting predictive toxicology an...

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Open science, open-source, and open data: Collaboration as an emergent property?

What is a Data Commons and Why Should You Care?

Elephant in the Room: Scaling Storage for the HathiTrust Research Center

Access methods for analysing sensitive data (amased)

Metadata Quality Assurance

Dataset Catalogs as a Foundation for FAIR* Data

NeXO Web Poster for ISMB 2014 BioVis SIG

FAIR Data Knowledge Graphs

Open Science Globally: Some Developments/Dr Simon Hodson

Identifying news clusters using Q-analysis and Modularity

Business Rule Learning with Interactive Selection of Association Rules - Rule...

SemanticWebApp

Ähnlich wie A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv) @PAPIs Connect — São Paulo 2017

Jon Bratseth (VP Architect) @ Verizon Media: The big data world has mature technologies for offline analysis and learning from data, but have lacked options for making data-driven decisions in real time. When it is sufficient to consider a single data point model servers such as TensorFlow serving can be used but in many cases you want to consider many data points to make decisions. This is a difficult engineering problem combining state, distributed algorithms and low latency, but solving it often makes it possible to create far superior solutions when applying machine learning. This talk will explain why this is a hard problem, show the advantages of solving it, and introduce the open source Vespa.ai platform which is used to implement such solutions in some of the largest scale problems in the world including the world's third largest ad serving system.

Big data serving: Processing and inference at scale in real time

Itai Yaffe

A general overview of the APACHE SAMOA platform for mining big data streams using machine learning algorithms running on distributed stream processing platforms such as Apache STORM, Apache Flink, Apache Samza and Apache Apex. Results are shown from experimentation with VHT, the Vertical Hoeffding Tree proposed in "VHT: Vertical Hoeffding Tree." N. Kourtellis, G. De Francisci Morales, A. Bifet, A. Mordupo. IEEE BigData 2016. Presentation in APACHE BIG DATA Europe 2015

SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)

Nicolas Kourtellis

Sistemas de Recomendação sem Enrolação

Gabriel Moreira

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato

Esteban Donato

The global data sphere, consisting of machine data and human data, is growing exponentially reaching the order of zettabytes. In comparison, the processing power of computers has been stagnating for many years. Artificial Intelligence – a newer variant of Machine Learning – bypasses the need to understand a system when modelling it; however, this convenience comes with extremely high energy consumption. The complexity of language makes statistical Natural Language Understanding (NLU) models particularly energy hungry. Since most of the zettabyte data sphere consists of human data, such as texts or social networks, we face four major obstacles: 1. Findability of Information – when truth is hard to find, fake news rule 2. Von Neumann Gap – when processors cannot process faster, then we need more of them (energy) 3. Stuck in the Average – when statistical models generate a bias toward the majority, innovation has a hard time 4. Privacy – if user profiles are created “passively” on the server side instead of “actively” on the client side, we lose control The current approach to overcoming these limitations is to use larger and larger data sets on more and more processing nodes for training. AI algorithms should be optimized for efficiency rather than precision. In this case, statistical modelling should be disqualified as a brute force approach for language applications. When replacing statistical modelling and arithmetic, set theory and geometry seem to be a much better choice as it allows the direct processing of words instead of their occurrence counts, which is exactly what the human brain does with language – using only 7 Watts!

AI-SDV 2021: Francisco Webber - Efficiency is the New Precision

Dr. Haxel Consult

Offline and stream processing of big data sets can be done with tools such as Hadoop, Spark, and Storm, but what if you need to process big data at the time a user is making a request? Vespa (http://www.vespa.ai) allows you to search, organize and evaluate machine-learned models from e.g TensorFlow over large, evolving data sets with latencies in the tens of milliseconds. Vespa is behind the recommendation, ad targeting, and search at Yahoo where it handles billions of daily queries over billions of documents.

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath

Yahoo Developer Network

Filtering From the Firehose: Real Time Social Media Streaming

Cloud Elements

Basic Sentiment Analysis using Hive

Qubole

eScience: A Transformed Scientific Method

Duncan Hull

Offline and stream processing of big data sets can be done with tools such as Hadoop, Spark, and Storm, but what if you need to process big data at the time a user is making a request? This presentation introduces Vespa (http://vespa.ai) – the open source big data serving engine. Vespa allows you to search, organize, and evaluate machine-learned models from e.g TensorFlow over large, evolving data sets with latencies in the tens of milliseconds. Vespa is behind the recommendation, ad targeting, and search at Yahoo where it handles billions of daily queries over billions of documents and was recently open sourced at http://vespa.ai.

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...

Yahoo Developer Network

tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...

David Peyruc

Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...

AIST

Knowledge graphs and graph-based data in general are becoming increasingly important for addressing various data management challenges in industries such as financial services, life sciences, healthcare or energy. At the core of this challenge is the comprehensive management of graph-based data, ranging from taxonomy to ontology management to the administration of comprehensive data graphs along with a defined governance framework. Various data sources are integrated and linked (semi) automatically using NLP and machine learning algorithms. Tools for securing high data quality and consistency are an integral part of such a platform. PoolParty 7.0 can now handle a full range of enterprise data management tasks. Based on agile data integration, machine learning and text mining, or ontology-based data analysis, applications are developed that allow knowledge workers, marketers, analysts or researchers a comprehensive and in-depth view of previously unlinked data assets. At the heart of the new release is the PoolParty GraphEditor, which complements the Taxonomy, Thesaurus, and Ontology Manager components that have been around for some time. All in all, data engineers and subject matter experts can now administrate and analyze enterprise-wide and heterogeneous data stocks with comfortable means, or link them with the help of artificial intelligence.

Leveraging Knowledge Graphs in your Enterprise Knowledge Management System

Semantic Web Company

Final Next Generation Content Management

Scott Abel

Microsoft Dryad

Colin Clark

Ontopia / Liferay integration

Matthias Fischer

For this upcoming meetup Juan Valencia, Principal Engineer at ShareThis, will be presenting on their real-world use of Apache Cassandra for high throughput and mission critical applications. This meetup will cover how to set up your projects successfully by having a good data model, running Cassandra, and using the Hector Java client. We will have a Q&A session at the end of Juan's presentation, to ensure everyone's questions are answered. Hope you can make it! What You Will Learn at this Meetup: • Real-World Use Case on ShareThis + Apache Cassandra • Data Modeling with Apache Cassandra • Using the Java Hector Client Library with Cassandra Abstract Juan Valencia, Principal Engineer at ShareThis, will be presenting on the use of Cassandra for high throughput applications. ShareThis has been running on Cassandra since version 0.6 and currently runs 4 Cassandra clusters, powering batch analytics, real-time analytics, a counter service, and a data lookup service.

Real-World Cassandra at ShareThis

Juan Valencia

Learn Big Data & Hadoop

Edureka!

Data Lake, Business Intelligence, Enterprise Data Warehouse, Big Data Pipeline, Online Machine Learning, Lambda Architecture, Streaming, Spark, Kafka, Storm, Flink, Hadoop, Mesos and SMACK stack are some of the things you hear about when you want to dive into building a data pipeline. The Big Data Landscape cannot fit on a single screen as seen in the presentation. This is in addition to all the Big Data & Machine Learning offerings AWS has been introducing over the past few years which address many pain points highlighted by the various communities and help you get up and running faster. The objective of this talk is to provide the audience with a framework which helps them define their pipeline problems, isolate components and pick the right tools for the right job. We will talk about: 1. A consistent definition of BIG in big data 2. The lineage of fundamental tools in the ecosystem 3. First principles of a big data pipeline based on the lambda (not lambda functions) and kappa architectures 4. Distinguishing between big data and online machine learning pipelines 5. Technology choices based on first principles, open source solutions and AWS offerings 6. Demo: Serverless, Managed Big Data Pipeline and real-time dashboard on AWS (orchestrated via Terraform) Presented at: https://www.meetup.com/Vancouver-Amazon-Web-Services-User-Group/events/245946651/

Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes

Osama Khan

Defensa.V11

promanas

Ähnlich wie A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv) @PAPIs Connect — São Paulo 2017 (20)

Big data serving: Processing and inference at scale in real time

SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)

Sistemas de Recomendação sem Enrolação

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato

AI-SDV 2021: Francisco Webber - Efficiency is the New Precision

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath

Filtering From the Firehose: Real Time Social Media Streaming

Basic Sentiment Analysis using Hive

eScience: A Transformed Scientific Method

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...

tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...

Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...

Leveraging Knowledge Graphs in your Enterprise Knowledge Management System

Final Next Generation Content Management

Microsoft Dryad

Ontopia / Liferay integration

Real-World Cassandra at ShareThis

Learn Big Data & Hadoop

Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes

Defensa.V11

Mehr von PAPIs.io

The daily job of a Data Scientist ranges from a variety of tasks: improving models performance or dealing with framework structure implementations. Machine learning as a service, a hot topic in the field, implies thinking about architecture to allow constant improvements in performance for our products. This presentation shows one architecture design using RESTful resources, document oriented databases and pre-trained pipelines to achieve real-time predictions of time series with high availability, scalability and freedom to Data Scientists work directly on improving the accuracy rate of our products. We fine tunned to work on time series forecasting which is a very challenging field that still needs better solutions in terms of innovative modeling. During the presentation will be shown how these decisions keep our Data Scientists focused on working with real data and thinking about improvements that can reach a large volume of time series instead of singular and localized actions.

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...

PAPIs.io

Feature engineering is one of the most important, yet elusive, skills to master if you want to be a good data scientist. Machine learning competitions are hardly ever won with strong modeling techniques alone -- it is the combination of creative feature engineering and powerful modeling techniques that makes the difference. This tutorial will give the audience practical tips and tricks to improve the performance of machine learning algorithms. We will broadly look at feature engineering for applied machine learning, touching on subjects like: categorical vs. numerical variables, data cleaning, feature extraction, transformations, and imputation.

Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017

PAPIs.io

For online businesses, recommender systems are paramount. There is an increasing need to take into account all the user information to tailor the best product offer, tailored to each new user. Part of that information is the content that the user actually sees: the visuals of the products. When it comes to products like luxury hotels, pictures of the room, the building or even the nearby beach can significantly impact users’ decision. In this talk, we will describe how we improved an online vacation retailer recommender system by using the information in images. We’ll explain how to leverage open data and pre-trained deep learning models to derive information on user taste. We will use a transfer learning approach that enables companies to use state of the art machine learning methods without needing deep learning expertise.

Extracting information from images using deep learning and transfer learning ...

PAPIs.io

Graphs are used to map relations on unstructured data. Companies’ data are most from database and mined using traditional data mining approach. However, model relational data as a graph can reveal useful insights and discovery relation among data that is ignored by traditional data mining techniques. In this work we used graphs to map physician relations using claim data as a proxy and this approach reveal interesting insights from health insurance company.

Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...

PAPIs.io

Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...

PAPIs.io

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...

PAPIs.io

In times of huge amounts of heterogeneous data available, processing and extracting knowledge requires more and more efforts on building complex software architectures. In this context, Apache Spark provides a powerful and efficient approach for large-scale data processing. This talk will briefly introduce a powerful machine learning library (MLlib) along with a general overview of the Spark framework, describing how to launch applications within a cluster. In this way, a demo will show how to simulate a Spark cluster in a local machine using images available on a Docker Hub public repository. In the end, another demo will show how to save time using unit tests for validating jobs before running them in a cluster.

Building machine learning applications locally with Spark — Joel Pinho Lucas ...

PAPIs.io

Battery life is critical for smart devices, but optimizing it requires cooperation from the entire software ecosystem. Wasteful software affects user perception about devices’ battery quality. Therefore, a large team within a producer of those smart devices is focused on identifying and correcting energy consumption bugs. Since the software ecosystem grows fast, that team faces a lot of suspect issues, from which only a small fraction turns out to be genuine. Our project aims to streamline energy-related bug processing in devices of the company and its partners, by automatically identifying anomalous behaviors related to battery drain using data mining and machine learning.

Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...

PAPIs.io

Machine learning as a service (MLaS) is imperative to the success of many companies as many internal teams and organizations need to gain business intelligence from big data. Building a scalable MLaS in a very challenging problem. In this paper, we present the scalable MLaS we built for a company that operates globally. We focus on several scalability challenges and our technical solutions. Video at https://www.youtube.com/watch?v=MpnszJ_3Ong Couldn't attend PAPIs '16? Get access to the other presentations' slides and videos at https://gumroad.com/products/fehon/

Scaling machine learning as a service at Uber — Li Erran Li at #papis2016

PAPIs.io

This talk will offer answers to the following questions: What is data-driven decision making? What is AI? What is Business Intelligence? Why are these concepts important? What are the biggest challenges and opportunities? Daniel is the CEO of Satalia that provides AI inspired solutions to solve industries hardest problems. He’s the co-founder of the ASI that transitions scientists into data scientists. Daniel has a MSci and EngD in AI from UCL, and is Director of UCL’s Business Analytics MSc; applying AI to solve business/social problems. Daniel has many Advisory and Executive positions, holds an international Kauffman Global Entrepreneur Scholarship and actively promotes innovation across the globe.

Real-world applications of AI - Daniel Hulme @ PAPIs Connect

PAPIs.io

Possibly the most important lesson we have learned after 60 years of AI research is that what seemed to be very difficult to achieve, such as accurate medical diagnosis to playing chess at the level of a Grand Master, turned out to be relatively easy whereas what seemed easy, such as visual object recognition or deep language understanding, turned out to be extremely difficult. In my talk I will try to explain the reasons for this apparent contradiction by briefly reviewing the past and present of AI and projecting it into the near future. Ramon Lopez de Mantaras is Research Professor of the Spanish National Research Council (CSIC) and Director of the Artificial Intelligence Research Institute of the CSIC. Technical Engineer EE (Electrical Engineering) from the Technical Engineering School of Mondragón (Spain) in 1973. Master of Sciences in Automatic Control from the University of Toulouse III (France) in 1974, Ph.D. in Physics from the University of Toulouse III (France), in 1977, with a thesis in Robotics (done at LAAS, CNRS). Master of Science in Engineering (ComputerScience) from the University of California at Berkeley (USA) in 1979. Ph.D. in Computer Science, from the Technical University of Catalonia, Barcelona (Spain) in 1981.

Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...

PAPIs.io

Everybody uses price promotions in retail. However, individual pricing is seldom used, particularly in offline retail. Marketing literature has been advocating the use of individual price discrimination for decades. Furthermore, product recommendations, ever-present in e-commerce, are also not often found in offline retail. We show the machine learning driven system behind a new promotion channel that enables retailers and manufacturers alike to target individual customers in offline retail. Lessons learned, technologies used, and machine learning approaches driving our system will be shown. Daniel Guhl has a background in economics & marketing, and got interested in data modeling during his Ph.D.. Currently, he is working as a data scientist at a Berlin based Start-up and is pursuing a postdoc at Humboldt University. He enjoys learning everyday and focuses on solving real world problems.

Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...

PAPIs.io

Deep Learning (DL) is becoming a big tsunami in the Machine Learning community. This talk aims at introducing DL, its motivation and main techniques. However, part of this talk is also devoted to demystify DL. What are the main advantages but also the main drawbacks of DL?. And what are the key issues that the practitioners have to consider? Roberto Paredes is an Associate Professor at Departamento de Sistemas Informáticos y Computación DSIC of the Universidad Poliécnica de Valencia UPV. He belongs to the Pattern Recognition and Human Language Technologies Research Centre PRHLT. Roberto Paredes is the Director of the PRHLT and the President of the Spanish AERFAI Association. His main research interests are around the statistical learning, machine learning and more recently neural networks and deep learning.

Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect

PAPIs.io

The best services have one thing in common: a superb customer experience. Banking services are no exception to this rule, and indeed the quest for an effortless, well informed, and personalized customer experience is one of the main goals of today's innovation in digital banking services. According to what Maslow has described in his "pyramid of needs", customers are seeking a more intimate and meaningful experience where banking services can actively assist the customer in performing and managing their financial life. Predictive APIs have a fundamental role in all this, as they enable a new set of customer journeys such as automatic categorization of transactions, detecting and alerting recurrent payments, pre-approving credit requests or provide better tools to fight fraud without limiting legitimate customer transactions. In this talk, I will focus on how to provide better banking services by using predictive APIs. I will describe the path on how to get there and the challenges of implementing predictive APIs in a strictly audited and regulated domain such as banking. Finally, I will briefly introduce a number of data science techniques to implement those customer journeys and describe how big/fast data engineering can be used to realize predictive data pipelines. Natalino is currently Enterprise Data Architect at ING in the Netherlands, where leads the strategy, definition, design and implementation of big/fast data solutions for data-driven applications, for personalized marketing, predictive analytics, and fraud/security management. All-round Software Architect, Data Technologist, Innovator, with 15+ years experience in research, development and management of distributed architectures and scalable services and applications.

Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect

PAPIs.io

Fintech startups are taking business away from traditional institutions like banks, exchanges, and brokerages. One of the reasons that these startups are able to compete with $30B+ behemoths like Credit Suisse and Goldman Sachs is their advanced decision making capabilities. By leveraging new data sources and better predictive analytics, companies like Ferratum Bank can make more accurate decisions in a fraction of the time. This talk will cover: Types of decisions you can automate Challenges in building predictive, financial apps First-hand, real-world examples Greg Lamp is the co-Founder and CTO of Yhat. In this role, Greg leads development of Yhat's core products and infrastructure and is the principal architect of the company's cloud and on-premise enterprise software applications. Greg was previously a product manager at OnDeck, a fintech startup in New York and before that an analyst at comScore. Greg is a graduate of the University of Virginia.

Microdecision making in financial services - Greg Lamp @ PAPIs Connect

PAPIs.io

What is the future we want to create, and what can we do – starting today – to actively shape that future with general AI? This talk outlines a vision for the future of humankind once AI reaches human or superhuman levels, and leads the audience through the steps one research group is taking to get there. From the economics of smart robots and job replacement, to bionic humans exploring the universe through space travel, the talk offers a window into the work of 30 researchers focused on AI development and safety, and explains what attendees can do themselves to help make that future happen. JoEllen is the AI Safety Ambassador and Head of PR for GoodAI, a Prague-based general AI research and development company. A high school teacher by trade, she has a bachelor’s degrees in English and Philosophy from Seattle University, a master’s degree in Transatlantic Studies from Charles University in Prague, and is the recipient of Fulbright grant. JoEllen is particularly interested in how AI will affect international government and political relations.

Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...

PAPIs.io

Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. In this talk we'll show how to use an AWS Spark cluster to train a model quickly from a laptop at a very little cost (around 10€). Vincent Van Steenbergen is a freelance (big) data engineer who's working on a range of international projects, implementing systems able to handle terabytes of data, usually involving Spark, Scala, Kafka, Hadoop and Cassandra. His main interest right now is applying these techniques to solve machine learning problems. Vincent was previously a technical architect at Property. Works, a real estate startup in London and before that an R&D engineer at IDAaaS.

Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...

PAPIs.io

Shopping, or as the people on the other side of the counter call it, retail has become the number one breeding ground for predictive applications in the enterprise. What started as simple recommendation engines has evolved into a complex and powerful ecosystem of predictive applications that affect core processes such as pricing, replenishment and staff planning. In this talk, Ulrich Kerzel will share impact and experiences from building and operating predictive applications for large retailers, and explain why the future of retail is as much a science as an art. Dr. Ulrich Kerzel is a Senior data scientists at Blue Yonder and renowned scientist with research experience at the University of Cambridge and CERN. Ulrich Kerzel earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. After his PhD, he went to the University of Cambridge, were he was a Senior Research Fellow at Magdelene College. His research work focused on complex statistical analyses to understand the origin of matter and antimatter using data from the LHCb experiment at the Large Hadron Collider at CERN, the world’s biggest research institute for particle physics. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a senior data scientist.

How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect

PAPIs.io

We live in a world of data, of big data. A big portion of this data has been generated by humans, and particularly through their mobile phones. In fact, there are almost as many mobile phones in the world as humans. The mobile phone is the piece of technology with the highest levels of adoption in human history. We carry them with us all through the day (and night, in many cases), leaving digital traces of our physical interactions. Mobile phones have become sensors of human activity in the large scale and also the most personal devices. In my talk, I will present some of the work that we are doing at Telefonica Research in the area of human behavior understanding from data captured with mobile phones, and particularly our work in the area of Big Data for Social Good. I will highlight opportunities but also challenges that we would need to address in order to truly leverage this opportunity. Nuria Oliver is a computer scientist and Scientific Director at Telefónica. She holds a Ph.D. from the Media Lab at MIT. She is one of the most cited female computer scientist in Spain, with her research having been cited by more than 8900 publications. She is well known for her work in computational models of human behavior, human computer-interaction, intelligent user interfaces, mobile computing and big data for social good.

The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...

PAPIs.io

ML services are quickly becoming a commodity, and they will be taken for granted by developers and computer users alike in the near future. The building blocks for ML as an ubiquitous service are already in place, almost always in the form of remote APIs that provide a first level of abstraction over ML problem-solving and, specially, obviate scalability and resource allocation issues. But that's not enough: those building blocks still leak implementation details inessential to the application developer that needs to provide domain-specific solutions. We need to ascend a couple of rungs in the abstraction ladder and provide domain-specific languages to describe ML solutions without nitty-gritty details unrelated to the problem at hand, offering non-experts the possibility of automating their ML solutions. In this talk, we'll discuss our experience designing and developing BigML's data wrangling and ML workflow DSLs, Flatline and WhizzML, and how they generalize to similar ML services and APIs. Jose A. Ortega Ruiz is part of the founding team of BigML, a little startup trying to apply machine learning and other AI techniques to big data, and make them accessible to non-specialists. He was hacking for Oblong from 2008 to early 2011. Before that, he worked for Google (from July 2007). From June 2005 to May 2007, he worked on embedded software development for the scientific payload of LISA Pathfinder. He was a theoretical physicist in a previous life, and wrote a Ph. D. thesis on gravitational wave detectors. He also got a bachelor’s degree in computer science. Between 2003 and 2005, he taught courses on programming and computer networks at the Universitat Autonoma of Barcelona, where he was part of the mobile agents research group.

Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...

PAPIs.io

Mehr von PAPIs.io (20)