This is the story of the ten-year journey of a high-load data ingestion pipeline that handles 500 million posts daily, from choreography-based design to fully centralized orchestration, and the valuable lessons learned along the way.
14. TPL DataFlow
var fetch = new TransformBlock(async url =>
{
using (var web = new WebClient())
return (url, await
web.DownloadAsync(url));
});
var save = new ActionBlock(async data => {
(string url, byte[] image) = data;
await File.WriteAllBytesAsync(filePath,
image);
});
fetch.LinkTo(save); // links the output from the TransformBlock to
the ActionBlock
16. evolution conundrum
extend
o validate
o normalize
o detect sentiment
o detect language
o match
o delay
o send to client
Microservices, anyone?
vs split
o validate
o normalize
o detect sentiment
o detect language
o match
o delay
o send to client
17. disintegration reasons
+ + +
+
service
functionality
code
volatility
scalability &
throughput
fault
tolerance
data
security
Architecture, The Hard Parts by Neal Ford
(https:www.youtube.com/@thoughtworks)
36. reintegration reasons
Architecture, The Hard Parts by Neal Ford
(https:www.youtube.com/@thoughtworks)
database
transactions
structural (data)
dependencies
workflow and
choreography
45. complexity distribution
o validate
o normalize
o detect sentiment
o detect language
…
o match
o delay
o send to client
o detect objects
o detect OCR
o send to client o match keywords
o send to client
o extract
o highlight
o send to client
46. big ball of …
o validate
o normalize
o detect sentiment
o detect language
…
o match
o delay
o send to client
o detect objects
o detect OCR
o send to client
o match keywords
o send to client
o extract
o highlight
o send to client
[isFresh
]
[hasImage
]
[noOCR
]
[highPriority]
[isFromChanne
l]
[hasText]
56. When forming analogies, this bias
may lead individuals to selectively
focus on similarities between
two things while ignoring
differences, even if the analogy
is not valid.
Confirmation Bias
the end