This document discusses how big data and networked knowledge are changing how information is represented and explored. It outlines some of the complexities of knowledge objects that have multiple representations and can be connected in various ways. Some methods discussed for exploring this complex data ecosystem include manual browsing, automated spidering, collection plugins, conventional big data techniques, and using linked data. Integration challenges are addressed when data comes from different silos in various structures and formats. Keeping up with standards, learning visualization skills, and understanding data from different user perspectives are recommended for working with complex networked knowledge.
15. Complex Knowledge
Objects
- Have multiple representations & ways of being consumed
- Can be a link in a chain, node in a graph, or ecosystem of
knowledge.
- Allow different perspectives or ways to ask questions of.
(none of these are true of physical books)
16. Complex Knowledge
Objects
Data Examples:
•
•
•
•
•
JSON, XML, etc. esp. from a URL that allows it to be updated
Visualizations, sonifications, etc. (mind-maps, interactives)
Geospatial data, layered, constrained by area
APIs
Linked Data, integrated with many other resources
23. Methods of Exploration
In this diverse ecosystem, there is no one way of exploring a
topic.
- Manual Browsing
- Automated Spidering (e.g. Berkman / Media Cloud)
- Collection / Trawling (e.g. Browser Plugins)
- Conventional Big Data (e.g. Hadoop, Map/Reduce)
- Using Linked Data to branch out through related concepts
- Algorithmic Data Processing (e.g. Topic Modeling)
24.
25.
26.
27.
28.
29.
30. Problems of Completeness
- When do you know that you have enough information?
- What kind of compromises are made when information is
more massive than anyone can consume?
31. Problems of Integration
When data comes from many different silos, in many
different structures and formats, how do you bring all of this
knowledge together?
- One solution is RDF, which provides a common data
generic data model. Collaborative ontology development
can allow communities to work together.
- Open standards.
- Build tools and services that provide easy access to the
underlying data.
32. How To Get A Grip
- Keep abreast of W3C developments and other standards
bodies.
- Don’t focus too much on single technologies. They will
shift quickly.
- Learn at least one data visualization technology.
- Remember to frame questions of data in more than one
way.
- Ask your own questions of the data yourself. Understand
it from the point of the user.
Hinweis der Redaktion
We have crossed technological thresholds in the past, where the scale of operational parameters exceeded human conceptualization. 10-60rpms.
1000-6000rpms
40,000rpms
We are crossing a similar threshold now with big data. Mathematical complexity in 3 dimensions is conceivable.
Anything beyond that is basically impossible to visualize or conceptualize, even though the Euclidean geometry scales and continues to work for things like the Vector Space Model for representing high-dimensionality text for data mining.
The scale of network complexity has crossed this threshold as well. (2007)
Linked Open Data Cloud, 2010. Has scaled enormously since.
This all leaves us in an environment where the knowledge we need to educate ourselves about a given topic are not static, or found in a stack of books, but live, connected, spanning the borders of conventional containers.
We now have to contend with new forms of knowledge, new ways of discovering it, and new challenges to making use of it.Amazon deleted copies of Farrentheit 451 off of users’ Kindles. We are no longer in a world where knowledge is a simple, static asset. If authors can change their writing, the government can change their data.
Unsworth’s Scholarly Primitives are worth considering when planning for high-level functionality in a world where we rely on abstractions in the form of tools and algorithms to represent things in a lower-dimensionality. What are the tasks we need to provide access to on top of great scale.
Knowledge is being represented in vastly more complex structures than it used to be. Previously, books, tables, encyclopedias, and simple web pages were the standard. Now we have a heterogeneous mix of knowledge representation. These go beyond the page or document. They have the following properties.
They range from the concrete to the abstract.
Enigma.io, connecting disparate but linkable gov’t data sets.
WikiData as a complex knowledge object / ecosystem.
Mind maps (even in planning for this talk) are complex knowledge objects.
Browsing History Data; Civic Media; Catherine D’Ignazio doing work to learn about where you pay attention to and where you don’t. Browsing trackers are not independent, but networked.http://skyeome.net/wordpress/?paged=2
Linked Data Wanderer: Archie Bunker -> Singing -> List of Sovereign States -> Republican Party -> Imigrants to Cuba -> Human Voice -> Homeward Bound (the album)
Influence sub-graph on top of live DBpedia data. ( @sandsfish – http://dbpeople.herokuapp.com )
Highly linked.Influence sub-graph on top of live DBpedia data. ( @sandsfish – http://dbpeople.herokuapp.com )
Geo-parsing of place mentions in MIT Open Access collection. http://dspace.mit.edu
Geo-parsing of place mentions in MIT Open Access collection. http://dspace.mit.edu
For an Open Access collection to be useful to a wide population, it needs to be exposed for mining.I’m building an API to allow anyone to mine this information, instead of being limited to the representation of it on a web page or in a PDF. @sandsfish
If you primarily work with raw data, or are a librarian, learn even the most basic methods of visualization. This will give you a vocabulary with which to interact with data owners or patrons, and provide a better way of conceptualizing what questions are possible with data.Shifting your perspective from geographic to temporal, or from a single answer to a range of possibilities will help expand the knowledge you can acquire about a topic. This is one of the benefits of knowledge being complex.