APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
Rediscover hidden facebook semantic search engine, reimagine open source intelligence
1. Rediscovering the hidden Facebook semantic
search engine and reimagining open-source
intelligence
unchained Graph Search
akos.bardoczi.ch@ieee.org
2. Some important things about OSINT
• In most cases, the most efficient technique is
not well-known, hidden, but not (too) difficult to
use!
• The misunderstood deep web: the tale about
the size of the deep web is based on a 16-yrs-old
research…
• The misunderstood deep web #2: it usually
cannot provide up-to-date, relevant information
3. About Facebook Graph Search
• Announced by FB in March 2013
• FB almost immediately killed the Advanced
search box due to privacy concerns – but the
search options stay available
• the dying of semantic search – huge machine
learning failures, such as too few complex
search queries from users
• After 4 years, still available only in US English
4. Example
• „Budapest University of Technology and
Economics students who are Budapest, Hungary
residents and like Shakira” (sic!) – this will
generate a simple keyword search without
relevant results
• https://www.facebook.com/search/106146106082559/stu
dents/106502519386806/residents/5027904559/likers/int
ersect URL will generate a smart, semantic search query
5. The universal scheme of GS’s URI structure
• https://facebook.com/(search)/(str)[n]/string_te
rm[n]/entity_id[m]/keyword[m]/(intersect)
• the „search” [optional] is not mandatory in
some cases
• the „str” [optional] indicates simple keyword
terms and must be placed before other terms
• the „intersect” is mandatory in complex queries
6. More about the scheme of GS’s URI
• the „entity_id” represents the entity of
something, e.g. names, places, religious views,
spoken languages – see below – you can find it
in the client-side code
• There isn’t any limit about query complexity or
length
• The „keyword” indicates the type of entity –
which is important, e.g. a university as a physical
location, as a school, or as a workplace
7. Some important things
• The queries works only with US English
Facebook but:
• the URI may contain any characters after the
„str” part, e.g. Москва or الدولي دبي مطار
• The „word order of the sentence” matters in
most cases
8. What can you search with Graph Search?
• Basically almost anything!
• in theory you can find any content and relations
between entities and contents which you can
view with your permissions
10. The scope of search in practice
• as I mentioned, any content: texts in status
updates, comments, image descriptions, images,
geotags, likes, and other reacts on public pages,
on event pages, and on users’ timelines (even
the items hidden from timeline!)
• Full contents of open groups, full content of
closed and secret groups as a group member
• Basically anything except items specifically
deleted by the user
11. The scope of search in practice 0x200.
• keep in mind the audience selectors – and
bypass them
• your scope will exponentially grow with more
friends and after joining more goups – note: the
avg. distance between two randomly chosen
users is 3.5 and users have 300 contacts on avg.
but the limit # of friends is 5000 and you can
join 5000 different groups
12. The scope of search in practice 0x300.
• You will need a professionally molded, realistic
character for a Facebook user depending on
your research interest
• A professionally molded character [actor] is not
a simple fake profile – and I think this is the
most difficult part – see also OPSEC
13. OPSEC considerations @ sophisticated
research
• In practice, you cannot make fully-virgin
searches – e.g. the order of results depends on
everything, the previous searches as well
• don’t try to use widely used anonimizer
techniques, for example TOR – the FB will know
it!
• the best practices are similar to the best
practices in forensics lab and in HUMINT
14. OPSEC considerations @ sophisticated
research 0x200.
• You will need a spare, non-virtual SIM card never used
before
• Depending on the sensitivity of research, you may need a
photoshopped goverment-issued ID – don’t worry,
nowadays researchers can generate realistic faces [difficult
&& not my business ]
• the FB reserves the right of account deactivation,
temporary suspension; let’s minimize this risk
15. OPSEC considerations @ sophisticated
research 0x300.
• It is recommended to use a virtual machine with
default browser settings – see also: browser
fingerprint
• Once again – do not use TOR! – instead use a reliable
VPN provider, and keep in mind that your IP address
is associated with an approx. location that affects the
order of your search results that you receive
16. • the Facebook traces user behaviour – e.g. statistical
information about keystrokes speed – including what
you deleted from a text field - and the distribution of
different operations, in short, your every click
• Of course, never mix your actor’s behavior and your
own – e.g. don’t send a friend request to someone
you know personally
OPSEC considerations @ sophisticated
research 0x400.
17. Tailor your actor’s character and behavior for
the concrete research field
• more complicated than you think
• an ideal actor is similar to a secret agent, who is
familiar with language, culture, language-culture
(!!) in different cases – e.g. counterterrorism,
social psychology researches, or cyber-threat
intelligence context
• in some cases you simply don’t need an actor,
you can search via your own account