Graph thinking, derived from a branch of mathematics called Graph Theory, is all the rage in Silicon Valley. For instance, in August, Gartner identified knowledge graphs and graph analytics as two of 29 emerging technologies CIOs should experiment with in the next year. (See: Gartner Hype Cycle for Emerging Technologies.)
Some of the most successful software companies in the past two decades use graph databases:
- Facebook is a social graph
- LinkedIn is a professional graph, and
- PageRank, the algorithm underlying Google Search, uses graph theory to decide the relevance of search results.
Graph thinking is also deeply relevant in the area of engineering information, technical documents, and finding answers to questions in the engineering lifecycle. In the first of this two-part series, we will provide a very brief (and mathematically painless) introduction to graph thinking. In the second installment, we will show how SWISS applies graph thinking to engineering documentation.
What is Graph Thinking?
The mathematics of graph theory can be complex, but the basics are quite intuitive. A graph consists of a set of nodes (also called vertices) that are interconnected by links (also called edges). Almost any network you can think of (e.g, social, cell, railroad, etc.) can be described as a graph and mathematically modeled as one.
You are very familiar with graphs such as:
- The Internet! Think of web pages as nodes (each of which [may] contain useful information) and the clickable links as edges (which indicate and lead to another related information source). All the pages on the web form a webgraph. Some web pages stand alone, and some are highly interconnected. In graph theory, the webgraph is known as a “directed” graph because links made between web pages are “asymmetric”— meaning a page on web site A may link to a page on web site B, but that web site B does not have to link back to the first site A.
- Facebook, as mentioned above, is a social network or social graph. Each of the billion plus people that have joined Facebook become a node in the Facebook graph. The friend connections we make are the links between us. Facebook has taken this a step further by turning each of our “Likes” into a link to another node in the graph (like a favorite band, restaurant, or group), thereby creating a more complex but more useful view of who we are (great for advertisers). Unlike the webgraph example, Facebook is an “undirected” graph, every friend link is symmetric. That is, we cannot link to a friend without our friend accepting our request and establishing the link between us. LinkedIn, the professional network, operates on the same principle.
- Twitter has both symmetric and asymmetric links. Friends can choose to follow each other, but celebrities can have many followers and need not follow them back!
Characteristics of Graphs
Once you start thinking about networks in graph terms, you start to see that not all nodes are created equal (Just visualize the drawing of a celebrity’s connections on Twitter compared to yours.) Additional examples of how nodes in a graph may differ:
- Some webpages have more incoming links and other web pages have more outgoing links. (Google uses this notion, and others, to rank the importance of web pages.)
- Some webpages, or people in Facebook, are connected to many more pages or other people (this is called “degree centrality”).
- Some people in Facebook are close to more people and some people are out on the edges of the social graph (this is called “closeness centrality”).
- Some people in the social graph are more important to go through to get to other people than others (this is called “betweenness centrality”).
- Some networks and graphs are very “dense” with everyone node connected to every other one, and some are very “sparse” with any one node connecting only to a small percentage of nodes on the network (e.g., Facebook)
Practical Uses of Graph Theory
Now let’s look at a few of the ways that organizations can use graph data to improve intelligence, performance, forecasting, and more.
- Perhaps the best-known graph that you don’t know is a graph is Google. Google came from Sergei Brin and Larry Page’s work on an algorithm that examined not just the web pages (nodes) on the Internet, but also the edges (links) between them. In their model, more links to a page indicates greater importance of that page. They called this PageRank and it is still the underpinning of Google’s search engine today.
- PageRank was actually preceded by research done by Eugene Garfield who created the “journal impact factor”, which calculates the annual average number of citations in relation to recently published articles. It was used for quantifying the reach of a particular journal in the scientific community based on citations of its publications. Brin and Page credited Garfield in their own research and Garfield is widely known as the “Grandfather of Google”.
- Netflix uses data about a user’s watch history, their geographic location, and even their preferences on social media to predict the shows that they will want to watch, and even to predict what new shows Netflix should produce or include in their lineup.
- Pandora developed the “Musical Genome” which examines dozens of data points (“genes”) among songs to find similar songs that their users will like. Their graph is comprised of ~450 different characteristics such as lyrical mood, language style, melody, tempo, gender of lead vocalist, gender of lead vocalist, level of distortion and more.
- Retailers use graph models to assess the best locations to place new stores. Each existing store and each possible new store is a node and they are connected to other nodes including competitors, street intersections, parking lots, median area income, and more.
- NASA’s Lessons Learned database contains tens of thousands of small pieces of knowledge gained through projects and missions. But searching that database by keyword returns far too many results to be useful (consider the number of results from searching common words like “thermal”, “material”, and “fuel”. NASA created a graph database that connects individual lessons based on subject, keywords, sentiment, frequency of use, and other variables. Now a basic keyword search of the graph yields results based on patterns and relationships.
- Thousands of pieces of legislation go through Congress every year, but only about 4% actually pass both chambers and get written into law. Predicting which ones will survive is valuable to many companies and organizations and a task made easier by graph theory. Algorithms map dozens of variables related to each bill including the semantic language, authors, co-sponsors, party affiliation, past voting of members, and more. The result is a software tool that helps people forecast the outcome of proposed legislation.
Hopefully you can imagine many more rich mathematical and visual models to describe graphs. (If you are interested in learning more about graph thinking I recommend this short primer. For a great book about networks and graph theory, also consider The Power of Networks.)
In the next post, we will demonstrate how SWISS applies graph thinking to standards and other engineering documents to create a more comprehensive and contextual view of engineering information. Until then, start noticing all of the networks or graphs around you.
You Inspire Us!
We are always eager to hear your thoughts and ideas about engineering information, digital models, knowledge graphs, and SWISS. Please reply with your comments or add them online or contact us directly to share your feedback.