Enabling Self-Service Access With Large Graphs
Graph databases often contain a massive amount of rich information but for most users, accessing this data can be a challenge. Graph tooling is often designed for highly trained specialists and the graph query language requires a significant investment of time to master. This means that the value of the graph is limited to a handful of users unless a simpler means of accessing the graph data is made available.
As graph databases grow in size, additional logistical challenges come into play that can also impact broader access. In this blog post we discuss ways in which organizations can leverage Process Tempo to work around these issues and offer true self-service access to data in the graph - despite the size of the graph itself.
For this discussion let's use a hypothetical situation: the organization has just invested heavily in developing a graph database that contains vital customer-related information. This graph helps the organization to better understand their customers, their buying habits, and their journey from prospect to customer. It is now the organization's desire to make this data accessible in a controlled and secure manner to as many internal analysts as possible.
Several barriers stand in the way of increasing this adoption:
- Know-How
- The Size of the Graph
- Security Constraints
- Existing Tooling
- Bad Data
Let's step through each of these barriers a little more.
First Barrier: The Know-How
Graph databases store data differently and this difference is now what most users are familiar with. The graph structure of nodes and edges compared to rows and columns will take some training and adjustment. Understanding how the graph query language works requires a skillset that most analysts do not have, nor have the appetite to learn. This makes self-service access to the graph impossible without some other mechanism in place.
Second Barrier: The Size of the Graph
In cases when the graph database is very large self-service becomes a scary proposition. The organization will need to implement a means to control user queries so that they do not negatively impact performance or consume an inordinate amount of compute resources. When operating these graphs in the cloud, compute demand can quickly increase cloud hosting costs unless the organization has a means to control what users can do and access.
Third Barrier: Security Constraints
Are parts of the graph off-limits? How does the organization restrict access to portions of the graph? In our hypothetical scenario a customer knowledge graph may contain personally identifiable information (PII). This data needs to be tightly secured. This can be very tricky to manage in a single database environment which can force the organization to deploy multiple databases at additional cost and complexity. Is there a better approach to handle this?
Fourth Barrier: Existing Tooling
Organizations may have standardized on data visualization solutions such as Power BI or Tableau. These tools are designed for power users who will have to be trained on the graph query language as these tools do not naturally connect to the graph and expose graph results in their natural form. They require a user to first write a query against the graph database. Given that not all users will be trained on the graph query language, this greatly limits who can do this. These experts become the bottleneck and will be required each time a user requires access to specific data in the graph.
Fifth Barrier: Bad Data
The motto "Garbage in, garbage out" impacts graph databases just like any other data source. User confidence in them can plummet if they are continuously impacted by bad data. The organization will need to take steps to minimize this and should seek out collaborative approaches to solving this problem. Data quality should not be the sole responsibility of IT.
What is Self-Service?
Before we talk about a potential solution to these problems we should be clear on the definition of "Self Service". Software vendors often use this term and it should be clear as the different personas they are trying to offer self-service access to.
If self-service is designed for technical users it means these assumptions are in play:
- The user understands which data assets they should be using for their given need
- The user has the appropriate login credentials to access this data
- The user understands the underlying data and the database structure
- The user is licensed and trained on the tool required to access the data
Translated, this is self-service access for properly trained and credentialed employees. When you add the complexity of a graph database and a graph query language the pool of users that can operate in this environment becomes even smaller.
To broaden the potential reach a less technical approach is required. This is very important if the goal is to create as much adoption as possible. Let's imagine what self-service means to the typical business analyst:
- They go to one place for data
- The data is easy to understand
- They do not have to worry about writing queries or accessing the wrong data
- They do not have to wait around for a license or permission to download something
- They do not need extensive training
- They can contribute their own data or knowledge
- There are minimal barriers for them to get started immediately
Applying this level of self-service to an organization's graph database can equate to a tremendous boost to employee productivity and greatly enhance the investment put into the graph itself. Often the greater adoption, the greater the return on investment.
Enabling Self-Service with Process Tempo
Process Tempo enables self-service because it is designed to minimize the complexity of the underlying graph database. The design of the Process Tempo interface is also meant for less technical users and emphasizes ease of use over technical complexity.
How Process Tempo can help overcome barriers to graph adoption:
Overcoming The First Barrier: The Know-How
Process Tempo provides a drag-and-drop environment allowing users to access, explore data within the graph without having to understand the underlying query language. Process Tempo makes the graph easier to understand and navigate by providing visual methods to explore the nodes and relationships within the data.
Process Tempo is a no-code environment designed such that non-technical users can access information in the graph and use this information to solve problems.
Overcoming The Second Barrier: The Size of The Graph
Process Tempo offers administrators and power users methods to define sub-graphs on behalf of their users. Sub-graphs partition the graph in order to minimize the chances of large, computationally heavy, queries impacting the performance of the database and available resources.
The idea is to create perspectives for users based on the answers they seek. For example, users may want to start with a Product perspective, while another user may prefer a Supplier or Supply Chain perspective. Each perspective will be attached to a pre-defined part of the graph which keeps the user within that part of the graph and thereby constraining the amount of data that these users can access.
In this example, the user has selected the node labels and relationships they want available to those that can access this dashboard. They can also add filters at this time to further control what is shared.
Now when users access this part of the graph from within a dashboard, they are restricted to just that part of the graph - yet they are free to add additional searches or filters in order to explore the data within this part of the graph.
Overcoming The Third Barrier: Security Constraints
In the same way the graph can be constrained from a performance perspective - the same approach can be used to constrain access from a security perspective. Designers can be selective as to what parts of the graph that they want to make available to users that can access this specific dashboard.
In this example, the designer is adding a filter to the graph that will restrict the viewer from seeing anything but data related to the country Finland:
Furthermore, dashboards are connected to Process Tempo Workspaces. A workspace is a collection of content that users have to be granted access to. The combination of features in the Workspace and in each dashboard offer a very fine level of control over data access - while still giving users the freedom to use this data how they wish.
It is worth mentioning that when faced with a data access challenge users will find ways to procure the data they need and often process it using unsupported solutions. Examples are when users retrieve data and analyze it in spreadsheets. Spreadsheets represent a security risk that can make it easier for hackers to steal sensitive data. A self-service approach supported by IT can mitigate this risk.
Overcoming The Fourth Barrier: Existing Tooling
The fourth barrier that will need to be overcome is the ability to empower users with the right set of tools - tools that work well in a graph environment. Most data access solutions were designed to handle tabular data from a relational database. They assume that the user is well-trained and familiar with working with data and can write the appropriate graph query. These tools require an expert knowledge of the underlying query language.
Graph database vendors often provide a number of tools or modules that attempt to make it easier to visualize the data in the graph. However, these approaches are not designed for non-technical users. They are also not often designed for large scale, self-service scenarios.
Overcoming The Fifth Barrier: Data Quality
The best approach to improving data quality is through a crowd-sourced approach. The easier it is for subject matter experts to access data in the graph, the quicker data quality problems can be spotted, the faster they are spotted, the faster they can be remediated. Process Tempo supports improved data quality in two ways:
- Offers true self-service access which means it is easier for a larger group of people to spot problems faster
- Process Tempo offers a drag-and-drop form builder that allows dashboard designers to capture critical information while users are looking at the dashboard. The information captured in these forms can be written back to the graph or the source system allowing users to fix problems when they see them and on the fly.
Summary
Graph databases provide truly incredible ways to house and access information. Unfortunately, they bring additional complexity that impacts their adoption. Organizations that wish to leverage this technology will need to overcome several technical barriers in order to increase this adoption and thereby increase the return on investment. This is especially true if the size of the graph database is fairly large.
Process Tempo makes it easier for organizations that have made this investment to grow the use of the graph across different teams, departments and personas. It achieves this by offering a no-code, drag-and-drop environment that allows users to access and explore this data without having to learn the underlying graph query language.
To learn more, visit us at www.processtempo.com
Or, contact us at info@processtempo.com
Discover the Power of Data
Unlock insights and drive business growth with our platform
Related News
Discover the latest trends and insights in data analytics.