In April I went on a pilgrimage to Enterprise Data World to encourage my colleagues in the Data world who typically are focused on issues of Data Governance, Data Quality, and Data Standards, to think more about data from the perspective of the line of business end user who is actually trying to accomplish something with their precious data! To echo a line from the great keynote by Scott Berkun, data doesnāt actually do anything by itself, itās what we do with the data that matters!
Iām going to recap a couple of sessions below, and then share my key take aways. There were were so many sessions, it was almost too hard to pick what to focus on!
Recap of sessions
I started out sitting in on a great talk by Cathy Normand from ExxonMobile on how to enable metadata strategies. Interestingly she articulated a strong you need to market to your users philosophy. Too often in the IT world we take the ābuild it and they will comeā philosophy, something I am seeing with a current customer project ;-). She talked about using gamification and going out to meet the end users in their natural habitat and evangelizing the need to provide good quality metadata as key:
Itās a lot of work, but itās also always changing, and thatās interesting!
I also attended a session on the FIBO group about Knowledge Graph and AI, who had their own sub track at the conference. FIBO stands for Financial Industry Business Ontology. I had an interesting conversation with a fellow attendee about it, and it in some ways represented the best and worst of Semantic thinking. FIBO is very powerful in modeling relationships, but only as long as the relationships being modeled actually reflect the real world! And the real world, especially in some of the less traditional finance industries, seem to be much messier then these structured semantic worlds. It reaffirmed my thinking that semantic structures, and the query languages like SPARQL are great for a very narrow well-defined domain, but the reality is most of us work in much broader multi domain worlds, where highly structured relationships like FIBO break down.
I then got to sit in on a talk that finally introduced semantic blockchain⦠I knew someone would bring up that buzzword ;-).
I dropped in on a talk that purpoted to demonstrate how easy it is to build a UI using transformation languages like XQuery. Having burned my fingers on building a complete robust website using XQL, I listened to see if the speaker changed my mind, but he didnāt. XML transformation languages are not good choices for building robust applications (hint hint Marklogic, TopQuadrant, etc).
There was another interesting talk about doing entity extraction on forms, and how they did specific analyzers to look for āwet inkā style signatures etc. Laid out some nice pros and cons of building it outside of your data platform. It was also cool to see the shout out for Tesseract OCR.
I really enjoyed Scott Berkunās keynote talk on creativity and ideas.

I like pyramids, so does Scott Berkun
His keynote felt more topical than most that Iāve seen, and I thought he posited some very interesting ideas, specifically that he feels the phrase the data says should be verboten! I like that he pushed the message that the data is always interpreted by people. When we set up a relevance project, we typically are pushing to identify some pretty specific hard KPIās to measure the impact of the work we are doing. For example āRevenue per Searchā or āRequests for Data Set per Searchā, as well as the oldie by goodie āNDCGā metric. I wonder if KPIās that are generated by data are a case of falling in the the data says trapā¦.?
My Takeaways
-
One of OSCās tag lines was Data ā> Information ā> Wisdom, which mapped nicely on to a pyramid of creativity that Scott mentioned.
-
Weāve seen that as you move up the pyramic, you need more and cleaner data sets to get closer to that, dare I say it, Cognitive AI search platform. I donāt think weāll ever really go deep in Data Governance, beyond recognizing that itās a good thing to encourage good data sets ;-)
-
In learning more about data quality, Iām seeing some tools and ideas to help us better understand the shape of the data, and incorporate the quality of the data as a signal into our relevancy algorithms. I can see us having boost factors based on quality of data set in making matches. Or deciding what datasets to use or not based on a analysis of data quality.
-
Enterprise Data World since my last visit in 2014 has become more relevant to search folks, and Iām not sure if the topics have changed over time, or if our focus on data and machine learning has led us closer to the conferenceās core topics!
Lastly, I also want to point to a Twitter thread that @HealthcareWen shared. She captured some great nuggets of information from a number of sessions that I didnāt get to attend.

Comics! She spent the week drawing amazing comics as speakers talked!
One More Thing
Iām working with a client who has ~60 data sources in various systems, and weāre working with them to start answering questions that they never could previously by bringing together data sets that had never been joined before. They have a very intuitive sense of āwhat the right decisions areā, but correctly, are looking to check those intuitions against actual data to help them refine their decision making. Plus, find new startling insights by finding less obvious patterns in the data. A key challenge is that there is no āroadmapā to the datasets. They are doing some good governance things like establishing a data steward for each data set, but there are no cross data set links. To figure out how two data sets would work together. I have a mental picture of a traditional highway map, where the think Interstate lines map the very strong connections between data sets, ie. good primary keys, clean data sets, complete records on either side. And then lesser lines to represent highways, byways, and then local roads reflecting the less probably connections between the data sets.
Iām thinking this is a graph of knowledge, but how to get that knowledge? Iāve been involved in too many projects that said ādata lakeā, or ādata warehouseā and spent all the budget moving the data around, and not enough on understanding it. So how to flip that paradigm? One of the topics was the idea of Data Virtualization, a new term to me. here the idea is that instead of directly accessing the data, I work with a virtual data set. That virtual layer abstracts me form the underlying format, and lets me keep my 60+ data sets in their original platform, whether that is a RDBMS, a CSV flat file, a big data solution, or even a API or website⦠It lets me start playing with the data using a common language like SQL. Denodo is the vendor at EDW who was talking about this approach, and from a ālets get startedā perspective, there was a lot I liked. Yes, it has limitations, including I am querying potentially production systems⦠Iām intrigued to see if I can put on my Data Cartographer hat and start building that roadmap to the data for other business oriented folks to leverage to ask questions they didnāt think they could before!