Challenge / Goal
Cities and their citizens are producers of large amounts of diverse data. Diversity of many data sources is in fact one of the biggest issues in big data processing. Semantic data integration offers unique advantages, as opposed to more traditional approaches. ETL (Extract, Transfer, Load), for instance, creates bottlenecks for data access and doesn´t scale as well as technologies that consume data directly without moving it around.
Under the GrowSmarter project, Barcelona´s big data integration solution aims at developing a semantic model that reflects and connects three domains of interest: mobility, energy, and integrated infrastructures. Users can browse and query the ontology. The goal is to provide a solution that is easier to evolve, maintain, and port to new cities with different data and use patterns.
This solution consists of three components:
- City ontology, together with a browse and query tool: The city ontology reflects the meaning (i.e. semantics) of all the urban concepts (entities and relationships) that describe the domains of interest and the connections between them. The browse and query tool allows a keyword-based search of concepts, navigation starting from these anchor concepts, and the construction of queries in a graphical fashion.
- Semi-automatic mapping tool: This tool aligns the semantic model and the specific model of the city data platform, and will be available via the web. Multiple users could participate collaboratively to produce valid alignments.
- Semantic access layer (SAL):
- Functions as an access point for applications that pose semantic queries to access the data on the city platform. Applications accessing data from different cities can work without modification if an SAL exists for those cities that maps between the city ontology and the actual city schema. SAL acts on behalf of the applications (with their security and privacy credentials defined by Cellnex) to fetch the required data via a REST API and compute the query results for the few most common query operations (such as join). It calls the mapping tool to know which are the resources in the city platform schema that correspond to the semantic concepts contained in a query.
- Data integration solutions traditionally imply a data warehouse approach. While this is based on a well established and efficient technology, as well as solid formal foundations, several characteristics of data in urban environments are a misfit for this type of data integration. Firstly, data and schemas evolve; secondly, data is incomplete and no assumptions should be made about non-existing data; thirdly, there are an increasing number of data sources of heterogeneous nature and formats that need to be integrated in an efficient and, as much as possible, automated way; fourthly, data is usually available for consulting but cannot be moved around and stored at the target.
- These are scenarios where semantic technologies excel. These are not only a natural fit for the Open World paradigm, but they evolve gracefully and foster semiautomatic mapping techniques for massive data population and access.
- One advantage is that new data can be integrated faster, new semantic relationships can be inferred, and users can query the data without having to learn a query language nor understand the entire data model at a time.
Want to learn more about the lessons learned, financial details and results?