Center for Refining Information (CRI) is a spin-off company founded by the University of Warsaw in 2019
CRI commercializes the innovative achievements of the Faculty of Journalism, Information and Bibliology in the field of analysis of large information resources (BigData).
Studies carried out since 2010 in the Department of Information Technology of the Media have resulted in the development of a concept and practical implementation of a new innovative system for searching, collecting and analyzing large text collections called Refining Information (RI). The effectiveness of this method was confirmed, among others, by the results of research on electoral preferences, indicating social expectations or identifying and determining the dynamics of changes in innovation processes. The scope of CRI services includes an assessment of the current state of the studied cases and forecasts of social, political and economic processes. The researchers are also in the CRI market offer.
The strength of CRI is the staff of over a dozen people: IT specialists, analysts, economists and bibliologists. The unique, transparent ICT tools created by CRI experts enable reliable, efficient identification and collection of source data as well as their precise quantitative analysis – refining information enriched by AI elements, especially the machine lerning.
We are technologically independent, the system is scalable, redundant – with sufficient power. Technical facilities meet the basic conditions of teleinformation security (staff – state certificate of the inspector and IT security administrator).
One of the commercial projects which has been implemented by CRI was the development of a unique system for monitoring technological trends (MTT).The system has functionalities that allow (1) automatic collection and standardization of source materials from indicated online and offline sources. Another MTT functionality is (2) identification – with the use of AI (machine learning) tools – events / phenomena describing technological trends. After beeing identified, the trends are subjected to (3) a quantitative assessment of their change and prediction dynamics. This task is carried out on the basis of analyzes of large volume data, covering the problems of global and Polish trends in the field of technology. These evaluations have been verified in several projects made for the National Center for Research and Development. The current state and dynamics of technological innovation changes were examined in such the fields like: energy, aviation, cloud computing or photonic material technologies.
The results of these studies are effectively used to support R & D and innovation programs, based on reliable and, above all, up-to-date, self-accumulated, data (on a global scale) that allow to identify and forecast technological trends. The obtained data can be used to (4) identify long-term trends, exceptional events in the scope of: market expectations, competition activities, directions of research and development and innovation development to objectively describe what is currently happening on the market, how trends shaped Poland and the world, and whether the areas of innovation identified today will be valid in the near future. The tools created for refining information have proved useful also in research in the field of social, political and technical sciences which was confirmed by the documented applications (5) of refining information, using sentiment analyzes, including in terms of quantitative assessment of the course and accurate prediction of parliamentary and presidential elections in Poland in 2010/2011.
(http://studiamedioznawcze.pl/Numery/2013_2_53/gogolek-en.pdf; http://studiamedioznawcze.pl/Numery/2013_3_54/gogolek-en.pdf) and 2014/2015 (http://studiamedioznawcze.pl/Numery/2016_4_67/gogolek-en.pdf).
1. Identification and collection of source materials.
Acquiring the information in question. Search and resource identification strategies. The distribution of sources according to the categories. Methodology for identifying thematic sources of information.
2. Exploration of large, variable and diverse data sets identified. Data collection procedure, data formats, categorization. Classification of source documents.
3. Analytical tools. Classification of source materials.
4. Classification method. Classification by source category. Data classification. Training set. Machine learning. Processing of input data. Attribute analysis. Division into teaching and test set. Model learning. Model verification. Category mapping.
5. Analytical tools. Identification of technological trends. The functioning algorithm. Identification of technological trends. Classification Poland – world. Information Osmosis. Identification of attributes. Data cleansing. Tokenization. Quantitative analysis. Statistics TF-IDF. Bigram analysis. Cluster points.
The examples below presents how Refining Information technology can be aplicable to analyse current international events:
“Islam” vs. “ISIS”
Occurrences of the terms „ISIS” and „Islam” were analyzed in the period of study corresponding to the area of interest of Al Jazeera.
Figure 1: The area of particular interest of the Al Jazeera TV network
It is clear that the two names are linked with the focus on the Islamic State. The relationship between them is interesting. The stereotypical interpretation that formed the dominant narrative in European countries, especially in relation to refugees, identified “Islam” and its followers with “ISIS.” In Poland, for example, some media channels claimed that Islam is a threat to the security of citizens and linked it to the terrorism perpetrated by the Islamic State. However, statistics do not support this claim. In the period from 2017-04-17 to 2017-07-03, there was a clear rise in the occurrence of the term “ISIS” on websites related to Al Jazeera. It is worth noting that until May 29, 2017, both the terms “ISIS” and “Islam” were almost on an equal footing in terms of use—their frequencies overlapped in the news channels narrowed down to a weekly period (Table 1).
Table 1: Occurrence of the terms “ISIS” and “Islam” from 2017-04-17 to 2017-07-03 on Al Jazeera websites
2017-04-17 | 2017-04-24 | 2017-05-01 | 2017-05-08 | 2017-05-15 | 2017-05-22 | 2017-05-29 | 2017-06-05 | 2017-06-12 | 2017-06-19 | 2017-06-26 | 2017-07-03 | |
ISIS | 32 | 20 | 28 | 24 | 15 | 19 | 40 | 47 | 106 | 98 | 102 | 100 |
Islam | 8 | 28 | 32 | 28 | 15 | 24 | 44 | 8 | 26 | 15 | 21 | 11 |
(Source: own study)
A significant change in the trend occurred in the beginning of June 2017. The frequency of the term “Islam” decreased and that of “ISIS” increased significantly (Figure 2). In the last 2 months of the period studied, “Islam” did not exceed 26 occurrences while “ISIS” occurred nearly six times more often. Until May 29, “Islam” appeared on average 25.6 times a week and “ISIS” 25.4 times, but from June 5 to July 3, “Islam” appeared on average 13.5 times while “ISIS” occurred 75.5 times.
[gdoc key=”https://docs.google.com/spreadsheets/d/1PuI66DichANXorQH5Ub3hAUS8_pZHUrdANtQ8djBHZ8/edit?usp=sharing” chart=”Line” title=”Figure 2: Changes in trends of occurrence of terms “ISIS” and “Islam” from 2017-04-17 to 2017-07-03 (Source: our study)”]
Figure 2: Changes in trends of occurrence of terms “ISIS” and “Islam” from 2017-04-17 to 2017-07-03 (Source: our study)
Finally, the relation between these terms was determined using the correlation between their numbers of occurrence during the studied period (Table 2). The correlation was negative, which means that a more frequent occurrence of the term “ISIS” was accompanied by a decrease in the frequency of use of the term “Islam.” The relationship was not strong, and a value of −0.226407 suggests that these terms were statistically independent. Thus, hypothesis H1 was considered to have been verified.
Table 2: Correlation between the occurrence of the terms “ISIS” and “Islam” from 2017-04-17 to 2017-07-03
Variable | ISIS | Islam |
ISIS | 1,0000 | -0,2265 |
Islam | -0,2245 | 1,0000 |
(Source: own study)
“Saud” and “ISIS” vs. Oil Prices
Using correlation as a measure of dependence, of the terms used in the analyzed information channel, those that were statistically significantly related to the price of crude oil were highlighted. The terms “Saud” and “ISIS” were chosen as significant (Table 3). The collected data were aggregated into weekly intervals to create cross-channels in the information stream.
Table 3: The terms “Saudi” and “ISIS,” and the price of crude oil in the studied period
cross-channel | Saud | ISIS | Oil |
2017-04-17 | 6 | 32 | 55.52 |
2017-04-24 | 8 | 20 | 52.16 |
2017-05-01 | 8 | 28 | 51.44 |
2017-05-08 | 15 | 24 | 50.27 |
2017-05-15 | 5 | 15 | 51.80 |
2017-05-22 | 93 | 19 | 53.78 |
2017-05-29 | 48 | 40 | 52.58 |
2017-06-05 | 40 | 47 | 49.95 |
2017-06-12 | 210 | 106 | 48.26 |
2017-06-19 | 91 | 98 | 45.32 |
2017-06-26 | 255 | 102 | 46.13 |
2017-07-03 | 165 | 100 | 49.59 |
(Source: https://www.bankier.pl/inwestowanie/profile/quote.html?symbol=ROPA accessed: 2017-09-22), own study)
Both terms had a strong positive relationship with oil prices: the correlation coefficient for “ISIS” and “Saud” was 0.86 (Table 4). On the contrary, oil prices were correlated negatively with the occurrence of “ISIS,” −0.79, and “Saud,” −0.64.
Table 4: Correlation between the occurrence of “ISIS,” and “Saud” and oil prices
Variable | Saud | ISIS | Oil |
Saud | 1,0000 | 0,8593 | -0,6422 |
ISIS | 0,8593 | 1,0000 | -0,7959 |
Oil | -0,6422 | -0,7959 | 1,000 |
(Source: own study)
The results led to the conclusion that the frequency of occurrence of these terms can be used to forecast oil prices. Multiple regression analysis was used to make price predictions, where occurrences of the terms „Saudi” and “ISIS” were treated as independent variables, and the price of oil was treated as a dependent variable. The multiple regression equation took the form of a linear equation:
OILc = a1*Sc + a2*Ic +a0
where
OILc – price of crude oil in cross-channel c
Sc – number of occurrences of the term “Saud” in cross-channel c
Ic – number of occurrences of the term “ISIS” in cross-channel c
a1, a2, a0 – variable coefficients
The coefficients of the equation below were then obtained (Table 5).
Table 5: Coefficients of regression equation
(Our own calculations)
That is,
a1 = 0.00536
a2 = −0.07441
a0 = 54.05774
Following the calculations using the obtained coefficients and known values of the independent variables (“Saud,” “ISIS”) and the dependent variable („OIL”), we obtained the values presented in Figure 3.
USD
Figure 3: Oil price quotes and forecasted oil prices (multiple regression) (our own calculations)
The results can be considered satisfactory. The theoretically calculated oil price was within the range of ± 5% of the actual price. Therefore, by knowing the number of occurrences of the terms “Saud” and “ISIS,” we can forecast oil prices. Thus, hypothesis H2 can be considered to have been positively verified.
Terrorism vs. Acts of Terror
In the last stage of the analysis, the number of terms related to terrorism were collected and statistics were generated for each day. The data were compared with the number of victims (killed and injured) of terrorist acts that took place during the studied period (Table 6).
The terrorist events covered in the study are listed in Table 7.
Table 6: Occurrences of terms related to terrorism and the number of victims of terrorist acts in the studied period
Terrorism, | Victims | Terrorism, | Victims | Terrorism, | Victims | |||||
2017-04-13 | 1 | 0 | 2017-06-05 | 19 | 0 | 2017-07-24 | 10 | 0 | ||
2017-04-15 | 1 | 0 | 2017-06-06 | 33 | 0 | 2017-07-25 | 14 | 0 | ||
2017-04-17 | 2 | 0 | 2017-06-07 | 33 | 62 | 2017-07-26 | 14 | 0 | ||
2017-04-19 | 1 | 0 | 2017-06-08 | 28 | 0 | 2017-07-27 | 6 | 0 | ||
2017-04-20 | 4 | 5 | 2017-06-09 | 27 | 0 | 2017-07-28 | 6 | 0 | ||
2017-04-21 | 19 | 0 | 2017-06-10 | 24 | 0 | 2017-07-29 | 9 | 0 | ||
2017-04-22 | 3 | 142 | 2017-06-11 | 32 | 0 | 2017-07-30 | 9 | 0 | ||
2017-04-23 | 4 | 0 | 2017-06-12 | 36 | 0 | 2017-07-31 | 8 | 0 | ||
2017-04-24 | 3 | 0 | 2017-06-13 | 28 | 0 | 2017-08-01 | 10 | 0 | ||
2017-04-25 | 1 | 0 | 2017-06-14 | 16 | 0 | 2017-08-02 | 9 | 0 | ||
2017-04-26 | 1 | 0 | 2017-06-15 | 24 | 0 | 2017-08-03 | 6 | 0 | ||
2017-04-27 | 5 | 0 | 2017-06-16 | 18 | 0 | 2017-08-04 | 2 | 0 | ||
2017-04-29 | 6 | 0 | 2017-06-17 | 14 | 0 | 2017-08-05 | 3 | 0 | ||
2017-04-30 | 1 | 0 | 2017-06-18 | 15 | 0 | 2017-08-06 | 3 | 0 | ||
2017-05-01 | 3 | 0 | 2017-06-19 | 21 | 0 | 2017-08-07 | 6 | 0 | ||
2017-05-02 | 3 | 0 | 2017-06-20 | 25 | 0 | 2017-08-08 | 8 | 0 | ||
2017-05-03 | 5 | 0 | 2017-06-21 | 24 | 0 | 2017-08-09 | 2 | 0 | ||
2017-05-04 | 5 | 0 | 2017-06-22 | 29 | 0 | 2017-08-11 | 1 | 0 | ||
2017-05-05 | 2 | 0 | 2017-06-23 | 44 | 0 | 2017-08-12 | 7 | 0 | ||
2017-05-06 | 4 | 0 | 2017-06-24 | 18 | 0 | 2017-08-13 | 11 | 0 | ||
2017-05-07 | 4 | 0 | 2017-06-25 | 12 | 0 | 2017-08-14 | 8 | 0 | ||
2017-05-08 | 4 | 0 | 2017-06-26 | 16 | 0 | 2017-08-15 | 6 | 0 | ||
2017-05-09 | 7 | 0 | 2017-06-27 | 17 | 0 | 2017-08-16 | 9 | 0 | ||
2017-05-10 | 5 | 0 | 2017-06-28 | 20 | 0 | 2017-08-17 | 9 | 144 | ||
2017-05-11 | 3 | 0 | 2017-06-29 | 19 | 0 | 2017-08-18 | 11 | 17 | ||
2017-05-12 | 3 | 0 | 2017-06-30 | 18 | 0 | 2017-08-19 | 11 | 0 | ||
2017-05-13 | 5 | 0 | 2017-07-01 | 14 | 0 | 2017-08-20 | 14 | 0 | ||
2017-05-14 | 2 | 0 | 2017-07-02 | 8 | 0 | 2017-08-21 | 4 | 0 | ||
2017-05-15 | 1 | 0 | 2017-07-03 | 26 | 0 | 2017-08-22 | 9 | 0 | ||
2017-05-16 | 10 | 0 | 2017-07-04 | 18 | 0 | 2017-08-23 | 4 | 0 | ||
2017-05-17 | 8 | 0 | 2017-07-05 | 9 | 0 | 2017-08-24 | 7 | 0 | ||
2017-05-18 | 13 | 0 | 2017-07-06 | 16 | 0 | 2017-08-25 | 7 | 0 | ||
2017-05-19 | 7 | 0 | 2017-07-07 | 20 | 0 | 2017-08-26 | 5 | 0 | ||
2017-05-20 | 4 | 0 | 2017-07-08 | 12 | 0 | 2017-08-27 | 4 | 0 | ||
2017-05-21 | 17 | 0 | 2017-07-09 | 6 | 0 | 2017-08-28 | 4 | 0 | ||
2017-05-22 | 24 | 0 | 2017-07-10 | 20 | 0 | 2017-08-29 | 4 | 0 | ||
2017-05-23 | 6 | 0 | 2017-07-11 | 28 | 0 | 2017-08-30 | 10 | 0 | ||
2017-05-24 | 10 | 0 | 2017-07-12 | 35 | 0 | 2017-08-31 | 9 | 0 | ||
2017-05-25 | 12 | 0 | 2017-07-13 | 45 | 0 | 2017-09-01 | 5 | 0 | ||
2017-05-26 | 9 | 0 | 2017-07-14 | 19 | 0 | 2017-09-02 | 6 | 0 | ||
2017-05-27 | 11 | 0 | 2017-07-15 | 15 | 0 | 2017-09-03 | 3 | 0 | ||
2017-05-28 | 7 | 0 | 2017-07-16 | 40 | 0 | 2017-09-04 | 2 | 0 | ||
2017-05-29 | 3 | 0 | 2017-07-17 | 55 | 0 | 2017-09-05 | 4 | 0 | ||
2017-05-30 | 3 | 0 | 2017-07-18 | 54 | 0 | 2017-09-06 | 8 | 0 | ||
2017-05-31 | 3 | 0 | 2017-07-19 | 52 | 0 | 2017-09-07 | 5 | 0 | ||
2017-06-01 | 5 | 0 | 2017-07-20 | 35 | 0 | 2017-09-08 | 4 | 0 | ||
2017-06-02 | 7 | 0 | 2017-07-21 | 39 | 0 | 2017-09-09 | 6 | 0 | ||
2017-06-03 | 6 | 59 | 2017-07-22 | 11 | 0 | |||||
2017-06-04 | 13 | 0 | 2017-07-23 | 11 | 0 |
Table 7: Terrorist attacks and the number of victims of (killed and injured) during the studied period
Date | Victims | Event 1 |
2017-04-20 | 5 | The attack in Paris—a terrorist attack that took place on 20April, 2017 in Champs Elysées. |
2017-05-22 | 142 | Manchester bombing—a terrorist attack on 22 May, 2017 in the Manchester Arena. The explosion in front of the building after a concert by Ariana Grande at 22:33 local time (BST) caused 23 deaths (including the perpetrator), and approximately 119 people were injured. |
2017-06-03 | 59 | The London attack took place on 03June, 2017 on London Bridge, and continued near the Borough Market. 11 people died in the attack, including three perpetrators. |
2017-06-07 | 62 | Two terrorist attacks in the Iranian capital, Tehran, on 7 June, 2017. The first took place in the Iranian Parliament, the second in the mausoleum of Ayatollah Khomeini. 23 people were killed in the attacks (including 5 perpetrators), and 39 were injured. The Islamic State claimed both. If confirmed, this would have been its first terrorist attack in Iran. |
2017-08-17 | 144 | Several hours after an explosion in Alcanar, near La Rambla, a van drove into the crowd, killing 13 people and injuring more than 120. The police and Spanish authorities confirmed that this was a terrorist attack, which was claimed by IS. |
2017-08-18 | 7 | A second terrorist attack took place in Spain on the Friday morning following the Barcelona attack. Attackers drove a car into a group of people. The vehicle overturned, and the attackers got out and attacked a random passer-by with a knife, wounding him in the face. The police found knives and an axe in the car. The attackers were wearing dummy explosive belts. |
2017-08-18 | 10 | The attack in Turku, a likely terrorist attack that took place on 18 August, 2017 in Finland. |
In Figure 4, we observe a worrying trend whereby terrorist attacks were accompanied by a decreasing occurrence of terms related to terrorism in the analyzed information stream (using data collected from sources related to Al Jazeera). Thus, hypothesis H3 was considered to have been verified.
Figure 4: Occurrences of terms related to terrorism and the number of victims of terrorist acts
1 Descriptions of the acts in terror are based on the Polish Press Agency information.