in READ

A visual story behind words on the web

A visual story behind words on the web (Rajeev Dixit - Hackernoon - Medium)
The steps and data visualization that may be used to analyze and make sense of the data available online.

Every day a large amount of content is being generated continuously on the internet. There are numerous blog posts, social media posts, reviews, ratings, comments, websites, images and videos about people, products, companies, regions, countries etc. This large content generated, targeted and consumed every day, affects public perception. In turn, it can strongly influence entities or events in either positive or a negative way. Given the sensitivity, there is a strong need to understand underlying patterns, trends and attribution in near real-time and use this information for corrective actions. For example, a news analytics service need to detect bias, fake news, analyze sentiments and public perception of various entities and topics of interest. A marketing agency need to understand sources of media bias and how to manage the perception.

Multiple opportunities are available for data capture, extraction, and analytics.

Meta-data extraction from unstructured content

Below are the various ways you can extract and associate meta-data from the captured content.

  • Extract keywords and associated entities from a content source (text, pictures and videos)

  • Extract named entities (people, products, companies and locations) and values (emails, telephone numbers, currencies, percentages, embedded URLs) in the source

  • Extract named entities in a document, disambiguates and cross-links them

  • Extract meta-data like author name, publish date, embedded RSS feeds, type of device used to create data, image resolution, IP addresses, location etc.

  • Extract entities, locations and any other useful data from image and video analysis

  • Detect and if needed, translate language

  • Create a summary of an article

  • Extract topics from an article

  • Classify and cluster content according to pre-defined taxonomies

  • Detect sentiment of a text in terms of polarity (positive or negative opinion) and subjectivity (opinion or supported by facts and figures from legitimate sources)

  • Analyze sentiments towards entities found in the text or other media

  • Analyze sentiments towards each aspect of an entity. For example, for a hotel, aspects are staff, location, nearby places etc.

This data can then be aggregated from its original unstructured content format.

The collected meta-data on each source can be aggregated and analyzed on different dimensions based on duration, sources, clusters, entities, keywords, authors, sentiments, topics, aspects etc. This aggregation can run daily or as often as needed to reflect real-time information. Such aggregations help in discovering patterns and trends which can then be visualized easily and drilled down to narrow details.

Finally, dashboards can be used to combine this information together. Dashboards offer an interactive way to visualize public perception and impact of on-going stories. This forms “A true visual story behind words on the web.”

 

Write a Comment

Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.