Unlocking Insights from Financial News: Building a Structured Newsfeed with Python, SpaCy, and Streamlit
In today‘s fast-paced financial world, staying on top of the latest news and trends is crucial for making informed investment decisions. However, the sheer volume of unstructured news data available can be overwhelming and time-consuming to process manually. This is where Natural Language Processing (NLP) techniques, such as Named Entity Recognition (NER) and Named Entity Linking (NEL), come into play. By leveraging these powerful tools, we can extract valuable insights from financial news articles and create structured newsfeeds that provide actionable information to investors and analysts.
In this blog post, we‘ll walk through the process of building a structured financial newsfeed using Python, SpaCy, and Streamlit. We‘ll harness the power of NLP to identify relevant entities, such as company names, from news headlines and link them to their corresponding stock symbols. By the end of this guide, you‘ll have a fully functional web application that displays a curated newsfeed along with real-time stock market data.
The Toolbox: Python, SpaCy, and Streamlit
Before we dive into the implementation details, let‘s take a moment to introduce the key tools and technologies we‘ll be using:
-
Python: Python is a versatile and beginner-friendly programming language widely used in data science and NLP projects. Its extensive ecosystem of libraries and frameworks makes it an ideal choice for our structured financial newsfeed.
-
SpaCy: SpaCy is a powerful open-source NLP library that offers state-of-the-art performance and accuracy. It provides a suite of pre-trained language models and a streamlined API for tasks like tokenization, part-of-speech tagging, dependency parsing, and named entity recognition.
-
Streamlit: Streamlit is a Python library that simplifies the process of building interactive web applications. With just a few lines of code, we can create an intuitive user interface for our financial newsfeed, complete with data visualizations and user input functionality.
-
RSS Feeds: We‘ll be using RSS feeds as our primary source of financial news data. RSS (Really Simple Syndication) is a standardized format for delivering regularly updated content, such as news articles, blog posts, or podcasts.
-
Pandas: Pandas is a popular data manipulation library in Python. We‘ll leverage its powerful data structures and functions to process and analyze the extracted financial data.
-
yfinance: yfinance is a Python library that provides a convenient interface for retrieving real-time stock market data from Yahoo Finance. We‘ll use it to fetch the latest stock prices and other relevant information for the companies mentioned in our newsfeed.
With our toolbox ready, let‘s embark on the journey of building our structured financial newsfeed!
Step 1: Setting Up the Development Environment
To get started, we need to set up our development environment. Follow these steps to ensure you have all the necessary dependencies installed:
-
Install Python: Make sure you have Python installed on your system. You can download the latest version from the official Python website (https://www.python.org).
-
Create a virtual environment: It‘s a good practice to create a virtual environment for each Python project to keep the dependencies isolated. Open your terminal and navigate to the project directory, then run the following commands:
python -m venv myenv
source myenv/bin/activate
- Install the required libraries: With the virtual environment activated, install the necessary libraries by running the following command:
pip install spacy pandas streamlit yfinance beautifulsoup4 requests
- Download the SpaCy language model: SpaCy requires a pre-trained language model for NLP tasks. Download the English model by running:
python -m spacy download en_core_web_sm
With our development environment set up, we‘re ready to start coding!
Step 2: Extracting Financial News Headlines
The first step in building our structured financial newsfeed is to extract the news headlines from reliable sources. We‘ll be using RSS feeds from reputable financial news websites to ensure the quality and relevance of the data.
For this example, we‘ll use the RSS feed from the Economic Times (https://economictimes.indiatimes.com/markets/stocks/rssfeeds/2146842.cms). Feel free to explore and add other RSS feeds that align with your investment interests.
To extract the headlines from the RSS feed, we‘ll use the requests
library to send a GET request and the BeautifulSoup
library to parse the XML response:
import requests
from bs4 import BeautifulSoup
def get_headlines(rss_url):
response = requests.get(rss_url)
soup = BeautifulSoup(response.content, features=‘xml‘)
headlines = soup.findAll(‘title‘)
return [headline.text for headline in headlines]
rss_url = ‘https://economictimes.indiatimes.com/markets/stocks/rssfeeds/2146842.cms‘
headlines = get_headlines(rss_url)
The get_headlines
function takes the RSS feed URL as input, sends a GET request, and parses the XML response using BeautifulSoup. It then extracts the text content of all the <title>
tags and returns them as a list of headlines.
With the headlines extracted, we‘re ready to move on to the next step: performing named entity recognition.
Step 3: Performing Named Entity Recognition (NER)
Named Entity Recognition is an NLP technique that identifies and classifies named entities, such as person names, organizations, locations, and dates, in unstructured text. In our case, we‘re interested in extracting company names from the financial news headlines.
SpaCy provides a powerful and efficient NER model out of the box. Let‘s load the pre-trained English model and apply it to our headlines:
import spacy
nlp = spacy.load(‘en_core_web_sm‘)
def extract_entities(headlines):
entities = []
for headline in headlines:
doc = nlp(headline)
for entity in doc.ents:
if entity.label_ == ‘ORG‘:
entities.append(entity.text)
return entities
entities = extract_entities(headlines)
In the extract_entities
function, we iterate over each headline and process it using the SpaCy NLP pipeline. The nlp
object is created by loading the pre-trained ‘en_core_web_sm‘ model. We then iterate over the recognized entities in each headline and filter for entities with the label ‘ORG‘, which represents organizations or companies. The extracted company names are appended to the entities
list.
Step 4: Named Entity Linking (NEL)
Now that we have extracted the company names from the headlines, the next step is to link them to their corresponding stock symbols. This process is known as Named Entity Linking (NEL) and involves mapping the extracted entities to entries in a knowledge base.
For our financial newsfeed, we‘ll use a CSV file containing the list of Nifty500 companies and their stock symbols as our knowledge base. You can download the CSV file from the official National Stock Exchange of India website (https://www1.nseindia.com/products/content/equities/indices/nifty_500.htm).
Here‘s how we can perform NEL using Pandas:
import pandas as pd
import yfinance as yf
def get_stock_data(entities):
nifty500_df = pd.read_csv(‘nifty500.csv‘)
stock_data = []
for entity in entities:
match = nifty500_df[nifty500_df[‘Company Name‘].str.contains(entity, case=False)]
if not match.empty:
symbol = match[‘Symbol‘].values[0]
stock_info = yf.Ticker(symbol + ‘.NS‘).info
stock_data.append({
‘Company‘: match[‘Company Name‘].values[0],
‘Symbol‘: symbol,
‘CurrentPrice‘: stock_info[‘currentPrice‘],
‘DayHigh‘: stock_info[‘dayHigh‘],
‘DayLow‘: stock_info[‘dayLow‘],
‘ForwardPE‘: stock_info[‘forwardPE‘],
‘DividendYield‘: stock_info[‘dividendYield‘]
})
return pd.DataFrame(stock_data)
stock_data = get_stock_data(entities)
In the get_stock_data
function, we load the Nifty500 companies CSV file into a Pandas DataFrame. We then iterate over the extracted company names (entities
) and search for a match in the ‘Company Name‘ column of the DataFrame. If a match is found, we extract the corresponding stock symbol.
Using the yfinance
library, we fetch the real-time stock information for each symbol by appending ‘.NS‘ to indicate the National Stock Exchange of India. We create a dictionary containing the company name, stock symbol, current price, day high, day low, forward P/E ratio, and dividend yield. The dictionary is appended to the stock_data
list.
Finally, we convert the stock_data
list into a Pandas DataFrame, which will serve as the structured data for our financial newsfeed.
Step 5: Building the Streamlit Web Application
With the structured financial data ready, it‘s time to build an interactive web application using Streamlit. Streamlit allows us to create a user-friendly interface with just a few lines of Python code.
Create a new Python file named app.py
and add the following code:
import streamlit as st
import pandas as pd
# Add a title and description to the app
st.title(‘Structured Financial Newsfeed‘)
st.write(‘Stay up-to-date with the latest financial news and stock data.‘)
# Display the extracted headlines
st.subheader(‘Latest Financial News‘)
st.write(headlines)
# Display the structured stock data
st.subheader(‘Stock Data‘)
st.write(stock_data)
In this code, we import the necessary libraries and use Streamlit‘s st
module to add a title and description to our app. We then display the extracted headlines under the ‘Latest Financial News‘ subheader using st.write()
.
Next, we display the structured stock data DataFrame under the ‘Stock Data‘ subheader. Streamlit automatically renders the DataFrame in a user-friendly tabular format.
To run the Streamlit app, open your terminal, navigate to the project directory, and run the following command:
streamlit run app.py
This will start the Streamlit server and open the app in your default web browser. You should see the latest financial news headlines and the corresponding stock data displayed in a clean and interactive interface.
Conclusion
In this blog post, we explored how to build a structured financial newsfeed using Python, SpaCy, and Streamlit. By leveraging the power of NLP techniques like Named Entity Recognition and Named Entity Linking, we were able to extract valuable insights from unstructured news data and create a curated newsfeed with real-time stock information.
The possibilities for extending and customizing this project are endless. You can incorporate additional data sources, implement more advanced NLP models, or integrate trading algorithms to generate buy and sell signals based on the extracted insights.
Remember, the key to success in the financial markets is staying informed and making data-driven decisions. By harnessing the power of NLP and building structured newsfeeds, you can gain a competitive edge and stay ahead of the curve.
I hope this blog post has provided you with a solid foundation for working with Python, SpaCy, and Streamlit in the context of financial news analysis. Feel free to explore further, experiment with different ideas, and adapt the code to suit your specific needs.
Happy coding and happy investing!