Inspiration

I have always enjoyed data aggregation, data processing, and stock trading, but getting access to quality data has always expensive, not readily available, and decentralized. A lot of products like Bloomberg cost thousands of dollars a month, and for someone like me who doesn’t need Bloomberg for their job, I wanted to make something that provided myself with the a large majority of the data I needed to begin building my own Trading Algorithms and eventually backtest them.

What it does

The program utilizes the API structure of url’s in order to request data on stocks. It then takes the webpages, scrapes them, aggregates the data, cleans the data, and then writes the data to a data folder as a .csv. I can then run iterators (such as the Open, High, Low, Close, RSI, IncomeStatement) on the data to fetch data for specific days and assets (ie. NASDAQ: AAPL).

How I built it

I built it using Selenium and Chromedriver to automate the dynamic web-scraping. I used normal python requests in order to collect crumbs needed for downloads. I built the pipeline elements using standard python iterators on top of the .csv data folder.

Challenges I ran into

  1. Figuring out how the website url APIs worked and how to reduce the amount of Selenium needed to get the data.
  2. Figuring out how to handle proxy failures.
  3. Building the iterators so that complex iterators can be built on top of them as well as making them usable by an asset.

Accomplishments that I'm proud of

Being able to aggregate large amounts of financial data at no expense to myself in order to further my own learning.

What I learned

How to use UserAgent, Selenium, and requests. How to go about thinking about how to design a modular data Pipeline.

What's next for StockPipe

Fix the iterators to be able to fetch new data when available. For the data folder downloaded a, change fetches to only get new data and not query for all data from the sources. Start working on constructing the other aspects of the Pipeline (Managing Collections of assets etc) Cleanup functions & better manage my own imports of sub packages/files.

Share this project:

Updates