Python, a programming language which was conceived in the late 1980s by Guido Van Rossum, has witnessed humongous growth, especially in the recent years due to its ease of use, extensive libraries, and elegant syntax.
Role of Python Today
Apart from its huge applications in the field of web and software development, one of the reasons why Python is being extensively used nowadays is due to its applications in the field of machine learning, where machines are trained to learn from the historical data and act accordingly on some new data. Hence, it finds its use across various domains such as Medicine (to learn and predict diseases), Marketing (to understand and predict user behavior) and now even in Trading (to analyze and build strategies based on financial data).
In this article, we will learn how we can get started with using Python for Trading. The topics include:
- Why use Python for Trading?
- Most used libraries in Python for trading
- How to import data from various sources into Python?
- Plotting the graphs in Python
- Backtesting in Python
- Resources and books for reference
Why use Python for Trading?
Before we understand why Python is a good choice for coding trading strategies, let us first understand the factors we consider most important while trading, which is, identifying trends in price movements, the speed of execution of a trade and using various strategies to generate fruitful trades. There are high chances of human error in these if done manually, that is where Algorithmic Trading helps to automate all of these processes and also allows you to test your strategy on historical/past data also called as backtesting. And this is why Python is used - to code your trading strategies, to predict future price movements and to back-test your strategies on historical data.
Components of Python
Now that you understand the advantages of using Python, let’s understand the components which we will be installing and using before getting started with Python.
- Anaconda – Anaconda is a distribution of Python, which means that it consists of all the tools and libraries required for the execution of our Python code. Downloading and installing libraries and tools individually can be a tedious task, which is why we install Anaconda as it consists of a majority of the Python packages which can be directly loaded to the IDE to use them.
- Spyder IDE - IDE or Integrated Development Environment, is a software platform where we can write and execute our codes. It basically consists of a code editor, to write codes, a compiler or interpreter to convert our code into machine-readable language and a debugger to identify any bugs or errors in your code. Spyder IDE can be used to create multiple projects of Python.
- Jupyter Notebook – Jupyter is an open-source application that allows us to create, write and implement codes in a more interactive format. It can be used to test small chunks of code, whereas we can use the Spyder IDE to implement bigger projects.
- Conda – Conda is a package management system which can be used to install, run and update libraries. Spyder IDE and Jupyter Notebook are a part of the Anaconda distribution; hence they need not be installed separately.
Libraries in Python
Libraries are a collection of reusable modules or functions which can be directly used in our code to perform a certain function without the necessity to write a code for the function. As mentioned earlier, Python has a huge collection of libraries which can be used for various functionalities like computing, machine learning, visualizations, etc.
However, we will talk about the most relevant libraries required for coding trading strategies before actually getting started with Python. We will be required to import financial data, perform numerical analysis, build trading strategies, plot graphs and perform backtesting on data.
For all these functions, here are a few most widely used libraries:
- NumPy – NumPy or NumericalPy, is most used to perform numerical computing on arrays of data. The array is an element which contains a group of elements and we can perform different operations on it using the functions of NumPy.
- Pandas – Pandas is mostly used with DataFrame, which is a tabular or a spreadsheet format where data is stored in rows and columns. Pandas can be used to import data from Excel and CSV files directly into the Python code and perform data analysis and manipulation of the tabular data.
- Matplotlib – Matplotlib is used to plot 2D graphs like bar charts, scatter plots, histograms etc. It consists of various functions to modify the graph according to our requirements too.
- Zipline – It is a Python library for trading applications that power the Quantopian service mentioned above. It is an event-driven system that supports both backtesting and live trading.
How to import data to Python?
This is one of the most important questions which needs to be answered before getting started with Python, as without the data there is nothing you can go ahead with. Financial data is available on various online websites like Yahoo Finance, Google (NASDAQ:GOOGL) Finance which is also called as time series data as it is indexed by time (the timescale can be monthly, weekly, daily, 5 minutely, minutely, etc.). Apart from that, we can directly upload data from Excel sheets too which are in the CSV format, which stores tabular values and can be imported to other files and codes. Now, we will learn how to import both time series data and data from CSV files through the examples given below.
Here’s an example of how to import time series data directly from Yahoo Finance:
Output
Now let’s understand this code.
Line 1 As mentioned earlier, pandas is a library which can be used to import data. We, therefore, load the pandas library into our code so that we can use it for the same. We do so in the first line using ‘import pandas as pd’, the ‘as pd’ refers to the fact that we can address the pandas library as pd in our code. It will be just a short form used for pandas.
Line 2 This line is added for the latest version of the pandas-data reader to work.
Line 3 Here we are importing the function ‘data’ from the pandas module ‘pandas_datareader’. ‘data’ will store the financial data as a DataFrame which is it will store it in a tabular format.
Line 4 Here we import the time series data of Apple (NASDAQ:AAPL), denoted as AAPL, for 1 year from 1st January 2017 to 1st January 2018 from Yahoo Finance using the function: df.get_data_yahoo(‘symbol of the stock’, ‘start date’, ‘end date’).
Line 5 df.head() is used to print or display the first ‘n’ rows of the data which we imported. The default value which is ‘5’, is considered when we leave the field blank.
Now, let us see another example where we can import data from an existing CSV file:
import pandas as pd
file_1 = pd.read_csv(‘C:/Data/filename.csv’)
Here, we are importing ‘filename.csv’ and storing it in ‘file’ using ‘pd.read_csv()’, within which we have specified the path of filename.csv
How to plot graphs in Python?
Line charts, bar graphs, candlestick charts are heavily used in representing trading data which requires us to explore the ‘Data Visualization’ feature of Python while getting started with Python for trading. The data which we load into our code can be converted into graphical representations like line graphs, scatter plots etc. Just as we used the ‘pandas’ library earlier to import data, we will use ‘matplotlib.pyplot’ to plot the data in 2-D. We will understand this in more detail using the below examples.
Creating a line graph in Python:
import matplotlib.pyplot as plt
#here 'df' signifies the financial data which we imported earlier from Yahoo Finance
plt.plot(df['Adj Close'])
#We can give a title to our graph which in this case is 'Plotting Data'
plt.title('Plotting data')
#Used to label x-axis
plt.xlabel('Time')
#Used to label y-axis plt.ylabel('Price') plt.show()
Output:
This displays the Line Graph for the Adj Close column in Data.
Creating a scatter plot in Python:
import matplotlib.pyplot as plt
import numpy as np
n = 1000
#Here np.random.rand creates two NumPy arrays x and y each of which have 1000 random values between 0 and 1.
x = np.random.rand(n)
y = np.random.rand(n)
plt.scatter(x, y, marker=’o’)
plt.show()
Output:
Creating a Histogram in Python:
import matplotlib.pyplot as plt
#Values for Histogram frame = [21,54,32,67,89,42,65,78,54,54,21,67,42,78,21,54,54,89,78,54,42,78,21,67]
#This will create the Histogram with the given parameters where histogram type is ‘Bar’ and the width of each bar will be 0.6
plt.hist(frame, bins=20, histtype='bar', rwidth=0.6)
plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.title('Histogram’')
plt.show()
Output:
Backtesting in Python
Once your strategy is in place, you need to ensure its efficiency so that we can use it for trading purposes. For this, we can conduct backtesting, which is a process in which the strategy will be tested on historical data through which we will be able to see how accurate the results were in predicting the data. Few of the Python trading libraries for backtesting are as follows:
- PyAlgoTrade – This library can be used to perform event-driven backtesting, which is performing backtesting on data which is based on the occurrence of some events, collected through various sources like Yahoo Finance, Google Finance as well as data imported through CSV files. It also allows for paper-trading, in which you can test your strategies in a simulated environment without risking real money.
- Pybacktest – This can be used for vectorized backtesting, where you can perform operations on the entire data frame at once, unlike event-driven where new data gets generated based on events. Pybacktest can be used along with Pandas to specify trading strategies.
. Books and References
- A Byte of Python
- A Beginner’s Python Tutorial
- Python Programming for the Absolute Beginner, 3rd Edition
- Python for Data Analysis, By Wes McKinney
Conclusion
Python is widely used in the field of machine learning and now trading. In this article, we have covered all that would be required for getting started with Python. It is important to learn it so that you can code your own trading strategies and test them. Its extensive libraries and modules smoothen the process of creating machine learning algorithms without the need to write huge codes. In the next blog, we will understand how to plot different indicators such as moving averages, RSI, MACD, Bollinger Bands etc in Python.
Different execution strategies in Python are covered in the Executive Programme in Algorithmic Trading (EPAT) course conducted by QuantInsti.