VectorBT Pro - MultiAsset Data Acquisition
In this tutorial, we will talk about the acquisition of M1 (1 minute) data for various forex currency pairs from dukascopy
(a free data provider).The acquired data will be saved to a .hdf
file for use in a VectorBT Pro Backtesting project. We will use a nodeJS package called Dukascopy-node
to download M1 (1 minute) historical data for the following currency pairs.
You can find the installation instructions and other details for this node package here: https://github.com/Leo4815162342/dukascopy-node
Multi Asset Market Data Acquistion with DukaScopy
npx dukascopy-node -i gbpaud -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpaud -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i eurgbp -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i eurgbp -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpjpy -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpjpy -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i usdjpy -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i usdjpy -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i usdcad -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i usdcad -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i eurusd -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i eurusd -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i audusd -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i audusd -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpusd -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpusd -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
The acquired bid
and ask
files need to be averaged to get normalized 1 min data and finally saved into a hdf
(.h5) file. The code for these processes is as follows:
def read_bid_ask_data(ask_file : str, bid_file : str, set_time_index = False) -> pd.DataFrame:
"""Reads and combines the bid & ask csv files of duksascopy historical market data, into a single OHLCV dataframe."""
df_ask = pd.read_csv(ask_file, infer_datetime_format = True)
df_bid = pd.read_csv(bid_file, infer_datetime_format = True)
merged_df = pd.merge(df_bid, df_ask, on='timestamp', suffixes=('_ask', '_bid'))
merged_df['open'] = (merged_df['open_ask'] + merged_df['open_bid']) / 2.0
merged_df['close']= (merged_df['close_ask'] + merged_df['close_bid']) / 2.0
merged_df['high'] = merged_df[['high_ask','high_bid']].max(axis=1)
merged_df['low'] = merged_df[['low_ask','low_bid']].max(axis=1)
merged_df['volume'] = merged_df['volume_bid'] + merged_df['volume_ask']
merged_df = merged_df[merged_df["volume"] > 0.0].reset_index()
## Case when we downloaded Dukascopy historical market data from node package: dukascopy-node
merged_df['time'] = pd.to_datetime(merged_df['timestamp'], unit = 'ms')
merged_df.drop(columns = ["timestamp"], inplace = True)
final_cols = ['time','open','high','low','close','volume','volume_bid','volume_ask']
if set_time_index:
merged_df["time"] = pd.to_datetime(merged_df["time"],format='%d.%m.%Y %H:%M:%S')
merged_df = merged_df.set_index("time")
return merged_df[final_cols[1:]]
return merged_df[final_cols].reset_index(drop=True)
## Specify FileNames of Bid / Ask data downloaded from DukaScopy
bid_ask_files = {
"GBPUSD" : {"Bid": "gbpusd-m1-bid-2019-01-01-2023-01-13.csv",
"Ask": "gbpusd-m1-ask-2019-01-01-2023-01-13.csv"},
"EURUSD" : {"Bid": "eurusd-m1-bid-2019-01-01-2023-01-13.csv",
"Ask": "eurusd-m1-ask-2019-01-01-2023-01-13.csv"},
"AUDUSD" : {"Bid": "audusd-m1-bid-2019-01-01-2023-01-13.csv",
"Ask": "audusd-m1-ask-2019-01-01-2023-01-13.csv"},
"USDCAD" : {"Bid": "usdcad-m1-bid-2019-01-01-2023-01-13.csv",
"Ask": "usdcad-m1-ask-2019-01-01-2023-01-13.csv"},
"USDJPY" : {"Bid": "usdjpy-m1-bid-2019-01-01-2023-01-13.csv",
"Ask": "usdjpy-m1-ask-2019-01-01-2023-01-13.csv"},
"GBPJPY" : {"Bid": "gbpjpy-m1-bid-2019-01-01-2023-01-13.csv",
"Ask": "gbpjpy-m1-ask-2019-01-01-2023-01-13.csv"},
"EURGBP" : {"Bid": "eurgbp-m1-bid-2019-01-01-2023-01-16.csv",
"Ask": "eurgbp-m1-ask-2019-01-01-2023-01-16.csv"},
"GBPAUD" : {"Bid": "gbpaud-m1-bid-2019-01-01-2023-01-16.csv",
"Ask": "gbpaud-m1-ask-2019-01-01-2023-01-16.csv"}
}
## Write everything into one single HDF5 file indexed by keys for the various symbols
source_folder_path = "/Users/John.Doe/Documents/Dukascopy_Historical_Data/"
output_file_path = "/Users/John.Doe/Documents/qqblog_vbt_pro_tutorials/data/MultiAsset_OHLCV_3Y_m1.h5"
for symbol in bid_ask_files.keys():
print(f'\n{symbol}')
ask_csv_file = source_folder_path + bid_ask_files[symbol]["Ask"]
bid_csv_file = source_folder_path + bid_ask_files[symbol]["Bid"]
print("ASK File PATH:",ask_csv_file,'\nBID File PATH:',bid_csv_file)
df = read_bid_ask_data(ask_csv_file, bid_csv_file, set_time_index = True)
df.to_hdf(output_file_path, key=symbol)
Output
GBPUSD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpusd-m1-ask-2019-01-01-2023-01-13.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpusd-m1-bid-2019-01-01-2023-01-13.csv
EURUSD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurusd-m1-ask-2019-01-01-2023-01-13.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurusd-m1-bid-2019-01-01-2023-01-13.csv
AUDUSD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/audusd-m1-ask-2019-01-01-2023-01-13.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/audusd-m1-bid-2019-01-01-2023-01-13.csv
USDCAD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdcad-m1-ask-2019-01-01-2023-01-13.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdcad-m1-bid-2019-01-01-2023-01-13.csv
USDJPY
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdjpy-m1-ask-2019-01-01-2023-01-13.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdjpy-m1-bid-2019-01-01-2023-01-13.csv
GBPJPY
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpjpy-m1-ask-2019-01-01-2023-01-13.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpjpy-m1-bid-2019-01-01-2023-01-13.csv
EURGBP
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurgbp-m1-ask-2019-01-01-2023-01-16.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurgbp-m1-bid-2019-01-01-2023-01-16.csv
GBPAUD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpaud-m1-ask-2019-01-01-2023-01-16.csv
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpaud-m1-bid-2019-01-01-2023-01-16.csv
Binance Crypto Data
For the crypto fans VectorBT directly provides a wrapper to fetch data from Binance
## Acquire multi-asset 1m crypto data from Binance
data = vbt.BinanceData.fetch(
["BTCUSDT", "ETHUSDT", "BNBUSDT", "XRPUSDT", "ADAUSDT"],
start="2019-01-01 UTC",
end="2022-12-01 UTC",
timeframe="1m"
)
## Save acquired data locally for persistance
data.to_hdf("/Users/john.doe/Documents/vbtpro_tuts_private/data/Binance_MultiAsset_OHLCV_3Y_m1.h5")