### Display First Few Rows of Stock Data (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Shows the first few rows of the DataFrame to get a quick look at the data structure and content. This is a common first step in data exploration. ```python df.head() ``` -------------------------------- ### Split and Normalize Data for Model Training Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Prepares the data for machine learning model training by selecting features and target variables, normalizing the features using StandardScaler, and then splitting the data into training and validation sets. A test size of 10% and a random state for reproducibility are used. ```python from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split features = df[['open-close', 'low-high', 'is_quarter_end']] target = df['target'] scaler = StandardScaler() features = scaler.fit_transform(features) X_train, X_valid, Y_train, Y_valid = train_test_split( features, target, test_size=0.1, random_state=2022) print(X_train.shape, X_valid.shape) ``` -------------------------------- ### Initialize Correlation Heatmap Figure (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Sets up a figure for plotting a heatmap to visualize correlations between features. This is a standard step before calculating and displaying feature correlations. ```python plt.figure(figsize=(10, 10)) # As our concern is with the highly ``` -------------------------------- ### Importing Libraries for Machine Learning in Python Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Imports necessary Python libraries for data manipulation, visualization, and machine learning tasks. Includes Pandas for dataframes, NumPy for numerical operations, Matplotlib/Seaborn for plotting, Sklearn for ML algorithms, and XGBoost for gradient boosting. ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sb from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from xgboost import XGBClassifier from sklearn import metrics import warnings warnings.filterwarnings('ignore') ``` -------------------------------- ### Plot Distribution of Stock Features (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Generates distribution plots for 'Open', 'High', 'Low', 'Close', and 'Volume' features. This helps understand the distribution and identify potential skewness or multi-modality. ```python features = ['Open', 'High', 'Low', 'Close', 'Volume'] plt.subplots(figsize=(20,10)) for i, col in enumerate(features): plt.subplot(2,3,i+1) sb.distplot(df[col]) plt.show() ``` -------------------------------- ### Plot Pie Chart of Target Variable Distribution (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Visualizes the distribution of the 'target' variable using a pie chart. This helps assess whether the target variable is balanced, which is important for model training. ```python plt.pie(df['target'].value_counts().values, labels=[0, 1], autopct='%1.1f%%') plt.show() ``` -------------------------------- ### Describing Dataset Statistics in Python Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Generates descriptive statistics for the numerical columns in the Tesla stock price DataFrame. This includes count, mean, standard deviation, minimum, maximum, and quartile values, offering insights into data distribution. ```python df.describe() ``` -------------------------------- ### Displaying Dataset Information in Python Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Prints a concise summary of the Tesla stock price DataFrame, including the index dtype and columns, non-null values, and memory usage. This is useful for understanding data types and identifying missing values. ```python df.info() ``` -------------------------------- ### Loading Tesla Stock Dataset in Python Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Loads the Tesla stock price dataset from a CSV file into a Pandas DataFrame. It then displays the first five rows to give an initial look at the data, which includes OHLC prices and volume. ```python df = pd.read_csv('/content/Tesla.csv') df.head() ``` -------------------------------- ### Create New Features and Target Variable (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Calculates new features like 'open-close' difference and 'low-high' difference. It also creates a 'target' variable indicating if the next day's closing price is higher than the current day's. ```python df['open-close'] = df['Open'] - df['Close'] df['low-high'] = df['Low'] - df['High'] df['target'] = np.where(df['Close'].shift(-1) > df['Close'], 1, 0) ``` -------------------------------- ### Plot Confusion Matrix for Validation Data Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Visualizes the confusion matrix for the first trained model (Logistic Regression) on the validation dataset. This provides a detailed breakdown of prediction accuracy, including true positives, true negatives, false positives, and false negatives. ```python from sklearn.metrics import ConfusionMatrixDisplay import matplotlib.pyplot as plt ConfusionMatrixDisplay.from_estimator(models[0], X_valid, Y_valid) plt.show() ``` -------------------------------- ### Visualize Feature Correlation Heatmap Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Generates a heatmap to visualize correlations between features in the dataset. It focuses on correlations greater than 0.9 to identify highly correlated features. This helps in understanding feature relationships before model development. ```python import seaborn as sb import matplotlib.pyplot as plt sb.heatmap(df.drop('Date', axis=1).corr() > 0.9, annot=True, cbar=False) plt.show() ``` -------------------------------- ### Check for Matching 'Close' and 'Adj Close' Rows (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Compares the 'Close' and 'Adj Close' columns to determine if they contain identical data across all rows. This helps identify redundant columns. ```python df[df['Close'] == df['Adj Close']].shape ``` -------------------------------- ### Create 'is_quarter_end' Feature (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Adds a binary feature 'is_quarter_end' which is 1 if the month is a multiple of 3 (indicating a quarter end), and 0 otherwise. This feature can capture seasonal effects related to quarterly reporting. ```python df['is_quarter_end'] = np.where(df['month']%3==0,1,0) df.head() ``` -------------------------------- ### Plot Tesla Closing Stock Price Over Time (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Visualizes the closing price of Tesla stocks over a period to identify trends. This plot helps in understanding the stock's historical performance. ```python plt.figure(figsize=(15,5)) plt.plot(df['Close']) plt.title('Tesla Close price.', fontsize=15) plt.ylabel('Price in dollars.') plt.show() ``` -------------------------------- ### Plot Yearly Average Stock Prices (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Generates bar plots showing the average 'Open', 'High', 'Low', and 'Close' prices for each year. This visualization helps in observing long-term trends in stock prices. ```python data_grouped = df.drop('Date', axis=1).groupby('year').mean() plt.subplots(figsize=(20,10)) for i, col in enumerate(['Open', 'High', 'Low', 'Close']): plt.subplot(2,2,i+1) data_grouped[col].plot.bar() plt.show() # This code is modified by Susobhan Akhuli ``` -------------------------------- ### Train and Evaluate Machine Learning Models Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Trains three different machine learning models (Logistic Regression, SVC, XGBClassifier) on the prepared data and evaluates their performance using the ROC-AUC score on both training and validation sets. This helps in comparing model effectiveness and identifying potential overfitting. ```python from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from xgboost import XGBClassifier from sklearn import metrics models = [LogisticRegression(), SVC( kernel='poly', probability=True), XGBClassifier()] for i in range(3): models[i].fit(X_train, Y_train) print(f'{models[i]} : ') print('Training Accuracy : ', metrics.roc_auc_score( Y_train, models[i].predict_proba(X_train)[:,1])) print('Validation Accuracy : ', metrics.roc_auc_score( Y_valid, models[i].predict_proba(X_valid)[:,1])) print() ``` -------------------------------- ### Plot Boxplots of Stock Features (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Generates boxplots for 'Open', 'High', 'Low', 'Close', and 'Volume' features to visualize data distribution and detect outliers. This helps in identifying data points that deviate significantly from the rest. ```python plt.subplots(figsize=(20,10)) for i, col in enumerate(features): plt.subplot(2,3,i+1) sb.boxplot(df[col]) plt.show() ``` -------------------------------- ### Group Data by Quarter End and Calculate Mean (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Groups the DataFrame by the 'is_quarter_end' feature and calculates the mean for other columns. This analysis helps compare stock performance metrics during quarter-end months versus non-quarter-end months. ```python df.drop('Date', axis=1).groupby('is_quarter_end').mean() ``` -------------------------------- ### Displaying Dataset Shape in Python Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Calculates and displays the number of rows and columns in the Tesla stock price DataFrame. This provides a quick overview of the dataset's dimensions, indicating the total number of records and features available. ```python df.shape ``` -------------------------------- ### Check for Null Values in DataFrame (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Calculates and displays the sum of null values for each column in the DataFrame. This is crucial for data cleaning and preprocessing. ```python df.isnull().sum() ``` -------------------------------- ### Extract Day, Month, Year from Date Column (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Derives 'day', 'month', and 'year' features from the 'Date' column by splitting the date string. This is a common feature engineering step for time-series data. ```python splitted = df['Date'].str.split('/', expand=True) df['day'] = splitted[1].astype('int') df['month'] = splitted[0].astype('int') df['year'] = splitted[2].astype('int') df.head() ``` -------------------------------- ### Drop Redundant 'Adj Close' Column (Python) Source: https://www.geeksforgeeks.org/machine-learning/stock-price-prediction-using-machine-learning-in-python/index Removes the 'Adj Close' column from the DataFrame as it was found to be redundant with the 'Close' column. This simplifies the dataset for further analysis. ```python df = df.drop(['Adj Close'], axis=1) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.