Pose Recognition using OpenCV and MediaPipe (Final Version) - ECE-180D-WS-2023/Knowledge-Base-Wiki GitHub Wiki

Pose Recognition using OpenCV and MediaPipe

Introduction

What is Pose Recognition?

Human Pose Recognition is a computer vision task that involves identifying different landmarks on a human body. The goal is to locate these landmarks and estimate the pose of an individual in a frame or sequence of frames. Pose recognition is extremely useful today, as it has many practical applications. For instance, in self-driving cars, the car must understand human behavior by detecting whether a person is standing still on the sidewalk or if that person is running in the direction of the vehicle. In the former case, the car will continue its trajectory as normal. However, the vehicle might consider slowing down or stopping for the latter. Security monitoring relies on pose estimation to detect unusual activity in CCTVs. Healthcare is another example that also employs it to monitor rehabilitation services. The list goes on for the numerous ways pose recognition improves technologies and impacts people's daily lives.

OpenCV and MediaPipe

OpenCV is a Python library designed to solve computer vision problems. It provides some image and video processing tools, such as object detection. For example, it can follow a blue object.

MediaPipe is an open-source framework developed by Google which offers cross-platform, customizable Machine Learning solutions for live and streaming media. It provides several pre-trained models that can recognize face and body landmarks and track objects.

MediaPipe's Limitations

Mediapipe is a powerful and versatile tool for real-time body pose estimation and tracking, but like any technology, it does have its limitations. One of the most significant limitations of Mediapipe Pose is its susceptibility to poor lighting conditions and occlusion. When the lighting is too dim or the body parts are obstructed by clothing or objects, Mediapipe Pose may struggle to accurately detect and track the pose.

This limitation arises from the fact that Mediapipe Pose relies on the detection and tracking of individual body landmarks to estimate the pose. If one or more landmarks are not visible or obscured, the model may fail to detect the pose accurately. This can be especially problematic in situations where the body is not fully visible or is moving in complex ways that make it difficult to track all the landmarks.

Despite these limitations, there are ways to mitigate the impact of poor lighting and occlusion on Mediapipe Pose. For example, using additional sensors such as depth cameras or infrared sensors can provide additional information to the model and improve its accuracy in low-light conditions. Additionally, we can find work around the limitations by trading off accuracy for practicality.

Tutorial 1: Installation and Pose Classification Examples

This tutorial will present how OpenCV and MediaPipe can be used to recognize different poses done with a body and face.

Screenshot_2023-02-08_at_6 26 04_PM

Steps to Follow

  1. Setting up OpenCV and MediaPipe
  2. Collecting Data, Training, and Evaluating the ML model
  3. Classify the poses

1. Setting up OpenCV and MediaPipe

First, we will need to install the dependencies:

!pip install mediapipe opencv-python pandas scikit-learn

(For M1/M2: replace mediapipe with mediapipe-silicon)

The python libraries pandas and scikit-learn will later be used to train the data using ML.

import mediapipe as mp
import cv2
mp_drawing = mp.solutions.drawing_utils
mp_holistic = mp.solutions.holistic

MediaPipe’s mp_drawing helps draw the landmarks, while mp_holistic gives the landmark detections that will be needed to detect the pose.

import csv
import os
import numpy as np

CSV is used to work with lists of data.

OS is used to work with files.

Numpy is used to work with arrays, lists, and matrices.

2. Collecting Data, Training, and Evaluating the ML model

Collecting Data

num_coords = len(results.pose_landmarks.landmark)+len(results.face_landmarks.landmark)
landmarks = ['class']
for val in range(1, num_coords+1):
    landmarks += ['x{}'.format(val), 'y{}'.format(val), 'z{}'.format(val), 'v{}'.format(val)]
with open('coords.csv', mode='w', newline='') as f:
    csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(landmarks)

These blocks of code create a new CSV file, “coords.csv”, which has 2005 columns:

  • 1 column for the class name.
  • 501*4 columns for the 501 landmarks of body (pose) and face.
    • We multiply by 4 for the x, y, z coordinates and visibility (which means if a landmark is present in the frame or not: can take 0 or 1 value).

For those who are interested in knowing where those 501 landmarks come from, here is a breakdown:

  • 468 come from MediaPipe Face Mesh (Face Landmarks).

    face_mesh_ar_effects

  • 33 come from MediaPipe Pose (Pose Landmarks).

    pose_tracking_full_body_landmarks

“coords.csv” will allow to store all the landmarks collected and will be used as input to train the ML model.

Repeat n times

n represents the number of classes/poses your model will be able to recognize.

class_name = "Wakanda Forever" # Insert custom name here

This represents the class you will be collecting data points for.

cap = cv2.VideoCapture(0)
# Initiate holistic model
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    while cap.isOpened():
        ret, frame = cap.read()

We capture the frames from our video feed in the block of code above, using OpenCV. We set the thresholds of our mp_holistic solution to 0.5. This means that MediaPipe will consider a landmark tracking successful when it is at least 50% sure.

# Recolor Feed
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image.flags.writeable = False        

# Make Detections
results = holistic.process(image)

# Recolor image back to BGR for rendering
image.flags.writeable = True   
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

We then use OpenCV to convert the colors of the frame from BGR (Blue, Green, Red) to RGB (Red, Green, Blue), as OpenCV works with BGR color space while MediaPipe works with RGB. We also set the image’s writeable property to false to save some memory before processing the image with MediaPipe’s holistic. We store the output into results. Once done, we repeat the steps in inverse order because we need to revert back to using OpenCV.

# 1. Draw face landmarks
mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION, 
                         mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                         mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                         )

# 2. Right hand
mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                         mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                         mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                         )

# 3. Left Hand
mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                         mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                         mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                         )

# 4. Pose Detections
mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                         mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                         mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                         )

The next step is to draw the face, hands, and pose (here pose represents the main landmarks on your body, like shoulder and hips). This is done using mp_drawing, which we defined earlier. It allows us to see all the tessellations of our body on the live feed.

# Extract Pose landmarks
pose = results.pose_landmarks.landmark
pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in pose]).flatten())

# Extract Face landmarks
face = results.face_landmarks.landmark
face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in face]).flatten())

# Concate rows
row = pose_row+face_row

# Append class name 
row.insert(0, class_name)

# Export to CSV
with open('coords.csv', mode='a', newline='') as f:
    csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(row) 

The final step is to extract those same landmarks, store them all in a continuous array, and write them to the “coords.csv” file.

📝 To close the window, press “q”.

📝 The longer it takes you to close the window, the more data you can collect. It will take longer to train the data, but you will get more accurate results. 15 seconds of capturing will give acceptable results for the sake of this project.

📝 Check Code 1 (Appendix) for the full code.

2. Training the ML model

import pandas as pd
from sklearn.model_selection import train_test_split

The train_test_split is used to split data into training data and testing data.

df = pd.read_csv('coords.csv')
X = df.drop('class', axis=1)
y = df['class']

Here X represents all the features from “coords.csv” excluding the class name, while y represents the class name.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)

This line splits X and y into two categories: training (70%) and testing (30%) in a random way.

from sklearn.pipeline import make_pipeline 
from sklearn.preprocessing import StandardScaler 

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

These are the classification model dependencies.

  • The make_pipeline function creates a ML pipeline.
  • The StandardScaler class normalizes the data.
  • The last two lines represent four different classification algorithms.

We will pick the best out of the four algorithms at the end to classify the poses.

pipelines = {
    'lr':make_pipeline(StandardScaler(), LogisticRegression()),
    'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
    'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
    'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}

Here, we are creating four different pipelines (another way to look at it is we are creating four different Machine Learning models).

fit_models = {}
for algo, pipeline in pipelines.items():
    model = pipeline.fit(X_train, y_train)
    fit_models[algo] = model

We train our four different models with our training data.

3. Evaluating the ML model

from sklearn.metrics import accuracy_score
import pickle

The accuracy_score function is used to evaluate the accuracy of a classifier.

The pickle library is used to save ML models to disk.

for algo, model in fit_models.items():
    yhat = model.predict(X_test)
    print(algo, accuracy_score(y_test, yhat))

We loop over the four ML models, and predict the class of our testing set X_test. We then output the accuracy score of each model which is a number in [0, 1] where 1 means our predictions are 100% accurate (y_test contains the actual expected classes, while y_hat contains the predicted classes).

with open('body_language.pkl', 'wb') as f:
    pickle.dump(fit_models['rf'], f)

We use the pickle library to save our best model. In my case, rf has the highest accuracy score, and I save it here to the file “body_language.pkl”. If your other models have higher accuracy scores, then change 'rf' to 'lr', 'rc', or 'gb'.

with open('body_language.pkl', 'rb') as f:
    model = pickle.load(f)

Although you don’t have to save the model if you are running all the code above in one go, it is good practice not to waste computing power every time training the model if the data hasn’t changed.

3. Classify the poses

# Make Detections
X = pd.DataFrame([row])
body_language_class = model.predict(X)[0]
body_language_prob = model.predict_proba(X)[0]
print(body_language_class, body_language_prob)

In order to detect the pose:

  • The first line creates a new pandas data frame object, which takes input row. Row contains all of our face and pose coordinates (x, y, z, visibility).
  • The second line uses the ML model stored in model to predict the class of the data stored in X.
  • The third line uses the same ML model to predict the probability of each class for that data.
  • The fourth line prints the predicted class and the probabilities of each of the other classes.
# Grab ear coords
coords = tuple(np.multiply(
                np.array(
                    (results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EAR].x, 
                     results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EAR].y))
            , [640,480]).astype(int))

cv2.rectangle(image, 
              (coords[0], coords[1]+5), 
              (coords[0]+len(body_language_class)*20, coords[1]-30), 
              colors[np.argmax(body_language_prob)], -1)
              
cv2.putText(image, body_language_class, coords, 
                cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)

image = prob_viz(body_language_prob,["Happy","Sad","Wakanda Forever"],image,colors)

This section grabs the left ear coordinates and scales them to the screen size. We use those coordinates to display what class is detected close to your ear. We then draw a colored rectangle and fill it in with the most probable class.

The prob_viz function is described below and represents another fun way to display the classes!

colors = [(245,117,16), (117,245,16), (16,117,245)]
def prob_viz(res, actions, input_frame, colors):
    output_frame = input_frame.copy()
    for num, prob in enumerate(res):
        cv2.rectangle(output_frame, (0,60+num*40), (int(prob*300), 90+num*40), colors[num], -1)
        cv2.putText(output_frame, actions[num] + ' ' + str(res[num]), (0, 85+num*40), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.LINE_AA)
        
    return output_frame

The function outputs all three classes (in my case “Happy”, “Sad”, and “Wakanda Forever”) to the video feed with their respective probabilities in an interactive way (the rectangle widens and shrinks depending on the probability of each class). Feel free to modify the colors :)

📝 Check Code 2 (Appendix) for the full code.

This is the final result:

animated

Tutorial 2: MediaPipe Application in a Game

In this example, we're trying to determine the aiming angle so that we can mimic the action in a shooting game. Not only that, this example will also highlight MediaPipe's limitations and the workaround.

Intuitively, to obtain the angle, we're going to use landmarks 12, 11, and 13, which represent the angle between the arm and the chest.

Note that each landmark provides (x, y, z) coordinates, where positive z points out of the screen, positive y points upward, and positive x points to the right.

We have written our 'calculate_angle' function, which takes two coordinate values for each point to compute the angle. Parallel to our real-life situation, we will examine the xz plane.

def calculate_angle(a,b,c):
    a = np.array(a) # first
    b = np.array(b) # mid
    c = np.array(c) # end

    radians = np.arctan2(c[1]-b[1], c[0]-b[0]) - np.arctan2(a[1]-b[1], a[0]-b[0])
    angle = radians*180.0//np.pi

Below is the result of this implementation:

animated

As you can see, the measurement is not accurate. This is due to the inaccuracy of the z-values. MediaPipe does not provide the precise depth value for each of the landmarks, meaning it does not accurately indicate how far you are from the screen.

Please note that I was wearing long sleeves, and the landmarks are glitching. This leads to another problem: if the landmarks are blocked, MediaPipe will have a hard time estimating poses.

So, what's the workaround for these two problems? We are going against our intuition.

This time, we will use landmarks 0 and 15, representing the nose and left wrist, respectively. To obtain an angle, we need a third point. This will be an imaginary point 'x' that extends straight down from our nose landmark. We will now examine the 2D angle between the wrist, nose, and x. This means that we no longer need to know how far or close we are to the screen.

animated

Conclusion:

Hopefully, this tutorial has provided some explanations of the significance of Pose Recognition in today's fast-paced, sensor-driven society. Using OpenCV and Google's MediaPipe, the tutorial demonstrated how to classify different poses and output their probabilities. If this topic interests you, there are several other potential areas of exploration, including movement classification (which involves classifying poses over consecutive frames) or even counting people in a room. Like any other technology, MediaPipe also has its limitations. However, as demonstrated in the last example, there are workarounds available, even if they may go against our intuition at times.



References:

OpenCV: Introduction to OpenCV-Python Tutorials

Home

Human Pose Estimation with Deep Learning - Ultimate Overview in 2023 - viso.ai

https://www.youtube.com/watch?v=We1uB79Ci-w

https://www.youtube.com/watch?v=doDUihpj6ro&t=6050s

AI Pose Estimation with Python and MediaPipe | Plus AI Gym Tracker Project

Appendix:

Code 1:

cap = cv2.VideoCapture(0)
# Initiate holistic model
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    while cap.isOpened():
        ret, frame = cap.read()
        
        # Recolor Feed
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False        
        
        # Make Detections
        results = holistic.process(image)
        # print(results.face_landmarks)
        
        # face_landmarks, pose_landmarks, left_hand_landmarks, right_hand_landmarks
        
        # Recolor image back to BGR for rendering
        image.flags.writeable = True   
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # 1. Draw face landmarks
        mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION, 
                                 mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                                 mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                                 )
        
        # 2. Right hand
        mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                                 )

        # 3. Left Hand
        mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                                 )

        # 4. Pose Detections
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                                 )
        # Export coordinates
        try:
            # Extract Pose landmarks
            pose = results.pose_landmarks.landmark
            pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in pose]).flatten())
            
            # Extract Face landmarks
            face = results.face_landmarks.landmark
            face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in face]).flatten())
            
            # Concate rows
            row = pose_row+face_row
            
            # Append class name 
            row.insert(0, class_name)
            
            # Export to CSV
            with open('coords.csv', mode='a', newline='') as f:
                csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
                csv_writer.writerow(row) 
            
        except:
            pass
                        
        cv2.imshow('Raw Webcam Feed', image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

Code 2:

cap = cv2.VideoCapture(0)
# Initiate holistic model
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    while cap.isOpened():
        ret, frame = cap.read()
        
        # Recolor Feed
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False        
        
        # Make Detections
        results = holistic.process(image)
        
        # face_landmarks, pose_landmarks, left_hand_landmarks, right_hand_landmarks
        
        # Recolor image back to BGR for rendering
        image.flags.writeable = True   
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # 1. Draw face landmarks
        mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION, 
                                 mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                                 mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                                 )
        
        # 2. Right hand
        mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                                 )

        # 3. Left Hand
        mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                                 )

        # 4. Pose Detections
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                                 )
        # Export coordinates
        try:
            # Extract Pose landmarks
            pose = results.pose_landmarks.landmark
            pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in pose]).flatten())
            
            # Extract Face landmarks
            face = results.face_landmarks.landmark
            face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in face]).flatten())
            
            # Concate rows
            row = pose_row+face_row

            # Make Detections
            X = pd.DataFrame([row])
            body_language_class = model.predict(X)[0]
            body_language_prob = model.predict_proba(X)[0]
            print(body_language_class, body_language_prob)
            
            # Grab ear coords
            coords = tuple(np.multiply(
                            np.array(
                                (results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EAR].x, 
                                 results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EAR].y))
                        , [640,480]).astype(int))
            
            cv2.rectangle(image, 
                          (coords[0], coords[1]+5), 
                          (coords[0]+len(body_language_class)*20, coords[1]-30), 
                          colors[np.argmax(body_language_prob)], -1)
            cv2.putText(image, body_language_class, coords, 
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
            
            image = prob_viz(body_language_prob,["Happy","Sad","Wakanda Forever"],image,colors)
            
        except:
            pass
                        
        cv2.imshow('Raw Webcam Feed', image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

Aiming Angle Code:

import mediapipe as mp
import cv2 as cv
import numpy as np

mp_drawing = mp.solutions.drawing_utils # give drawing utilities
mp_pose = mp.solutions.pose # import pose estimation model

def calculate_angle(a,b,c):
    a = np.array(a) # first
    b = np.array(b) # mid
    c = np.array(c) # end

    radians = np.arctan2(c[1]-b[1], c[0]-b[0]) - np.arctan2(a[1]-b[1], a[0]-b[0])
    angle = radians*180.0//np.pi
    
    return angle

# Video Feed
cap = cv.VideoCapture(0)
# Setup mediapipe instance
with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
    while cap.isOpened():
        ret, frame = cap.read()
        
        # recolor image to RGB
        image = cv.cvtColor(frame, cv.COLOR_BGR2RGB)

        # flip image horizontally
        image = cv.flip(image, 1)

        # set flag
        image.flags.writeable = False

        # make detection
        results = pose.process(image)

        # recolor image back to BGR
        image.flags.writeable = True
        image = cv.cvtColor(image, cv.COLOR_RGB2BGR)

        # extract landmarks
        try:
            landmarks = results.pose_landmarks.landmark
            
            # get coordinates
            nose = [landmarks[mp_pose.PoseLandmark.NOSE.value].x, landmarks[mp_pose.PoseLandmark.NOSE.value].y]
            right_wrist = [landmarks[mp_pose.PoseLandmark.RIGHT_WRIST.value].x, landmarks[mp_pose.PoseLandmark.RIGHT_WRIST.value].y]
            # first = [960/1920.0,10/1080.0]
            mid = [nose[0], right_wrist[1] + 300/1080.0]
            mid_frame = [960, mid[1] * 1080]
            print(nose[1])

            # calculate angles
            angle = calculate_angle(nose, mid, right_wrist)
            print(angle)

            # visualize angle
            cv.putText(image, "x", tuple(np.multiply(mid, [1920, 1080]).astype(int)), cv.FONT_HERSHEY_SIMPLEX, 3, (255,255,0), 3, cv.LINE_AA)
            cv.putText(image, "o", tuple(np.multiply(nose, [1920, 1080]).astype(int)), cv.FONT_HERSHEY_SIMPLEX, 3, (255,255,0), 3, cv.LINE_AA)
            cv.line(image, tuple(np.multiply(nose, [1920, 1080]).astype(int)), tuple(np.multiply(mid, [1920, 1080]).astype(int)), (255,255,0), 3)
            cv.line(image, tuple(np.multiply(right_wrist, [1920, 1080]).astype(int)), tuple(np.multiply(mid, [1920, 1080]).astype(int)), (255,255,0), 3)
            cv.putText(image, str(angle), (960,540), cv.FONT_HERSHEY_SIMPLEX, 2, (255,255,0), 2, cv.LINE_AA)
            
        except:
            pass

        cv.imshow('Mediapipe Feed', image) #(1920x1080)
        # print(cv.getWindowImageRect("Mediapipe Feed")) # to get the dimension of webcam

        if cv.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv.destroyAllWindows()
⚠️ **GitHub.com Fallback** ⚠️