
EAGLE VISION
How it Works

Inspired by the Hawk-Eye system from tennis, Eagle-Vision is a project that combines aspects of computer vision, machine learning, and trajectory algorithms.
First, Eagle-Vision will take in videos of a tennis ball being thrown, dropped, bounced, and other projectile movements. Using computer vision and color filtering, Eagle-Vision looks for the tennis ball and tracks its location in 2D space.
Then, Eagle-Vision will learn from this training data and use several different algorithms to learn and develop patterns in order to understand how tennis balls move in space given particular motion data, incorporating elements like position, speed, and trajectory angles.
Finally, Eagle-Vision should then use this knowledge to be able to predict how a tennis ball will move in space, given only a small segment of movement data. After only having the first few seconds of a tennis ball moving, Eagle-Vision should project a path for the trajectory of the ball for the remainder of the video.

Data Processing
Filmed the Data Set
In a neutral background setting and with a stable camera, a video containing over 400 sample tennis ball tosses was filmed. These tosses included the ball falling, bouncing off a wall, and bouncing off a floor.
Developed Color Filtering (In-Class Tool #1)
We developed basic color filtering on still images. The initial image was a ball on a table. It was initially challenging to filter the ball since the color of other objects were similar to the ball. We decided to keep the background a constant, neutral color to reduce interference from other objects. In our new test data, the ball is moving in a neutral, white background which allows the algorithm to more easily find the ball using color filtering.
Developed Edge Detection (In-Class Tool #2)
We developed basic edge detection and determined that it was not good enough (on still images). We developed a basic edge detection algorithm that finds the edge of a ball in an image. We tried this algorithm with a ball in 3 perspectives: close up, mid range and far away. The close up ball had the best result, with only 2 circles that were reasonably close to the edge. The midrange ball produced a large number of circles that were not close to the edge. Finally, the far away ball was not detected at all by the algorithm and instead extraneous circles throughout the image were detected.
Since our project will consist of detecting a ball far away, we have decided that using edge detection is not a feasible method of tracking the ball. We will instead use color detection since it produced a much better result for all perspectives of the ball.









In the first column, there are 3 images: ball3.jpg (closeup shot of tennis ball), ball2.jpg (mid shot of tennis ball), and Testlive.jpg (far shot of tennis ball). ​Each image will underwent two processes: edge detection and center.
The second column represents edge detection, which finds the edge of the ball. For this process:
-
Ball3.jpg was fairly close to the ball
-
Ball2.jpg had many wrong circles and was not close
-
Testlive.jpg failed to detect the ball at all and detected many unwanted circles
​
The third column represents center, which finds the x-y coordinate of the ball in pixels. For this process:
-
Ball3.jpg was a very good prediction of the center
-
Ball2.jpg had it’s center off the ball
-
Testlive.jpg was not in the center but the margin of error was reasonable
​
We found that:
-
A ball that is close up to the camera is successful with both edge detection and center
-
A far away ball is good with center but bad with edge detection
-
A mid range ball is poor in both categories
​
​​​
Applied Filter to Frames
The next step was to take the color filtering process and apply it to individual frames from our video to ensure it worked with our desired sample set.
One challenge along the way was that the green threshold used for the sample images, which were very bright and saturated, would not work for the video, due to its lower brightness and saturation.
We attempted to implement several threshold functions that would find a green threshold based on the average, maximum, or percentile of the green values in a given frame, but this led to frames with no tennis ball falsely detecting parts of the wall being marginally more green than other pixels.
We decided to hard implement a new, lower green threshold that seemed to work well with the video. This is not an ideal solution, as for different lighting conditions we would need different thresholds, and so further work might involve a threshold function involving the whole video; however, for this video we settled on a hard-coded green threshold. Figure is shown below.

This plot shows how color detection software was implemented to our video data. In the first plot, a single frame of the video is shown, along with the location of its pixels. The second plot shows the results of the color filtering, with each bright dot indicating a valid green pixel found above the threshold. This is significantly more accurate than the edge detection method. The average location of these pixels is used as the location data for that frame. This process is then repeated over an entire portion of video.
Applied Filter to Video
Afterwards, we applied the color filtering process to a sequence of frames within a given time span (12:30 to 14:00 for example). Although most of the initial results were good, correctly detecting the location of the tennis ball, even in poor quality frames, there was an issue of some bad data cluttering the location data.
Some parts of the wall, outside the throwing area, registered as green enough to be classified as a ball and so we filtered out any data that was within the ball throwing area in pixels (X position of less than 190 pixels or greater than 825). These numbers are hard-coded based off of our video and will need to be adjusted for new video information.
Connected Location Data
After the data was cleaned up to only include valid tennis ball locations, the final step was to connect the data sequentially to form a proper sequence. These location sequences will be what we use to train Eagle-Vision on. The figure below shows a sample location sequence.

This plot shows the location of a tennis ball in trajectory over a two second period. The ball is thrown from the top left (indicated by the red starting point) and ends after being caught in the middle of the hallway (indicated by the green ending point). Every marker represents a cataloged data point, showing a successful detection of the ball using color filtering. The connections between markers represent the ball’s path.This plot demonstrates that we are able to find and track successive tennis ball locations using image filtering from a video, building the training data for the machine learning portion of Eagle-Vision.
It can be seen, when compared with the original clip, this tracking method is successful.
Trial Generation
The process of trial generation turned raw video location data into individual thrown projectile instances for use in the cosine similarity algorithm as well as individual input trials to be predicted by the algorithms. Trial generation was a two step process.
The first step was linking location data. Sequential location data was linked together for a trial. This stage also removed bad data such as falsely detected tennis balls outside of the expected range and removed tennis ball data caused by human interference like catching the ball.
The second step was splitting up the location data into individual trials. Trials were separated from the video data depending on whether there was a tennis ball detected in a frame. If ten or more frames with tennis balls were detected, the past ten frames would be stored and all new frames would be stored in the trial. Once ten frames without a tennis ball was detected, those ten frames would be removed from the trial data and the trial would be complete and stored. This algorithm repeated for every frame of video data.
Moving Average Filter (In-Class Tool #3)
One DSP In-class tool we found was the moving average system to help smooth data. The moving average helps smooth out otherwise jittery data by making the data point an average of elements from before and after a given moment. We saw this as possibly beneficial to the project as the image processing section can sometimes generate jittery results for an otherwise simple smooth trajectory. We found however through stricter filtering of the video data and error handling the moving average would not need to be implemented for the input data. Additionally since the moving average changed the location of output data, this would impact the positional accuracy of our prediction so we did not implement this for our output data. Although we did not end up implementing this filter for the project, we found it was a DSP tool worth considering and could be used in future applications.

Trajectory Prediction
We developed three different trajectory prediction algorithms for Eagle-Vision: Cosine Similarity, physics based, and Kalman Filtering. The cosine similarity required a data set, while the physics based and Kalman filtering did not require a library of trained datasets.
Cosine Similarity (In-Class Tool #4)
The cosine similarity prediction algorithm, based off of a homework problem from class, measures the degree of similarity between two vectors in an inner product space. Cosine similarity is often used for document/text classification, but we are using it as a method of prediction.
For the training set, the x and y components of the ball positions are stored in separate matrices, with each row representing a trial. For the test trial, the x and y components of the ball position are stored in separate vectors. The similarity calculation is then done between the x and y vector of test and each row of the x and y matrices of training respectively. The closest match for both components is then extracted to form the prediction. Thus, it is possible to have x values and y values from completely separate trials to generate completely new trajectories.
We predicted x values and y values independently since the x and y movements of a trajectory are assumed to be independent in kinematics.
The results of one of our predictions using cosine similarity is shown below.

Physics Based Model (Out-of-Class Tool #1)
The physics based algorithm utilizes introductory level physics along with some statistical analysis to predict the movement of the ball. This algorithm does not use the trials dataset; only the input starting data (as long as the data is filmed in the same environment). The physics model is based on the principle that for each new frame, the new position of the ball will be the current position plus the directional velocity in both the x and y directions.
The assumption is made that there is no acceleration in the x direction and the y acceleration is constant due to gravity, except when there is contact with the wall or floor. All of the hard-coded constants, such as acceleration, position of the wall and floor, and the scaling factor post-contact velocity were all determined using statistical analysis and are specific to our trials. These will need to be adjusted for differing datasets.
The physics algorithm starts by taking in a few data points (our experiment used 10) of an already in motion tennis ball throw. From this initial throw, the average x velocity, final y velocity, and final location of the ball is determined. Then, the algorithm adds the x and y velocity to the location, adjusts the y velocity according to gravity, checks for contact with the wall or floor, and repeats.
If contact with the wall or floor is detected, the velocity in the x or y direction is flipped, depending on whether contact was made with the wall or floor respectively, and the velocity is scaled down. The algorithm is repeated for as many trials as the user requests.
The results of one of our predictions using the physics based algorithm is shown below.

Kalman Filtering (Out-of-Class Tool #2)
Kalman filtering is an algorithm that takes in a series of measurements along with noise/error to make a prediction about the next state of the system. This is ideal for systems that are continuously changing.
First, an initial measurement with a margin of error is taken. The Kalman filter will also output the exact same value, as it cannot compare to any previous data. Then, a second measurement is taken that is slightly different from the initial measurement. The Kalman filter can then output a weighted value that considers both the initial and second measurements. As the measurements increase, the Kalman filter’s predicted output will be closer and closer to the actual value. If we stop inputting measurements after the filter is sufficiently trained, the Kalman filter can predict what the next value will be based on the previous data.
The algorithm that we used was based off of one that tracks a ball rolling across a table. It records the location of the ball for the first few frames using a blob analyzer to identify the ball.The Kalman filter then generates a prediction of the actual location of the ball for those first few frames based on the blob analyzer’s recording. After the first few frames, the blob analyzer stops recording the location of the ball. But, the Kalman filter continues to generate predictions of the ball’s location. Since the Kalman filter generates the prediction using the initial data given in the input, we can predict the ball’s trajectory without using a training set.
​
We based our Kalman filtering algorithms based on Mathworks resources found here and here.
​
The results of one of our predictions using the Kalman filter algorithm is shown below.

Final Results & Conclusions
Below is a graph of our final results. We have a plot of our actual trajectory compared with predictions made by our cosine similarity, physics based, and Kalman filtering algorithms.​

Analysis of Results
Cosine Similarity
-
The predicted trajectory is heavily dependent on the initial conditions of the trial because the initial conditions determine the shape of the entire trajectory(and consequently determine all of the position data)
-
Prediction accuracy improves as a function of size of training set
-
Cannot adjust prediction iteratively as the trial goes on
​
Physics Based Model
-
Prediction modeled true data set fairly well, especially the beginning of the prediction and the overall expected shape of the path
-
Physics based model has the advantage of not needing training data
​
Kalman Filter
-
The Kalman filter can predict the trajectory of the ball without the need of a large training set data
-
However, the Kalman filter prediction is subpar compared to the cosine similarity prediction and physics based model
-
One hypothesis for why it is subpar compared to the other methods is that we kept the same state model matrix
-
Since the path of our object follows a trajectory instead of a straight path, the values in the state model matrix might need to be changed to fit our situation
-
What we Learned About the Different Methods
Cosine Similarity
-
Good for predicting what the trajectory will generally look like, since it is based on matching a test data set to a large set of training data
-
Not good for obtaining a refined trajectory
​
Physics Based Model
-
Required a lot of hard-coded data gathered from experimental trials, thus this algorithm is less flexible and adaptable to different situations.
-
Introductory physics model leaves more room for exploration and advanced modeling, especially when handling collision and changes in the x velocity. Most errors in the prediction resulted after bounces and a more robust physics model should attempt to get a better handle on that.
​
Kalman Filter
-
Works based on two assumptions:
-
1) Gaussian distribution
-
2) Linear function
-
This is because if you input a Gaussian into a linear system, the output will be Gaussian. A non-Gaussian distribution does not have mean or variance.
-
If we had a nonlinear function, then the imputed Gaussian would result in an output that is non-Gaussian
-
-
Makes heavy use of matrices to predict and update
-
Blends the probability distributions of the current input measurements and what is predicted based on the previous state
-
The state model matrix has a large influence on how the Kalman filter prediction turns out. We played around adjusting the values, and it caused the predicted trajectory to vary greatly
-
There was no documentation on what each value in the state model matrix was, so we were unable to make adjustments that resulted in any improvements

Further Applications
1) Develop an algorithm that can detect any colored ball
Our current system is designed to only detect a green ball due to the color filtering method that we used to track the ball. We can change this algorithm to detect other specific values by adjusting the bandpass filter to accommodate different RGB values. However, it would be interesting to be able to detect any colored ball given that it is distinct from the background color.
2) Develop an algorithm that can identify and track multiple balls at once
Our current system can only track one ball at a time. A next step would be to be able to detect more than one ball in the same video. One way that we could do this is set our algorithm to detect two different colored balls with hardcoded RGB values.