Let's Troubleshoot Together
The objective behind this section is to collaborate together on a pre-built scenario to put in practice some of the troubleshooting techniques and tools you have learned and benefit from the real power of the Aviatrix platform.
The scenario can be found below:
A few notes on the scenario are here:
- This is an ingress scenario i.e. we have a Wordpress application that we want to publish to the world.
- The application consists of 3 components NGINX Proxy, Web Server & Database.
- The ALB, the NGINX proxy and the Web Server reside in us-east-1 region (N. Virginia).
- The Database resides in us-east-2 region (Ohio).
- The setup exists in AWS but the same logic and methodology can be followed in other CSPs.
- Public Subnet Filtering Gateway (PSF) is used at the edge of the Ingress VPC to protect any communication from malicious IP Addresses.
- External User will communicate with the Public IP address corresponding to the ALB.
- ALB will basically terminate the connection and create another connection. Recall that the ALB does SNAT. The target of the ALB is the NLB. Thus the traffic flow will now be from the ALB's private IP to the NLB's private IP.
- FireNet is also part of the scenario whereby we will ensure that any traffic between any of the VPCs can be subject to inspection.
The password and the POD number will be provided when the breakout rooms will be created!
- username: student/ Password will be provided
- FW_username: admin / FW_Password will be provided
- Controller URL: https://ctrl.pod#.aviatrixlab.com/#/dashboard <-- replace the first # only with the pod number
- CoPilot URL: https://cplt.pod#.aviatrixlab.com
- AWS console URL: https://pod#-aviatrix.signin.aws.amazon.com/console
- Time to complete is 90 minutes.
- One volunteer of the team should share the screen and navigate through the Controller, CoPilot and AWS dashboards.
- Feel free to name your teams.
- The Wordpress application that is very critical to your business is broken! You are going to collaborate together to try to identify all the issues that are causing it to break.
- Once the 90 minutes are done, the instructor will close the breakout rooms and each team will discuss their findings & learnings for ~ 10 minutes.
- Once this is done, the instructor will highlight some of the key values that the Aviatrix Platform brings into troubleshooting and operating the setup.
- If you manage to get the running application, then you will need to clone the post found here by adding the following details: Team name, Team members, Team POD number and the proof (the DNS name of the ALB).
Challenge#1 Customer Connection to the ALB
Go to EC2 section on AWS console and locate the Load Balancers section on the left navigation panel. Click the load balancer with the following symbolic name:
Hint: the ALB is located in us-east-1 region!
Then pinpoint the DNS name of this LB inside the Details section.
Then add new inbound security rules inside the Security Group assigned to the centralized-ingress-lb. Allow the HTTP traffic (port 80) from the Public IP addresses of all the team members' machines.
Keep the existing inbound rules as they will be used by the instructors to figure out if the Application is healthy.
- Step Perform an HTTP request towards the ALB's DNS name from your browser. The application is down.
- Try locating this flow in Copilot
We need your support to answer three questions:
1. What are the public IP Addresses of the ALB's DNS?
2. Why does Copilot indicate that the flow is being sent to private IP addresses? What do these IP addresses correspond to?
3. Which gateway(s) in the topology does this flow pass through? Why is this flow only seen by this gateway?
Hint: ALB always performs SNAT
Challenge#2 Ingress to Proxy
We continue our investigation to bring up our application. The new flow we are looking at is between the ALB and NLB.
1. Check which gateways see this flow (Use the FlowIQ search and look at the Flow Exporter/Aviatrix Gateways)
2. Check the routing tables of the ingress spoke & web spoke
Hint: Check the routes received by the transit from the spokes, Customize Spoke Advertised VPC CIDR
Bonus: Try to trace the direction the packet is heading to.
By the end of this challenge you need to ensure that the flow goes beyond the ingress spoke to the transit and to the Proxy Spoke.
Challenge#3 Proxy VM to Web VM
You have received some indications that the problem is between the NGINX Proxy VM and the Web VM. Use Copilot to figure out the IP addresses of both virtual machines
1. Can you identify where the transit is forwarding this traffic? (Egress interface on the transit GW for this flow)
2. Feel free to get a packet capture of the packets on the specific interface identified
3. Is the Firewall returning the traffic back?
By the end of this challenge you need to ensure that traffic is flowing from the Transit Gateway to the Firewall and back.
Note: Eth2 is the Interface that connects the Transit to the PAN VM-series FW.
Challenge#4 Slow Performing Application
You received a call just now stating that the application is not performing well. The suspicion is that the NGFW with its advanced inspection capabilities is introducing a delay between the NGINX Proxy VM and the Web VM.
1. You need to make sure that this specific traffic Proxy to Web VM is exempted from the Firewall to check the performance without the Firewall
2. It is not allowed to remove the Web Spoke from the inspected list of FireNet as there are tons of other applications that might get impacted.
3. You need to prove through Copilot that the specific traffic (NGINX Proxy to Web VM) is not being forwarded to the FW.
Hint: Network List Excluded from Inspection.
Challenge#5 Web to DB
The DB administrator mentioned that the DB is receiving requests from an IP range in the 100.64.0.0/10 space.
1. Check where is this range coming from?
2. Why isn't the response being received on the gateway?
3. Without removing the configuration, you need to get the application into a functioning state.
4. The workload's routing table in the CSP should continue to be able to reach 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 for Egress traffic from the DB in addition to the new range 100.64.0.0/10
Hint: Check NAT configuration on the Database GW & Customize Spoke VPC Routing Table