Current State:
DevOps team started the cloud project and built AWS TGW and attached VPCs as needed. Everything is working fine for them as long as there is no issue.
Now production environment is moving to AWS and we are running into basic network infra design challenges. DevOps teams did great work in application space but as network infrastructure is not their forte, several basic infrastructure needs were not catered for in the design. Architecture Board and Compliance teams are holding the project until following issues are resolved before a sign-off.
Key Issues:
Networking:
a. Currently 15 VPCs but will expand to 100s including VPCs owned by other LoBs that cannot be re-IP'd
b. From on-prem to Cloud 100 routes are okay for now but from TGW to on-prem we cannot summarize everything in 20 CIDRs
c. Need ability to send 100s of routes from AWS to on-prem. Some VPCs will also require secondary CIDRs.
Security:
a. All data from on-prem to cloud must be encrypted on top of application level encryption
b. Not all applications are encrypted and to encrypt every application will be very costly and delay the project
c. Security needs NGFW in the cloud for intra-Cloud (E-W) traffic
d. Remove firewalls from the on-prem termination point as it has limited thruput and is more costly from licensing, racking and stacking and other operational costs associated with hosting on-prem. Customer wants to move to Consumption based model and save on cost.
Operations:
a. Ops team is not very well versed in cloud and need a quick and easy way to take over build and support
b. Need ability to look at flows, take packet captures and do all normal network troubleshooting
c. Use of VPC Flow Logs becomes expensive very quickly
Solution
Aviatrix Transit Architecture solves all the challenges mentioned above. To keep this conversation precise, I will only mention details relevant to problem statement.
Networking:
a. No route limitations on learned or advertised routes
b. Traffic engineering capabilities with BGP and manual overrides
Security:
a. Encryption from on-prem at line-rate. 10Gbps in this case
b. Service Insertion in cloud at full scale without IPSec, BGP, SNAT etc
c. Ability to have upto 20 A/A FWs allow realization of consumption based model and save on cost
d. Security domains allow easy segmentation and air gaping between VPCs
Operations:
a. Provide simple way to configure with an easy to use UI plus way to automate via Terraform or RestAPI
b. Visibility into all the flows, analytics like flow IQ, Geo Location, Session analytics etc.
c. Ability to quickly take Packet Capture, run tools like Ping, Telnet, Traceroute, Tracepath etc