About Stacks Guru

Stacks Guru is born from the need to search the vast number of videos out there on stacks built for Stacks Pro and the Stacks 5 plugin for Rapidweaver.

We have scraped over 500 videos to get the transcripts for each in order to make the spoken word searchable.

Please use this free tool to help you learn and discover the awesome power that Stacks and the stacks made for Stacks have to offer.

Stacks Guru

Video Reference

Leave a Tip!

Every little bit helps to keep this going. I'll be doing my very best to keep up with the likes of Joe when it comes all the videos he posts!

Field Office | Video 06 | Templates thumbnail

Field Office | Video 06 | Templates

Marty the OT Guy walks you through how to diagnose the following network problems, and how to do it in the lab: 00:00 Introduction 03:17 1. Network retransmission 11:07 2. Throughput rates 16:29 3. Down network links over time Thanks, Marty!Marty the OT Guy walks you through how to diagnose the following network problems, and how to do it in the lab: 00:00 Introduction

Categories:

Transcript

0:00 foreign
0:02 [Music]
0:12 the OT guy here
0:15 um back today to talk about some things
0:16 you can do with nozomi Network's
0:19 products that can help operational
0:21 technology teams diagnose network
0:23 problems
0:25 we'll start off start off with a funny
0:27 story
0:29 um we were dealing with a remote site
0:32 here in New Zealand right down in the
0:35 South Island wonderful South Island for
0:36 anyone who's been there Lord of the
0:39 Rings and all those things and we had a
0:42 wind farm uh the primary comms
0:46 connection was via satellite
0:49 and over time
0:51 um we found out the hard way of course
0:53 we didn't we didn't find out while this
0:55 was happening but over time the
0:58 satellite Communications were dropping
1:00 off and it was progressively getting
1:02 worse and worse and we couldn't really
1:04 understand what was going on
1:06 um and it's a long way to get to this
1:08 place it's very remote
1:09 um we're flying from one end of the
1:11 country to the other we then have a oh I
1:15 think it was nearly two hour drive
1:17 um 4x4 off-roading and to get to the
1:20 site and
1:22 yeah we couldn't work out what was going
1:24 on with um with these comms issues and
1:26 what we really needed was a way to
1:28 diagnose the problem
1:31 um and of course as you do you work out
1:33 how to do it when you've got the tools
1:35 and after the fact but so what would
1:37 happen was the um satellite
1:39 Communications would misbehave and it
1:41 would it would go in and out or it'd be
1:43 shaky for short periods of time and
1:45 things like that and progressively over
1:47 time it got worse and worse
1:48 so
1:50 when we did the site analysis and we
1:52 finally got to have a look what we found
1:55 out was some fencing was broken and some
1:58 local cows had managed to work out that
2:02 a really good way to get a back massage
2:04 was rubbing up against the um the
2:08 transmission arm and lnb on a 1.2 meter
2:12 satellite dish so whenever the cow
2:14 needed a back scratch he would wander up
2:17 sidel up against the Satellite Dish and
2:20 get jiggy and have a good rock around
2:22 um and then as he went back to the back
2:25 to the herd and told some of the other
2:26 guys
2:27 um progressively they they came along
2:29 and they they're all rubbing up against
2:31 the Satellite Dish and
2:33 in the in the first instances we we
2:35 could only assume that while the cow was
2:37 rubbing
2:38 um of course he's rocking the whole
2:40 satellite dish which is knocking the
2:41 alignment out and then progressively as
2:43 they used the actual dish itself it
2:46 slowly rotated the disc and knocked it
2:48 out
2:49 um the dish rather and knocked it out of
2:50 alignment so yeah we worked this out one
2:53 day when we were actually there on site
2:55 and the cow walked up and did it right
2:57 in front of us that was our that was our
2:59 diagnostic tool but over time it would
3:02 have been so much easier if we could
3:04 have whipped it out on the fly so yeah
3:07 today's session let's share three ideas
3:09 three ideas about
3:12 um how you can diagnose network problems
3:14 using the zombie networks tools
3:16 [Music]
3:22 okay so the first thing I want to talk
3:24 about is monitoring uh Network
3:27 retransmission so when you've got a
3:31 nazomi networks Appliance a guardian
3:32 Appliance in your network and you are
3:35 monitoring links and sessions and
3:37 throughput and all the good things that
3:39 comes with that you have the ability to
3:41 look for re-transmissions how do we pick
3:43 up rear Transmissions because it's on
3:45 the network you're going to see it if
3:47 you if you took a packet catcher you're
3:49 going to see re-transmission data inside
3:52 that packet capture
3:54 so we can set up alerts
3:57 um and things called assertions and
3:58 assertion is sort of like a question
4:00 where you say
4:02 um if a occurs in B time
4:07 um then create an alert or you can just
4:10 have an assertion that just if a occurs
4:12 in B time
4:14 just assert let us know it doesn't have
4:16 to alert
4:17 um so that's really how you do it you
4:19 would set up a monitoring query that
4:23 would monitor the network link that
4:25 you're looking for and when the
4:26 re-transmission rate exceeds a set point
4:29 exceeds a given set point
4:31 um yeah you can create an alert from
4:33 that so that's kind of cool and very
4:35 useful for monitoring
4:37 um important
4:38 um you know critical links primary links
4:40 or something like that and it reminds me
4:42 of another another story and we did this
4:44 with a customer recently where during um
4:47 during a trial session during a proof of
4:49 concept the um the customer commented
4:52 that the communications to this one
4:54 particular remote site
4:57 um were always a problem and we looked
5:00 at it and there we were able to help
5:02 them diagnose really quickly just by
5:04 picking up that re-transmission rate
5:07 um and showing how bad it actually was
5:10 we're able to diagnose really quickly
5:12 that the microwave link they had in
5:14 place so microwave Dish One in micro
5:16 efficient data the microwave link they
5:18 were using and the dishes were slightly
5:20 out of alignment so they they sent her
5:23 technician out to site who
5:26 rotated the dish slightly got it back in
5:28 alignment and all of a sudden all the
5:30 re-transmission went away
5:31 um
5:32 so yeah the the proof in the pudding
5:35 there was it was really easy for them to
5:37 be able to to
5:40 to not go
5:42 um you know we've got this problem I
5:44 don't know what to do with it it was
5:45 we've got this problem we've proven it
5:47 we can fix it it's a really fast
5:49 turnaround on that one so that was cool
5:51 that was a good one
5:57 foreign
5:59 so let's look at the equipment we're
6:01 using to demonstrate to do this
6:03 experiment in the lab
6:05 so I've got some cheap CCTV cameras uh
6:08 connected to my lab Network they go
6:10 through a variety of infrastructure and
6:12 end up at a network video recorder and
6:15 along the way my nazomi Network's
6:17 Guardian Appliance is monitoring the
6:19 traffic
6:21 taking a look at the query you can see
6:24 here screenshot from the guardian
6:26 Appliance I've entered the query you can
6:28 see the results there's some fairly High
6:30 re-transmission rates in there this is
6:32 deliberate it's the way my lab is set up
6:34 purposely for this experiment but let's
6:37 dive deeper into how this query works
6:40 and we've presented it here in what I
6:43 like to call Vantage format and the
6:45 difference between our Vantage SAS
6:48 platform and the nozomi networks
6:51 Guardian Appliance
6:53 um it's the way that the the queries are
6:56 presented visually so in the guardian
6:59 Appliance you see it all as one big long
7:01 string whereas Vantage format which is
7:04 that's my term breaks it down into
7:08 separate lines which makes it a little
7:09 easier to understand so let's take a
7:11 look at what's happening on the first
7:12 line here we're referencing the links
7:14 table
7:15 the vertical line at the end the pipe
7:18 then sends that on to the next command
7:19 so second line we're saying where
7:22 tcpretransmission dot percent is greater
7:25 than 10.
7:26 that's all it's self-explanatory it's
7:29 really really simple so we're pulling a
7:31 data set from the links table of all
7:35 records where re-transmission percentage
7:38 is greater than 10. we pipe that through
7:40 to the third line and we do a select
7:42 command because I want to display this
7:43 data in a meaningful format and I've
7:45 deliberately used some features here
7:48 just to make it a bit easier to read you
7:51 don't have to do it this way so starting
7:54 from the left we select from then you
7:57 see the right arrow which is a hyphen
7:59 sign and a greater than and we're taking
8:01 lowercase from two from with the capital
8:04 the right hand side of that expression
8:07 becomes the new name for the column on
8:10 the report
8:11 So reading through it we select from we
8:13 select two we select TCP retransmission
8:17 percent and we've renamed those to from
8:19 and two with a capital f and T and
8:23 retrains percent as a more readable
8:25 result and you can see in the screenshot
8:27 below that's our report that's come out
8:30 nice and simple
8:32 so let's look at this in a slightly
8:35 different way what if we don't want a
8:37 table as an output let's look at a
8:39 network graph query why can't we create
8:41 a network graph so this is a similar
8:43 this is using similar inputs to get
8:48 um a graph output it could be easier to
8:50 read it might make life simpler for the
8:52 um for the users or for the engineers
8:54 that need to need to work on this
8:58 so let's break down the query again here
9:01 in Vantage format this time it's a bit
9:04 more complicated because to use the
9:06 network graph feature we have to be
9:09 referencing the nodes table so we have
9:11 to do a join the first line we select
9:14 the nodes table we pipe that to a join
9:16 command which is the second line the
9:18 second line is joining the nodes table
9:21 and the links table using the IP
9:26 column from the nodes table and the two
9:29 column from the links table as the
9:32 common call it a key if you like like a
9:35 database key so the first two lines were
9:38 saying take the nodes table take the
9:40 links table and I want a data set that
9:44 my my resulting data set needs to be all
9:47 of the entries where IP and 2 have the
9:51 same IP address in them we then pipe
9:54 that into a where command
9:55 so we're saying we're join the link ip2
9:59 that is a new column that's created
10:02 um as part of the the resulting data set
10:05 joined link ip2 TCP retransmission
10:08 percent greater than five this time I
10:11 went for five percent it obviously you
10:13 change that to suit the threshold that's
10:17 um that's relevant in your environment
10:19 finally we pipe that through to the
10:21 graph command
10:23 um in order to use this graph command we
10:25 have we ideally we set three um
10:29 we set three parameters so the first one
10:31 we're setting is node label which is IP
10:34 address the second one we're setting is
10:37 the node perspective and we're saying
10:39 use the roles so I want to see each
10:42 individual device I want to know is it a
10:45 producer is it a consumer a server the
10:48 the role it plays within the network and
10:50 the third parameter we're testing is
10:52 link perspective TCP retransmitted bytes
10:56 so that means that the arrows showing
11:00 the links will change color depending on
11:02 the level of retransmission
11:05 present on each leg
11:12 the second one we're going to move on to
11:15 is about Network transmission rates
11:18 throughput rates so we had a customer
11:21 um recently reasonably recently who
11:24 needed to be able to monitor the average
11:26 throughput over time for given
11:29 transmission links given communication
11:32 links so in in their case they they had
11:35 a radio connections between sites and
11:39 they needed to know that the
11:42 communication throughput stayed within
11:45 certain limits over a seven day period
11:48 or something like that they had to
11:49 report on that as part of their
11:52 performance metrics
11:54 so they asked us to design and Implement
11:56 a feature which allowed them to do that
11:58 so you can now we're now able to look at
12:01 Network throughput for any given link
12:04 and alert or assert based on a high or
12:08 low level so for instance if you've got
12:10 a radio link from point A to point B
12:14 and you know that it runs at a constant
12:17 50 megabits per second or fairly
12:20 constant 50 megabits per second you can
12:23 put some alerting figures let's for
12:25 argument's sake let's say at 40 and at
12:28 60. so we have 40 and 60 as our lower
12:31 and upper limits we have 50 as our as
12:34 our normal operation level and if the
12:37 network traffic Peaks outside of 60 or
12:41 drops below 40 for a specified period of
12:45 time we can raise an alert based on that
12:47 why is that important well you could
12:51 have if you're losing network report if
12:54 it's dropping off it may be indicative
12:56 of devices that are failing in the field
12:59 you might have something that's given up
13:01 and it's starting to drop away or the
13:03 throughput drops away because the device
13:05 has failed it may not be a critical
13:07 device
13:09 um and in perhaps you don't detect it or
13:14 some other yep some other part of the
13:15 the network that that you can't
13:18 necessarily pick up through a critical
13:20 failure if you have a spike in traffic
13:23 that could indicate that someone's added
13:25 something new to your network
13:27 some behavior is changing the throughput
13:30 maybe it's re-transmissions again maybe
13:32 there's a whole lot of messages going
13:34 backwards and forwards just for that and
13:36 it's taking up more and more Network
13:38 throughput
13:39 [Music]
13:44 okay lab session two
13:47 for this experiment we're using a PLC an
13:50 HMI and Scatter workstation and an
13:52 engineering workstation within my lab
13:54 environment my nozomi networks Guardian
13:56 is monitoring the traffic
14:01 so let's take a look at this query
14:03 this one's a little different from the
14:04 last one we used
14:06 um because I want the output of the
14:09 query to be an assertion
14:11 uh an assertion think of an assertion as
14:14 a true false yes no on off red green
14:18 binary output so in this case we've got
14:22 it showing up um it's giving us a green
14:25 line a green box there to say that this
14:28 assertion is is okay
14:30 and what we're doing here is we're
14:32 looking at traffic between two looking
14:34 at link traffic between two nodes and
14:37 saying hey if if it's not transmitting
14:39 enough traffic or if it's transmitting
14:41 too much traffic
14:44 um across a 15 minute period Then can
14:46 you can you please let me know about
14:48 that
14:49 let's break the query down
14:52 so start with the links table pipe that
14:56 through to the first where command on
14:57 line two where we're selecting the one
15:00 end of the of the link which is in this
15:03 case it's a DOT 50.35 which is the scada
15:05 workstation
15:07 pipe that to the third line we're
15:09 selecting the two end the other end of
15:11 the link which is dot 50.60 that's my
15:14 small plc
15:16 pipe that through to the fourth line
15:19 we're looking at transferred last 15
15:21 minute Bots how much traffic went
15:24 through over the last 15 minutes and
15:26 we're saying here if it was less than a
15:28 megabyte then I want to know about it
15:30 and pipe that through the fourth line
15:32 for fourth line being the same condition
15:36 however we're saying if it's greater
15:39 than 100 Meg this time so really what
15:41 I'm doing is I'm saying is my PLC
15:43 transmitting any traffic at all uh and
15:46 because this is just for a lab example
15:48 I've deliberately used really wide
15:51 um parameters there you could narrow
15:54 that down you could say I've got a link
15:56 uh between this
15:58 station and that station and I know that
16:02 the traffic should be 500 kilobits plus
16:05 or minus 50 kilobits and you could
16:08 change the parameters in here and really
16:10 narrow that up
16:12 and what we're saying here that final
16:14 part on line one two three four five
16:15 right at the very end
16:17 um we pipe that through and say assert
16:20 empty so we're saying if the traffic is
16:25 inside the parameters we want I want a
16:27 green box
16:34 have you ever had those times in your
16:36 career when you submit some compliance
16:39 reporting or some trends for analysis
16:43 and someone comes back to you and says
16:46 can you please explain this Gap in the
16:48 trend for let's say two days there's a
16:51 gap in the trend and you went I'm sorry
16:53 what Gap in the trend or you maybe
16:55 didn't know that that data was missing
16:57 you found out the hard way
17:00 how do we solve that what can we do
17:02 about that well we are able then zombie
17:06 networks product is able to monitor when
17:10 a device lasts communicated so it's sort
17:14 of like throughput
17:15 um but instead of looking at volume
17:16 we're now looking at time so we can set
17:19 up alerts and assertions so that if a
17:22 device doesn't communicate for a given
17:24 period of time we trigger an alert let's
17:27 say you have a 12-hour compliance
17:30 reporting requirement you you need to
17:33 maintain your reporting and trending and
17:36 you're not allowed to have any outages
17:37 greater than 12 hours it'd be a really
17:40 good idea to detect a a loss of
17:44 communication a link drop or yeah or
17:48 last last time that a device was
17:50 communicating
17:52 um maybe at eight hours maybe you pick
17:54 it up at four hours and that gives you
17:57 plenty of time to do diagnostics and
17:59 hopefully fix the problem problem before
18:01 it becomes a compliance issue
18:10 okay lab session three same as before
18:14 we're using the PLC the scada HMI
18:17 workstation and the engineering
18:19 workstation with the nozomi networks
18:20 Guardian monitoring the traffic so this
18:23 query again we're using we're using an
18:25 assertion but this one is misbehaving
18:28 this time you can see it's got a red
18:29 line so it's a bit different
18:31 let's break it down again starting with
18:34 the links table
18:35 well with this time I'm excluding some
18:38 traffic I I don't want uh traffic the
18:42 0.0.0.0
18:45 um address in my nazomi networks
18:48 Guardian configuration
18:51 um is used as a catch-all for all of my
18:53 public IP addresses or anything that's
18:56 not within my lab so we're saying we
18:59 don't I'm not interested in links that
19:01 are trying to reach the internet or
19:02 trying to get to my home network or in
19:04 any way related to my management or
19:06 anything like that
19:09 um let's look at that third line where
19:11 to exclude
19:13 224.0 okay so I want to exclude 224.0
19:17 addresses which you know they're a
19:18 management address I don't want to know
19:20 about it
19:21 next line down where to exclude FF I
19:24 don't want IPv6 in this response uh in
19:26 the result rather from this query
19:29 and then
19:30 next line down so we're now on one two
19:32 three four five where two is not equal
19:36 to my public catch-all again so I don't
19:38 want traffic going to or from there
19:42 now we get into the interesting stuff so
19:45 the next line down we're saying where
19:47 the last activity happened greater than
19:49 two hours ago so if I know that I have a
19:52 PLC that should communicate roughly
19:54 every hour
19:56 um then I can set that to be greater
19:58 than two hours we haven't had a response
20:01 something might be wrong
20:02 we come down again and we say where the
20:04 transferred average packet has dropped
20:07 below
20:08 um below a thousand so now we're saying
20:11 it's been greater than two hours and our
20:12 average traffic is much much lower than
20:14 we expect
20:16 we have we maybe have some problems here
20:18 so I put in a a select line in there
20:22 this is sort of a a demonstration of how
20:25 not to make a query in a way we've got
20:27 select in there we're saying hey forgive
20:29 me from to transferred average packet
20:33 bytes but then we're piping through to
20:35 an assertion which means we're not
20:37 seeing the output of the select command
20:40 um so yeah that's a deliberate mistake
20:42 that's showing you how to well you can
20:45 believe it's deliberate or not I don't
20:46 mind but it's a deliberate mistake to
20:48 show you uh what would happen or how it
20:50 may not result the way you expect the
20:53 query still works properly
20:56 but what we don't get is the table the
21:00 data table coming out from the select
21:03 command
21:05 okay so this next Lab session we're
21:07 using the same configuration as before
21:10 but what we're doing different is we're
21:12 going to use the link State rather than
21:14 the links table itself so it's a
21:17 slightly different query
21:19 and let's take a look at it
21:21 the first thing we do is we start off
21:23 with the link events table
21:26 we pipe that through to our first wear
21:28 command so we're saying where the event
21:30 includes down so now we're looking at
21:33 links that have gone down it's not there
21:35 anymore
21:37 pipe through the next one where hours
21:40 ago time lesson 24. so we're saying what
21:44 has happened in the last 24 hours have
21:46 any of these links gone down
21:50 pipe through again and we've got a big
21:52 long select so we're sort of combining
21:55 everything that we've covered in these
21:56 Labs here so we're selecting ID Source
21:58 but we're renaming that to from
22:01 we're selecting Port Source renaming
22:04 that to from Port we're selecting ID
22:06 destination and calling that two we're
22:09 selecting Port destination and calling
22:12 that two port and we're selecting time
22:15 and calling it time and same with
22:17 protocol selecting protocol calling a
22:19 protocol with a capital P and you can
22:21 see there the resulting data set and
22:23 table that we've got out of that
22:25 so that's my three my three tips for
22:27 this this session if you're coming into
22:31 OT from the it will some of this might
22:34 seem a little unusual or a little
22:36 different if you're working on a campus
22:38 you're working in a large campus
22:40 how often a device communicates or or
22:43 even in some cases how much bandwidth
22:45 it's using isn't often that important
22:49 um it might not matter if a given
22:51 endpoint goes from 50 megabits a second
22:54 to greater than 60. who cares it could
22:57 be a use case thing but in the OT world
23:00 that could be indicative of some
23:02 problems in your network or or things
23:04 that could go on to affect production or
23:07 compliance so it's pretty important that
23:10 we keep an eye on those kind of things
23:17 so that brings us to the end
23:19 um we'll look forward to catching up
23:20 with you next time and I'm really
23:23 interested can we get some feedback from
23:25 from new people out there would you find
23:28 it interesting to learn about some Core
23:31 iot Concepts such uh you know design or
23:36 um
23:37 digital and analog inputs and outputs
23:40 versus
23:41 communication devices versus you know a
23:44 bit more technical detail on how OT
23:46 systems actually work rather than just
23:49 how we're using it within the nozomi
23:52 world the two things fit together really
23:54 well we could have a lot of fun
23:55 discussing some of those things if
23:57 you're interested in knowing more about
23:59 that let us know we'll see you next time
24:02 [Music]"}]
0