Think outside the computer
One CTO believes in making self-programming machines that have the capacity to learn from data.
Michael Kagan, Chief Technology Officer, Mellanox Technologies said “We are in the midst of a huge international networking transformation over the world, and we'll talk about this. It all starts with the move from the phone being static to being mobile. I got my first mobile about 30 years ago; maybe a little bit more than 30 years ago, and I thought it was great because I can talk to my colleagues in the United States on my way to work, which opened up new ways of operating and communicating.
“But I didn't realise those days that 30 years later, this machine will become something that you never leave home without it; it's actually what Steve Jobs told his engineers; I want you guys to design a machine that nobody would like to leave home without. Now we use the phone for news, emails, gaming, social media. Sometimes we use it as a phone and you can do it all even without leaving the toilet.
Kagan said that in 2007, Nokia was a very large company, about $150 billion worth. Nokia decided to get into the geo-navigation business and decided to build the navigation system that monitors traffic. To make it work, they needed traffic sensors, so they went to Navitech, which is a European company, they had about five million traffic cameras across Europe. So with traffic cameras you can know where the traffic is, when it builds up, and this way, by connecting this to your navigation software and navigation gadgets, you can actually navigate people around the traffic.
In the same year, 2007, Kagan said, there was an Israeli company started Waze. They figured out that they can get same data, which is traffic sensors, for free by writing this application software and every phone that runs this software actually generates the data to Waze and pretty much the rest is history. So Waze figured out that they can get the data in a much more efficient way out there and the rest is history. Nokia’s valuation went down in five years to less than they bought Navitech for and Waze sold to Google for more than one billion. So once you figure out that the data is out there, and you find an efficient way to get this data to your hands, you can do a lot of good things claims Kagan.
But there is one problem - the growth of the data and growth of the CPU power are not aligned. Actually the data growth is faster than Moore's Law can keep up and the CPU performance, the PC is not keeping up with the data and barely keeping up with Moore's Law.
With the gap growing, it means we cannot do the evolutions anymore. Like electric light didn't come from the improvements of candles, we need to think outside of this box of what can and what cannot be done in order to take advantage of this data says Kagan.
Kagan: “So what do we need to do? We need to use the data to generate the services itself. This is the artificial intelligence model, neural network model; how do you take the data and actually create the service that using the data. The machine learns from the data to do the job and the application or services are created by data itself.
“It's not a single individual processing unit, a human machine anymore. It's connecting different pieces of the network of machines and - by the network, and make them work together to create the services. It means take - optimised across the data centre, optimise across the cluster.
“If you do the right things we can do new things that we couldn't do before. It's not just doing the same thing faster, it's doing new things that we haven't done before. So one of the analogies is ants. A single insect like this can't do much. But many of them together, working together in a way that people don't really understand can build huge things. They build cities with ventilation and irrigation systems. This, by connecting to each other, by working together efficiency, you can do things which are not just linear extrapolation of what one person can do.
“You need to make machines that can learn from data. You need to make machines that program themselves. You need to enable the data processing or computing in the network, and in the storage. Computing and storage as separate entities is going to end. Last but not least, security is a really interesting topic that we will touch a little bit because, once we have more data, we can understand more and we can know more, and hence the security from crime to privacy to everywhere, okay.
“And so to the Cloud. Cloud is a different approach; it is moving from the server approach to service approach. When we ask people, at least until not very long time ago, you asked people whether they have electricity in the home, the answer would be usually, yes, but what they meant is that they have a plug in the wall and the electricity is actually generated by the electric plant far away out there.
“Until recently, when you asked people whether they have a computer at home, or have some storage disk at home, and the answer was also usually, yes, but what they meant is that you have a box under your table that you need to buy, you need to manage and so on and so forth, and that's your computer, that's your disk. But essentially there is not much difference between electricity as a service, and computing as a service; and that's what Cloud computing is all about.
“Now, when you want to create such a service, it means that you need to take multiple pieces of hardware, let's say, or multiple resources and assign them, or chain them together - these are microservices, together to create the application, or create the response to the request of the client. Now it becomes fairly tricky.
“If you think about the analogy of autonomous cars, today the car spends about 95 per cent of its life parked. But it's a very clear process. I leave my car, I lock it, it stays in the parking lot, and then I come back and I pick the car up. So most of the time the car does nothing. When we go to the autonomous cars, or on demand cars, then basically I'm asking for a car and it shows up in a minute in the place when I am. So now you need to connect these things and manage these resources.
“The same thing in the data centre. When I am looking for some sort of service, there are multiple resources that need to work together in the chain to provide the services. That's what the data centre is about. It's taking multiple resources and configuring them in a fraction of second to respond to the request from the consumer.
“Recently Oracle re-wrote their rack system and by moving from 10 Gig networking to 40 Gig networking, they got 50X performance boost. Not 4X, 50X. The reason is because they used their resources more efficiently.
“The other thing which is more recent is by using our network with Nvidia GPUs and Mellanox Interconnect, we have built a machine that for the first time is classifying images, or recognising images better than humans. Today, computers can see better than humans see. The image recognition on the computer is better than the human and this first happened in 2015 when Bidoo build this machine using our network and they couldn't achieve such a results with any other network.
“So one thing that I will touch a little bit on the storage; SNAP is the software defined networking and accelerator storage processing. If you look at the internet cloud, fabric, you have multiple machines connected to each - to the network and each one of these machines has its own storage. The problem with this is that not all the computers and not all the platforms are actually ready and can take advantage of this. So it works but then you need advanced technologists on each machine to take advantage of this.
“So what if you could have storage services? You could be able to represent anything on your cloud as the really true local device. Not network attached device that has some driver that presents APIs that you can use, which implies some changes on the host. But you see on your machine the legacy operating system, but this local device somehow magically can touch everything.
“We deliver this with our SmartNIC technologies, BlueField can present to the host, to the operating system, the local device can allocate resources from different machines on the network and make them available as storage services resource on the local machine. So this way, the utilisation of the storage goes up. You can actually utilise, and you can even over-subscribe your physical storage resources and it still works. We have a pilot project with the major cloud providers right now and they hopefully will move to production with this, either this year or next year.
“These things are usually called virtualisation. We virtualise the resources, this way or another, by running - by abstracting out the physical infrastructure from the application; and as with this analogy of the power plants.
“Fortunately enough, all these things can be optimised by off-loading fairly significant part of the virtualisation operation to the hardware, to network adaptors, or to the storage. But still, some part is out there running on the compute server and this would not be a big deal because this is a really small part.
“There's going to be much more applications and, again, the data; more and more data is being generated, more and more data will be available, and more and more things we can do; $4 trillion of revenue opportunity. It's all out there, we just need to take advantage of this.
“We have known Nvidia for quite a while. Actually, they've the fastest artificial intelligence machine in the world, which is the Oak Ridge Lab supercomputer built based on the Mellanox technology interconnect and Nvidia GPUs and IBM build the machines but if you look at most of the computing in the AI space it's done on the Nvidia. So they know us for a decade. Our technology enables much more efficient scaling out of the GPU computing and it's proven time and again and the Oak Ridge machine is one of this proofs.
“I think that these type of opportunities actually drive cloud providers outside of the data centre. One of these is obviously 5G. It is for various reasons, you know, starting from the simple physics, okay if we want to have things like decision-making at the fractions of the seconds, like for self-driving car or other things, then you need the decision-making point to be physically closer to the place that effects the decision. If you have multiple sensors that are physically located, one in the traffic lights, one in the car, one in the paving and one on top of the roof that all these sensors, so data is coming from the sensors that are aggregated to some point, which is a few metres away. You can make or you want to make a decision based on this data, you actually need to build the incidence of the data centre that makes the decision out there. It's in the base of the antenna, it can be in the car itself, it can be anywhere. It drives the data centre services and data centre paradigm outside.
“Now the point is that in order to build the things, you don't need to reinvent the wheel all over again. This efficiency of accessing the network, and the security topics that we discussed, they are pretty much the same no matter where you are physically located, if it's the antenna based, if it's the autonomous car or if it's the huge data centre like Google, Facebook or Amazon or whatever Azure.
“It means that, if you take the same architecture and you take the same or subset of the same software that used to run the data centre, developed by people like Microsoft, Google, Amazon all these guys are actually providing their data centre management tools. As you can buy them as a service, and now you can run your data centre inside the car or on the base of the antenna or any other aggregation point using the same tools. Now it's a matter of the implementation of different pieces of hardware to feed the dimensions and the power and the cost of that particular implementation.
But it turns out that software is much harder than hardware, and much more complicated than hardware. Being able to take advantage of this development in the data centre all the way down to the edge that's what IoT and 5G is doing. Actually, for companies like Mellanox, it presents a great opportunity, because they have a great engine for the data centre operating system out there, and it creates a huge opportunity for those types of products.
Michael Kagan spoke at NetEvents EMEA in Barcelona.
The full transcript of the session is available at https://www.netevents.org/wp-content/uploads/2019/01/Day-2-Keynote-Presentation-QA-Mellanox-draft.pdf