odysseus2000 wrote:As I have typed several times there is no certainty as to what will be needed for a full level 5 system. It may require Lidar and a host of other stuff, it may never be possible or it may be possible using sensors that are similar to a human and suitable neural nets.
Apologies for jumping in ... I've been skimming through the thread, but quite distracted so have probably missed much
I've quoted the above, but to be fair, my response could be applicable to a number of comments by different posters...
There are a few considerations that I hope the industry - and regulators - will be working out...
The MinimumI've seen a video from one research group where a (toy) car is being driven autonomously (for obstacle avoidance) using nothing more than a
monocular camera. The deep learning network behind it (and I believe the sole purpose of that specific research) was to demonstrate what could be done with a single camera that had been trained to infer depth from 'familiarity' - i.e. the semantic information in each single image. For example, by inferring scale and distance from the expected size of things in the scene - like people, trees, buildings, and so on.
I've just googled to try and find the above one again - unsuccessfully so far - but have found this which makes specific use of the monocular video for estimating the drivable area in front of the vehicle (just one aspect of what would need to be inferred from the images)..
http://www.cs.toronto.edu/~yaojian/freeSpace.pdf"
In this paper we propose a novel algorithm for estimating the drivable collision-free space for autonomous navigation of on-road and on-water vehicles. In contrast to previous approaches that use stereo cameras or LIDAR, we show a method to solve this problem using a single camera."
All in all, I would say there is already enough research out there to be confident that you could
in theory achieve a self driving vehicle using only 1 single camera.
But you would never find me riding in such a vehicle. For many reasons, but also importantly (and I don't think this has been mentioned, though I could easily have missed it)...
Fail SafetyStuff fails. Sensors get obstructed.
(Human) Drivers tend to know their vehicle. They can feel when something isn't right. They can adapt - drive more slowly, allow more time when they observe a 'risk' ahead if they know their brakes are feeling soft (e.g. when children playing at the side of the road, etc).
With the driver out of the control loop - no longer feeling how the car responds to their tap on the accelerator, etc, that in itself is going to require additional work on the 'intelligence' side. The AD system will need to monitor how well its demands get translated into actions - acceleration, steering or braking - and adapt accordingly (since there is always going to be some variance, not just in over time in a single vehicle, but also from unit to unit off of the production line. But the AD will also need to determine when expected 'adaption' becomes a cause for concern requiring a trip to the garage.
But equally importantly, the system will need to be able to identify whether the
sensors are operational - and when they need maintenance. They will need to distinguish between fog, spray and mist, compared to dirt on the lens that needs human intervention. They will need to identify smearing wipers on the camera's 'windscreen'. And so on.
Now to achieve this could be a sliding scale. Simply have 2 redundant sensors, and if they don't agree, then you stop the car, and tell the occupants to call out the breakdown truck.
But I suspect (hope!) that socially (i.e. through regulation) that that wouldn't be acceptable. There'd just be too many vehicles pulled up on the side of the road. And if one of two sensors fail - how do you know which? Could you even safely pull over to the side of the road if all you know is that the sensors don't agree - you don't actually know which you can trust to use in order to manouvre to a safe position.
I truly hope the regulators demand quite a high level of redundancy, purely to minimise the 'need to stop'. Far more sensors than is the theoretical minimum needed to actually achieve AD.
ExpectationsAD only needs to match humans to theoretically justify itself. But the public would never buy that. The Daily Mail and other scaremongering newspapers would report every accident as though AD were something to be afraid of. They wouldn't care about whether it matches humans or not.
And I think most people - me included - would really expect some degree of safety improvement from AD compared to humans. After all, that is part of the AD promise. Alongside the ability to put up your feet and read a book, browse the web, watch a movie, etc, on your commute home in your own personal car, we are also being promised that AD cars will always be on the lookout, never get tired, never get distracted.
So there is definitely a perception and (quite genuine) expectation that AD will surpass human drivers in terms of safety. So again, it's not down to what a human can do - we want AD to do better. It's not down to what sensors would match a human - it is a question of what sensors give a realistic safety improvement.
To throw some around some numbers plucked completely out of thin air to illustrate what I mean .. If you could match a human with (say) 2 sensors and a couple of mirrors, but - at reasonable cost - could half the accident rate per mile by using 20 sensors.... then I hope the regulators across the globe would look towards making the latter the minimum requirements for AD.
Though only where reasonably practical and cost effective. Even if AD only equalled humans, and that was the best it could do economically, then OK it's still justified. But if a small investment more in sensors-per-vehicle could slash accident rates - then that surely has to be considered as being the minimum required regulations.
ModularityOne other, slightly tangential, consideration is what about modularity?
I'd be really curious to know already how the different manufacturers are architecting their systems for modularity.
I mean, a straight forward ANN (artificial neural net), when it is trained, it is trained with a fixed number of inputs and fixed number of outputs. I suppose the convolutional aspect (i.e. that the same 'weights' are repeatedely applied across all nodes in a layer) does actually perhaps allow for a degree of flexibility, though that is likely only going to be limited to the first layer.
But then how do you deal with multiple sensors?
I think in practice, it's almost certain that any ANNs related to vision are going to be specific to a single camera input - if you have two cameras, each will have their own ANNs, and then they perhaps feed into more conventional algorithms that build up a more conventional 3d model of the surroundings upon which the driving decisions are then taken (e.g. the sort of things that Waymo show when illustrating AD systems, and I get the impression that Telsa provide some sort of similar representation of what the car 'sees' around it).
But it does raise the question of what standards will emerge. I mean, simplisitically, you can't just plug a lidar into an ANN that was trained for vision/camera.
To be flexible with sensors - to do the sort of experimentation suggested by some - needs desigining in up front! I mean, if you just gave someone the remit to
'hey, use 20 sensors 10 lidar - get it working', and didn't make it clear that you wanted what they produced to be flexible to experimentation by reducing the number of sensors - in order just to get something going, they might simply feed all the sensors into a single big ANN and let that learn to handle all of them. The problem is, that if you were to take one of the sensors out, then you'd basically have to start the entire training again!
It's a challenge!
It's going to be very interesting to see how the industry adapts to modularisation with AD.
I can see that the industry will need it. I can't imagine any company will last long without it. I mean, different groups are good a different things. Ultimately modularity will win the day.
But how?
AD needs to be real time. I could imagine multiple 'modules' taking the input image/video from each camera.
- You might have ANNs for object recognition (pedestrians, cyclists, other vehicles, etc) in order to reason about the 3d space - work out who's going where, what might collide etc.
- Another ANN connected to the same camera but developed by a different team that concentrates on 'drivable' area - wheres the road edge, where are the potholes, etc.
- You might have another (still connected to the same camera) ANN for looking for road markings, and other highway code signs related to directions, bus lanes, emergency vehicle lanes, no entry signs, etc.
- You might have another ANN related to identifying traffic lights, pedestrian crossings and other scenarios which relate to the actionable rules of driving - where you might have to yield way in certain circumstances.
- You might have another related to reading sign posts - diversions, speed limit signs, road works,
- You might have another looking out for policemen/policewomen directing traffic (this is something that google were already demonstrating their systems could do before they split off into Waymo)
But then how do you standardize this information so that you could interchange modules? Certainly a very intersting technical challenge.
A lidar won't be able to recognise a pedestrian as a pedestrian. So the output of a lidar sensor / associated ANN isn't going to be the same semantic information as would come from the ANNs related to a camera - you couldn't swap a lidar module (incl ANNs) with a camera module (including ANNs)
In fact, in reality the lidar is likely to complement the camera (plus ANNs) outputs - a central AD processing unit would likely fuse these together in a complimentary way - overlaying the lidar information onto the elements identified by the camera / ANNs in order to add reliable depth information to the semantic information of the camera / ANN outputs.
When you look at it that way, I really can't imagine why anyone would dream of
not fusing the two together. I mean, OK, lidar is only valid in certain circumstances - it doesn't cover everywhere, so yes, true, there would always be some areas of the images - outputs from the camera/ANN which wouldn't have lidar data, and the central AD processing unit would have to know how to deal with these - and you could argue, well if it can deal with those, it can deal without any lidar then... in theory yes, ..
..but really... lidar (and radar, ultrasound, etc) all work well at short to medium distances - i.e. where you are most likely to hit pedestrians, cyclists, animals, etc. It just seems insane not to include this extra dimension of information to make a far more robust model from which to act.
Anyway, it really does feel like we are in the midst of a revolution.
In another 5 years, when AD becomes more widespread and mainstream, it'll be interesting to watch what become the new buzzwords. What will the technical afficionados talk about - today the geeks talk about cylinders, exhausts, catalytic converters, fuel injection...
.. just imagine in 5 years time, I could easily see the language change completely - the talk being of what modules your car has fitted - what neural modules you have - what processing units you have processing this information - who manufactured them, etc.
To bring it on topic, I think this could trip Telsa up somewhat. They currently claim their cars have all the hardware they need for AD. I think in a few years, Telsa - and the cars' owners - will look back and think that was naïve.
Even if Tesla manage to get some software update AD retrofitted to their current hardware, I think the owners will see where the industry has moved to, and most probably won't make the software upgrade. I suspect most will rather go with the better modularized, better standardized vehicles that are still currently at the design / concept stage.
Companies like Waymo and nVidia who arean't actually producing their own cars, are entirely reliant upon their offering being something that can be modular fitted to other companies vehicles.
And quite rightly, different companies will demand different sensors and sensor configurations to differentiate themselves from the competition.
Consumers will demand different - I mean, those paying large sums - the BMW / Merc / Audi drivers of today will demand far better, far more powerful sensors than the bottom end models.
And the AD processing units from Waymo and Google will have to adapt to this different demand.
Moreover, they will ultimately need to build their systems that allow other sensor manufacturers / developers to fuse their sensors into the nVidia and Waymo systems.
If a company like Telsa has already built cars not designed for this flexiblity, then I suspect they'll be left behind.
I suspect that the commercial reality will quickly mean Telsa (and the rest) moving towards more open standards. But that won't be good for cars built before those interchangeable standards were finalised. If I were buying a Telsa car now, I wouldn't get too excited about the promises of full automation on the existing hardware from future software updates.
I think ultimately - once the dust starts to settle, and the regulations start to catch up - I think the complexity of the AD modules (at the module level, with all the different ANNs serving different functions, getting fused into additional processing modules, etc) could match or surpass the complexity of the mechanical components within an engine.
And with the need for these modules to be hard real time (
https://whatis.techtarget.com/definitio ... ime-system ) / safety critical, I just cannot see how you could hope to achieve this from a purely software upgrade.
My current bet would be that the ANNs get manufactured into dedicated neural network circuitary that is self contained within the module (there are already prototypes that show very fast, very low power function of neural networks if you put them onto a chip dedicated for that purpose - for AD it isn't going to be something that is computed on a a regular general purpose CPU)
It's tempting to think you wouldn't buy a camera, you'd instead buy a camera + ANN as a single package, with a standardised output that you plug into your AD controller.
But even that is probably not enough. With the vaious functions that you could perform from the image from a camera (see list above).
And modularity doesn't necessarily mean consumer level modularity. It maybe that the modularity is like different integrated circuits that the manufacturer can choose to include in the circuitary for a specific model. But it doesn't change the need for some degree of standardization of the inputs and outputs of the various modules.
Anyhow, sorry for rambling.... if only this technology were coming around 15 to 20yrs ago, shortly after I left uni.... I'd love the technical challenge of working properly on a serious AD system (not a toy hobby system) ... I was creating a neural net (before deep learning came around) to navigate a robot around a maze back when I was at uni... I would have loved to have progressed onto working on proper, real world AD systems.
For what it's worth (probably not a lot, though it may give an idea how I see the industry) ... I'd more than happily go to work for Waymo / nVidia or any of the established car manufacturers to work on AD systems.
I'd be more hesitant at working for Tesla / Uber.
Why? Gut feeling.
Put simply - Waymo and nVidia seem to be serious about doing proper - 'offline' (Away from consumers) - development before unleashing on the real world consumer. And that is good. Similarly for the established car manufacturers, they already have experience of developing 'enhanced' systems (like antilock brakes, etc) that are of a safety critical nature, and they understand the need for proper engineered development.
Tesla / Uber - purely in my own personal opinion - seem to be a little too cavalier at pushing our their attempts too soon. It has come as no surprise to me that they are the first to have resulted in fatalities. I don't think my style of working / my approach to development and design would suit working at either of these companies for AD.
And for the same reason, I also suspect that Tesla and Uber are also inadvertantly setting themselves up to get wrong footed by regulations.
If the other manufacturers can demonstrate substantially better safety and better reliability with their lidar (or other additional sensor fusion), etc, then I suspect - hope even - that the regulators will look to make those kinds of systems the model for regulations.
And that could really throw a spanner in the works to other efforts that have tried to cut corners, do things on the cheap, skimp on sensor inputs, etc.
Sorry .... I'll stop now...