Neuronal mesh recognition of the image. Development of an image recognition system based on a device of piece neural networks

As one of the tools for rozv'yazannya trudnoformalіzovannyh zavdan vzhe Bulo said to dosit rich. And here, in a nutshell, it was shown how to measure zastosovuvat for recognizing an image, as well as creating an evil captcha. However, types of neuromerezh are known to be rich. I chi good classic povnozv'azkova neuron mesh (PNS) for zavdannya recognition (classification) of the image?

1. Manager

Otzhe, we chose to virishuvate the manager of the recognition of the image. You can also recognize objects, symbols, etc. I'll take a look at the task of recognizing handwritten numbers. The task is good for low reasons:

    For recognizing a handwritten symbol, it is important to add formalizations (not intellectual) algorithm and it becomes more sensible, just look at the same digit written by different people

    The task is up to date and may be submitted to OCR (optical character recognition)

    Іsnuє freely developed base of handwritten symbols, available for download and experiments

    It is necessary to complete a lot of articles on this topic and you can easily and easily balance the differences by hand

As the input data is encouraged to win the MNIST data base. The database contains 60,000 initial pairs (image - label) and 10,000 test pairs (image without labels). The image is normalized for size and water centering. The size of the skin number is not larger than 20x20, but it is inscribed in a square with a size of 28x28. The butt of the first 12 digits from the initial MNIST base set is pointed at the little one:

In this rank, the task is formulated by the coming rank: create and learn a neural network for recognizing handwritten characters, accepting their images at the input and activating one of 10 outputs. Under the activation, the value 1 on the output is reasonable. The value of other outputs from one's own fault (ideally) is equal to -1. Why the scale is not victorious, I will explain later.

2. "Primary" neurotransmitters.

The majority of people under the "singular" or "classic" neuromeasures are reconciled with new lingual neural networks of direct latitudinal zirvorotnym roses of all pardons:

As you can see the name in such a way, the skin neuron is connected with the skin, the signal goes only straight from the inlet ball to the outer one, there are no regular recursions. Let's name such a merezha quickly Poland.

It is necessary to cross the back of the head, how to submit data for entry. The simplest and most non-alternative solution for PNR is to visualize the two-world image matrix in the form of a one-world vector. Tobto. for the image of a handwritten figure with a size of 28x28 and matimemo 784 enter, which is already not enough. Dalі vіdbuvaєtsya those for scho neuromeasures and їх methods of a lot of conservative scientists do not like - the choice of architecture. And not to love, more choice of architecture - tse pure shamanism. Until now, there are no methods that allow you to uniquely designate the structure and structure of the neuromeasures that appear in the description of the problem. On the contrary, I’ll say that for hard-to-formalize tasks it’s unlikely that such a method will be created. In addition, there are various impersonal methods of reducing measures (for example, OBD), as well as various heuristics and empirical rules. One of these rules is to say that the number of neurons in the attached ball can be an order of magnitude greater than the number of inputs. Just take it to the point of respect, that by itself the transformation from an image into an indicator of a class is more folded and strictly non-linear, one ball cannot do here. Vykhodyachi from the above-mentioned verse, it is rudely estimated that the number of neurons in the attached balls will be in order 15000 (10,000 for the 2nd ball and 5,000 for the third). At the same time, for the configuration with two attached balls, the quantity laced and initial ties will be 10 million between the entrances and the first attachment ball + 50 million between the first and the other + 50 thousand. between others and those days, so that we have 10 days out, for each of them we indicate a number like 0 to 9. Together, rudely 60 000 000 calls. I didn’t guess for nothing what the stench is tuned in - it doesn’t mean that when training for the skin, it will be necessary to calculate the grace of the gradient.

What's wrong here, the beauty of the piece intellect wins the victims. Ale, just think about it, fall into a thought, that if we transform the image into a linear lancet byte, we are irrevocably lost. Moreover, with the skin ball, the cost is less likely to be overcome. So i є - mi vtrachaєmo the topology of the image, tobto. vzaєmozv'yazok mizh okremimi parts. In addition, the manager of the recognition of the transmission of neurotransmitters will be steady until small damage, turning and changing the scale of the image, tobto. it is guilty of vitiaguvati z danih deyakі іnvarianti, yakі lie in the handwriting of those chi іnshої people. Then what kind of neuromeasure can be, so that it’s not even more foldable and, at one time, more invariant to other creations of the image?

3. Zgortkovі neuron networks

The solution to the problem was found by the American visionary of the French adventure Jan LeCun, the inspired robots of the Nobel laureates in medicine Torsten Nils Wiesel and David H. Hubel. Tsі vchenі dоslіdzhuali zorova cortex of the brain of the intestines and showed that it is so called simple clitiny, yakі especially strongly react to straight lines under different folds and folded clitiny, yakі react to ruh lines in one straight line. Yan LeKun, having called vicorist so called bundles of neuron strings.

6. Results

The program on matlabcentral is provided with a file of already trained neuron, as well as a GUI for demonstrating the results of the work. Below is the application of recognition:



For the benefit of the table, there is a table of matching methods for recognizing MNIST. The first time for the light-hearted neuromeasures with the result of 0.39% pardon recognition. Most of these pomilkovo recognizable images are not correctly recognizable by skin people. In addition, in the robot there were elastic creations of input images, as well as in front of the head without a teacher. Ale about qi methods yak-nebud in other articles.

Posilannya.

  1. Yann LeCun, J. S. Denker, S. Solla, R. E. Howard and L. D. Jackel: Optimal Brain Damage, in Touretzky, David (Eds), Advances in Neural Information Processing Systems 2 (NIPS*89), Morgan Kaufman, Denver, CO, 1990
  2. Y. LeCun and Y. Bengio: Convolutional Networks for Images, Speech, and Time-Series, in Arbib, M. A. (Eds), The Handbook of Brain Theory and Neural Networks, MIT Press, 1995
  3. Y. LeCun, L. Bottou, G. Orr and K. Muller: Effective BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998
  4. Ranzato Marc "Aurelio, Christopher Poultney, Sumit Chopra and Yann LeCun: Effectively Learning Sparse Representations with Energy-Based Model, in J. Platt et al. (Eds), Advances in Neural Information Processing Systems (NIPS 2006), MIT Press . 2006
1

A neural network is a mathematical model of that її zdіysnennya as a software and hardware implementation, like a viconano z urakhuvannya modeling of the activity of biological neural networks, such as networks of neurons in a biological organism. Scientific interest in the structure of the system led to the fact that the following model allows us to take information about the system. This model is similar to the practical implementation of modern science and technology at the bottom of the galleys. The article examines the nutrition that is being used by those neural networks for the development of image identification systems, which may be widely used in security systems. Reported nutrition, related to the topic of the algorithm of recognition of the image and yoga zastosuvannia. Briefly provided information about the method of training neural networks.

neural networks

help for neural networks

image recognition

paradigm of local acceptance

security systems

1. Yann LeCun, JS. Denker, S. Solla, R.E. Howard and L. D. Jackel: Optimal Brain Damage, Touretzky, David (Eds), Advances in Neural Information Processing Systems 2 (NIPS*89). - 2000. - 100 p.

2. Zhigalov K.Yu. The method of photorealistic vectorization of these laser location data with the method of far-reaching in the internal combustion engine // Vistishchih nachalnyh zakladіv. Geodesy and aerial photography. - 2007. - No. 6. - S. 285-287.

3. Ranzato Marc'Aurelio, Christopher Poultney, Sumit Chopra and Yann LeCun: Effectively Learning from Sparse Representations from the Energy-Based Model, in J. Platt et al. (Eds), Advances in Neural Information Processing Systems (NIPS 2006). - 2010. - 400 p.

4. Zhigalov K.Yu. Preparation of technology for the selection of automated control systems for the life of autoways // Natural and technical sciences. - M., 2014. - No. 1 (69). - S. 285-287.

5. Y. LeCun and Y. Bengio: Convolutional Networks for Images, Speech, and Time-Series, in Arbib, M. A. (Eds) // The Handbook of Brain Theory and Neural Networks. - 2005. - 150 p.

6. Y. LeCun, L. Bottou, G. Orr and K. Muller: Effective BackProp, in Orr, G. and Muller K. (Eds) // Neural Networks: Tricks of the trade. - 2008. - 200 p.

Today, technological and scientific progress is expanding to new horizons, rapidly progressing. One of them is the modeling of the superfluous natural world with the help of mathematical algorithms. In this aspect, it is based on trivial, for example, modeling of the cracking of the sea, and on the edge of folding, non-trivial, rich-component tasks, for example, modeling of the functioning of the human brain. In the process of completing this nutrition, it was seen that there was a clear understanding - a neural network. A neural network is a mathematical model of that її zdіysnennya as a software and hardware implementation, like a viconano z urakhuvannya modeling of the activity of biological neural networks, such as networks of neurons in a biological organism. Scientific interest in the structure of the system led to the fact that the following model allows us to take information about the system. This model is similar to the practical implementation of modern science and technology at the bottom of the galleys.

A short history of the development of neural networks

Varto signifies that the understanding of the "neuronal mesh" is taken from the robots of American mathematicians, neurolinguists and neuropsychologists W. McCulloch and W. Petts (1943), de authors first guess about it, give it a purpose and work on the first test of the model neural networks. Already 1949. D. Hebb introduces the first learning algorithm. Given the low success rate of neuronal learning, the first working prototypes appeared around 1990-1991. last hundred years. At that time, there were no prote of counting strains settling down to finish the robotic neural nets. Until 2010, the GPU fatigue of video cards greatly increased and it became clear that programming without intermediary on video cards appeared, which significantly (by 3-4 times) increased the productivity of computers. In 2012 neuromerezhi first won at the ImageNet championship, which was marked by their further furious development of the emergence of the term Deep Learning.

In the current world of neural networks, there is a colossal ohoplennya, it is important to consider the follow-up to be carried out in the gallery of the training of behavioral features and the development of neural networks, very promising. The list of areas, in which neural networks were known to be stagnant, is majestic. Tse and recognition and classification of images, and forecasting, and development of approximations, and some aspects of data compression, analysis of data, and, obviously, the design of systems of a safe nature.

Research on neural networks is being actively explored today in scientific communities of different countries. With such a view, it is presented as an exception to a number of methods of pattern recognition, discriminant analysis, as well as methods of clustering.

It also means that by stretching the rest of the rock at startups in the gallery of image recognition systems, more funding was seen for the previous 5 years, which is to say about increasing the popularity of this type of retail on the Kiev market.

Zastosuvannya neural networks for recognition of images

Let's look at the standard tasks, which are violated by neural networks in the appendage to the image:

● object identification;

● recognition of parts of objects (for example, osib, hands, nig toshcho);

● semantically distinguish between objects (allowing to deprive only between objects in the picture);

● semantic segmentation (permits division of images on different environments of objects);

● view normals to the surface (allows you to convert two-dimensional images to three-dimensional images);

● seeing objects of respect (allows you to signify those on which the person in this image would have given respect).

Varto signify that the task of recognizing the image has a bright character, the development of this task is to be put together by an extraordinary process. When vykonannі razznіvannya as an object can be a human person, a handwritten figure, as well as impersonal other objects, which are characterized by a low unique sign, which simplifies the process of identification.

Whom will study the algorithm of creation and recognition of handwritten symbols in the neural network. The image will be considered as one of the inputs of the neural array, and one of the outputs will be the task for displaying the result.

At this stage, it is necessary to briefly summarize the classification of neural networks. Today there are three main views:

● hump neurons (CNN);

● recurrent measures (deep learning);

● training for support.

One of the most common examples of neural networks is the classical topology of neural networks. Such a neural network can be represented as an external connection graph, with a characteristic pattern and a direct extension of information and a reverse extension of the signaling about a pardon. Tsya technology does not have recursive powers. An illustrative neural network from the classical topology can be shown in Fig. one.

Mal. 1. Neuronal mesh with the simplest topology

Mal. 2. Neuronal mesh with 4 balls of attached neurons

One of the clearly significant minuses of this topology of the land is the overworld. For the rahunok of transcendence, when submitting data from the view, for example, a two-world matrix at the entrance, you can take a one-dimensional vector. So, for an image of a handwritten Latin letter, described for an additional matrix of 34x34 size, 1156 inputs are needed. It’s not worth talking about those that are counting the strains that are involved in the implementation of the hardware and software solution given to the algorithm, to be too great.

The problem was solved by the American scientist Jan Le Kun, who was the author of the analysis of the Nobel Prize laureates in medicine T. Wtesel and D. Hubel. Within the framework of the study conducted by them, the brain cortex became the object of the study. Analysis of the results showed that in the cortex there are a number of simple cleats, as well as a number of folding cleats. Simple clitins reacted to the otrimane in the form of receptors of the image of straight lines, and folds - translational movement in one straight line. As a result, the principle of inducing neuronal networks, ranks was viroble. The idea of ​​this principle was based on the fact that for the implementation of the functioning of the neuron mesh, there is a drawing of short balls, as it is customary to mean C - Layers, subsampling balls S - Layers and re-sampling balls F - Layers at the exit from the neuron mesh.

Three paradigms lie at the basis of such a kind of family - the paradigm of local adoption, the paradigm of vag, which are divided, and the paradigm of subdiscretization.

The essence of the paradigm of local perception is that in the skin, the input neuron does not receive the entire image matrix, but only a part. Other parts are supplied by other input neurons. In this way, it is possible to design the mechanism of parallelization, for the help of a similar method, it is possible to save the topology of the image from ball to ball, richly processing it, so that in the process of processing, a sprat of neuron meshes can be twisted.

The paradigm of vag, which are divided, to talk about those who, for impersonal ties, can only sing a small number of vag. Dani sets mayut the name of the "core". For the final result of the processing of the image, it can be said that the images are divided, positively add to the power of the neural network, with further behavior, the building of the significance of the invariants in the images and filtering the noise components does not change.

Viewing from the visceral deposit, you can create visnovi about those that, when the procedure of imaging on the basis of the kernel is stopped, the image will appear in the image, the elements of which will be the main characteristic of the filter's performance, so that the generation of the card will be a sign. The whole algorithm is shown in fig. 3.

Mal. 3. Algorithm for generating a sign map

The paradigm of subsampling in that which changes the input image of the area changes the space dimension of the mathematical equivalent - the n-dimensional matrix. The need for subsampling appears in the invariance of the scale of the output image. When the technique of drawing balls is fixed, the possibility of generating new maps is a sign of obvious, so that the practical implementation of this method is based on the fact that the generation of a rich matrix will be transferred to a vector, and then to a scalar value.

Implementation of neural network training

The main dimensions are divided into 3 classes of architectures at a glance:

● supervised by a teacher (percepton);

● learning without a teacher (measurement of adaptive resonance);

● zm_shane navchannya (merezhі radial-basic functions).

One of the most important criteria for evaluating the robotic neural network in image recognition is the quality of image recognition. Varto indicate that for a calculus assessment of the clarity of image recognition behind the auxiliary functioning of the neural layer, the mean-square pardon algorithm is most often used:

(1)

In tsіy zalezhі Ep - p-and pardon recognition for parity neurons,

Dp - scoring the output result of the neuron mesh (calculate the mesh is 100% guilty of recognition and recognition, but the construction O(Ip,W)2 is the square of the mesh output, which is to lie in the p-th input and the set of coefficients Vags W. Before the construction, the cores of the throat, and the coefficients of all versions are included. Forgiveness pardon counts for the calculation of the arithmetic mean for all pairs of neurons.

As a result of the analysis, a regularity was found that the nominal value of the vag, if the value of the pardon is minimal, it is possible to decommission people from the fallow land (2):

(2)

From the point of fallow, it can be said that the calculation of the optimal vag is arithmetic for the difference in the first-order pardon function according to the vase, divided into later pardon functions of a different order.

Placement of fallows gives the possibility of a trivial calculation of pardon, which is the case with the outside ball. Calculation of the pardon in the attached balls of neurons can be implemented using the help of the method of free pardon for the whole pardon. The main idea of ​​the method is based on extended information, on seeing signals about a pardon, on output neurons up to the input ones, so that it is direct, reversed in terms of extension of signals on neuron lines.

Varto also means that the training is carried out on specially prepared image bases, classified into a large number of classes, and takes a lot of time.
Today, the largest database is ImageNet (www.image_net.org). There is no cost access to academic institutions.

Visnovok

As a result of the above-mentioned varto, it appears that neural networks and algorithms, implemented on the principles of their functioning, can find their place in fingerprint recognition systems for internal references. Often the very software component of the software and hardware complex, direct recognition of such a unique folding image, like little ones, є іdentifіkatsiynymi dannymi, vyrіshuє їє zavdannya її zavdannya povnoy miroy. The program, implemented with improved algorithms, the basis of which to enter the neural network, will be more efficient.

Summing up, you can sum it up:

● neural networks can know how to recognize both images and texts;

● This theory gives the opportunity to talk about the creation of a new promising class of models, and models themselves based on intellectual modeling;

● neuron measures of the building before the start, which will tell you about the possibility of optimizing the process of functioning. Such a possibility is an extremely important option for the practical implementation of the algorithm;

● Evaluation of the algorithm for recognizing images with additional data on neural networks, which may include a large number of values, depending on the mechanism for the construction of parameters of the required value with additional calculation of the required coefficients.

On this day, far away from neural networks, it is a promising area of ​​research, so that it can be successfully occupied in an even greater number of science and technology galleries, as well as the activity of people. The main voice in the development of modern recognition systems is collapsing into the field of semantic segmentation of 3D images in geodesy, medicine, prototyping and other areas of human activity - it is necessary to use folding algorithms and it is connected with cis:

● due to sufficient number of databases of reference images;

● the availability of a sufficient number of qualified experts for the cob education system;

● Images are not saved in pixels, which will require additional resources, like a computer, and a retailer.

It also means that today there is a large number of standard architectures for neural networks, which actually makes it easier to set up a neural network from scratch and lead up to the selection of the structure of the network, which in the case of specific tasks.

At the present time, there is a large number of innovative companies on the market that are engaged in the recognition of images from the various neurotransmission technologies of the system. Dostemenno vіdomo, scho have reached the accuracy of recognition of images in the region of 95% with a different data base of 10,000 images. For all the reach you can reach static images, with a video sequence, everything is more folded at a time.

Bibliographic request

Markova S.V., Zhigalov K.Yu. DEVELOPMENT OF THE NEURAL MATERIAL FOR THE CREATION OF THE IMAGE RECOGNIZATION SYSTEM // Fundamental research. - 2017. - No. 8-1. - S. 60-64;
URL: http://fundamental-research.ru/ru/article/view?id=41621 (date of submission: 03/24/2020). We respect the journals that are seen in the journals "Academy of Natural History"

It was designed to look at neuromechanical methods, which are used in the recognition of images. Neural network methods are methods based on a variety of different types of neural networks (NN). The main directives for sizing various SRs for recognizing images of that image:

  • zastosuvannya for otrimanya key characteristics or a sign of task images,
  • classification of the images themselves, or even the selected characteristics of them (in the first case, the key characteristics are implicitly identified in the middle of the line),
  • vyshennya optimization tasks.

The architecture of piece NNs can be similar to natural neural networks. NS, recognized for the completion of different tasks, can be directly modified by the algorithms of functioning, and its main characteristics are as follows.

SR is made up of elements called formal neurons, which in themselves are more easily related to other neurons. The skin neuron transforms the set of signals, which is up to the input of the output signal. The connections between neurons themselves, which are encoded by Tereses, play a key role. One of the advantages of SR (as well as shortcomings in the implementation of them on the last architecture) is the fact that all the elements can function in parallel, by the same token, it increases the efficiency of the task, especially in the design of the image. In addition, HP allows you to effectively make a rich task, stench to put on tight bellies and universal training mechanisms, which is the main advantage over other methods (imovirnіsnі methods, linear branches, virshchalny trees). It will help you to choose the key features, their significance and distinguish between the features. Prote selection of external representation of input data (vector in n-world expanse, frequency characteristics, wavelets, etc.), it directly influences the clarity of the solution and the overall theme. SR mayut garnu zagalnyuyuchu zdatnіst (better than the lower in the virishalnyh trees), then. can successfully expand the dosvіd, rejection on the final initial set, on all impersonal images.

Let's describe the NA stasis for recognizing an image, indicating the possibility of a stasis for recognizing a person's image.

1. Bagatospheric neural networks

The architecture of the bugatospheric neural network (MNS) is composed of sequentially connected balls, de neuron of the skin ball with its inputs of the connections with the neurons of the anterior ball, and with the outputs of the offensive one. SR with two virtual balls can, with some accuracy, approximate a rich function. SR with one virishal ball is designed to form linear dividing surfaces, which sound strongly around the leaders of their decoupling, but such a mesh cannot solve the task of the "include or not" type. SR with a non-linear function of activation and two virish balls allow shaping whether or not bulging areas in the space of solution, and with three virish balls - areas of whether or not folding, including non-vibrating ones. With this, the MNS does not waste its vital building. The Ministry of Taxes and Miners is learning to help the algorithm of the reverse spring pardon, which is by the method of gradient descent in the space of the territory with the method of minimizing the total pardon of the pardon. With this pardon (more precisely, the magnitude of the correction of the vag) expands at the gate straight from the entrance to the exit, the crease of the vag, which triggers the neurons.

The simplest way is to create a one-balloon NS (called auto-associative memory) in order to recall the images that are presented. Submitting an image on the input test and calculating the accuracy of the reconstructed image, you can evaluate how many lines the input image was recognized. The positive power of this method depends on the fact that the measure can be used to create noise in the image, but it is not suitable for more serious purposes.

Mal. 1. Bagatosharova neuron mesh for classifying images. The neuron with the maximum activity (here the first one) indicates the relevance to the recognized class.

The MNS is also selected for a non-intermediate classification of an image - at the input it is given either the image itself in front of me, or the set of key characteristics of the image earlier, at the output the neuron with maximum activity indicates the presence up to the recognized class (1). If the activity is lower than the current threshold, then it is important that the image is not given until the same class. The process of training will establish the validity of submissions for the entry of images from the validity to the first class. Tse is called a teacher from a teacher. At zastosuvanni before recognizing people of the image, such a pidhid is good for managing access control of a small group of osib. Such a pidkhid is safe without intermediary matching between the images themselves, and even with the greater number of classes, the learning of that work and the measure grows exponentially. Therefore, for such tasks, like a search for similar people in a great database, it requires a compact set of key characteristics, on the basis of which you can create a search.

Approached to the classification from the list of frequency characteristics of the entire image descriptions in . A one-ball SR was fixed, based on rich-valued neurons. 100% recognition on the basis of the MIT data was recognized, but at the same time, the recognition of the medium of the image was recognized, as a measure of the boulevard.

The design of the MNS for classifying the image of the features on the basis of such characteristics, as it can be between certain specific parts of the individual (nis, mouth, eyes) is described in. In this case, tsі vіdstanі were submitted to the entrance of the SR. The hybrid methods were also victorious - the first one was given the results of processing the attached Markov model, and the other one was given the result of the work of the SR being fed to the input of the Markiv model. Another person did not dare to worry about what to say about those who had a sufficient result of the classification of SR.

Shown is the zastosuvannya SR for classifying images, if the results of the decomposition of the image according to the principal components method are found on the input.

In the classical MNS, interspheral neuron sprouts appear, and the image is presented in the form of a one-dimensional vector, although it is not two-dimensional. The architecture of the zgortkovo SR is straightened on the hem of these short ones. They had local receptor fields (to ensure local two-dimensional connectivity of neurons), deep veins (to ensure the detection of specific rice in whatever area of ​​the image), and hierarchical organization with spacious subsampling. Zgortkova SR ensures the regular stability up to the change in scale, zsuviv, turns, and creation. The architecture of the CHP is composed of rich balls, skins from some of the areas, moreover, the neurons of the frontal ball are connected only with a small number of neurons of the anterior ball from the outskirts of the local area (like in the golden cortex of a person). Eyes in the skin point of the same plane are the same (harrow balls). Behind the narrow ball is the next ball, which replaces its roaming with a path of local average. Let's sweat again the golden ball, and so on. Such a rank reaches the ієrarchіchna organization. Pіznіshі balls vytyagüyut zagalnіshi characteristics, scho less lie in the picture. The CHP is trained by the standard method of turning back the pardon. The division of the Ministry of Taxes and the National Socialist Republic showed the sufficiency of the remainder of the difference, both for the swedishness and for the superiority of the classification. Corrinity of the power of SNR є І, Scho Characteristics, Shaho to form at Verkhnіk Shavіv Ієєрхії, you can boutique for clasifіkatsії at the method of a snap-free Sousіda (Fortune, Communications E Euccіdovo Vіdstan), breech can be successfully Vityuvati Characteristics and in the forever . CHP is characterized by swidka shvidkіst navchannya that work. Testing the CHP on the basis of ORL data, which compensates for image errors with small changes in lighting, scale, spatial rotations, position and various emotions, showed approximately 98% recognition accuracy, and for these objects, options were presented in the selected images, Such a result is to make this architecture promising for distant developments in the sphere of recognition of images of spacious objects.

MNS zastosovuyut and for the manifestation of objects of the singing type. In addition, if the MNS has been taught the singing world, you can determine the validity of the images to “its own” classes, you can learn the special detection of the singing classes. For this type, the outward classes will be the classes that lie and not lie before the given type of images. At the stasis, the neuron detector showed the appearance of the image in the input image. The image was scanned in a window of 20x20 pixels, as it was submitted to the entrance of the fence, which is virishu, chi lie down to class osib. The training was carried out both from a selection of positive examples (different images of specialty), and negative ones (images that are not special). In order to promote the superiority of the victorious detection, the team of the SRs, trained with different cobs, after which the SRs were pardoned in a different way, and the rest of the decision was accepted by the votes of the entire team.

Mal. 2. Head components (moisture individuals) and layout of the image on the head components.

SR is also used for the study of the key characteristics of the image, and then it is selected for the offensive classification. Indications, the method of neuromechanical implementation of the method of analysis of the main components. The essence of the method of analysis of the main components is based on the selection of maximally decorated coefficients, which characterize the input images. Numbers of coefficients are called head components and are scored for statistical image compression, for which a small number of coefficients are scored for presentation of the whole image. NS with one attached ball to avenge N neurons (as a lesser lower image diversity), trained for the method of reverse expansion of pardoning the image on the output, filing on the input, forming on the output of the attachment neurons on the coefficients of the first N head components, yaknya for victorious. Sound victorious from 10 to 200 main components. With larger component numbers, the representativeness is greatly reduced, and there is no sense to win over components with greater numbers. From the variety of non-linear activation functions of neuron elements, a non-linear decomposition into the main components is possible. The non-linearity allows you to more accurately represent the variations of the input data. Zastosovuyuchi analysis of the main components to the decomposition of the image of the body, we take away the main components, the titles of powerful people (holons in robots), such as the self-powerful core of power - to establish the components, as in the main they reflect such essences of the individual as to become, race, emotions. When the components are renewed, they may look similar to the person, moreover, the first ones show the most vulgar form of disguise, the rest - the difference between other persons (Fig. 2). Such a method is well used for looking for similar images in great data bases. It is also shown the possibility of a further change in the expansion of the main components for the help of NS. Assessing the quality of the reconstruction of the input image, you can even accurately determine its belonging to the special class.

AlexNet is a short-lived neuron network, which has stuck to the development of machine learning, especially to computer science algorithms. Merezha won the ImageNet LSVRC-2012 image recognition competition in 2012 with great success (15.3% pardons vs. 26.2% in another month).

The AlexNet architecture is similar to the Yann LeCum-built LeNet. However, AlexNet has more filters on the balls and the nested balls. The measures include shirring, maximum pooling, dropout, data augmentation, ReLU activation functions, and stochastic gradient descent.

Features of AlexNet

  1. As an activation function, Relu replaces the arc tangent to add a non-linear model. For the same size, with the same accuracy of the method, the speed is 6 times faster.
  2. The replacement of the dropout instead of regularization eliminates the problem of redirection. However, the hour of learning is subtracted from the indication of a dropout of 0.5.
  3. To be carried out a reconciliation of the change of rozmіru merezhі. For the quarters of the first rіven pardons of the first and the fifth rіvnіv are reduced to 0.4% and 0.3%, apparently.

ImageNet dataset

ImageNet - a set of 15 million image recognitions from a high-rise building, divided into 22,000 categories. The images were collected from the Internet and hand-picked to help crowdsource Amazon's Mechanical Turk. Starting in 2010, the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) is being held as part of the Pascal Visual Object Challenge. The challenge wins part of the ImageNet dataset with 1,000 images in a skin of 1,000 categories. A total of 1.2 million images for training, 50,000 images for proofreading and 150,000 for testing. ImageNet is made up of images from various different buildings. The volume for the competition is scaled to a fixed resolution of 256 × 256. Although the image was straight-cut, it is cut to a square in the center of the image.

architecture

Malyunok 1

The architecture of the yard is brought to a small scale 1. AlexNet to avenge all the balls with your coefficients. The first five of them are short, and the last three are fully connected. The external data is passed through the softmax input function, which forms a subdivision of 1000 class marks. Merezha maximizes the bagatolin logistic regression, which is equivalent to the maximization of the average of the initial initial fluctuations in the logarithm of the fluency of the correct labeling for the rozpodіl ochіkuvannya. The cores of the other, the fourth and fifth spherical balls are connected only with these core cards in the front ball, as if they were on the same graphics processor. The nuclei of the third spherical ball are tied with the mustache cards of the nuclei of another ball. The neurons of the outer balls are connected with the antennae by the neurons of the anterior ball.

In this manner, AlexNet will avenge 5 short balls and 3 additional balls. Relu zastosovuєtsya after dermal pharyngeal and recurrent ball. The dropout zastosovetsya before the first and the other repetitive balls. Merezha avenue 62.3 million parameters and window glass 1.1 billion calculated with direct passage. Golden balls, on which 6% of all parameters fall, 95% are calculated.

Navchannya

AlexNet pass 90 epochs. Started borrowing 6 days at a time on two Nvidia Geforce GTX 580 graphics processors, which is the reason why the border is divided into two parts. There is a stochastic descent of the gradient with a speed of 0.01, an impulse of 0.9 and a decay of the coefficients of 0.0005. The speed of the training is divided by 10 after the increase in accuracy and it decreases by 3 times the duration of the training. Scheme of updating the coefficients w may look:

de i- Iteration number, v- Change the momentum, and epsilon- Shvidkіst navchannya. In the course of the whole stage, the heading was chosen equally to all balls and corrected manually. The advancing heuristics suggested that the number of pardons at the hour of rechecking ceased to change.

Apply the variant and realization

The results show that a large, deep neural network is capable of achieving record results on even more foldable datasets, which are no longer taught by a teacher. After the publication of AlexNet, all the participants of the ImageNet competition began to quiz a bunch of neuron strings for the completion of the classification task. AlexNet was the first implementation of short neural networks and created a new era of success. Realizing AlexNet has become easier with the help of deep learning libraries: PyTorch, TensorFlow, Keras.

Result

The measure of reach of the offensive equal pardons of the first and fifth equals: 37.5% and 17.0%, apparently. The best productivity, achieved during the ILSVRC-2010 competition, was 47.1% and 28.2% with a victorious approach, in which the transfer is averaged, eliminated by six models with increased coding, trained on different vectors of power. From that hour, the results were achieved: 45.7% and 25.7% with a different approach, in which the transfer of two classifiers is averaged, as they are trained on Fisher vectors. The results of ILSVRC-2010 are given in Table 1.


Zlіva: Vіsіm test images of ILSVRC-2010 and five labels, like hiring for the idea of ​​the model. The correct sign is written under the skin images, and the imagery is shown with a red smug, as it is found at the top five. Right: five test images of ILSVRC-2010 at the first site. In the other pages, six initial images are shown.

Entry

The theme of this research is the development of a system for recognizing an image based on a piece of neural networks. The task of recognizing an image is even more important, since the possibility of automatic recognition by a computer of an image can bring impersonal new possibilities to the development of science and technology, such as the development of systems to search for objects and other objects in photographs, control the quality of products without the participation of a person, automatic transport and impersonal others.

Shaho stuck pieces of neural measure, then in the rest of the Rocky Dignia Ridgel Machine Incutment To Keep all the b_listener at the ZI'Inka zi I mean the value of the company's importance of the losychih computeerіv І A sewn of the graffitious cards for Communication, Shaho Doszoloє in the neuronnі merrys lower earlier, like, in his own way, show significantly better results against other algorithms of rich tasks, especially the task of recognizing an image. Denmark directly to the development of neural networks, having taken the name deep learning (“deep learning”) and is one of the most successful ones that are being developed rapidly in Denmark. So, for example, behind the results of the wide range of recognition of ImageNet-2014 images, the most successful algorithms won over the deepest layers.

So yak zavdannya rozpіznavannya The images Je duzhe wide i in bіlshostі vipadkіv vimagaє okremogo pіdhodu for rіznih tipіv The image, then rozglyanuti task rozpіznavannya The images in tsіlomu within one doslіdzhennya practical nemozhlivo, to Bulo priynyato rіshennya for butt rozglyanuti okremo Taku pіdrozdachu rozpіznavannya still images yak rozpіznavannya dorozhnіh symbols.

Thus, the main method of this research was the development of the system of recognition of images on the basis of piecewise neural networks for images of road signs. To reach the goal, the goal was formulated as follows:

Review of the analytical review of the literature on the topic of piecewise neural networks and their staging for image recognition

Development of an algorithm for recognizing road signs from the installation of piecework neural networks

Development of the prototype of the image recognition system with the improvement of the expanded algorithm. As a result of this task, the software complex is to blame, which allows the image capture and the transfer of the class of the image.

Conducted experimental studies. It is necessary to carry out a follow-up and evaluate the accuracy of the robot taken by the algorithm

In the course of the carried out follow-up, all the tasks were set. The specific results of these skin tests will be described in the main part of the work.

1. Review of the literature

.1 Machine learning

Neuronal networks, as reported in these robots, are one of the varieties of machine learning algorithms, or machine learning. Machine learning is the only one of the developments of piece intelligence. The main power of algorithms in machine learning is learned in the process of work. For example, the algorithm for making a solution tree, not looking forward to the same information about those that are data and how they are based on regularities, but rather a certain input set of objects and the meaning of some signs for skin at once, depending on the process, depending on the class the tree itself reveals the attachment of laws, so it learns, and after the beginning of the building transfer class for new objects, like wines not bachiv earlier.

There are two main types of machine learning: learning with a teacher and learning without a teacher. Learn from the teacher that the algorithm, apart from the dates themselves, hopes for additional information about them, so that you can give wine for learning. Before the most popular tasks for training with a teacher, the tasks of classification and regression are set. For example, the task of classifying can be formulated in the following way: some kind of set of objects, skins of which belong to one of the many classes, it is necessary to designate which of these classes to lay a new object. The task of recognizing road signs, as the bula was looked at in this robot, is a typical variety of task of classifying: it is necessary to determine the number of types of road signs - classes, the task of the algorithm is to “recognize” the sign, so that it can be classified as one of the main classes.

Learning without a teacher depends on learning from a teacher, that the algorithm does not rely on any additional information, except for the set of external data itself. The most popular example of learning without a teacher is clustering. The essence of the task of clustering lies in the offensive one: the number of objects is given to the algorithm as an input, which belongs to different classes 'facts on a subset of "similar" objects, so that they belong to one class.

Among the successful algorithms of machine learning, there are a few of the main families. How to go about the task of classifying, to the most popular such families lie, for example:

· Classifiers based on rules (Rule-based Classifiers) – the main idea of ​​such classifiers is based on the search for rules for introducing objects to that class in the “IF – THEN” form. To search for such rules, call up the deuce of statistical metrics, as well as often hear the rules based on the decision tree.

· Logistic regression - the main idea is based on the linear plane, which divides the space as accurately as possible into two spaces so that objects of different classes lie on different spaces. At this point, the alignment of the target area fluctuates like a linear combination of input parameters. For the development of such a classifier, one can use, for example, the method of gradient descent.

· Bayesian classifier - how to name, classifier of grounds on the Bayesian theorem, as it is written in the form

neural network machine learning

The idea of ​​a classifier in this way is to know the class with the maximum a posteriori ability to understand what all the parameters can mean, like the stench can be for an instance that is classified. In zagalnomu vipadku tse zavdannya peredbachaє poperednogo Znannya duzhe velikoї kіlkostі umovnih ymovіrnostey i, vіdpovіdno, velicheznogo rozmіru navchalnoї vibіrki that visokoї skladnostі obchislen, that on praktitsі naychastіshe zastosovuєtsya rіznovid bayєsіvskogo classifier, titles Naїvnim Bayєsovskim Classifier Where Money Does peredbachaєtsya scho OAO All parametric nezalezhnі od other , obviously, the formula looks richly simple and for її vikoristannya it is necessary to know only a small number of mental idioms.


If you want to make it sound like a distant view of reality, Bayes' naive qualifier often shows bad results.

· Decision trees - for a simple look, the given algorithm works for a budding tree, in which a leather vuzol is given to a certain test, which is over the parameters of an object, and to leaves a subbag class. Іsnuє impersonal variant trees decision and algorithms їх budovi. For example, one of the most popular algorithms is C4.5.

· Neuronal networks - a model that appears in the minds of the aggregates of elements (neurons) and the links between them, like in a wild way they can be straightened or not straightened, and they may be able to work. In the course of the work of the neural network, a part of the її neurons, named as inputs, should find a signal (input data), which expands and transforms in a singing order, and at the output of the network (output neurons) you can get the result of the work of the array, for example, the ambiguity of the other classes. Neuronal networks will be examined in greater detail in the next section.

· MACHINE OF REGULATIONS VIELOB - CONCRETE ALGORITE POLYAYAє OSTOGA, YAK SOLD TO LOW REAL REPRESSIA, JOIN RIDDIAYUCHO BUILDERS (Abo Kilochi Plishch), Protein Speosiba Dana Waist in Tswy Vipadka Vіdriznogoznazhn - Shukayuz Vipadka Vіdriznaysnznaznzh - Shukayuz Schukatin Taka, Scho Vіdan Nіdny class as much as possible, now vikoristovuyutsya methods of quadratic optimization.

Linear classifiers (Lazy learners) - a special variety of algorithms in classifiers, as a substitute for the fact that the model will be further developed and further adopted decisions about the introduction of the object to that class on the same basis, are based on the ideas, more often Mayut one and the same class. If such an algorithm enters the input of an object for classifying, then it looks like the objects looked over earlier are similar to the new one, giving information about their class, forming its own transfer to the target class of the object.

It is possible that classification algorithms can be mothers of different ideas and, in their own way, it is reasonable to show different efficiency for different types of tasks. So, for a task with a small number of input signs, it can be shown by the correct systems based on the rules, so for the input objects it is possible to quickly and manually calculate the similarity metric - line classifiers, so there is a parameter about the task well it is important to identify or interpret, such as recognizing images or moving, the most appropriate method of classification is neuron networks.

1.2 Neural networks

Pieces of neuron networks are one of the most widely used models of machine learning that are victorious. The idea of ​​piecewise neural networks is based on the imitation of the nervous system of human beings.

The model of the nervous system of creatures is simplified in which case it appears to be the system of clitin, the skin of which may have two types: dendrites and axons. At the singing moment of the clitin, the signals from other clitins are taken away through the dendrites, and if so, the signals will be of sufficient strength, they are excited and transmitted to other clitins, with which they are connected, through the axons. In this way, the signal (awakening) expands to the entire nervous system. The model of neural networks is built in a similar way. Neuronal mesh is composed of neurons and straightening connections between them, with which skin connections can work. When some part of the neurons are input, they receive data from the outer medium. Then, on the skin, the neuron removes the signal from all input neurons, calculates the important sum of signals, stops the function before it, and transmits the result to the skin from its outputs. Also, there may be a small number of outgoing neurons, which will form the result of the work of the measure. So, the task of classifying the external values ​​of these neurons can mean the prediction of the dermal class imovirnosti for the input object. Obviously, the neuronal lineage is similar to the selection of such cells for communication between neurons, so the output values ​​for all input data were as close as possible to the actual ones.

There are a few main types of architectures of neural networks:

Feed-forward network - can be on the verge, so that the neurons and connections between them establish an acyclic graph, de signals are expanded only in one straight line. The very same measures are the most popular and widely cultivated, and they are taught to become the least difficult.

· Recurrent neural networks (recurrent neural networks) - in such networks, on the front line of a line of direct width, signals can be transmitted in both directions, and can reach the same neuron a few times during the process of processing one input value. A private variety of recurrent neural networks is, for example, the Boltzmann machine. The main difficulty in working with such measures is to learn how to create an efficient algorithm for the most complex way to fold tasks and do not have a universal solution.

· Self-organized maps of Kohonen - neuron array, recognized in the first line for clustering and visualization of data.

In history, the development of neural networks is seen in 3 main periods of development. The first studies in the gallery of piece neural networks can be seen until the 40s of the 20th century. In 1954, J. McCulloch and W. Pitts published the work “A logical calculation of ideas, like a nerve activity” , which laid out the basic principles of stimulating a piece of neural networks. In 1949, a book by D. Hebb "Organization of Behavior" was published, the author, having looked at the theoretical foundations of the formation of neuronal networks, and first formulated the concept of the formation of neural networks as the formation of a nerve between neurons. In 1954, W. Clarke first tried to implement an analogue of Hebb's string using an auxiliary computer. In 1958, Roci F. Rosenblatt propagated the model of the perceptron, which was essentially a neural net with one attached ball. Principal view of the Rosenblatt perceptron of representations of the little one 1.

Baby 1. Rosenblatt Perceptron

This model was started with an additional method of correction of pardons, which is the case when the voices remain unchanged until the hour when the value of the perceptron is correct, at the time of the pardon the voice changes to 1 kill, the return sign of the pardon. The whole algorithm, as brought by Rosenblatt, must converge. Vikoristovuyuchi such a model, in the distance to create a computer, a kind of rozpіnє deyaki letters of the Latin alphabet, which, without a doubt, was a great success at that hour.

However, the interest in neural networks has significantly decreased after the publication of the book “Perceptrons” by M. Minsky and S. Papert in 1969, the girls described the essence of the exchange, as a model of the perceptron, the zocrema, the impossibility of such a function have appointed high levels of strength to the necessary computational strain of computers for the training of neural networks. So, as tsі vchenі small even high prestige in scientific svіvtovarіvі, neural networks were for a certain hour of recognition of unpromising technology. The situation changed only after the creation in 1974 of the pardon extension algorithm.

The backpropagation algorithm was proposed in 1974 simultaneously and independently by two scientists, P. Verbos and A. Galushkin. Tsey algorithm of foundations on the method of gradient descent. The main idea of ​​the algorithm is based on extended information about the pardon from the exits to the її entrances, so that the return is straight on the return to the standard campaign. At any time, the voices of the calls are corrected on the basis of the information that came before them, about the pardon. The main benefit, like imposing data on the algorithm, is those that the activation function of neurons can be differentiated, but the gradient descent method, which is not surprising, is calculated on the basis of the gradient.

Algorithm of the reverse expansion of the pardon allows you to easily start a border, which allows you to get a sprat of attached balls, which allows you to bypass the perceptron exchange, which block the development of a circle earlier. From the mathematical point of view, the algorithm is reduced to the successive multiplication of matrices - to achieve good results and optimize tasks. In addition, this algorithm is better parallelized, which allows you to speed up the hour of learning the measure. Everything has led to a new development of neuronal networks and impersonal active achievements directly.

The backpropagation algorithm, at the same time, may have a number of problems. Thus, the variation of gradient descent transfers the risk of convergence to a local minimum. Another important problem is the long-term learning of the algorithm for the obviousness of a large number of balls, so as a pardon in the process of reversal expansion, the power of the daedal may change more when approaching the cob of the ear, obviously, the learning of the cob balls in the measure. Yet another shortcoming, powerful neural networks zahal, є folding in the interpretation of the results of their work. A model of a neuron network was trained, which is similar to a black screen, an object is submitted at the input and a forecast is output at the output, it is predictive, as signs of the input object were protected and some of the neurons for which it is problematic, sound. It’s worth trying neuron networks richly in why it’s less addictive in pairs, for example, with decision trees, in which the model itself is trained, it itself represents the quintessence of knowledge about the subject area that is being looked at, and it’s easy for the reader to understand why another class.

Given the shortcomings, at the same time, although the neural arrays showed bad results, the results were equal to the results of other classifiers, for example, support vector machines, which are gaining popularity, while the results of the remaining ones were much simpler in the other interpretation an hour, the development of neural networks has risen to a hell of a recession.

This decline ended only in the 2000s of the 21st century, when the understanding of deep learning, or deep learning, appeared and expanded. The revival of neural networks has brought about the emergence of new architectures, such as, for example, smart networks, restricted bolzman machines, stacked autoencoders, etc., which have made it possible to achieve much higher results in such areas of machine learning, such as image recognition and movement. The appearance and expansion of hard-wired video cards and yogo zastosuvannya for calculus tasks became the most significant official of this development. Videocards, having a significant number of nuclei paired with the processor, high and less skin tightness, are ideally suited for the task of training neural networks. At the same time, the productivity of computers increased significantly over the last hour and the expansion of the numbering clusters made it possible to learn significantly foldable and deep architecture of neural networks, lower earlier.

1.3 deeplearning

One of the most important problems, which lead to the development of machine learning algorithms, is the problem of choosing the correct signs, the basis of which is being developed. Particularly significant is the problem when looking at such tasks, like recognizing an image, recognizing a movement, processing a natural movie, then it’s quiet, it’s a daily obvious sign, it can be used for learning. Call the choice of the set of signs of the beginning of the work by the last path of the deaco analytical work, and the very choice of the sign of the set of signs is rich to which the success of the work is determined by the algorithm. So, for the task of recognizing the image, as such signs can serve as an overriding color in the image, the steps of its change, the presence on the image of clear cordons, or else. More detailed nutritional recognition of the image and the choice of the correct sign for which will be looked at in a different distribution.

However, such a pidkhid may be short-lived. First of all, Danish pidkhіd maє on uvazi suttєviy obsyag roboti z manifested sign, moreover, tsya robot zdіysnyuєtsya manually by the last one and may vimagati great vitrates to the hour. In another way, the revealed sign, on the basis of which you can take a good algorithm, in this case it becomes rich in what it is, it’s so small that signs will be accepted to respect, which can be added to the internal structure of the image, but if you invisible to humans. In this manner, the idea of ​​automatic designation of a sign is especially attractive; The very same opportunity is given to the deep learning approach.

From the point of view of the theory of machine learning, deep learning is a subset of the so-called representation learning. The main concept of representation learning is the most automatic search for a sign, on the basis of any practical application, such an algorithm, for example, classification.

Wing a side side, it's one of the most important problem, Zakuyu is made to sthot in the vicaristanni of the machine inhabited - the price of the factors of Varіtzії, Yaki, you can still turn on the Zovnіshniy Visigayd Vikhodnyy Dones, a protein at Cyoma not Majut Vizdshennya to Samo. So, in the task of recognizing the image, such factors can be buti kut, for example, the object on the image turns to posterigach, doby, clarification, etc. So, stalely, I’ll look at that wait, the red car may have a mother in the photograph of a different view of that uniform. Therefore, for similar tasks, for example, the identification of an object depicted in a photograph, it seems reasonable to protect not specific low-level facts, such as the color of a song pixel, but the characteristics of a higher level of abstraction, for example, the presence of wheels. However, it is obvious what is to be determined on the basis of the external image, what is at the new wheel - the task is not trivial, and this solution can be more collapsible. In addition, the presence of the kolіs is only one of the most majestic number of possible signs, and the designation of their success is the folding of algorithms for re-verifying the image for the presence of their appearance, which does not seem too realistic. Here the last ones themselves can cite all the advantages of the deep learning approach. Deep learning of foundations on the surface of the outward object in a seemingly hierarchical structure is marked by such a rank that the skin of the advancing ridge is a sign based on the elements of the forward ridge. So, as you can see about the image, as the lowest rіven will appear in the outer pixels of the image, the steps will be the rіvіzki, yakі you can see the middle of these pixels, then - kuti and other geometric figures, in the yakі add up in іrіzki. On the advancing level of their figures, objects, for example, wheels, are already established for the person, and the remaining line of the archery is determined for specific objects on the image, for example, a car.

For the implementation of the deep learning approach, the modern sciences are rich in neural networks of different architectures. Neuronnі Merezhmi Ideally pіside for the Vishenian tasks wishes ієrrhіniki і і і і ієрахіной иноминия means, so Yak, by Sutі, neuronna, I am a sobpnіt neuronіv, skin zakuzki Yakschko VKHіdnі Danі Associate Pevy Criteriyam - Tobto Actuatu I mean, with Cyoma Rules Activation neurons - those that signify a given sign - are automatically learned. At the same time, the neurons have their widest appearance, they themselves represent an archi- cal structure, the advancing ball of neurons in the vicorist’s as their entrance exits the neurons of the front ball - or, in other words, the signs of the higher level are formed on the basis of the sign of the lower level.

The expansion of this approach, in the connection with the cym, the black rosette of the neural network, served three mutual reasons:

· Appearance of new architectures of neuron strings, imprisoned for the celebration of singing tasks (strand strings, Boltzmann machines, etc.)

· Development and availability of the calculation for different gpu and parallel charges in general

· Appearance of that broadening of the ball-by-ball training of neural networks, in which case the skin ball is started using the standard algorithm backpropagation Unite in a single measure and reconsider the further improvement of the measure already from the stosuvannya razmіchenih data on the variant of a specific task (fine-tuning). Tsej pіdhіd mає dvі іstotnі perevagi. Firstly, in this way, the effectiveness of the beginning of the merezhi, the shards of the skin are not learned at the hour, not a deep structure, but the mezhu with one attached ball - the result has problems with the change in the value of the pardon in the world, the increase in the depth of the merezhіvі and vіdnipovnіdnіdnіd. And in a different way, the Danes pidkhіd to the beginning of the measure allows you to win the hour of the beginning of the unrecognized data, which sound richer and more lower, the more simple and accessible to the next. Romіchenі Danі in such a pіdkhodі potterbnі tіlki in the most Kіntzі for Donalashtowannya Merezhі on the virtues of the singing tasks, osskіkіkatsії, і at Cyoma, osskіlki zagalna structure Measure, Shah Describe Danі, put into the process of across the trip in short sign Krіm skorochennya neobhіdnoї kіlkostі rozmіchenih danih, vikoristannya podіbnogo pіdhodu dozvolyaє navchiti once trammel s vikoristannyam velikoї kіlkostі nerozmіchenih danih i potіm vikoristovuvati otrimanu Find our structure for virіshennya rіznih zavdan klasifіkatsії, donalashtovuyuchi trammel for Relief rіznih naborіv danih - for nabagato Mensch hour nіzh potrіbno in at time for a new beginning of the measure.

Let's take a look at the three presentations of the main architecture of neural networks, as if they sound victorious in the context of deep learning.

· Bagatosharovy perceptron - is a great all-round neuron network with a great number of balls. Food, how many balls are important to make great, there is no unequivocal answer, but ring the bells, maybe 5-7 balls, even if they are called "deep". The architecture of neural networks is given, although there are not many principles of understanding in the frameworks, as if it were better to broaden the understanding of a deep learning earlier, maybe even more effective in times of successful completion of the task of learning, which was the main problem of working with such meshes earlier. In this hour, the problem is turning into a path for learning a number of graphic cards, which allows you to speed up the learning and, apparently, to carry out a larger number of iterations of learning, or a ball-based learning of a measure, guessed earlier. So, in 2012, Ciresan and colleagues published the article "Deep big multilayer perceptrons for digit recognition", in which they made an admission, that a rich-spherical perceptron with a large number of balls, at a sufficient amount of learning I will suffice for a few data for learning (as it is possible to show the effectiveness of not higher, lower, more folding models). Their model, which is a neural net with 5 attachment balls, when classifying numbers from the MNIST dataset, showed a parity of 0.35, which is shorter, lower published earlier than the results of folding models. Also, by means of a way to unite a number of people trained in such a rank to merge into a single model, they were able to reduce the rate of pardon to 0.31%. In this rank, regardless of simplicity, the rich-spherical perceptron is a completely successful representative of deep learning algorithms.

· Stacked autoencoder (stack autoencoder) - the model is closely related to the rich-ball perceptron and in general to the task of developing deep neural networks. The most common ways of stack autoencoder are implemented by share-by-share learning of deep merezhs. However, the given model is not just for the development of other models, but often has great practical significance in itself. To describe the essence of the stack autoencoder, let's take a look at the beginning of the understanding of the ultimate autoencoder. The autoencoder is a learning algorithm without a driver, in which case the output values ​​of the neural array act as input values. Schematically, the autoencoder model is represented by little 2:

Baby 2. Classic autoencoder

It is obvious that the task of learning a similar model can be trivial, as the number of neurons in the attached ball is equal to the number of input neurons - just to transfer the attached ball to simply transmit its input values ​​to the output. Therefore, when autoencoders are trained, additional exchanges are introduced, for example, the number of neurons in the attached ball is set to a significantly lower value in the input ball, or special methods of regularization are introduced, directing them to those that ensure a high level of the attached ball. One of the most wide-spread auto-encoder blocking in a clean look is the task of removing the stifled announcement of the out-of-date data. So, for example, an autoencoder with 30 neurons in the attached ball, referenced on the MNIST dataset, allows you to insert digits into the output ball practically without changing, which means that the actual skin of the external images can be accurately described with only 30 numbers. In their own way, autoencoders are often seen as an alternative to the head component method. A stackable autoencoder is essentially a combination of decals of great autoencoders that are created ball by ball. At each output, the value of the training neurons attached to the ball of the first autoencoder act as the input value for the other of them, and so on.

· Zgortkovі merezhi - one of the most popular late hour models of deep education, which is placed in the first line for recognition of the image. The concept of knitwear is based on three main ideas:

o Local receptive fields - to say about the recognition of the recognition of the image, it means that on the recognition of that chi іnshoy element on the image, we must first add yogo without middle sharpening, that is, pixels, that are visible in everything this element is not connected in any way and does not take away information, as it would help to correctly identify it

o Shared weights - the presence of the shared weights model is actually special, that the same object can be found in any part of the image, with this search for all parts of the image, one and the same pattern is blocked (set vag)

o Subsampling (subsampling) - a concept that allows the model to grow more stable to insignificant differences in the shukany pattern - among those that are associated with smaller deformations, changing lightening thinly. The idea of ​​subsampling is based on the fact that when setting a pattern, it is not exactly the value for a given pixel, but the area of ​​pixels, but its aggregation in the neighborhood, for example, the average or the maximum value.

From the mathematical point of view, the basis of the hierarchical neuronal dimension is the operation of the matrix fold, as if it is in the element-by-element multiplied matrix, which is a small dilanka of the visual image (for example, 7 * 7 pixels) with the matrix of the same size take away the value. With this, the core of the image is, in fact, a real template, and otrimana as a result, the subsuming number characterizes the degree of similarity of the given area of ​​the image to the entire template. Obviously, the leather ball of the sash merezha is composed of a certain number of templates, and the task of designing the merezha is to select the correct values ​​for these templates - so that the stinks show the most significant characteristics of the images. With the help of the skin, the template is created sequentially with the use of the parts of the image - in the same way, the idea is presented in the same way. Balls of this type are called balls of the throat in the knitted bag. There are balls of subsampling at the borders of the throat, which replace small areas of the image with one number, which at once changes the size of the image for the work of the stepping ball and the border is more stable to small changes in data. In the rest of the balls of the girdle, one or more of the knitted balls ring out, which are used for vikonnannya without intermediary classification of objects. In the rest of the world, the use of lace stitches has actually become the standard for classifying images and allowing you to achieve the best results in your gallery.

· Restricted Boltzmann Machines - another variety of models of deep learning, on the vіdmіnu vіd mérezh gortannya, zastosovuєtsya in Persh cherga for zavdannya razznіvannya movi. The Boltzmann machine has its own classical rozumіnnі є unorientated graph, the ribs in a way reflect the fallow between knots (neurons). In this case, some of the neurons are visible, and some are attached. From the point of view of neural networks, the Boltzmann machine is essentially a recurrent neural network, from the point of view of statistics - a marked Markivska field. Important understanding for Boltzmann's machines, I will understand the energy of merezhi and become jealous. The energy of the merezhi to lie in the fact that the skills of neurons strongly connected to each other are at once in an active station, the task of training such a merezhі is to lie down in the її convergence to become equal, with which the energy is minimal. The main shortcomings of such measures are great problems for the head of the society. To solve the problem, J. Hinton and his colleagues proposed the model of the Restricted Boltzmann Machines, which imposes the restriction on the structure of the mesh, representing in the form of a bipartite graph, in one part of which there are less visible neurons, and less attached obviously, the connections are present only between visible and attached neurons. Dane obezhennya allowed the development of efficient algorithms for the creation of such a type of mesh, for which reason the progress was made in the process of recognizing movies, and the given model practically vindicated the previously popular model of Markov's meshes.

Now, having looked at the main concepts and principles of deep learning, let's briefly look at the main principles and evolution of image recognition development, those that deep learning occupies in a new place.

1.4 Image recognition

Іsnuє impersonal formula for zavdannya rozpіznavannya izobrazhen i vyznachiti її unambiguously dosit smoothly. For example, it is possible to recognize the image as a task by searching for and identifying certain logical objects on the visual image.

Recognition of an image is a folding task for a computer algorithm. It is connected with us in front of the high variability of the image of other objects. So, the search for a car on the image is easy for the human brain, which building will automatically identify in the object the presence of an important sign for the car (wheels, a specific shape) and, if necessary, “worthy” of the picture in visual form, representing insufficient details, and folding for the edge computer, oscilki іsnuє majestic number of different car brands and models, which can be rich in what raznu form, moreover, the shape of the object on the image is to lie in the point of zayomki, kuta, and yakim vіn znіmaєtsya parameter. The role of lighting is also important, which adds to the color of the captured image, and it can also work with incomprehensible, or create more details.

In this manner, the main difficulties in recognizing the image are called out:

Variation of subjects in the middle class

Variability of form, size, orientation, position on the image

· Lighting variability

To fight with these difficulties, the development of the recognition of the image was promoted by the most sophisticated methods, and at the hour of this sphere, it was already possible to reach the absolute progress.

The first studies in the gallery of recognition of images were published in 1963 by L. Roberts in the article "Machine Perception Of Three-Dimensional Solids", de author having tried to abstract from possible changes in the shape of an object and concentrating on the recognition of images for simple geometric turns in simple geometric turns. A computer program was developed by him to identify images of geometric objects of some simple forms and form their trivi- mer model on a computer.

In 1987, Sh. Ulman and D. Huttenlocher published the article “Object Recongnition Using Alignment”, and they also tried to recognize objects of simple forms, with which the process of recognizing the letters of organizations in two stages: first, search for the area on the image, to know the whole object, and the designation of its possible expansion and orientation (“alignment”) for the help of a small set of characteristic features, and then the potential image of the object to be improved.

However, popixelne imagery may be impersonal essences of shortcomings, such as labor complexity, the need for obviousness of the template for dermal objects of possible classes, and also those that, in the case of popikselny pravnyannya, there may not be a specific ektiv. In some situations, it’s zastosovno, prote in more vipadkiv, all the same, the search for not one specific object, but impersonal objects of any class is needed.

One of the most important directions in the distant development of the recognition of the image was the recognition of the image on the basis of the identification of contours. In rich vipadkah, the contours themselves can take away most of the information about the image, and at the same time, looking at the image at the sight of the totality of the contours allows you to ask questions. To solve the problem of searching for contours on the image, the classic and most popular approach is the Canny Edge Detector, the robot of which is based on the search for a local maximum of the gradient.

The most important directly in the field of image analysis is the development of mathematical methods, such as frequency filtering and spectral analysis. These methods are used, for example, for embossing images (JPEG embossing) or increasing brightness (Gaussian filter). However, the oskіlki tsі methods do not po'yazanі without intermediary recognition of the image, the reporter stench will not be seen here.

One more task, which is often seen at the link from the image recognition task, is the task of segmentation. The main meta-segmentation is the observation on the images of other objects, skins for some of them, but we can also classify and classify them. The manager of the segmentation is meaningfully simple, as if the image is binary, it is composed of pixels in less than two colors. In this way, the task of segmentation often fails due to the congestion of methods of mathematical morphology. The essence of the method in mathematical morphology is based on the given image as a deuce multiplier of two values ​​and zastosuvannye to tsієї logical multiplier operations, the main means of transferring, increasing (logically adding) and erosing (logically multiplying). For example, these operations are similar to those similar, such as the flickering and roaring, it is possible, for example, to use noise on the image or to see the boundary. If similar methods are used in the task of segmentation, then the most important of these tasks is the task of using the noise and forming on the image more or less of the same type of business, which is then easy to know for additional algorithms, similar to the search for links image.

When it comes to RGB image segmentation, one of the most important pieces of information about image segmentation can be texture. For the purpose of image texture, the Gabor filter is often used, which is a kind of creations in experiments to demonstrate the peculiarity of textures by the human eye. The basis of the robotic filter is the function of the frequency transformation of the image.

Another important family of algorithms, based on the recognition of images, are algorithms founded on the basis of local features. LOALALNIY SPECIAL PILLY SOTRY DIVERSIY TYKHERNYY, YAKІ PRIVATING SPIVIONS DOWN SOCHNY SPIREN (OBZKTOT, SHO SHUKU) І Visnochiti, Chi Vіdpovіdaєda Dana Zobrenha Model i, Yakschko Vіdpovіdaє, visnacity Parametri modelі (rod, Kut Nakhila, stysna і t .d.) . For the yakіsny vikonannya of their functions, the local peculiarities of the blame must be stable to Athenian transformations, susvіv and so on. A classic butt of local features is kuti, which is most often present at the boundaries of various objects. The most popular algorithm for searching kutіv is the Harris detector.

In the rest of the hour, more and more popular methods for recognizing images, based on neural networks and deep learning. The main recognition of these methods appeared after the appearance of the 20th century of cross stitching (LeCun, ), which shows significantly better results in the recognition of images compared with other methods. So, most of the leading (and not only) algorithms in the image recognition of ImageNet-2014 won out in the same way as the folded mesh.

1.5 Recognition of road signs

Recognition of road signs for a stranger is one of the most important tasks for recognizing images or, for certain views, video recordings. This task can be of great practical importance, the signs of the recognition of road signs are victorious, for example, programs for automating the signing of a car. The task of recognizing road signs in many impersonal variations - for example, identification of the visibility of road signs on a photograph, seen on an image of a plot, which represents a road sign, designation, a specific sign of images on a photograph, a road sign. You can see three global goals, which are associated with the recognition of road signs - their identification of the middle of the landscape, without middle recognition, or classification, and so the ranks of tracking - there may be an opportunity to focus on the road sign "steward" algorithm. The skin of the tsikh pidzavdan itself, by itself, we will mark the subject for follow-up and may ring out your own stories and traditional approaches. At this robotic respect, the head of the classification of the road sign depicted in the photograph was taken care of, so let's look at the report.

Tse zavdannya classification for classes with unbalanced frequency. Tse means that the image belongs to different classes of rіzna, oskolki deyakі class zustrichayutsya more often, nіzh іnhim - for example, on Russian roads, the sign of the fencing of shvidkost "40" is significantly more often, the lower sign "Naskrzdіne pro їkrzdіnі є". In addition, road signs establish a few groups of classes such that they classify in the middle of one group are very similar to each other - for example, all the signs of the border crossing look even more similar and look less like numbers in the middle of them, which, obviously, makes it more convenient. On the other side, road signs have a clear geometric shape and a small set of possible colors, which could easily simplify the classification procedure - just not the fact that real photographs of road signs can be taken from different angles and with different lighting. In this way, the task of classifying road signs, although it can be seen as a typical task of recognizing an image, will require a special approach to achieve the best result.

Until the solemn moment, the time of reaching the tsієї those were to achieve chaotic and did not show up among themselves, to the one who put the old man between himself with the power of the task and victorious with the power of data, so there was no possibility of equalizing and surprising the results. So, in the 2005 ROCI BAHLMANN with the collar in the framework of the complex system of Rospevnoye Roadnikov, Scho pіdtimuє Sivі 3 Zgadanі Ranishki Pіdzadachi Rospevna Roadnikov, Realizived the algorithm Rospenzonavna Skіv, Scho Putzyuє s exactly 94% for road sign, Shahko to put up to 23 rіzniki Klasіv. Incutching Bulo was painted at 40000 Zobyn, at the Cyom Kilkіt, Vіdpón, Vіdpovіdnyy skin class, War_da 30 to 600. For detectivanny road sign, a sign of the ADABOOST I Vaevleti Haar algorithm, and for clasifіkatsya, the sign of the Expec algorithm. The system of recognizing road signs with interchangeable security, was developed by Moutarde in 2007, with low accuracy up to 90% and was trained on blank 281 images. In this system, there were vikoristan detectors of kіl and squares for detecting road signs on the image (for European and American signs, it’s clear), in some of them, a skin digit was seen, and it was classified as an additional neural measure. In 2010, Ruta and colleagues developed a system for detecting and classifying 48 different types of road signs with a classification accuracy of 85.3%. Їх pіdkhіd buv foundations for a search in the depicted kіl and rich-faceted areas and seeing in them a small number of special regions, yakі allow you to add this sign to the reshti. At the same time, a special transformation of the colors of the image, called by the authors Color Distance Transform, and which allows the speed of the number of colors present in the image, and, as a result, the increase in the ability to change the color of the image and the speed of the development of data. Broggie and Colin in 2007 propagated a three-stage algorithm for detecting and classifying road signs, which is composed of segmentation of colors, designed to form a neural array, however, their publications throughout the day showed results of the robotic algorithm. Gao ta in. in 2006, the company launched a system of recognition of road signs, based on the analysis of the color and shape of the transmitted sign, and showed an accuracy of recognition of 95% of the average of 98 copies of road signs.

The situation with the recognition of road signs in the field of recognition of road signs changed in 2011, if within the framework of the conference IJCNN (International Joint Conference on Neural Networks) the recognition of road signs was held. For this purpose, a data set of the GTSRB (German Traffic Sign Recognition Benchmark) was split up to cover over 50,000 images of road signs posted on the roads of Germany and 43 different classes. On the basis of this set of data, a magnification was carried out, as it is composed of two stages. Following the results of another stage, the article “Man vs. Computer: Benchmarking Machine Learning Algorithms for Traffic Sign Recognition”, where we looked at the results of the competition and described the approaches that were won by the most successful teams. Also, following this approach, a number of articles were published by the authors of the algorithms themselves - participants in the magic, and this collection of data eventually became the basic benchmark for algorithms for recognizing road signs, similarly to all MNIST for recognizing handwritten digits.

To the most advanced algorithms in this world, the committee of hedge fencing (IDSIA team), the multi-scale hedge fencing (Multi-Scale CNN, Sermanet team) and the climbing forest (Random Forests, CAOR team) should go. Let's look at the skin from these algorithms in three reports.

Committee of Neural Measurements, Proposals by the IDSIA team from the Italian Dalle Molle Institute for Artificial Intelligence Research on D. Ciresan, the achievement of the accuracy of character classification is 99.46%, which is higher for the accuracy of humans (99.22%), the assessment of which bula was carried out within the framework of Togo. well the competition. The Danish algorithm was more than described in the article “Multi-Column Deep Neural Network for Traffic Sign Classification” . The main idea of ​​the approach lies in the fact that 4 different normalization methods were tested before the weekend: Image Adjustment, Histogram Equalization, Adaptive Histogram Equalization and Contrast Normalization (Contrast Normalization). Potimm for the skin set of Dones, announcing in the result of Normalіzatsії, that whirlwind of Dones Boulo was bunted by the 5 Zortkovih Mezhkovo svkadkovo Іnіціalіalі-caliper values ​​of VAG, Skin Z 8 Shar_V, at the Tsoma for Vytniy Merezhi, the Rіznі Vipadkovі transformer increase the scope and variability of the initial choice. The resulting transfer of the membrane was formed by a path of averaged transfer of the skin membrane of the throat. For navchannya danikh merezh zastosovuvaetsya implementation s vikoristannya calculus on gpu.

Algorithm for varying the richness of the bagatelle's bag was proposed by the team that is formed by P. Sermanet and Y. LeCun from the University of New York. This algorithm is described in the article “Traffic Sign Recognition with Milti-Scale Convolutional Networks”. In this algorithm, all images were scaled to a size of 32*32 pixels and converted to grayscale, after which the contrast was normalized before them. Also, the expansion of the original initial multiplier was increased by 5 times along the way to the image of small vertical transformations. p align="justify"> The resulting merging of the bula is folded into two stages (stages), as it is presented on the 3rd stage, while in the sub-bag classification, there were not only the other stage, but also the first one. The measure showed an accuracy of 98.31%.

Malyunok 3. Rich scale neuron mesh

The third successful algorithm from the logging of the volcanic forest was split by the CAOR and MINES ParisTech team. A detailed description of this algorithm was published in the article “Real-time traffic sign recognition using spatially weighted HOG trees” . The Danish algorithm of foundations on the basis of 500 vipadkovy trees of solution, the skin of which was trained on the vipadkovo variant submultiple of the initial multiplier, with the sum of the presumably viable values ​​of the classifier, the most votes. This classifier, at the sight of the front ones, looked at it, not showing the image at the look at the set of pixels, but giving the organizers of the contest together with them the HOG-representation of the image (histograms of the oriented gradient). The sub-bag result of the work of the algorithm was 96.14% of correctly classified images that show that for the task of recognizing road signs can achieve successful recognition of the same methods that do not involve neural networks such deep learning, although they are still effective in performance. gortkovyh merezh.

1.6 Analysis of key libraries

For the implementation of algorithms with neural networks in the system that is being developed, it was decided to win one of the main libraries. Therefore, an analysis of the basic solution for the implementation of deep learning algorithms was carried out, and for the reasons for this analysis, a choice was made. The analysis of the main decisions was formed in two phases: theoretical and practical.

During the theoretical phase, such libraries as Deeplearning4j, Theano, Pylearn2, Torch and Caffe were reviewed. Let's look at the skin of their report.

· Deeplearning4j (www.deeplearning4j.org) - a library with open source code for the implementation of neural networks and deep learning algorithms, written in my Java. You can use Java, Scala and Closure, support integration with Hadoop, Apache Spark, Akka and AWS. The library is being developed and supported by Skymind, as well as providing commercial support for the library. Use the middle tiers of the library to win a library for swedish robots with n-world arrays ND4J of the distribution of the same company. Deeplearning4j supports impersonal mesh types, mid-range perceptrons, narrow meshes, Restricted Bolzmann Machines, Stacked Denoising Autoencoders, Deep Autoencoders, Recursive autoencoders, Deep-belief networks, recurrent meshes and others. An important feature of the library is the building development of the cluster. Also, the library supports learning from the GPU selections.

· Theano (www.github.com/Theano/Theano) - a library of my own Python with open source code, which allows you to effectively create, calculate and optimize mathematical expressions from multiple rich arrays. The NumPy library is used to present rich arrays and work on them. This library is recognized in the first year for scientific research and was created by a group of scholars from the University of Montreal. Theano's versatility is even wider, and the robot's neuron meshes are just one of the small її parts. At the same time, the library is the most popular and most likely to be guessed, if there is a deep learning work.

· Pylearn2 (www.github.com/lisa-lab/pylearn2) - python-library with open source code, based on Theano, but hopefully more convenient and simple interface for future students, which hopefully ready to type algorithms and allow simple configuration YAML file format. Distributed by a group of scientists from the LISA laboratory to the University of Montreal.

· Torch (www.torch.ch) - a library for calculation and implementation of machine learning algorithms, implemented by language C, which allows users to work with it using rich scripting language Lua. This library provides its efficient implementation of operations on matrices, rich arrays, computing on the GPU. Allows the implementation of new ties and folds. May valid exit code.

· Caffe (www.caffe.berkeleyvision.org) – a library focused on the efficient implementation of deep learning algorithms, which is expanded into the Berkley Vision and Learning Center first line, prote, as well as everything in front, may open code. The library is implemented in my C, it also provides a handy interface for Python and Matlab. Supports new languages ​​and short strings, allows describing strings in the format using a set of balls in the format.prototxt, counting on the GPU. To overcome the library, there is also the presence of a large number of advanced models and applications, which, with the help of other characteristics, make the library the simplest to start among the most important ones.

For a set of criteria for a closer look, 3 libraries were selected: Deeplearning4j, Theano and Caffe. 3 libraries were installed and protested in practice.

Among the Deeplearning4j libraries, it turned out to be the most problematic in the installation, moreover, there were pardons in the demonstration cases that work with the library at once, which called out the power of the library’s reliability and made it very difficult to move away. In order to improve the lesser productivity of the Java mov C, on which Caffe was implemented, in the distant view of the library, it was seen as a breeze.

The Theano library also appeared to be foldable in the installation and on-line, prote for this library, there is a large amount of good and well-structured documentation and applications of the working code, the permission of the library robot, including on the graphic cards. However, as z'yasuvalosya, the implementation of the elementary neural network in the library depends on the writing of a large number of free code, apparently also blames the great folding on the description and modification of the structure of the array. Therefore, regardless of the potential richness of the library’s library, Caffe was torn apart, for the conduct of this study, it was necessary to stand on its own for the rest as the most important set of tasks.

1.7 Librarycafe

The Caffe library aims to achieve a simple and easy-to-use interface, allowing you to easily configure and learn neural networks. To work with the library, it is necessary to create a description of the merge in the prototxt format (protocol buffer definition file - a movable description of data, created by Google), as it is similar to the JSON format, well structured and understood for people. The description of the measure is, in fact, the last description of the skin s її balls. As input data, the library can work with data base (leveldb or lmdb), in-memory data, HDF5 files and images. It is also possible to win for the purposes of distribution and testing a special type of data, titles DummyData.

The library supports the creation of balls of the next types: InnerProduct (re-connection of the ball), Splitting (reshaping data for transferring twice to a number of outgoing balls), Flattening (transforming data from a rich matrix to a vector), Reshape (allowing the conversion), resizing (reshaping data data from a number of entrance balls in one weekend), Slicing and other sprats. There are also special types of balls for pooling pools - Convolution (ball pool), Pooling (subsampling ball) and Local Response Normalization (ball for local data normalization). In addition, there are a number of types of input functions that are expected to occur when training measures (Softmax, Euclidean, Hinge, Sigmoid Cross-Entropy, Infogain and Accuracy) and neuron activation functions (Rectified-Linear, Sigmoid, Hyperbolic Tangent, Absolute Value, Power and BNLL ) - yaki are also configured like okremih balls.

In this order, the line is described declaratively in a sufficient simple form. Apply the names of the existing configuration files to the addendum 1. Also, for the work of the library with the standard scripts, it is necessary to create the file solver.prototxt, in which the configuration of the system is described - the number of calculations for the platform -cpu, learning for and etc.

The model development can be implemented from zastosuvannym budovanih scriptіv (after їх doopratsyuvannya and more precisely zavdannya) or manually through writing the code with python samples or Matlab, sho rely on api mine. If you use scripts that allow you to not only visconate the creation of images, but also, for example, create a database based on the list of images - with which image, before adding to the database, the data will be brought to a fixed size and normalized. Scripts, for the help of such learning, also encapsulate additional steps - for example, to evaluate the in-line accuracy of the model through the number of iterations and save the in-line state of the trained model a snapshot file. Vikoristannya faylіv snapshot dozvolyaє nadalі prodovzhiti navchannya modelі zamіst of dwellers pochinati spochatku, Yakscho vinikaє such neobhіdnіst and takozh pіslya deyakoї kіlkostі іteratsіy zmіniti konfіguratsіyu modelі - napriklad, Add your Novii ball - i at tsomu crowbars Vzhe navchenih ranіshe sharіv zberezhut svoї value scho dozvolyaє to implement the descriptions earlier of the mechanism of ball guidance.

In the beginning, the library appeared to be done manually in a robot, and it allowed to implement all the bazhan models, as well as to take into account the accuracy of the classification for these models.

2. Development of a prototype image recognition system

.1 Image classification algorithm

In the course of studying the theoretical material with the topics and practical experiments, an advancing set of ideas was formed, which could be instilled in the sub-bag algorithm:

· Vykoristannya deep gortkovyh neural networks. Zgortkovі merezhі stably show the best results in recognizing images, including road signs, the number of victories in the algorithm, which is developed, looks logical.

· Vykoristannya bagatospheric perceptrons. Notwithstanding the great efficiency of short-range meshes, to establish a type of image, for some rich-ball perceptron it shows better results, so a decision was made to use the same algorithm

· Combining the results of a number of models with the help of an additional classifier. Oskіlki was victorious at least two types of neuron lines, a method was needed to form a certain wild classification result with the improvement of the results in skin їx. For whom it is planned to win a supplementary classifier, not related to neuron meshes, input values ​​for which are the results of the classification of the skin mesh, and for whom - subbag predictions of the image class

· Zastosuvannya dodatkovyh reshuffle to vhіdnih data. To improve the applicability of the input images to the recognition of that, apparently, the completion of the effectiveness of the work of the classifier, to the input data, it is possible to stop a few types of reworking, with which the results of the skin type of them are to blame

On the basis of all the resurrected ideas, the offensive concept of the image classifier was formed. The classifier is an ensemble of 6 neuron strings that function independently: 2 perceptrons and 4 girdle strings. With this measure of one type, they are mixed with each other by a type of transformation, stagnant to the input data. The data at the entrance are scaled in such a way that at the entrance of the skin mesh, the data of one size are always shown, at which time the data can be adjusted for different sizes. For the aggregation of the results of all measures, a complementary classical classifier is used, in the quality of which there were 2 variants: the J48 algorithm, bases on the decision tree, and the kStar algorithm, which is a “live” classifier. Reworking that is victorious at the classifier:

· Binaryization - the image is replaced by a new one, which is composed of pixels in black and white colors. The adaptive thresholding method is used for victorization of binarization. The essence of the method polyagaє in fact scho for dermal pіkselya The image obchislyuєtsya serednє values ​​deyakoї Yogo okolitsі pіkselіv (peredbachaєtsya scho The image mіstit tіlki vіdtіnki sіrogo for tsogo vihіdnі The image boule poperedno peretvorenі vіdpovіdnim rank) i potіm on osnovі obchislenogo serednogo values ​​viznachaєtsya, responsible pіksel vvazhatsya black chi white.

· Histogram equalization - the essence of the method of fielding the histogram of the image of the same function, such that the values ​​on the resulting diagram will be distributed as evenly as possible. For any purpose, the function is calculated from the adjustment of the function, subdivided into the intensity of colors in the external image. Application of similar functions to histograms of images of representations of a baby 4. This method can be applied both for black and white, and for color images - but also for the skin component of color. Whom dosledzhenny buli vikoristani insults options.

Figure 4, the results of viewing the diagrams before the image

· Stronger contrast - it means that for the skin pixel of the image there is a local minimum and a maximum in the real area, and then the pixel is replaced by a local maximum, which means that the value is closer to the maximum or the local minimum in the fall. Zastosovuetsya to black and white images.

Schematically, the overall scheme of the removed classifier is represented by small 5:

Figure 5, subbag scheme of the classifier

For the implementation of a part of the model, which is used for the transformation of input data and neural networks, the language Python and the Caffe library are used. Let's describe the structure of the skin layer in a report.

Offending rich ball perceptrons to revenge 4 attached balls, and in general, their configuration is described by the offensive rank:

· Entry ball

Ball 1, 1500 neurons

Ball 2, 500 neurons

Ball 3, 300 neurons

Ball 4, 150 neurons

· Exit ball

An example of the Caffe configuration file, which describes the measure, can be added as an addendum. However, for clarity, let's look at the images, as if there is a significantly smaller size, the hemstitch has been modified. A short її description may look like this:

The scheme of the chain of measure is represented by the little 6.

Malyunok 6

The skin of the neuron meshes that guide the model is learning to work fine. Pіsl Incutching Neuronal Merezh Special Script Movy Python for Skin Skin Merzhin For Skin Sinking Inincidence In Inclineous Multi Refine the result of clasifіkatsії 'Visigar's list of yamіrniki Skin Clastivi, Vyibivo Two Nib_lsh ymovіrі clasi І Record Really I am valued at once in the file. The removed file is then passed as the initial multiplier to the classifier (J48 and kStar) implemented in the Weka library. Obviously, further classification is carried out with the help of the libraries of the library.

2.2 System architecture

Now, having looked at the algorithm for recognizing road signs for an additional neural network and an additive classifier, let's move on to the description of the broken system, like a victorious algorithm.

The system has been broken up with a program with a web-interface, which allows you to capture the image of a road sign and take the result of the classification from the description of the described algorithm for that sign. This add-on consists of 4 modules: web-addendum, neural network module, classification module and administrator interface. Schematically, the scheme of intermodality of modules is shown in small 7.

Malyunok 7, scheme of the robotic classification system

The numbers on the scheme indicate the sequence of the day before the hour of the work of the coristuvacha from the system. Koristuvach captures the image. The request is processed by the web server and the image capture is transferred to the neural mesh module, where all the necessary transformations (scaling, changing the color scheme, etc.) are displayed above the images, after which the skin of the neural mesh forms its own transfer. Then the logic that manages this module selects for the skin layer the two largest transfers and transfers the data to the web server. The web server transmits the canceled data about the transmission of the class to the module of the classification, degenerates and forms a residual opinion about the predicted image class, which is turned to the web server and the sound is corrected. With any interaction between the web-server and the web-server, the modules of the neural networks and the classification are used for the help of REST-requests for the help of the HTTP protocol. The image is transmitted in the multipart form data format, and the data about the results of the classifiers is in the JSON format. The logic is given to work with different modules to be isolated one or the other, which allows them to be separated independently, including various different programming methods, and also, if necessary, it is easy to change the logic of the robotic skin module, without interfering with the logic of the robots.

For the implementation of the koristuvach interface in this system, HTML and Java Script movs were used, for the implementation of the web server and the classification module - the Java language, and for the implementation of the neuron mesh module - the Python language. The old look to the interface of the system of representations on the little one 8.

Figure 8. Coristuvac interface

Vykoristannya tsієї sistemy transferring, shcho modules of neural networks and clasificаtsії already avenge the learned model. When you need to start models, you need an administrator interface, which, in fact, typed scripts in my python to start neural networks and my Java console utility to start a bag classifier. It should be noted that these tools are not guilty of vicoristing often or non-professional coristuvachas, so a larger interface is not necessary for them.

In general, the expansion of the appendages successfully overcomes all the tasks set in front of him, including allowing the coristuvachev to hand over the class for the image he has chosen. Therefore, there is more than enough food for the practical results of the work of the classifier, which is victorious in this algorithm, and will be considered in section 3.

3. Results of experimental studies

.1 External data

As the data entered in this dataset, the number of entries already guessed earlier was the GTSRB (German Traffic Signs Recognition Benchmark) dataset. The dataset is composed of 51840 images that lie in 43 classes. Given the number of images, which belong to different classes, in a different way. Razpodіl kіlkostі izobrazheni for the classes presented little 9.

Figure 9. Expanded the number of images by class

Rosemary of the entrance images are also revisited. For the smallest image, the width is 15 pixels, for the largest - 250 pixels. Zagalny rozpodіl rozmіrіv image is represented by little 10.

Figure 10

On the other hand, the image is presented in the ppm format, so that in the file, the de-skin pixel is given three numbers - the value of the intensity of the black, green and blue components of the color.

3.2 Front processing

Before the beginning of the work, the data was in the process of preparation - the conversion of the PPM form into the JPEG format, with the help of the Caffe library, the initial split on the initial test multiplier of the ratio of 80:20%, as well as scaling. In the classification algorithm, the images of two dimensions are drawn - 45 * 45 (for the formation of a rich-spherical perspron on binarized data) and 60 * 60 (for the formation of other lines), that for the skin image of the initial and test multiplier of the creation of copies of the two. Also, before the skin image, the images were saved earlier than the conversion (binarization, normalization of histograms, contrast enhancement), and later the image was removed, the images were saved in the LMDB database (Lightning Memory-Mapped Database), as it is a shvidka that effective key-value of the “key-value” type » ». Such a way to save data will secure the most security and help the work of the Caffe library. For the transformation of the image, the Python Imaging Library (PIL) and scikit-image libraries were uploaded. Apply after the dermal transformation of the image is presented to the baby 11. The savings in the database of the images were taken away for the immediate training of neuronal images.

Figure 11

As soon as the training of neuronal lines is necessary, then the skin layer was trained completely and the results of the work were evaluated, and then it was prompted that the bag classifier was trained. However, in front of the cymbal the bula was awakened and trained by the simplest mesh, which is a perceptron with one attached ball. Looking at the value of the meshes is not enough two ways - the creation of a robot with the Caffe library is a simple example of the formation of a benchmark for a greater substantive assessment of the results of the work of other meshes against it. Therefore, in the offensive, take a look at the skin of the models and the results of the work and report.

3.3 Results of four models

Prior to the implementation of the current follow-up models, it is worth mentioning:

Neuronal mesh with one attachment ball

· Bagatosharova neuron mesh, inspired on the basis of past data

Bagatosharov neuron array, inspired on the basis of binarized data

· Zgortkova merezha, inspired on the basis of weekend data

· Zgortkova merezha, inspired on the basis of RGB - data after visualization by diagrams

· Zgortkovy merezha, inspired on the basis of greyscale - given after visualization by diagrams

· Zgortkovy hemstitch, inspired on the basis of greyscale - given after strengthening the contrast

· Combined model, which consists of a combination of two mega-ball neuron meshes and 4 horns.

Let's look at the skin of their report.

The neuron is with one attached ball, although it does not come up to deep learning models, the prote is already necessary for the implementation in the first place, as the initial material for working with the library, and in a different way - like a basic algorithm for matching with the work of other models. To the inexhaustible advantages of this model, there is a lightness and a high level of comfort.

This model was designed for outdoor color images with a size of 45*45 pixels, while attaching a ball of 500 neurons. It took about 30 lengths to complete the line, and the resulting transfer accuracy reached 59.7%.

Another model was induced - the chain of the bagatosharov is connected with the neuron mesh. This model was designed for a binarized and colored version of the image in a smaller format and included 4 attached balls. The configuration of the merge is described as follows:

· Entry ball

Ball 1, 1500 neurons

Ball 2, 500 neurons

Ball 3, 300 neurons

Ball 4, 150 neurons

· Exit ball

Schematically, the model of the chain is depicted as a small 12.

Malyunok 12. Scheme of a bagatospheric perceptron

The estimated accuracy of the sampled model is 66.1% for binarized images and 81.5% for color images. However, it is true that the model for binarized images, regardless of the lesser accuracy, knows a number of images, for which the binarized model could designate the correct class. In addition, the model based on color images needed a lot more time for training - about 5 years compared to 1.5 years for the binarized version.

Other motivated models are still based on the image recognition barriers, so these barriers themselves showed the greatest efficiency in the tasks of image recognition. The architecture of the neural mesh was based on the LeNet mesh, which was split for image classification from the ImageNet dataset. However, for clarity, let's look at the images, as if there is a significantly smaller size, the hemstitch has been modified. A short description of the architecture of the city:

· 3 balls of a gortka with core sizes 9, 3 and 3 are suitable

3 balls subsampling

3 connected balls with 100, 100 and 43 neurons

The merezha was okremo trained on external images of a larger size, images after visualization of histograms (color of savings), images after visualization of histograms pointing to a black-and-white look and, nareshti, black-and-bright contrast strength of the image. The results of the training are presented in Table 1:

Table 1

It is possible that the best results were shown by the measure, prompted by the improvement of black-and-white images after viewing the histograms. We can explain Tim, Scho in the procession of Virіvnyvnynya Dіagrami Yak_t Zobyn, Ripples, Vіdmіnnostі Mіzh Zaginzhnaym І Background І Zagalniy Stіn Yaskravosti painted, in the same time I wave Іnformatiya, Shaho mііслять in Kolorі і not carrying an estate spelling navantaged the signs themselves in black and white - but noisy images and complicate classification - were used.

Teach the different method of reversible widening of the skin tread on the initial multiplier (one and the same multiplier for all tassels, but before the image is fixed, the difference is reworked)

2. For the skin instance of the initial multiplier, take two of the largest imovirnі class in the order of the change in imovirnosti in the skin layer, save the deletion of the dial (total 12 values) and the decimal mark of the class

Victory of typing data - 12 attributes and class mark - as the initial multiplier for the bag classifier

Evaluate the accuracy of the taken model: for the skin copy of the test set, take two of the largest weight classes in order

Based on the results of working out the tests for this scheme, the sub-bag accuracy of the combined algorithm was calculated: 93% with the J48 algorithm and 94.8% with the KStar variable. With this algorithm, based on a tree solution, showing three higher results, however, there are two important advantages: first, otrimana as a result of the work of the algorithm, the tree initially demonstrates the logic of the classification and allows you to better understand the real structure of the data (for example, as a measure). naychnіshi transfer for the sing type of signs and to that її the forecast unambiguously shows the result), in a different way - after the model given, the algorithm allows to classify new entities even more quickly, so for the classification you need only one pass down the tree to the animal. As far as the KStar algorithm is concerned, then in the course of this work, the models are actually not recognized, and the classification is based on the search for the most similar instances in the middle of the initial selection. To the same Danis algorithm, Watch the Clasifіkuє Snotnosti, Alla is not nada, with a Cyoma Skodkovo, and the head - clasifіkatsya of the skin Echpierata can vimagati knowingly kіlkostі o'clock, Shaho can be in defrinteer for Svidan, de Potrebno shouthem road signs at the hour of automatic signing by car.

In Table 2, there is a total equivalence of the results of the work of all analyzed algorithms.

Table 2. Comparison of results of robotic algorithms

There are 13 representations of the graph of the beginning of the line on the butt of the girdle line for the greyscale-data with the view of histograms for the little one (on the x-axis, the number of iterations, on the y-axis - accuracy).

Malyunok 13

For pіdbittya pіdsіdzhennya korisno vyvchiti the results of the classification and reveal, like the signs were the easiest to classify, and yakі, navpak, rozpіznayutsya by force. Let's take a look at the J48 algorithm and take away the table of occurrences for which output value (div. appendix 3). It is possible that for some of the signs the accuracy of the classification is 100% more or even close to them - for example, the signs "Stop" (class 14), "Take the road" (class 13), "Main road" (class 12), "Kіnets "All obmezhen" (class 32), "Skrіzny passage is easy" (class 15) (Fig. 12). Most of these signs may have a characteristic form (“Main road”), which have special graphic elements that can be analogous to other signs (“Kіnets vsіkh obmezhen”).

Figure 12. Apply road signs that are easy to recognize.

Other signs are often confused with each other, for example, such as ob'їzd levoruch and ob'їzd right-hander or different signs of the border of swidkost (Figure 13).

Figure 13. Apply signs that often laugh.

Pluda in Oco Lawnіsti, Scho Neuronnі Mezhi, often Sitezhet Symetrichnі MІZHO Signs - Symetry's specifically, Meriazh Zlotannnya, Yaki, shook at Zobrennі Lokalni Means and at Cyoboma, not an analization of Zobrenha in Tsіloma - for clasifіkatsії ostivniy zobatoshari Pershepponi.

Pіdsumovychi, you can say, Shaho for the completion of the Zgortkovih neural dimensions, the comobric algorithm, bought on ї ї a classifier for combining the results of neural networks may be impersonal possibilities for a further thoroughness.

Visnovok

For this robot, it was reported that the task of recognizing the image from the installation of piece neural networks was reported to the apparatus. We will look at the most relevant in Denmark at the right time to recognize the image, including deep neural networks, which are victorious, as well as the development of an algorithm for recognizing the image on the application of the task of recognizing road signs from the stoppage of deep measures. Based on the results of the work, it can be said that everything was put on the cob of the work and the tasks were vikonan:

An analytical review of the literature on the topic of placing piecewise neural networks for image recognition was carried out. Based on the results of this review, it was determined that the most effective and widened in the rest of the hour would come to the recognition of the image, which is grounded on the thickening of deep seams.

We have developed an algorithm for recognizing an image on the example of the problem of recognizing road signs, which is a vicoristic ensemble of neuron strings, which consists of two rich-ball perceptrons and 4 deep stringer strings, and from the inclusion of two types of an additive classifier KStar for -J48 committer

A prototype of the system for recognizing images on the butt of road signs based on the algorithm 3 p.

Dividing in item 3 the algorithm of the letters of the training with the variations of the GTSRB dataset, with the help of which the results of the skin dimension are estimated, which are included up to the new, and the accuracy of the algorithm for two types of the additional classifier is subsumed. According to the results of the experiments, the highest accuracy of recognition, equal to 94.8%, is achieved with the most diverse ensemble of neuronal strings and the KStar classifier, and among the four strings, the best results - an accuracy of 89.1% - showed

In general, this study confirmed that in this hour deep piece neuron measures, especially folded lines, are the most effective and promising approach for classifying images, which is confirmed by the results of numerical studies and recognition of image recognition.

List of victorious literature

1. Al-Azawi M. A. N. Neural Network Based Automatic Traffic Signs Recognition // International Journal of Digital Information and Wireless Communications (IJDIWC). - 2011. - T. 1. - No. 4. - S. 753-766.

2. Baldi P. Autoencoders, Unsupervised Learning, and Deep Architectures // ICML Unsupervised and Transfer Learning. - 2012. - T. 27. - S. 37-50.

Bahlmann C. and in. System for vehicle sign detection, tracking, and recognition using color, shape, and motion information // Intelligent Vehicles Symposium, 2005. Proceedings. IEEE. - IEEE, 2005. - S. 255-260.

Bastien F. and in. Theano: new features and innovations //arXiv preprint arXiv:1211.5590. – 2012.

Bengio Y., Goodfellow I., Courville A. Deep Learning. - MIT Press, book in preparation

Bergstra J. and in. Theano: A CPU and GPU Computer in Python //Proc. 9th Python in Science Conf. - 2010. - S. 1-7.

Broggi A. ta in. Real time road signs recognition // Intelligent Vehicles Symposium, 2007 IEEE. - IEEE, 2007. - S. 981-986.

Canny J. A computational approach to edge detection //Pattern Analysis and Machine Intelligence, IEEE Transactions on. - 1986. - no. 6. - S. 679-698.

Ciresan D., Meier U., Schmidhuber J. Multi-column deep neural networks for image classification //Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. - IEEE, 2012. - P. 3642-3649.

Ciresan D. and in. Neural networks for traffic sign classification committee //Neural Networks (IJCNN), The 2011 International Joint Conference on. - IEEE, 2011. - S. 1918-1921.

11. Ciresan DC et al. Deep big multilayer perceptrons for digit recognition //Neural Networks: Tricks of the Trade. - Springer Berlin Heidelberg, 2012. - P. 581-598.

Daugman J. G. Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression //Acoustics, Speech and Signal Processing, IEEE Transactions on. - 1988. - T. 36. - No. 7. - S. 1169-1179.

Gao X. W. ta in. Recognition traffic signs are based on different colors and forms of images that are based on different human images models //Journal of Visual Communication and Image Representation. - 2006. - T. 17. - No. 4. - S. 675-685.

Goodfellow I. J. and in. Pylearn2: a machine learning research library //arXiv preprint arXiv:1308.4214. – 2013.

Han J., Kamber M., Pei J. Data mining: Concepts and techniques. – Morgan Kaufmann, 2006.

Harris C., Stephens M. A combination of end and end detector // Alvey vision conference. - 1988. - T. 15. - S. 50.

Houben S. and in. Branding in Real Light Images: German Traffic Signal Benchmark //Neural Networks (IJCNN), The 2013 International Joint Conference on. - IEEE, 2013. - P. 1-8.

Huang F. J., LeCun Y. A great scope for developing svm and conversion arrays for generating rosebud objects //2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. – 2006.

Huttenlocher D. P., Ullman S. Object recognition using alignment //Proc. ICCV. - 1987. - T. 87. - S. 102-111.

Jia, Yangqing. "Caffe: An Open source convolutional architecture for fast feature embedding." h ttp://caffe. Berkeleyvision. org (2013).

Krizhevsky A., Sutskever I., Hinton G. E. Imagenet classification with deep convolutional neural networks //Advances in neural information processing systems. - 2012. - S. 1097-1105.

Lafuente-Arroyo S. and in. Traffic sign classification invariant to rotations using support vector machines //Proceedings of Advabced Concepts for Intelligent Vision Systems, Brussels, Belgium. – 2004.

LeCun Y., Bengio Y. Convolutional networks for images, speech, and time series // Trade Literature and Articles. - 1995. - T. 3361. - S. 310.

LeCun Y. and in. Learning algorithms for classification: A comparison on handwritten digit recognition //Neural networks: thestatistical mechanics perspective. - 1995. - T. 261. - S. 276.

Masci J. and in. Developed convolutional auto-encoders for hierarchical feature extraction //Artificial Neural Networks and Machine Learning-ICANN 2011. - Springer Berlin Heidelberg, 2011. - P. 52-59.

Matan O. and in. Handwritten character recognition using neural network architectures //Proceedings of the 4th USPS Advanced technology Conference. - 1990. - S. 1003-1011.

McCulloch W. S., Pitts W. A logical calculator for thoughts invisible in nerve activity / / Bouquet of Mathematical Biophysics. - 1943. - T. 5. - No. 4. - S. 115-133.

Minsky M., Seymour P. Perceptrons. - 1969.

Mitchell T. Generative and discriminative classifiers: naive Bayes and logistic regression, 2005 //>> 997281.